More than one million news articles in 22 languages have been analysed using the latest technology to pinpoint the factors that influence and shape the news agenda in 27 European countries.
Every day hundreds of news outlets across Europe choose which story to cover from a wide and diverse selection. While each outlet may make news choices based on individual criteria, clear patterns emerge when these choices are studied across many outlets and over a long period of time.
They discovered that chosen news content reflects national biases, as well as cultural, economic and geographic links between countries. For example outlets from countries that trade a lot with each other and are in the Eurozone are more likely to cover the same stories, as are countries that vote for each other in the Eurovision song contest. Deviation from ‘normal content’ is more pronounced in outlets of countries that do not share the Euro, or have joined the European Union later.
Professor Lewis said: “This approach has the potential to revolutionise the way we understand our media and information systems. It opens up the possibility of analysing the mediasphere on a global scale, using huge samples that traditional analytical techniques simply couldn’t countenance. It also allows us to use automated means to identify clusters and patterns of content, allowing us to reach a new level of objectivity in our analysis.”
Professor Cristianini, University of Bristol added: “Automating the analysis of news content could have significant applications, due to the central role played by the news media in providing the information that people use to make sense of the world.”
Sadly though, there is no elaboration on exactly which shared interests countries have, and exactly what kind of issues outlying countries are more interested in. (Most likely, news about non-EU countries’ that share their other borders.)
The big potential use I can see for all of this is the automated discovery of potential stories of interest – a feed of ‘stories my local media are not reporting’. It would be interesting to see if the same techniques could work for the entire news output of a single country, so we could get an analysis of stories across the UK.
It seems the researchers really went above and beyond what should be possible for their study…
[…] the team was able to analyse 1,370,874 articles – a sample size well beyond existing research techniques.
As the paper is Creative Commons, I thought I’d post a few chunks of it here.
A trend towards automation of scientific research has recently resulted in what has been termed “data-driven inquiry” in various disciplines, including physics and biology. The automation of many tasks has been identified as a possible future also for the humanities and the social sciences, particularly in those disciplines concerned with the analysis of text, due to the recent availability of millions of books and news articles in digital format. In the social sciences, the analysis of news media is done largely by hand and in a hypothesis-driven fashion: the scholar needs to formulate a very specific assumption about the patterns that might be in the data, and then set out to verify if they are present or not.
In this study, we report what we think is the first large scale content-analysis of cross-linguistic text in the social sciences, by using various artificial intelligence techniques. We analyse 1.3 M news articles in 22 languages detecting a clear structure in the choice of stories covered by the various outlets. This is significantly affected by objective national, geographic, economic and cultural relations among outlets and countries, e.g., outlets from countries sharing strong economic ties are more likely to cover the same stories. We also show that the deviation from average content is significantly correlated with membership to the eurozone, as well as with the year of accession to the EU.
While independently making a multitude of small editorial decisions, the leading media of the 27 EU countries, over a period of six months, shaped the contents of the EU mediasphere in a way that reflects its deep geographic, economic and cultural relations. Detecting these subtle signals in a statistically rigorous way would be out of the reach of traditional methods. This analysis demonstrates the power of the available methods for significant automation of media content analysis.
This is the network of the most significant relations among EU countries that cover the same stories in their media. The network has 27 nodes that correspond to the EU countries and 112 links between them. The sparseness was chosen as high as possible with the restriction that all countries must link to at least one other country.
b) For cultural relations we used data expressed in the voting patterns of EU countries competing in the Eurovision song contest from 1957 to 2003. We used the fraction of total points awarded by the country in question to each other country over the whole period of time. […]
Really? I thought this was amusing. While it’s probably a pretty good indicator of relations, surely some countries must vote based on the quality of the music? No?
Our approach opens up the possibility of analysing the mediasphere on a global scale, using automated means to identify clusters and patterns of content. While this approach inevitably lacks the degree of qualitative subtlety provided by humans, we believe that it is a significant breakthrough in the analysis of media content, allowing for a data-driven approach to social sciences, and the exploitation of the huge amounts of digital data that have become available in recent years.
All excerpts above from The Structure of the EU Mediasphere – plosone.org