A simple, speculative, topical analysis of what news channels report.
News is inherently biased though it shouldn’t be. Biases may stem from the writer’s opinions, the channel’s affiliations, or the audience’s preferences. These biases should be teased apart through analysis.
It is interesting to note what the different news channels are biased to write about. I found an interesting dataset on Kaggle detailing the news articles published by various major news channels. I started digging to find biases.
Data
The internet news data was collected between 03.09.2019 and 04.11.2019 (DD.MM.YYYY). The data is collected over 2 months from channels such as CNN, ABC News, The New York Times, CBS News, BBC News, The Irish Times, Reuters, Al Jazeera English, The Wall Street Journal, and Business Insider. There are 10437 articles and 15 attributes for these articles. A snapshot of the data can be seen below.
For the purposes of simplicity, I will be making calculated generalizations. I will be looking at popular topics presented in a news source but not in the others. Most things newsworthy should be reported by either all or a certain genre of news channels. For example, news in the UK is more likely to be reported by BBC or The Irish Times, and global, non-Western news is more likely to be reported by Reuters or Al Jazeera.
Similarly, American news is more likely to be reported in American channels such as CNN, ABC, CBS, or the NY Times. Business and finance articles are more likely to be reported in the Wall Street Journal or Business Insider.
The frequencies of occurrences of the various news sources in the data can be seen below.
Most news channels have between [950–1252] articles. Even with the ones with fewer articles, you have a large enough sample size to determine the most common topics. ESPN has too few data points so I am not considering it.
Code
I have analyzed the data in Python and used libraries such as Pandas and nltk. The code is hosted on a Kaggle iPython notebook. The notebook is a mere playground for my analyses and is subject to further edification, organization, and modularization.
Method
I analyzed the descriptions of the articles to determine the mentioned topics (shout out nltk). For each news source, I consider the 200 most common topics. This set of topics generally represents the various subject matters of the news sources. I then ask the data questions and explore promising hypotheses. I do a critical, qualitative analysis of certain topic mentions.
The Natural Language Processing library, nltk, is used to identify proper nouns. Proper nouns usually form the subject of an article. Incidentally and occasionally, it will take into account non-proper nouns because their initials are capitalized. These words usually occur at the beginning of sentences.
Standouts
During my analysis of the topics news sources write about, there were some standouts I would like to mention.
CBS News loves fear-mongering about the Taliban. And others don’t want to talk about it.
CBS News wrote 12 articles about the Taliban in the span of two months. Half of these articles report on the same topic of Trump’s peace talks with the Taliban being canceled. There is an overemphasized narrative of the Taliban unwilling to negotiate with Americans and their ideals. Other major American news sources, such as CNN, NYT, and ABC had no mention of the Taliban.
CNN has a strong preference of Biden over Bernie
CNN has an obvious Democrat bias. Interestingly, Biden is mentioned 17 times in CNN’s news descriptions, and Bernie only 5. Biden’s mention in CNN is substantially larger than his mentions in any other news source. The New York Times has the fewest mentions of either candidate and remarkably so.
Disney pays ABC News
ABC wrote 29 articles about Disney and all other sources wrote close to none. This is perhaps the most striking example of implicit product placement. There would probably be a decline of that after the pandemic.
Nearly all news channels, notably Al-Jazeera, mention Saudi more than other countries. Even China.
I compiled the 200 most common topics for each news source. Of these 200, only 6 topics are mentioned in all sources. They are ‘New’, ‘House’, ‘Saudi’, ‘Donald’, ‘President’, ‘National’. Al-Jazeera has 49 mentions of Saudi, significantly larger than mentions by any other channel. Al-Jazeera is a Qatar-owned site.
Only ABC is doing a majority of state-based reporting
ABC is the only American news source mentioning Connecticut, Dakota, Illinois, Indiana, Kansas, Louisiana, Maryland, Minnesota, Mississipi, Oklahoma, and Pennsylvania.
Exclusively reported topics
A news channel would sometimes report topics others won’t. I wonder how purposeful the bias is.
Reuters
Only Reuters has reported on Brazil, Spain, Czech, Singapore, and Iran. Reuters is an international news channel supposed to be the most unbiased off the list. It has 7 articles on the USA’s Environment Protection Agency, which is unreported otherwise.
BBC News
BBC loves sports. They’re the only ones talking about Europa (football), T20 (cricket), Arsenal (football), Stokes (Cricket), and Vuelta a Espana (cycling) among other sports-related topics. Only the BBC, which is British, mentions the words Scotland and English.
The Irish Times
Whiskey, Rugby, and Theatre. Only The Irish Times seems to talk about Toner (Rugby player) and Kobe (location of Rugby World Cup 2019). There is significant coverage of the Dublin Theatre Festival and the Dublin Fringe Festival, and ‘Theatre’ is only mentioned in The Irish Times (as is Rosamund Pike).
ABC News
If you are writing 29 articles on Disney (7 on Disneyland) in 2 months when no one else is writing on it, you better be getting paid by them. Only ABC is writing about Christmas and Halloween. It was the only news source forewarning of Hurricane Humberto. The aforementioned state-mentioning new reporting is perhaps to appeal to audiences in these states.
CNN
Only news source mentioning Yang, Kavanaugh, Pelosi, Barr, Buttgieg, McCabe and Voldymyr. CNN wrote 10 articles criticizing Voldymyr, the Ukranian president, and in many of them also Donald Trump. Only channel mentioning (ex) potential Democratic presidential candidates, Andrew Yang (7 times) and Pete Buttgieg (6 times).
Business Insider
Surprisingly, the only source mentioning Zuckerberg, Porsche, Lyft, Nintendo, McDonald’s, Microsoft, WeWork, Pixel, Surface and Mac. There are 10 articles on Lyft and most of them are almost certainly advertisements. Most of the 7 articles on Zuckerberg criticize him. 3 articles on Mac and 19 on Microsoft — a lot of these apparent advertisements for Surface. 8 articles on Nintendo, almost all raving Switch or Wii Fit. Business Insider definitely advertises Google Pixel. Also, 11 articles on WeWork.
The New York Times
Only source mentioning ‘Chinese’ (11 times) and Communist (7 times) — often simultaneously as the Communist Chinese Party. The Hong Kong protests are covered by other sources too. Tennis, Myanmar, Harvard, NRA and Nike are among other exclusively reported topics.
Al Jazeera
Only source mentioning Sudan, Lebanon, Venezuela, Korea, Cairo, Ashoura, Tunisia, and Yemen among other topics. There is a large focus on topics in Middle-East Asia and North Africa. Al Jazeera has the highest number of exclusive topics off all news channels.
The Wall Street Journal
AT&T, Obama, Marvel, OPEC, Oil, WTO, Samsung, and Slack are among some of WSJ’s exclusively reported topics. WSJ also only has 333 articles in the dataset and hence fewer and fewer occurring topics.
Limitations
There are several limitations to the data and analysis. The data is only available for 2 months. I didn’t have access to the entire content of the articles and was analyzing the descriptions. The dataset had unequal numbers of articles for various sources. I made generalizations about mentions being the same as topics. Generalizations were made when interpreting the numbers. I only considered the 200 most common topics; there were more. Bias is omnipotent — I perhaps had biases about what topics to report. The most one can do is be cognizant and skeptical. Feel free to check out and mess around with the code. You can see my approach and analyses in more detail.
Next Steps
A good amount of this project was a qualitative, manual analysis of biases and sentiments in the writing. I would love to have access to the entire contents of articles and decipher the various sentiments using Machine Learning. Having a larger dataset with more articles can help determine biases more accurately. I would also like to explore the reasons for biases more carefully.
Conclusion
Analyses of news channel data indicate some obvious biases and advertisements. I explored some insights that stood out. I investigated topics reported exclusively by different news channels. A speculative analysis, this project emphasizes certain biases and encourages readers to be more aware of them. Know that there is an opinion and voice — which are not yours — behind everything you read.
More content at plainenglish.io