Media is the fourth pillar of our democracy and a significant source of getting correct and precise information about various events around the world. In the present digital age, online news websites have become an important source of news and information for an ordinary person. However, nowadays, it becomes difficult for us to get precise information, due to the menace of fake news. There can be two types of fake news, one which is factually incorrect, and the other which is factual, but presented in a manipulative manner and thus, representing fictional style of writing. Both these problems are equally serious and can cause a lot of social damage if not tackled properly. A lot of work is already going on in the domain of detecting incorrect factual information published by online news agencies, but the latter problem of manipulative news content is hardly paid attention to.
It is important to note here that manipulation by itself is not a problem and is an unavoidable ingredient of our daily life. Product advertisements and general marketing is impossible without manipulation. And even teaching requires teachers to motivate students to do hard work, which is also a kind of emotional manipulation. However, this becomes a problem when an opinion written using emotionally manipulative language is presented in the form of informative news. This can lead to unnecessary misunderstanding between groups of people sometimes even leading to violent outcomes. Nowadays, due to performance pressure and ideological alignments, there is a high tendency among news agencies to use emotionally manipulative language in their articles, and this is a major problem which needs to be addressed.
In order to address this shortcoming, our team at IISER Bhopal has developed a Machine Learning based algorithm called Fictometer with a high accuracy of around 96% which can automatically detect manipulative/imaginative style of writing in any English text. This algorithm takes an English text (in our case, a news article) as input and, using a method called Logistic Regression, outputs the probability of the text being written in a manipulative (fiction) or straight forward (non-fiction) manner. For example, our algorithm may classify one article as fiction with 84% probability and another one with 72% probability. This is very important since real world articles don’t fall into a black/white binary, but on a wide spectrum which ranges from extreme manipulation to extreme straight-forwardness, and everywhere in between. As shown in Fig. 1, we have found that the two most notable features which help in doing this classification between fiction and non-fiction writing styles are adverb/adjective ratio and adjective/pronoun ratio. Essentially, fiction articles tend to have a higher adverb/adjective ratio and non-fiction texts tend to have a higher adjective/pronoun ratio.
Figure 1 : This graph shows the primary features that can be used to differentiate between texts belonging to the fiction and non-fiction genres. As can be clearly seen, fiction articles tend to have a higher adverb/adjective ratio and non-fiction texts tend to have a higher adjective/pronoun ratio.
We have applied this Machine Learning algorithm on some news articles collected from six leading Indian English news websites for a period of last few months, and then calculated the percentage of these manipulative/fiction articles published by each of these news websites. As shown in Fig. 2, we analysed the published news articles according to various categories (Political, Entertainment, Sci-Tech, Sports and Business), and found that the percentage of manipulative fake news published by these news agencies is roughly between 15-20% in most cases. A notable exception is the percentage of fiction articles by News Agency 1 in the Sports category, which is close to 40%, which is very surprising, as one would expect articles in the Entertainment category to be more fictional.
Figure 2 : Percentage of manipulative fake news articles (categorised as fiction by our Machine Learning algorithm) published by some of the major news agencies of India over the last few months. Some of the columns in this graph are blank, which means that we could not find sufficient number of articles belonging to that category published by that particular news agency.
An interesting observation from this analysis is that the percentage of manipulative fake news is not very different across categories, and also similar across news agencies. This hints at the possibility that the manipulative fake news published by various news agencies is perhaps not a deliberate attempt by the journalists and editors to misguide the audience but a natural outcome of trying to engage with the audience. Some amount of descriptive language is also necessary to retain the interest of the common reader. The situation would have been alarming if this percentage was higher than say, 30%, on average.
It would be interesting to carry out a similar study for less popular news agencies and see if they try to publish a higher percentage of manipulative news articles in order to capture market share. This is, however, difficult to do due to lack of easy availability of data in the required format, but hopefully we will be able to get access in the near future. It would be interesting to see how these statistics change when we go from English to other Indian languages like Hindi. Our current algorithm does not give a very high accuracy when directly applied to Hindi due to certain fundamental grammatical differences between English and Hindi, and we are trying to make suitable changes in our algorithm.
The team at IISER Bhopal which did this work consists of two faculty members (Dr. Kushal Shah and Dr. Rajakrishnan P. Rajkumar) and three students (Aditi Dave, Mohd. Rameez Qureshi and Sidharth Ranjan).
No comments:
Post a Comment