Claimspotting is a specially developed monitoring application that supports fact-checkers in verifying online content on the Telegram news platform. The term ‘Claimspotting’ combines the English words ‘claim’ and ‘spotting’. In the context of the application, it refers to the targeted identification of claims that could potentially be misinformation. The aim of the AI application is to make the work of journalistic fact-checkers more efficient by automating the time-consuming process of monitoring Telegram channels.
The application automatically searches and monitors around 200 Telegram channels daily, which are either known for spreading misinformation or have been identified as problematic by experts. It identifies posts that exhibit characteristics commonly associated with misinformation, according to research. Once such posts are detected, Claimspotting marks and categorises them according to specific criteria. This allows fact-checkers to pinpoint potential misinformation and subsequently verify it.
Our AI application offers three main functions: 1) You can view posts from Telegram containing potential misinformation in a table. You can sort this content according to various criteria. 2) There is a search function that allows you to search the entire database of monitored Telegram posts. This way, you can find out if a particular claim is circulating in the observed channels. 3) There is also a dashboard that visualises trends in the data. For example, it shows which topics or narratives are particularly prevalent on a given day.
Various types of texts qualify as factual claims in the Claimspotting application. Primarily, factual claims are statements that assert a truth. This means they are statements that can be either true or false. They are not opinions or judgements of taste. Often, factual claims refer to evidence, such as links, quotations, numbers, etc.
Our table displays 21 different topics, ranging from the environment to migration. Posts labelled as ‘No Topic’ have no specific theme, which may be the case if they are too short. Posts classified as ‘Other’ have a theme but do not fit into any of the 21 categories. The topics in the table are derived from the Comparative Agenda Project, which has been used for numerous research projects over the years. This project has defined various policy areas that we use as topic categories.
By narratives, we mean typical misinformation narratives. We asked several fact-checkers which types of overarching narratives they frequently encounter. They provided us with a list of about 40 narratives. We condensed this list to about 20 narratives, as machine learning with 40 classes was too challenging. These narratives include statements like 'Immigrants are more criminal than Germans' or 'Electric cars are worse for the environment than combustion engine cars.' If a post supports one of these narratives, it is classified accordingly. The narratives listed in Claimspotting are a snapshot in time. We plan to update them on a regular basis.
Polarising claims are those that create a clear friend-enemy distinction. They refer to specific national, ethnic, or religious groups or portray elites as enemies or perpetrators. We adopted the taxonomy and data from the DeFaktS project.
Sensational claims are statements that are exaggerated to grab the reader's attention. This often occurs through the excessive use of capital letters or exclamation marks. We adopted the taxonomy and data from the DeFaktS project.
Siblings are Telegram posts that are semantically identical in content but differ in wording, grammar, length, or other characteristics. They could also be called paraphrases. An important criterion for semantic identity during the annotation process was whether the posts could be verified by the same fact-checking article. In other words, if Telegram post A can be proven true or false based on certain evidence, and the same judgement can be made for post B with the same evidence, we speak of semantic identity between A and B.
We track the number of forwards and views of a post. These engagement metrics are provided by Telegram. When a post is forwarded at an unusually fast rate, we refer to it as high diffusion. We use outlier detection to identify this. Technically, we speak of high diffusion when a post’s forwards are three standard deviations above the average.
Disinformation is usually understood as false information spread with harmful intent. However, with misinformation, these intentions do not matter. The reason we refer to misinformation rather than disinformation is that our AI application cannot predict the intent of the authors based on a text. No software application can do that. Whether the information is spread with harmful intent or simply because the person genuinely believes it is not something software can determine. Therefore, we only refer to misinformation, not disinformation.
No, it does not. Detecting misinformation requires content verification, which is not the application's task, and we do not believe this should be done by software. What Claimspotting does is flag Telegram posts that meet certain criteria. These criteria are known from research and are often associated with misinformation. However, this does not mean that the content is actually misinformation. Therefore, we refer to potential misinformation. The application supports fact-checkers in monitoring such potential misinformation.
It is, of course, not possible to monitor all Telegram channels, as there are simply too many. Moreover, this would be of little use since most channels are uninteresting regarding misinformation. Therefore, we specifically monitor channels that have previously undergone fact-checking and are known for spreading misinformation. Additionally, we have asked fact-checkers about channels they monitor. If you have suggestions for other channels that should be included or believe that a channel should be removed, please feel free to contact us. A complete list of monitored channels can be found here.
From a technical standpoint, it would be possible to monitor other platforms as well. The models work with texts, and although adjustments would be required for other formats, this is generally feasible. However, many platforms have tightened their restrictions in recent years on who is allowed to scrape data. For Facebook and Twitter, it is currently either not possible or very expensive to collect this data. We hope this will change soon so that we can monitor other platforms in the future.
The Claimspotting application consists of three main components: a web scraper, several machine learning and statistical models, and a web interface. The web scraper accesses around 200 Telegram channels every two hours between 6 AM and 10 PM. This means you can always see the latest posts from these channels during the day. In the next step, various methods are applied to analyse the content of the posts. These methods are primarily based on machine learning and statistics and provide general information about each post. Finally, the data is transmitted via an API to the web interface, where it is displayed.
Many of the criteria used to analyse the Telegram posts are identified by machine learning tools. Our models work well but are not perfect. Machine learning is always about probability, not certainty. You may find that you disagree with some messages' classification as supporting a particular narrative or falling under the suggested topic. The application aims to provide orientation amidst the abundance of content posted on Telegram. However, you should always double-check the results in each case.
The models used are customised XLM-RoBERTa Large models. This model was trained on text files in more than 100 languages. Therefore, the Claimspotting models also work not just for German texts. The embedding models, i.e., the models used to recognise siblings, are based on a different version of XLM-RoBERTa and were also customised for the task with our own data.
You can find our code repository here, and the machine learning models here. If your question is not answered, do not hesitate to create an issue in our Claimspotting repository.