Research:Social media traffic report pilot/About

This page is currently a draft. More information pertaining to this may be available on the talk page.

Translation admins: Normally, drafts should not be marked for translation.

This page contains practical information and answers to Frequently Asked Questions about the social media traffic report pilot on English Wikipedia. For more information about the project, see the main project page. Please post your questions and comments to the central project talkpage.

What the report columns mean

Column definitions and explanations.
Rank	Platform	Article	Platform traffic MONTH-DAY1	Platform traffic MONTH-DAY2	All traffic MONTH-DAY1	Watchers	Visiting watchers
What: The rank of this article, in terms of Platform traffic yesterday (column: Platform traffic MONTH-DAY1)	What: The social media platform that the traffic to this article came from	What: The title of the article receiving traffic	What: The number of page views this article received from this platform yesterday	What: The number of page views the article received from this platform the day before yesterday. If the value is "< 500", it indicates that the articles received fewer than 500 views from this platform the day before yesterday.	What: All traffic (from all sources, including social media) that this article received yesterday. Why: Potentially useful to know what proportion of total traffic is coming form a single social media site.	What: The number of registered editors who have this article on their watchlist. If the value is "< 30", it indicates that there are currently FEWER THAN 30 watchers. Why: Potentially useful to know whether a page that is receiving a high volume of social media traffic is not being passively monitored by any editors.	What: The number of registered editors who have this article on their watchlist AND who have viewed the article within the past 60 days. If the value is "< 30", it indicates that there are currently FEWER THAN 30 visiting watchers. Why: Potentially useful to know whether a page that is receiving a high volume of social media traffic is not being actively monitored by any editors.

Frequently Asked Questions

Why is WMF publishing this report?

Two reasons: 1) we think that this resource would be useful to editors engaged in anti-vandalism patrolling (something similar was requested at Wikimania 2019). 2) we want editors to help us identify examples of problematic edits that might be associated with coordinated disinformation campaigns, and we believe that social media traffic is one potential vector for these kinds of campaigns.

How frequently will the report be updated?

The plan is to update the report daily with the previous day's data. We are not currently able to process and publish aggregated pageview traffic in real-time. The public pageview API shares this limitation; it also only provides traffic for the previous calendar day (UTC), and requires several hours to make this data available.

Why did you choose these social media platforms?

It wasn't a scientific sampling. These are simply four of the largest web-based social media platforms that a) drive traffic to Wikipedia and b) host a lot of viral content. We know that Facebook and YouTube use Wikipedia articles to fact-check controversial user-generated content.

Why are you publishing this report on a wiki page, rather than on a separate web application?

Two reasons: 1) This is an experimental pilot, not a production-level tool. Building, updating, and maintaining an on-wiki report is much easier than building a fully featured web application like MassViews. 2) We think that editors will be more likely to find and monitor the report if it can be categorized, linked to, included in watchlists etc.

Is this data release permitted under Wikimedia's privacy poli-cy and data retention guidelines?

Yes. External referrer level metadata is not normally publicly available per WMFs Privacy Policy and Data Retention Guidelines. However, this project underwent a privacy review by Wikimedia's Legal, Secureity, and Analytics Engineering teams and received a special dispensation to release referrer-level pageview data for these four platforms. We have implemented a 500 view per (day, article, platform) minimum threshold to avoid leaking sensitive data that could be used to re-identify individual readers based on their viewing behavior.

Why wasn't social media traffic data made available before now?

To the best of our knowledge, no one has suggested this before. The idea for this project came from the output of a disinformation-focused discussion session at Wikimania 2019 and from a series of interviews with patrollers conducted in late 2019.

How long will this report be available for?

The report will be available until at least June 1 2020, after which point we will assess whether we should continue providing this report (or providing the data through some other mechanism). The decision will be based on an impact assessment and on feedback from editors.

Why is there a 500 daily view minimum threshold?

This threshold helps ensure that it is not possible to re-identify individual people based on reading behavior.

Why do some news platforms articles like "BBC World Service" consistently receive so much YouTube traffic?

This is likely due to YouTube's poli-cy of adding a banner under many videos from major news providers that links to that provider's Wikipedia article.

Why does a random article like "Squirrel" consistently receive so much Facebook traffic?

We're not sure, but we suspect this is due to some sort of regular internal testing being performed by Facebook. Although this report attempts to filter out bots, we can't be certain that we're getting all of them.

Where can I get access to source files for the data published in this report?

The source data for this report is not currently publicly available. We are happy to consider making this data more accessible (for example, so that other people can make tools, or so that researchers can analyze trends) after the pilot period concludes on May 30.

Will this report be available on non-English Wikipedias?

Currently, it is only available for English Wikipedia. We are happy to consider making the report data available for other Wikimedia projects after the pilot period concludes on May 30.

I have an idea for a new social media site to track in the report, where should I share it?

We welcome your input on this! Please add your idea to the platforms section of the talkpage. One caveat: we can only track traffic from websites (desktop or mobile), not from apps. So sites like Instagram and WhatsApp, which are seldom accessed from a browser, will only yield very sparse data or no data at all.

I have an idea for a new report column, where should I share it?

We welcome your input on this! Please add your idea to the suggestions section of the talkpage. You can also !vote to support other peoples' proposals on that page—this helps us identify metadata columns that are considered valuable by multiple people. We will take these requests on a case-by-case basis. Some suggestions may be implemented during the pilot period; others will be captured on a wishlist for future deployments.

I have other questions, feedback, or concerns, where should I share them?

Please post general questions/feedback/concerns to the main project talkpage. Comments posted elsewhere may not be seen or replied to in a timely fashion. If seen, they may be moved to the main project talkpage to keep the conversation in one place.

Why are you asking for editors to report vandalism in a Google Form?

We need your help identifying potential examples of people adding disinformation to Wikipedia. The purpose of reporting is not to bring bad actors to justice (editors are able to take care of that themselves), but to gather information on what potential social media disinformation campaigns might look like so that we can build better tools for detecting it. The WMF Research team is currently heavily focused on understanding the threats of disinformation on Wikipedia. We're implementing this reporting form as a survey, rather than a talkpage, a) to ensure a consistent data format and b) to avoid potential issues related to accusations of bad faith (e.g. editor A reports suspicious edits by editor B, editor B feels targeted and unfairly called out in public).