Earlier this week, Twitter announced an initiative to combat misinformation on their platform that they call Birdwatch.
How Birdwatch works: Volunteers sign up (assuming they meet all the requirements) and can add notes to fill in context on misleading Tweets. Other users can rate these contextual tweets as helpful or not helpful. All of these “notes” and ratings of notes are completely transparent.
At its face, Birdwatch is an attempt to scale up the existing fact-checking capability used during the 2020 U.S. Elections while also crowdsourcing this decision-making.
I will give Twitter credit for two things, and only two things, before I get into the problems with their design.
- They’re distributing the power to fact-check bad tweets to their users rather than hoarding it for themselves.
- They correctly emphasized transparency as a goal for this tool.
But it’s not all sunshine and rainbows.
The Fatal Flaw of Birdwatch’s Design
There’s an old essay titled The Six Dumbest Ideas in Computer Security, that immediately identifies two problems with Birdwatch’s design. They also happen to be the first two items on the essay’s list!
- Default Permit
- Enumerating Badness
This is best illustrated by way of example.
Let’s assume there are two pathological liars hellbent on spreading misinformation on Twitter. They each tweet unsubstantiated claims about some facet of government or civil service. Birdwatch users catch only one of them, and correctly fact-check their tweet.
What happens to the other liar?
What happens if Birdwatch users can only identify one out of ten liars? One out of a hundred? One out of a thousand?! Et cetera.
To be clear: The biggest flaw in their product design is simply that their “notes” and “fact-checks” are negative indicators on known-bad tweets.
This will create a dark pattern: If a tweet slips past the Birdwatch users’ radars, it won’t be fact-checked. In turn, users won’t realize it’s misinformation. A popular term for the resulting conduct is coordinated inauthentic behavior.
This already happens to YouTube.
Hell, this is already happening to Twitter:
How To Fix Birdwatch
I wrote an entire essay on Defeating Coordinated Inauthentic Behavior at Scale in 2019. I highly recommend anyone at Twitter interested in actually solving the misinformation problem to give that a careful consideration.
But in a nutshell, the most important fix is to change the state machine underlying Birdwatch from:
- No notes -> trustworthy
- Notes -> misinformation
…to something subtly different:
- No notes -> unvetted / be cautious
- Notes ->
- Negative notes -> misinformation
- Positive notes -> verified by experts
This effectively creates a traffic light signal for users: Tweets start as yellow (exercise caution, which is the default) and may become green (affirmed by notes) or red (experts disagree).
What Would This Change Accomplish?
Malicious actors that accomplish Birdwatch evasion will only manage to encompass their message in caution tape. (Metaphorically speaking, anyway.)
If their goal is to spread misinformation while convincing the recipients of their message that they’re speaking the truth, they’ll have to get a green light–which is ideally more expensive to accomplish.
I would also recommend some kind of “this smells fishy” button to signal Birdwatch contributors that this tweet needs fact-checking. Users might self-select into filter bubbles that Birdwatch users are totally absent from, and in turn come across things that are completely unvetted and possibly ambiguous.
While I have your attention, here’s a quality of life suggestion, on the house:
Being able to link claims together (e.g. reposted images with a false claim, n.b. like how the minions memes on Facebook) to deduplicate their claims about reality would save a lot of unnecessary headache.
(Anyone who has used Stack Overflow will appreciate the utility of being able to say “this is a duplicate of $otherThing”.)
What If These Fundamental Flaws Remain Unfixed?
Although Birdwatch will probably meet the immediate goal of scaling up the fact-checking efforts beyond what Twitter can provide (and satisfy the public relations requirements of tangibly doing something to combat this problem), propagandists and conspiracy theorists will simply become incentivized to evade Birdwatch contributors’ detection while spreading their lies.
As I said above, coordinated inauthentic behavior is already happening. This isn’t some abstract threat that only academics care about.
To the aid of the malicious, most users will confuse tweets that evaded detection with tweets that didn’t warrant correction. This might even lead to users trusting misinformation more than they would before Birdwatch. This would be a total self-own for the entire Birdwatch project.
This post first appeared on Dhole Moments - Software, Security, Cryptography, And The Furry Fandom, please read the originial post: here