July 17, 2021

How does the Ground Truth Challenge work?

You may have seen our announcement of the Ground Truth Challenge. You may have even read through the rules. Still confused? Some times rule sets are hard to visualize but no fear! We've got you covered!

For the challenge to succeed, it must be easy for the 3 referees to make clear decisions, which means it's extremely important for the submissions to make their work easier by following the submission guidelines precisely.

Here's a flowchart of the challenge pipeline. The first thing to realize is that the challenge is split into two stages: Challenge, and Response.

All the steps are implemented in a Google Sheet that is publicly viewable so all can see the process as it happens. We use Zapier to read from and write to Twitter.

This is a good point to mention that the rules are the authoritative source of truth for all things related to this challenge. In case this article conflicts with the rules, the right answer is what is written in the rules.

Stage 1: Challenge

The first stage is where the requests for citation and the falsification candidates are submitted with the #gtcentry hashtag on Twitter. They make their way into the Raw Input sheet from where our volunteers (currently @iseravi) do a first sanity check, rejecting for straightforward rule violations to save referee time. The idea is that the rules enforced at this stage could, in principle, be automated.

Once the submissions pass the sanity check, they show up in the review buffers of the 3 referees. Each referree reviews each submission and determines whether, in their judgement, it meets the bar.

As a reminder, Referees grade Falsification candidates on the following scale:

  • 5/5 challenge demonstrates target quote to be logically impossible
  • 4/5 challenge demonstrates target quote to be practically impossible
  • 3/5 challenge demonstrates target quote to be false with very high likelihood
  • 2/5 challenge demonstrates target quote to have likelihood of being false
  • 1/5 challenge demonstrates target quote to have low likelihood of being false
  • 0/5 challenge is fallacious, invalid, unsound, unclear, or does not present sufficient evidence.

They also grade Unsupported Claim candidates on the following scale:

  • 5/5 citation request demonstrates target quote as essential to be cited
  • 4/5 citation request demonstrates target quote as useful to be cited
  • 3/5 citation request demonstrates target quote as desirable to be cited
  • 2/5 citation request demonstrates target quote as optional to be cited
  • 1/5 citation request demonstrates target quote as excessive to be cited
  • 0/5 citation request is fallacious, invalid, unsound, unclear, or does not present sufficient evidence.

Any candidate that receives 9 or more points from our referees will make it to the "Stage 1 Validated" sheet, and be tweeted out from @BeterSkeptics with the #gtcvalid hashtag. That's the end of this stage and the start of the next.

Stage 2: Response

As soon as a #gtcvalid tweet is out from the @BetterSkeptics account, the 24h time window to submit responses to the validated submissions begins:

For the Unsupported Claims submissions, anyone on Twitter can respond to our #gtcvalid tweets with supporting materials and citations for the Unsupported Claim submissions. Once 24 hours pass, the thread will be made available to referees for review, except for submissions that receive no worthwhile replies, which will be declared Uncontested and promoted to Stage 2 Validated directly.

Referees will review as many replies as they can to each tweet and will each determine whether they consider a specific citation as satisfactory citation of the stated claim. If so, they will mark the Unsupported Claim submission as reversed. If the majority of referees agree on which specific citation(s) it was that they found satisfactory, then those replies will be considered the effective citations and share a reward worth $50.

For the Falsification submissions, anyone on Twitter can respond to our #gtcvalid tweets with refutations of the Falsification submission.  Once 24 hours pass, the thread will be made available to referees for review, except for submissions that receive no worthwhile replies, which will be declared Uncontested and promoted to Stage 2 Validated directly.

Referees will review as many replies as they can to each tweet and will each determine whether they consider a specific rebuttal to be convincing. If so, they will mark the Falsification submission as reversed. If the majority of the referees agree on which specific refutation(s) it was that they found satisfactory, then those replies will be considered the effective refutation(s) and share a reward worth $100.

Falsification and Unsupported Claim submissions that are not reversed according to the majority of the referees will be marked as Stage 2 Validated and be tweeted by the @BetterSkeptics account with the #gtcwin hashtag

What next?

Once the challenge time expires and the additional time for the referees to finish evaluating all claims expires as well (including the necessary 24h time Response periods for any remaining #gtcvalid submissions), we will have a definitive list of twice-validated #gtcwin submissions, as well as a set of citations for various claims in the four transcripts. We will produce an annotated copy of each transcript with relevant citations, as well as marking specific claims as unsupported and other claims as falsified. While unsupported claims may yet find support that was not discovered during the challenge, we hope falsified claims will be conclusively falsified and therefore in need of revision.

All told the result will bear quite some similarity to an academic paper complete with submissions, alongside the results of a peer review for the authors to take into account. As mentioned in the challenge announcement given the volume of the material it is essentially impossible for spoken word to not contain some false statements, but it is up to the podcast hosts and guests to use the output of this process as they see fit, and for everyone else to make their own conclusions about the process, the output, and any resulting responses.