### Overall turking recipe
1. Run a pilot task on a small amount of examples (between 100-500), with a slightly higher number of workers/HIT than what your final task will be, to ensure wide participation.
- For categorical tasks: consider selecting the examples that you know the answer to, for easier grading of workers.
2. Assess the quality of each worker's responses:
- For categorical tasks: you can set up an autograder based on the responses you expect (if you have your own answers).
- For free-text tasks: download CSV of results and scan through HITs, sorted by workerID.
3. While scanning, make two lists of workerIDs
- List of good workers.
- List of bad workers.
4. Create two MTurk qualifications on Mturk:
- For good workers: call it "GreatAtMyTask"
- For bad workers: call it "PreviouslyDoneMyTask"; the idea is that we can avoid upsetting workers by avoiding saying that they're bad at something.
5. Assign bad workers to bad qual and good workers to good qual.
6. Create a copy of your pilot task
- Set the requisites to be GoodAtMyTask (and other quals)
- Reduce the number of workers/HIT to what you intended originally
7. If needed, re-run small "qualification" batch and do 1-6 again.
Make sure to disallow good and bad workers from doing this qualification task.
I compiled this high-level recipe based on valuable advice I got from my current and former labmates who Turk (including [Emily Allaway](https://www.aclweb.org/anthology/people/e/emily-allaway/), [Hannah Rashkin](https://homes.cs.washington.edu/~hrashkin/)) and my own experience Turking (specifically, on the [ATOMIC](https://mosaickg.apps.allenai.org/kg_atomic), [SocialIQa](https://leaderboard.allenai.org/socialiqa/submissions/get-started), and [Social Bias Frames](https://homes.cs.washington.edu/~msap/social-bias-frames/) projects)