For my dissertation research at Carnegie Mellon University, I have created a national advertising campaign to recruit interview subjects via an online survey. The resulting interviews of U.S. residents age 18 and older will, in turn, inform the design of a final national survey.
It’s fun to return to two of my passions – connecting with people online and conducting quantitative survey research – EXCEPT when my survey gets flooded with spam! Once study info gets posted to the internet, anyone can copy it to a forum or group where people try to game paid surveys with repeated and/or inauthentic responses. This could max out my quota sampling before I reach the people who actually want to be part of this research.
Below are some of my tips for setting up the survey in Qualtrics, in order to address and prevent spam in my dataset:
- In Qualtrics’ survey settings, I have enabled RelevantID. This checks in the background for evidence that a response is a duplicate or otherwise a fraud, and reports the score in the metadata. This helps catch, for example, whether someone is using a different email to take the survey more than once, and thus increase the amount of compensation they are issued.
- The “Prevent Ballot Box Stuffing” setting (known as “Prevent Multiple Submissions” in the newer interface) will also help guard against spam duplicates. In past surveys, I have set this to only flag the repeat responses for review. However, for this national survey, I set it to prevent multiple submissions. A message tells anyone caught by this option that they are not able to take the survey more than once.
- Also in Qualtrics’ survey settings, I have enabled reCAPTCHA bot detection. This is not just the “Prove you are not a robot” challenge question (which I added to the second block in the survey flow). Invisible tech judges the likelihood that the participant is a bot, and reports the score in the metadata.
- With all of the above enabled, I can manually filter responses in Qualtrics’ Data & Analysis tab. On the top right, the Response Quality label is clickable. It takes me to a report of what issues, if any, the above checks have flagged, and gives me the option to view the problematic responses. Once in that filter, I can use the far-left column of check boxes to delete data and decrement quotas for any or all the selected responses.
- Even better, though, is to kick these out of the survey before they start. I set Embedded Data to record the above settings, at the top of the Survey Flow. Then, I set a branch near the top with conditions matched to the Embedded Data: a True for Q_BallotBoxStuffing and Q_RelevantIDDuplicate, and thresholds for Q_DuplicateScore, Q_RecaptchaScore and Q_FraudScore. If any of these conditions are met, the block returns End of Survey. See the below image or the Qualtrics page for Fraud Detection for more info.
- Finally, I want to help the real humans who respond to my ads to choose not to take it, if they judge that it’s not worth the risk of having a response thrown out. In my survey email’s auto-responder and in the Qualtrics block with the reCAPTCHA question, I include text to this effect: Note that only one response will be accepted. We may reject responses if the survey metadata reports duplication, low response quality and/or non-U.S. location, if the duration of the survey seems inconsistent with manual human response, or if the responses fail attention checks.
- Also see an older post, Tips from my online survey research in 2018: https://blog.corifaklaris.com/2018/11/09/tips-from-my-online-survey-research-in-2018