Use these settings to detect and prevent spam in a survey dataset.
For my dissertation research at Carnegie Mellon University, I have created a national advertising campaign to recruit interview subjects via an online survey. The resulting interviews of U.S. residents age 18 and older will, in turn, inform the design of a final national survey.
It’s fun to return to two of my passions – connecting with people online and conducting quantitative survey research – EXCEPT when my survey gets flooded with spam! Once study info gets posted to the internet, anyone can copy it to a forum or group where people try to game paid surveys with repeated and/or inauthentic responses. This could max out my quota sampling before I reach the people who actually want to be part of this research.
Below are some of my tips for setting up the survey in Qualtrics, in order to address and prevent spam in my dataset:
In Qualtrics’ survey settings, I have enabled RelevantID. This checks in the background for evidence that a response is a duplicate or otherwise a fraud, and reports the score in the metadata. This helps catch, for example, whether someone is using a different email to take the survey more than once, and thus increase the amount of compensation they are issued.
The “Prevent Ballot Box Stuffing” setting (known as “Prevent Multiple Submissions” in the newer interface) will also help guard against spam duplicates. In past surveys, I have set this to only flag the repeat responses for review. However, for this national survey, I set it to prevent multiple submissions. A message tells anyone caught by this option that they are not able to take the survey more than once.
Also in Qualtrics’ survey settings, I have enabled reCAPTCHA bot detection. This is not just the “Prove you are not a robot” challenge question (which I added to the second block in the survey flow). Invisible tech judges the likelihood that the participant is a bot, and reports the score in the metadata.
With all of the above enabled, I can manually filter responses in Qualtrics’ Data & Analysis tab. On the top right, the Response Quality label is clickable. It takes me to a report of what issues, if any, the above checks have flagged, and gives me the option to view the problematic responses. Once in that filter, I can use the far-left column of check boxes to delete data and decrement quotas for any or all the selected responses.
Even better, though, is to kick these out of the survey before they start. I set Embedded Data to record the above settings, at the top of the Survey Flow. Then, I set a branch near the top with conditions matched to the Embedded Data: a True for Q_BallotBoxStuffing and Q_RelevantIDDuplicate, and thresholds for Q_DuplicateScore, Q_RecaptchaScore and Q_FraudScore. If any of these conditions are met, the block returns End of Survey. See the below image or the Qualtrics page for Fraud Detection for more info.
Finally, I want to help the real humans who respond to my ads to choose not to take it, if they judge that it’s not worth the risk of having a response thrown out. In my survey email’s auto-responder and in the Qualtrics block with the reCAPTCHA question, I include text to this effect: Note that only one response will be accepted. We may reject responses if the survey metadata reports duplication, low response quality and/or non-U.S. location, if the duration of the survey seems inconsistent with manual human response, or if the responses fail attention checks.
One upside of video calls during the COVID-19 pandemic has been that I can attend or speak at virtually any location or event, without having to travel or move my schedule around too much. It’s helped me get more comfortable with public speaking, and exposed me to different audiences for my work.
In my latest public appearance: I appeared this spring with fellow CMU grad student Tom Magelinski at Bytes of Good Live, organized by Hack4Impact, a student-run nonprofit that promotes software for social good. We talked about our Social Cybersecurity research and what we know of careers in cybersecurity. The recording is available on YouTube, or click on the preview shown below to go to the video. Let me know what you think!
Phew, what a semester! I ran a section of our Programming Usable Interfaces course here at Carnegie Mellon University, and I mentored several student assistants and two research associates for our HCII Social Cybersecurity research project – all while taking a required course (Social Web, roughly a survey of Computer-Supported Collaborative Work and Social Computing) and an elective (Computer Science Pedagogy). Oh and finished all of this AMID A PANDEMIC, while WORKING FROM MY CRAMPED APARTMENT with TWO INCREDIBLY FUSSY CATS.
It has been a steep learning curve to work out how best to use Zoom and other tools when carrying out university work. I found the following practices helped our sessions to work best:
Be very explicit in what you want students to do. I wrote out a script where I would verbally tell students to post in the Chat window, raise their hands with the icon or just unmute in order to ask questions or make a comment.
Use breakout rooms to facilitate discussion and social connection. No one will be able to see the discussion prompt slide once in the room, so it is best to keep it general or re-post the prompt via message once students have joined the groups.
Accept that you only get half the attention as in an in-person class – typical user is multitasking with in-home activities and distractions (in bed or cooking or managing kids/pets or doing laundry) – this includes me, when I’m not the lecturer, so I have empathy for this user persona!
No one will share screens or audio if the group is too big to fit in gallery view. This unfortunately amplifies the already present distancing of the video screen interface in any such conversation.
Rather than sharing links for additional material in Zoom chat, create a Slack workspace or use Piazza threads. Both are persistent and searchable, and Slack allows for lightweight engagement such as emojis. However, I also would upload to Canvas the (edited) chat transcript with my encoded mp4 file at the end of lab, for students who were not able to attend synchronously. I do not think a Canvas discussion thread is going to be useful for this, because the UI seems primarily designed for required discussion posts on assigned readings, but you could post it as an Announcement, which will be front and center for students.
Streaming videos can still be a fun and useful break in lecture – I had a lot of success showing a video demonstration of the Bootcamp.js grid system amid a lecture on using it for web design – but keep in mind you need enough bandwidth to stream multimedia and also need to configure your sharing settings properly: Turn off the video background or even your in person video altogether if needed, and check on Share Computer Audio when starting the screen sharing.
Stick with one persistent Zoom link for each type of meeting. It was a lot easier for me to simply create one repeating calendar item with a persistent Zoom URL, rather than constantly have to hunt for whatever the new Zoom link was for that day’s class or meeting. I saved my personal Zoom link for office hours and other activities that were likely to have happened in-person in my office. Use the waiting room feature or enable the password as part of the URL itself, if you are concerned about securing the meeting to only those unlikely to cause trouble. For one-off events, use a registration form to collect and vet attendees, then send a separate email with the actual Zoom link to the approved attendees.
Let me know in the comments what other practices you found helpful – or share on social media!