RE: Is the panel research business model creating a gold farming problem?
In response to a post on Research Rants: “Is the panel research business model creating a gold farming problem?”1
Granted, there is an incentive for people to game the system but there are numerous ways to capture and relegate this type of behaviour.
Bad Behaviour (Panelists)
Capture speeders. Identify, after the soft launch of an online survey, the median length and remove all individuals who have taken less than some percentage of the median time. (Remember to explain to your client that the higher the percentage of median time, the higher the cost and the more likely that you will capture real respondents – manage those expectations).
If questionnaires are designed with well balanced statements (both positive and negative, think – MBTI2), you can capture straighliners and christmas trees3. While it’s unlikely that a goldfarmer will do either of these, your goal is still data quality and even randomness can lead to responses that make no logical sense. Online survey software is complex enough to allow for this to happen within the logic of the survey, making it unnecessary for analysts to waste time looking at these results.
Use CAPTCHA and Flash elements in your surveys. CAPTCHA requires a human eye (for the most part) and Flash elements have to be clicked on, it’s not possible for javascript to inject a response. Anything that requires a human auto-magically becomes more costly to the person trying to cheat and therefore provides less of an incentive to farm for gold (YARRR!).
Panel Management
All panels should have some type of RDBMS 4 that allows managers to identify cheaters. Here are a few things that could be looked at on the panel side, all of which can be automated to blacklist respondents; it’s not profitable if every time a cheater does a survey they have to sign-up with a new email address.
If a respondent qualifies for everything, it’s more likely that they qualify for almost nothing. Like individuals that are 18-34 have arthritis, diabetes, and sickle-cell anemia. If the incidence of this individual in the population is too small to believe, than why are you believing it?
Track demographic questions. A respondent is not very likely to go from making under $30,000 to $100,000 in a few months/weeks/<insert your survey lockout period here>. They are also not very likely to change gender, age group, ethnicity or number of children, within a similar time frame.
Different respondents, same IP address. While there could be more than one person in the same household on the panel, when you start having two or more responses to a survey from the same IP, it should be questioned.
Comment: Rick Frank5 our fellow market research professional and owner of Dufferin Research6, points out that ISPs that supply dynamic IP addresses to their customers could create a problem with this method. If you are reviewing duplicate IP addresses over longer periods of time we definitely agree. However, if you find that you have 4 responses from the same IP address in the space of a day, it’s very unlikely that the lease on that IP address has actually changed to another household.
IP addresses that do not match the geography of the individual in question. If you’re running a survey in North America and you’re receiving IP ranges from Madagascar – there’s a problem.
IP addresses that match known free proxy servers. It’s fairly easy to spoof an IP address through a proxy server, but there are lists out there and all it takes is a quick match query to find them.
Look for bogus email addresses. A real persons email address is unlikely to be a random string of letters or numbers, this counts for both the address and the domain.
Lastly, digital fingerprinting is something that is becoming a possibility. It is not going to take long before someone comes out with an Active X Control or Flash Object or some other type of software that panelists have to install on their computer to access surveys that verifies their identity (privacy concerns aside), blocks injected javascript, etc etc.
There are numerous methods for identifying cheaters or farmers. How many of these Panel suppliers actually use is questionable. The relatively swift movement to online research over the past few years has provided many challenges and clients are only beginning to ask the hard questions. Suppliers need to be prepared to answer them if they want to continue pulling in decent gross margins. The investment required is minimal in comparison to the benefit derived. With all of these methods available to Panel suppliers, the propensity for gold farming is perhaps over-exaggerated. In our opinion, data quality has more to do with questionnaire design than perceptions of respondent “quality”. We definitely agree with our anonymous colleague7 in that there’s a need to rethink the way things are done, in his or her words:
“If we want honest answers from real people, maybe we should rethink this entire insulting “we’ll pay you fifty cents to answer 120 repetitive questions about the minute differences between four brands of orange juice” business model.”
What methods are you using to manage data quality for online surveys?