Reader feedback mechanism

Ravpapa's survey is really interesting: I'm glad he did it.

It seems to me that any mechanism we use for gathering reader input will be gameable: that's inherent and unavoidable. But I don't believe we should let the perfect be the enemy of the good. I think there is real and obvious value in gathering assessments of quality WRT specific articles: it's easy to imagine how we could use that information to drive quality up.

So ultimately -- if gathering reader feedback brings us 100 units of value, but 10 of those units need to be discarded as useless because they're tainted (e.g., controversial articles such as Creationism, Abortion, Scientology) .... we would still be 90 units ahead of where we are today. Which would be great.

FWIW, in my past workplaces (mostly newsrooms) we have tended to use two kinds of quality assessment tools.

1) Quantitative data, collected via survey/questionnaire. The downside of survey/questionnaire data is that it yields non-expert assessments: basically, it gives you people's opinions. But it makes up for its lack of quality by giving you volume -- it scales. You can gather data quickly if you have a large enough readership, and mostly the data is pretty good. You don't get nuance, but you get the basics. With quant/opinion data, you don't want to try to make the data do more than it's capable of. Which means you would 1) aim to collect information that the general public can actually give you -- e.g., you would probably not ask highly specific editorial questions such as "are the sources high-quality" or "are there sufficient citations," and 2) you would interpret the data that comes in as opinion rather than fact -- e.g., "this article seems fair" rather than "this article is fair."

2) Qualitative assessments, generally collected via expert panel or focus group. These are higher-quality measures in that you are asking experts to make the assessment, but they don't scale, they are resource-intensive, and vulnerable to skew if the experts are biased. They are good for deep sampling (what Malcolm Gladwell calls thin-slicing). One example of this in our world would be the assessment done of the German and English Wikipedia articles on mathematics, that I believe was presented at the German Wikipedia Academy in Berlin in 2008. In that assessment, a bilingual mathematics professor did a qualitative analysis of German and English Wikipedia math articles, and made several meta-recommendations designed to advise Wikipedians on how to increase the quality of math articles generally. You can't get that kind of information from the survey data.

Ultimately, I think we will need to use both types of measures, because they complement each other. We're using Rate This Article here on the strategy wiki (e.g., at the bottom of this page: http://strategy.wikimedia.org/wiki/Proposal:Volunteer_Toolkit), and I would like to see it, or some version of it, adopted for usage across the projects. I don't know where we would begin with qualitative assessments: it strikes me they might be best done ad hoc by people involved with wiki-projects and/or chapters -- as was the case with the example I gave above.

Sue Gardner20:44, 11 March 2010