Reader feedback mechanism

Reader feedback mechanism

I just ran a reader survey on five articles that I wrote or had a hand in editing. I found the results to be extremely enlightening and helpful in my work as an editor, and also as an important input to policy disputes in the project I am working on. You can see the results of the survey here.

I performed the survey by creating the survey form at www.surveymonkey.com, and attaching a link at the top of each article (see, for example, this revision of one of the articles.

As an editor, I would love a tool that I could use to develop a survey with article-specific questions, attach it to the end of an article, and analyze the responses.

In numerous other forums, I have pointed out that Wikipedia is an editor-centric, rather than reader-centric, institution. All the mechanisms and rules of behavior are designed to create cooperation of a community of editors. In this dynamic, the reader is most often shunted aside; to the extent that, when I proposed my survey, there were editors who clearly didn't want to know what their readers were thinking.

This is something that has to change if Wikipedia is to move forward, and that change will occur only when features of the editing environment support the change. That is why I think a tool like this would be invaluable, not only to me but to the entire Wikipedia weltschaum.

Ravpapa15:32, 8 March 2010

I worry about people gaming the survey. It's easier to measure editor responses and protect us from sockpuppets and canvassing. But with readers, it's far easier to mount a campaign somewhere and just have a bunch of IP addresses add their input. Maybe it wouldn't be an issue on articles about classical music. But for political issues, or religious issues, or for promotion... you could see a whole mass of people trying to push the quantity and quality of content towards that.

Randomran16:00, 8 March 2010
 

I didn't envision this as something that would appear universally on all articles. Rather, an option that editors could use to get feedback from readers if they wanted it.

As for gaming and canvassing, hard for me to see how that would be a problem. If an editor saw that dozens of readers were signing up to urge him to support Obama, he could always ignore them, and shut down the survey.

I suggest that you take a look, if you haven't already, at the survey we did on the project, and see for yourself how powerful a tool it could be.

Ravpapa17:05, 8 March 2010
 

Yeah, I agree it has the potential to do a lot of good. I wouldn't fight it from happening. The key would be finding ways to prevent its abuse.

Randomran06:55, 9 March 2010
 

A really useful tool as a sensor but as any sensor we should define what it's measuring and make sure that the sensor isn't disturbing the measured system.

If you see a sudden peak in article daily visits right after you start a survey then you have a problem.

KrebMarkt22:09, 9 March 2010
 

That's a good point. I'm all for getting more reader feedback (and I like the survey idea), but stuff like what KrebMarkt mentions will help us contain the negative sideeffects and abuse.

Randomran17:17, 10 March 2010
 

Ravpapa's survey is really interesting: I'm glad he did it.

It seems to me that any mechanism we use for gathering reader input will be gameable: that's inherent and unavoidable. But I don't believe we should let the perfect be the enemy of the good. I think there is real and obvious value in gathering assessments of quality WRT specific articles: it's easy to imagine how we could use that information to drive quality up.

So ultimately -- if gathering reader feedback brings us 100 units of value, but 10 of those units need to be discarded as useless because they're tainted (e.g., controversial articles such as Creationism, Abortion, Scientology) .... we would still be 90 units ahead of where we are today. Which would be great.

FWIW, in my past workplaces (mostly newsrooms) we have tended to use two kinds of quality assessment tools.

1) Quantitative data, collected via survey/questionnaire. The downside of survey/questionnaire data is that it yields non-expert assessments: basically, it gives you people's opinions. But it makes up for its lack of quality by giving you volume -- it scales. You can gather data quickly if you have a large enough readership, and mostly the data is pretty good. You don't get nuance, but you get the basics. With quant/opinion data, you don't want to try to make the data do more than it's capable of. Which means you would 1) aim to collect information that the general public can actually give you -- e.g., you would probably not ask highly specific editorial questions such as "are the sources high-quality" or "are there sufficient citations," and 2) you would interpret the data that comes in as opinion rather than fact -- e.g., "this article seems fair" rather than "this article is fair."

2) Qualitative assessments, generally collected via expert panel or focus group. These are higher-quality measures in that you are asking experts to make the assessment, but they don't scale, they are resource-intensive, and vulnerable to skew if the experts are biased. They are good for deep sampling (what Malcolm Gladwell calls thin-slicing). One example of this in our world would be the assessment done of the German and English Wikipedia articles on mathematics, that I believe was presented at the German Wikipedia Academy in Berlin in 2008. In that assessment, a bilingual mathematics professor did a qualitative analysis of German and English Wikipedia math articles, and made several meta-recommendations designed to advise Wikipedians on how to increase the quality of math articles generally. You can't get that kind of information from the survey data.

Ultimately, I think we will need to use both types of measures, because they complement each other. We're using Rate This Article here on the strategy wiki (e.g., at the bottom of this page: http://strategy.wikimedia.org/wiki/Proposal:Volunteer_Toolkit), and I would like to see it, or some version of it, adopted for usage across the projects. I don't know where we would begin with qualitative assessments: it strikes me they might be best done ad hoc by people involved with wiki-projects and/or chapters -- as was the case with the example I gave above.

Sue Gardner20:44, 11 March 2010
 

Going further from my previous post.

Surveys around one article should be repeated to see if the results are reproducibles.

Surveys around sets of articles for example Class B article of the English Anime/Manga project won't need to be repeated because they are spread on a large numbers of articles thus errors/issues stand-out more likely.

Things to take account are:

  • Adequateness between questions and what is evaluated
  • Influence of the questions ordering
  • External factors
KrebMarkt21:26, 11 March 2010
 

Very interesting survey, Ravpapa. I think this type of information is very good to have. I do think that readers get less attention than editors when it comes to research, so I'm glad that there is some attention paid to this very important group. Reader feedback on articles is an interesting slice. I'd also like to get a better understanding of the reader -> editor transition. I've seen some academic research on this and we're also gaining some additional texture through the follow-up interviews with people who completed the Former Contributors Survey.

One thing we learned from the Former Contributors Survey is that even our casual users (editors in this case) really like to give us feedback. The overwhelming sentiment from that survey is that these users are a gold mine of information -- they just need to be asked.

Re: limitations of research -- as everyone knows, it's always tricky to find the balance between interpreting research too narrowly vs. too broadly. These types of discussions are exactly what we need to help us negotiate that line. Going forward, maybe we can start these conversations by asking "How can we use this information?" This perspective might help us figure out how to constructively use the data while at the same time keeping an eye on the inherent limitations. Sometimes the answer will be "we can't," which is fine. But other times, we may be able to find an appropriate application of the research that initially escaped noticed.

A specific point about gaming -- I'm not sure if Survey Monkey has this capability, but Limesurvey enables you to restrict survey submissions to one per IP address. While this doesn't prevent the type of gaming described, it does make it a little more difficult.

Howief01:09, 12 March 2010
 

I'm sure tools and means to make those surveys can be refined to limit abuse.

What i'm wary of is editors making a survey with a biased set of questions for PoV pushing intent and furthering other agenda.

KrebMarkt07:14, 12 March 2010
 

I like that the survey is there as an option, and is only open for a limited duration. It makes it easier to tell if the feedback is the usual set of readers, or if there is a sudden spike in activity aimed at putting undue influence on the survey. I wouldn't want to see a rating system that were there in perpetuity.

Randomran13:29, 12 March 2010
 

Here a "Point" type set of question:

  • Is this article informative?
  • Is this article interesting?
  • Should this article be kept?

@Randomran Surveys are meant to be of limited duration are else it would be pointless because the article evolves thus what is evaluated can have drastically changed between the start of the survey and the end of if if the survey takes too long.

KrebMarkt14:30, 12 March 2010

I dunno -- I have a strong bias in favour of rate-this-article functionality on all articles in perpetuity :-)

Imagine all the many uses for that data. It would help us know which articles are poor, so editors could direct their energies towards them. Chapters could develop grant proposals for money to systematically increase quality in categories that rank particularly low. Editors who care about BLPs could identify the lowest-quality ones and dedicate sustained attention to fixing them. Professors could assign their classes to help clean up the 10 worst articles in their subject. If a particular language version was rated overall extremely high, other language versions could try to discern and emulate some of its practices. If a language version were rated particularly low, editors who speak that language could stage a quality improvement campaign helping it. Etc etc etc. The possibilities are endless :-)

Sue Gardner23:37, 12 March 2010

I am afraid getting a number of negative ratings of an article would act pretty much demotivating for the principal editor of this article. Definitely, it would act in such a way on me, and it happened to me in the past (for instance, when a bunch of trolls recently coordinated an attack on my FA, even though I knew these are trolls).

Yaroslav Blanter18:59, 13 March 2010
 

Possibility are endless, yea :)

Permanent article rating feature is workable if we can have the rating evolution across time so we can see the difference between earlier ratings and the most recent ones.

KrebMarkt09:20, 13 March 2010
 

I'm a little worried about permanent article rating... but here's a thought...

Most of the benefits of permanent article rating are to flag low quality articles, not to identify high articles. (Just what I'm gathering from looking at Sue's list.) But when it comes to exploits, I'm much more worried about people up-rating a biased article, and then using that as a reason to exclude new contributions (you're lowering the quality). I'm not as worried about people down-rating an article. What's the worst that happens? You force more people to give it their attention.

So instead of a permanent rating system, what we need is a permanent "flag an issue" system? Something that helps us identify areas that need improvement, and cannot be used as a way to hold a bad article in its current form.

Randomran16:12, 13 March 2010
 

I am glad there is interest in my survey. I think that if reader surveying is something we want to encourage, we need to make it part of the wiki environment - otherwise, editors won't do it.

This is not a complicated piece of software to develop, but it is also not a piece of cake. I would like to see a survey builder, which would include a set of standard questions (How old are you, how do you use Wikipedia, how frequently, and so on), a set of general topic questions (what is your relation to music/physics/Shakespearean drama), and a set of article-related questions. Editors could build a survey by picking prewritten optional questions, and by creating their own specific questions (Do you think an infobox would enhance this article?) Question types would be two: text short answer and multiple choice (one choice only, with an option for textual "other" response).

After building the survey, the editor would add a template to the article, with a parameter for the specific survey, and an end date, after which the survey automatically closes and the template switches to something like: "We did a survey of reader responses to this article. You are welcome to see the results here."

Survey results should be stored in an OpenDocs spreadsheet in some area in the Wiki, with an option for the surveytaker to create correlations and to add a textual analysis.

Is that a lot to order?

Ravpapa16:48, 14 March 2010
 

Heuuhh...

Creating the survey and results reporting tools are not a big deal and we agree that it can be a good thing. However we are not also blind to how much that feature can missed even if the plus outweights the minus.

So the question isn't whatever to do it but how to do it to limit the possibilities of abuse & screw-ups.

KrebMarkt16:55, 14 March 2010
 

Okay, okay. Just being a little impatient.

Ravpapa17:13, 14 March 2010
 

We waited 9 near before Wikipedia created this for brainstorming ;)

Beside your idea caught the attention of Sue Gardner, someone from the Foundation and she seems supportive to it.

If it can be technically done and i think it is so, the remaining hurdle is the modalities of implementation.

Surveys across a category won't be much a problem because the large number of the "experimental set" can balance issue coming from few of them. The issue is survey on one article which can turn awful if people start PoV pushing using it.

KrebMarkt18:56, 14 March 2010
 

I think some kind of short-term survey is good, so long as people realize that the accuracy of the survey can be invalidated by a sudden spike in readership.

Randomran01:14, 15 March 2010
 

There was a lot of negativity to Ravpapa's methods over at EN:WP (see here) but I for one applaud him for his intitiative and original thinking. As I say over there, we really do need "some official process whereby we can determine exactly who our readers are and solicit their opinions of the articles without them having to edit talkpages and us having to trawl those talk pages to collate the data. Some kind of WikiReaders forum perhaps?" It would seem that Ravpapa's basic idea is being considered favourably over here, which is great! Abuse? In what way? If people give their opinions then then they give their opinions, if they don't, they don't... can't see much to go wrong there except perhaps some incivility, oversimplification or overcomplexification etc, things we have to deal with anyway. Especially if a survey is permanently linked to all articles. How about a general forum, though?

Jubileeclipman00:32, 16 March 2010
 

Sue's idea of a universal "rate this article" type survey is intriguing, and I think would be very valuable, but there are technical problems that would need to be worked out. The solution of these technical problems would also resolve, to a large extent, the problem of ballot stuffing for controversial articles.

The main technical problem is that, if we want to collect statistics on users, and not just opinions (something I think would be necessary) we have to solve the problem of double counting respondents who rate more than one article. One way to do this would be to collect the URL of the respondent. Then, if the same URL responded to more than one survey, we could count the statistical data only once - or, we could check in real time if the URL had rated an article, and, if so, not display the statistical questions. Either of these solutions requires technical complications.

Statistics on users would be important, not only for our general understanding of our readers, but also for understanding criticisms of articles. For example, we might find that high school students rate an article high, but readers with MAs and higher rate it low.

Collecting the URL of the respondent would also be a defense against ballot-stuffing for controversial articles. If the same URL tries to fill out a survey form for an article a second time, we can simply show him the same form he already filled out. This also enables him to access his answers, and change his mind about his rating of the article.

Another technical problem I foresee is that we would need to identify the articles being rated - not just the specific article, but also the projects and categories. This would enable projects to gather statistics on all their articles. The projects and categories to which an article belongs could, presumably, be collected from the articles.

Ravpapa16:13, 16 March 2010
 

Ballot stuffing or astro-turfing could indeed be used to try to game the "rate this article" mechanism however I believe the appropriate response to this would be making the details of such a votes open and available so that the editors can better analyse such responses and decide how much weight to give - just as editors do to the votes of non-editors on deletion discussions.

Have the "Rate this article" mechanism at the bottom of every page. Include a box as well where people can leave a comment and a reference (automatically added to the talk page). Include a a link (there at the bottom of the page) to the talk page so they can see other peoples comments. Get the useability team to try out alternative formats for this to see what layout gets the most comments and what layout gets the best comments.

Every now and then some one will start a campaign for people to leave comments on one of these pages and we will give those comments appropriate weight. Meanwhile on other pages useful comments will point editors at issues needing attention or good ratings will encourage the work.

Filceolaire19:30, 16 March 2010
 

By "URL" I assume Ravpapa means "IP"? If so, that data is highly confidential unless the editor is anonymous. Only certain situations justify collecting IP addresess, eg sockputtetry. Anyway, many users have dynamic IP addresses: they can just log out and log back in with a new number... This is often problem for sockpuppet investigations, in fact, though comparitive analysis can often resolve this issue.

Astroturfing is a problem in all "rate this" or "comment on that" discussions: it is par for the course, unfortunately. Openness and accountability would certainly help but we need to be clear from the off that meats and socks—as well as votestacking, forum-shopping, and other false consensus-building techniques described in [1]—are looked upon with extreme disapproval.

Jubileeclipman19:12, 18 March 2010
 

Yes, I meant IP. And I certainly don't suggest that we should be storing IP addresses to make them available publicly. But I agree, there is sensitivity about this, and I also realize it is not foolproof.

The problem of ballot stuffing is not the primary problem that I was seeking to solve with this idea. The main problem is preventing the duplication of profiling information for statistical analysis. Consider the case where a reader replies to the survey for two separate articles. In all innocence, she replies to profiling questions (age, education level, profession, use of Wikipedia, or whatever else we are interested in asking) twice, thus skewing the sample.

An alternative way of handling this would be to ask the reader if she has already filled out profile information, and, if so, to remove those questions from the instance of the survey. But that means that we cannot correlate article ratings with profile information.

A third alternative would be to capture the IP address, but to store it for only a few hours or a day. In any case, its value as a method of avoiding duplication in the sample declines after a day, because of dynamic IP assignments.

Ravpapa19:53, 18 March 2010
 

The key to frowning upon votestacking and such is to be able to know when it is happening. Probably the safest test is if there is a sudden surge in traffic. If we have that covered, I'm okay with almost anything. But if we don't cover that, then I'd be adamantly opposed.

Randomran23:05, 18 March 2010