Thoughts to date

History

Thread summary:[Link to] [Edit]

Starting point

FT2's impressions at 26 November.

Points arising:

Categorizing the community - newcomers need guidance (wizards, hand-holding, won't know norms), experienced users are already involved so for this group we learn what demotivates and discourages and how to encourage uptake of other tasks. Featured content writers tend to work independently.
Consensus seeking needs improvement if it is to be viable long term.
Quality is not easy to metricate (beyond "substandard"), though ideas have been proposed. Metrics should mirror what we want people to pay attention to (tags, cite ratios, promotions to higher ratings, stability, time taken to resolve identified issues, etc). There isn't a way to capture reader concerns on articles and thereby focus editorial attention.

Questions and issues:

Highs or lows - should we emphasize raising fewer articles to featured level, or should we emphasize the need for all articles to quickly reach some kind of "reasonable/good" level? Does the world assess us more by featured content (clear successes) or poor content (clear failures)?
Effective community - we don't know how to make a large community operate consensus effectively.
Specific tasks - we have specific tasks where we need to get more input, but in a volunteer community they don't get enough attention.
Better guidance - we have specific demands for editors and increasingly high expectations, but need to better guide editors (the anarchic "go figure it yourself" is limiting).
Locking in good content - we haven't worked out how to "lock in" good content, or review lost content. So articles erode unless curated by users who actively watch for poor edits. The watching process is affected by OWNership/tendentiousness/attrition by POV warriors and user departure.

Shortlist for possible 2 - 4 recommendations:

Guiding and handholding newcomers: We need to guide newcomers better, and provide more for experienced users, not just "figure it yourself". For newcomers we need wizards, guidance, "question marks" you can click to see an explanatory popup of a key community term, automated user explanatory notes, etc
Recognition and enhancement for established users: A "Good Editor" standard, more insight into other wiki-specialisms, masterclasses, better help in disputes, etc. Migrating newcomers to established users is half the deal; keeping them enjoying their editing is the other half. We need to value them - and act like it, and show it.
Reader feedback: "Completing the loop" by allowing review and feedback and ways for editors to be notified where to pay attention.
Focus on the "low lying fruit": Focus on articles that are visibly substandard, as a starting point. Set defined quality baselines that articles should aim to reach within 1 - 3 months of creation, and metrics and information that specifically focus on that gap (measuring crudely via tags and clickable user feedback for issues). That's a useful standard for readers, much easier to measure, they're easy to fix, and this affects far more articles than any other likely targeted change. It's also easy to incentivize authors to fix such issues, and feeds through to GA/FA articles and editorship.
Key tasks under-attended: We need to solve the problem that key tasks get too little attention. Some kind of "task manager" feed where simple matters get added (missing cites for facts, bias queries etc) and which users and readers could filter based on their preferred areas, interests, or articles they are reading/watching. ("This article has requests for help that match your filters, do you want to read them?" may get a much better response than a mere "citation needed".)
Consensus mechanisms: Draw communities attention to the overriding importance of deciding to improve their own consensus-building and difference-resolving methods, as a way to empower themselves on quality.
Quality lock-in and erosion: We need to find a better way to "lock in" quality, against erosion, users who habitually watch articles but then leave, long term steady flow of poor/misinformed editors or deliberate edit warriors or people with long term agendas, etc. Flagged revisions is one way but a number of users object to it. How else?
Automated crude quality ratings: If users can see their article is rated at 3.2, and what's causing it to get that grade, they have a clearer incentive to get it to grade 4 or 5, and patrolers can focus on poor articles to improve the long tail at the bottom. It needs to just reflect basics that matter, as a spur to mediocre or poor articles and to inform and encourage their authors or readers. ("This article is only rated at 2.1 for quality. Click to see if you can help Wikipedia with any of these issues". Leveraging engagement is crucial)
Interwiki workflow: We need to improve the ability of knowledge to flow between wikis. Being able to see the quality ratings of the same article on other wikis (not just their existence) is one possible way.

Bottom line questions - What changes have the most significant effects long term? Which things most impede other changes?

General discussion

Randomran thinks it's a good summary but stepping back to the bigger questions will also help: "Why has quality not improved at the rate that we hope it would? What are the biggest barriers to quality on Wikipedia?"

He feels that teaching newcomers, improve consensus mechanisms and defining quality better are all valuable. But also "ask people why they think it's so hard to make Wikimedia projects into environment of quality. The recommendations will be much more persuasive (and effective) if they're matched with big problems."

Piotrus agrees with the suggested recommendations. He comments that in his view Wikipedia is judged predominantly by the places it fails or falls short, not by its highs (more below), but that consensus wouldn't be a problem if good faith and civility (conduct norms) were followed. He singles out "Recognition and enhancement for established users" and "Focus on the low lying fruit" as "very valid", but "Quality erosion" may need a closer look because sometimes it's the quality standards that change, not the editorial context.

Is Wikipedia judged more by its successes, or its failures?

Piotrus states that in his view Wikipedia is judged more by poor content and failures than by successes. "Many readers don't really distinguish between GA or FA; they are rather annoyed/confused by poorly written (or non-existent) articles". There is general consensus on this as a key point.

FT2 states that reducing clearly substandard items is a universally recognized core means of quality improvement:

"The wiki probably [has] 2 million articles we could easily get pile-on help to improve to a recognized and reasonable quality baseline with ease.

We have hundreds of thousands of members of the public who'd love us to make [it] so smooth they'd take to it as a fish to water, knowing when they needed guidance or ideas on improvement it was "just there".

We can make huge strides by addressing the easy but vast majority cases... [they are] very amenable to automation, they scale, and they encourage incremental improvement in other ways..."

He notes we would still need to support established users and address low quality ones, but GA/FA will flourish due to these other actions.

A baseline quality standard

Randomran feels what we may need (and is being described) is a "basic 'safe enough to eat' quality standard". He asks what this gets us and how we address articles that don't meet the baseline.

FT2 agrees, describing it as a communally agreed "baseline for quality". (Example criteria). What it gets us - Much easier to drive or promote basic improvements, educates new users who write them, hence likely to make good inroads, automatically, on a large scale, engaging the wider public, and relevant to what critics most notice; What about sub-baseline articles - they would be hit hard (next section)

How baseline standard can be attained

FT2 describes his view:

"We set up automated systems, "Help fix this!" buttons when someone views an article, feeds for individual substandard issues, "Fix a random issue" button, everything we darn can, and drive like hell that EVERYONE can help fix basics in articles, readers, people who've never used Wikipedia before. ANYONE.

"You can look up a citation, here's how!" ...
"You can check a fact, or if a statement/section is fairly tagged, here's how!" ...
"You just wrote an article, and I noticed some improvements that will help it stay on Wikipedia. Here are the top 2 items!" ...
"This article has requests for help that match your filters, do you want to read them?" ...
"This article is only rated at 2.1 for quality. Click to see if you can help Wikipedia with any of these issues" ...

We push like hell for it, using automated methods, to get this kind of work automatic. That's what we do."

What's come up strongly for me in the above threads

We need to categorize the community and look at the position and needs of each, related to quality. Quality depends upon accepting that newcomers may need guidance (wizards etc), won't know norms, will benefit from hand-holding. Experienced users are already involved, for them we need to find what demotivates and discourages, and how to encourage uptake of other tasks too. Featured content editors often find themselves working alone on an article or in small close groups. If consensus is to be a long term fundamental in the wikis then good productive consensus-seeking mechanisms are needed.

Quality is not easily metricated, short of direct poll of readers' views or some subset thereof. Not sure of any way round this. There's also a subtle issue - should we emphasize raising fewer articles to featured level, or should we emphasize the need for all articles to quickly reach some kind of "reasonable/good" level? Does the world assess us more by featured content (clear successes) or poor content (clear failures)?

The public assess us on whether articles have obvious bias and lapses, reasonable style, etc. The metrics should mirror what we want people to pay attention to. A crude metric might be based on simple things like tags (and their weight) or words-to-cites ratio in each section. More sophisticated metrics might track articles at various stages of their life cycle: proportion of new articles that have tags for major quality issues; how long those take to resolve; proportion promoted to GA; how long that took; number of articles reviewed for GA; and so on. (I choose GA as it seems a fair baseline for quality). A major problem is there isn't a way to capture reader concerns on articles, which would allow us to focus editorial attention there.

Questions and issues:

We don't know how to make a large community operate consensus effectively.
We have specific tasks where we need to get more input, but in a volunteer community they don't get enough attention.
We have specific demands for editors and increasingly high expectations, but with mass editorship we need to learn how to better guide editors, the anarchic "go figure it yourself" is limiting.
We haven't worked out how to "lock in" good content, or review lost content. So articles erode unless curated by users who actively watch for poor edits. The watching process itself being prone to OWNership, tendentiousness and attrition by POV warriors, and usual editor departure.

Shortlist for possible 2 - 4 recommendations:

Guiding and handholding newcomers: We need to guide newcomers better, and provide more for experienced users, not just "figure it yourself". For newcomers we need wizards, guidance, "question marks" you can click to see an explanatory popup of a key community term, etc. Even simple things like automated user notes that said "Your edit to X got <reversed | questioned> because <issue from dropdown list>. You can fix this by..." or "It looks like you're adding a lot of text but you haven't included any citations". All these would help.
Recognition and enhancement for established users: A "Good Editor" standard, more insight into other wiki-specialisms, masterclasses, better help in disputes (canonical example: expert in edit war), etc. Migrating newcomers to established users is half the deal; keeping them enjoying their editing is the other half. We need to value them - and act like it, and show it!
Reader feedback: We need to "complete the loop", by allowing review and feedback by readers and a better way for editors to be notified of these, for editors to know where to pay attention.
Focus on the "low lying fruit": Focus on articles that are visibly substandard, as a starting point. Set defined quality baselines that articles should aim to reach within 1 - 3 months of creation, and metrics and information that specifically focus on that gap (measuring crudely via tags and clickable user feedback for issues). That's a useful standard for readers, much easier to measure, they're easy to fix, and this affects far more articles than any other likely targeted change. It's also easy to incentivize authors to fix such issues, and feeds through to GA/FA articles and editors, so we need this anyway.
Key tasks under-attended: We need to solve the problem that key tasks get too little attention. "Drives" only go so far. Not sure how, though. I'd like to see some kind of simple "task manager" feed for the wikis to supplement patrolling, where simple matters (missing cites for facts, bias queries etc) all got added, and which users could filter based on their preferred areas, interests, or articles they are reading/watching. ("This article has requests for help that match your filters, do you want to read them?" may get a much better response than a mere "citation needed".)
Consensus mechanisms: We need to draw communities attention to the overriding importance of deciding to improve their own consensus-building and difference-resolving methods, as a way to empower themselves to improve quality going forward.
Quality lock-in and erosion: We need to find a better way to "lock in" quality, against erosion, users who habitually watch articles but then leave, long term steady flow of poor/misinformed editors or deliberate edit warriors or people with long term agendas, etc. Flagged revisions is one way but a number of users object to it. How else can this be done?
Automated crude quality ratings: If users can see their article is rated at 3.2, and what's causing it to get that grade, they have a clearer incentive to get it to grade 4 or 5. And patrolers can also focus on poor articles to improve the long tail at the bottom, rather than just patrolling template categories. It doesn't have to be perfect, just reflect what we feel matters in articles, and how much it matters, as a spur to mediocre or poor articles and to inform and encourage their authors or readers. ("This article is only rated at 2.1 for quality. Click to see if you can help Wikipedia with any of these issues". Leveraging engagement is crucial)
Interwiki flow: We need to improve the ability of knowledge to flow between wikis. Being able to see the quality ratings of the same article on other wikis (not just their existence) is one possible way.

Ultimately a project like this is like shaping a river bed. What changes have the most significant effects long term? Which things most impede other changes?

Hopefully this may help focus the thread, or at least says where I am at the moment. If others will add where they are at (below), then we will have a good summary to continue with.

FT2 ^{(Talk | email)}‎

I think this is a good summary, and a good place to start. But I think it would be valuable to take a step back and get a better grip on the problem.

Why has quality not improved at the rate that we hope it would? What are the biggest barriers to quality on Wikipedia?

I think the task force has already touched on this a bit. "Teach newbies" is implicitly linked to the idea that the problem is a lack of skilled editors. "Improve consensus mechanisms" is linked to the earlier discussion that the decision-making process makes it hard to achieve quality. (And I agree that this last one is definitely a big problem.) And you're asking good questions like "what the heck does 'quality' even mean?" But it would be good to maybe go around the circle, and ask people why they think it's so hard to make Wikimedia projects into environment of quality.

The recommendations will be much more persuasive (and effective) if they're matched with big problems.

Randomran‎

"Does the world assess us more by featured content (clear successes) or poor content (clear failures)?" - I believe, based on my readings and informal interviews, it's the latter. Many readers don't really distinguish between GA or FA (in either case, they are better than anything else out there :D); they are rather annoyed/confused by poorly written (or non-existent) articles.

"We don't know how to make a large community operate consensus effectively." This needs to be researched more, but I think that usually consensus works well - as long as discussions are civil and good faith is assumed. It is when the civil atmosphere breaks down that problems appear.

I agree with your recommendations. In particular, "Recognition and enhancement for established users" - hear, hear. And with my opening para in this reply, "Focus on the "low lying fruit"" is a very valid point (I'll make an amendment to my own early recommendations above based on that).

The only one I may slightly disagree is "Quality lock-in and erosion". In my experience, most of my FAs that have been defeatured have not been disrupted - our quality standards simply changed. That said, I think there is a danger that good content creators/patrolman may leave, and articles will deteriorate due to being left in care of POV pushers. I am not, however, fond of flagged revisions as a solution (but I don't want to open that can of worms :>). --Piotrus 22:21, 26 November 2009 (UTC)

Piotrus‎

I agree on Wiki being judged by its lows not its highs. The best way to improve quality universally seems to be recognized as focusing on reducing clearly substandard items. In customer service it's the adage about 1 unsatisfied customer telling 20 others". In card games it's "take care of your bottom (scores) and the tops will take care of themselves". In marketing it's called "managing expectation". MacDonalds is loved because people get the expected baseline every time (even though it's basic), not because it sometimes serves Michelin-style cuisine and hopefully that fixes up the nauseating mess they served you last week. You get the idea :)

We have probably got 2 million articles we could easily get pile-on help to improve to a recognized and reasonable quality baseline with ease. We have hundreds of thousands of members of the public who'd love us to make Wiki so smooth, they'd take to it as a fish to water, knowing when they needed guidance or ideas on improvement it was "just there". We can make huge strides by addressing the easy but vast majority cases... and the beauty of it is, these are also things that are very amenable to automation too (where assessing "brilliant prose" isn't), they scale, and they encourage incremental improvement in other ways too.

Make that the goal.

We do need to take care of established editors (further skills, new things to get into), and find ways to reduce the problems we have with low quality editors... but GA / FA as articles in themselves aren't really our priority or best focus at this time and level. Ruthless, but true. They'll be taken care of and flourish as a byproduct of other proposals' benefits, if we choose wisely.

FT2 ^{(Talk | email)}‎

Good point about judging Wikipedia by its lows, rather than its highs. If "FA" represents our highest quality standards, maybe what we need is a basic "safe enough to eat" quality standard. It's the difference between getting a stamp from the world's greatest food critic, versus a stamp from the FDA.

I'm not sure what that would look like in practice. If FA's are well researched with brilliant prose... maybe a "safe article" has at least the core concept and its importance verified, and there are no neutrality issues.

The question is what does the baseline get us. What do we do with articles that don't meet the baseline? What do we do with articles that inherently cannot meet the baseline?

Randomran‎

Yes, a "safe enough to eat" is exactly what's meant. Not necessarily "Good Article" but meets a set baseline for quality.

What it gets us: Much easier to drive or promote basic improvements, and also educates the new users who write them. Hence likely to be an area we can make good inroads, automatically, on a large scale, engaging the wider public, related to what critics most notice. Each a distinct "positive".
Example criteria (random ideas): Decent feedback from at least 30 readers... sufficient word:citation ratio in each section... fewer than X tags per 1000 words for major issues (cite needed, sourcing, npov, etc)... no major article quality tags in the last X days... Y thousand page views (to ensure sufficient eyeballs to have a good chance errors were noticed). Borrow ideas from Good Article Criteria and figure which are essential baselines, which we need but not as strictly, and which we can somehow approximate by automation.
What we do with sub-baseline content: We set up automated systems, "Help fix this!" buttons when someone views an article, feeds for individual substandard issues, "Fix a random issue" button, everything we darn can, and drive like hell that EVERYONE can help fix basics in articles, readers, people who've never used Wikipedia before, ANYONE. "You can look up a citation, here's how!" ... "You can check a fact, or if a statement/section is fairly tagged, here's how!" ... "You just wrote an article, and I noticed some improvements that will help it stay on Wikipedia. Here are the top 2 items!" ... "This article has requests for help that match your filters, do you want to read them?" ... "This article is only rated at 2.1 for quality. Click to see if you can help Wikipedia with any of these issues" ... We push like hell for it, using automated methods, to get this kind of work automatic. That's what we do.
Inherent fails will usually meet a brick wall as usual.

FT2 ^{(Talk | email)}‎