Measuring quality (narrow focus)
Measuring quality (narrow focus)
- Themes and ideas from Archive 1:
- This is distinct from "obtaining feedback", although feedback is an important source of metrics. It's about how we ourselves can measure quality, quality metrics, and so on.
- Possible approaches:
- Several ways suggested to obtain useful metrics:
- Basic stuff such as tags, low levels of citing, etc can be automatically measured and evaluated.
- Specified quality criteria can be measured using automation. For example enwiki has various "automated peer review" systems for possible article issues and improving articles, including AndyZ's tool and the like.
- Stability is similarly measurable
- Creation of agreed standards (new article, baseline "goot enough to eat" quality, good, featured) would mean that metrics for progress between these stages could be produced such as time taken, common blocking issues, etc.
- User/reader feedback can be obtained, and profiled by user/reader topic knowledge.
- Woodwalker identified some 17 areas for metrics and similar measures in the thread Defining quality
We can also introduce a large number of quality levels (say 10: from a mini-stub to a featured article) and ask the Wiki-projects to grade it. The quality label should not be confused with the importance label.
Can we meaningfully define quality levels between "baseline quality" (Good enough to eat") and "Good" articles? Not easy...?? I suppose it might be up to the community concerned to create quality levels. A smaller number could be less confusing.
Right now we (depending on the project) have five: candidate for deletion, stub, ordinary article, FA, GA. I believe that it is not so difficult to extend to 10 (for instance, introducing complete articles and articles with correct but incomplete information), but of course I do not find it important whether it is 6, 8 or 10. What I find important that everybody understands the rules of the game.
I think a baseline for quality is that you can write a neutral, verified sentence about what the subject is, and why it is important. e.g.: "World War II, or the Second World War (often abbreviated WWII or WW2), was a global military conflict which involved most of the world's nations, including all great powers, organised into two opposing military alliances: the Allies and the Axis. The war involved the mobilisation of over 100 million military personnel, making it the most widespread war in history." (I know it would be easy to get that article up to much higher quality than just those two sentences, but just wanted to throw that out as an example.)
We have some quality standards related to flagged revisions and these may help in separating the baseline quality and substandard articles. On ru.wp (which is different from de.wp but it is closer to what will be implemented on en.wp) the standards to flag an article are:
- does not contain obviously wrong statements, obvious copypaste and obvious defects like broken templated; basically, has not been vandalized;
- does contain at least one category;
- does contain at least one internal link;
- is not a speedy deletion candidate;
- all other problems like for instance lack of interwiki or sources are clearly marked.
Drawing a line between a baseline quality aricle and a GA-level article can be more difficult but I think it is clear that a two-sentence article is not a GA.
Yes. Baseline is about basics and about avoiding negatives. In principle it's "something we wouldn't be ashamed to show the public".
Agree with Randomran and Yaroslav Blanter that even one reasonable sentence with a category and minimal links etc, can be acceptable content.
The key is that the sentence has to be verified in something independent and reliable. And it can't be verifying the mere existence, or just anything.
"Tommy Smith was a World War 2 Veteran.(cited to a personal website) World War 2 was a global conflict that involved most of the world's nations, including two major alliances: the Axis and the Allies.(cited to a published history book)"
I'm digging into details here. But details are important.
Keep it simple for the purposes of a recommendation: - the idea of a baseline (A.K.A. "fit to eat", "not something we'd be ashamed of") plus examples of what a baseline standard might be.
Ideally it should be some simple definition that most communities would agree, in principle, is a basic need, and with basic requirements that any article can probably reach given an hour or two's work. So that we can easily agree all articles should be of this baseline quality.
If they are not created that way, then made that way very soon or else created in some "Draft:" namespace until they meet it.
Hmm, good point about simplicity. I actually think that "no original research", "neutral point of view", and "verifiability" offer a solid baseline. "What Wikipedia is not", as well. We wouldn't want to go much simpler than that, or else we really throw quality out the window. So it's really a question of translating those rules into a simple baseline standard.
Yaroslav is right that we probably need to throw in some positive things too, like having categories and wikilinks.
I am not exactly sure what we are discussing but I think writing some guidelines should be relatively easy. I am more worried here about the systematic bias issues: for instance, coverage of Israel on ar.wp. But may be we should just let these issues as an apart point and not discuss here (as well as problematic topics on major projects). I beleve even without these problematic topics we cover 99% of all the articles. Problematic articles should be marked as such and treated manually.
Possible baseline (feel free to amend or edit):
- Inclusion/encyclopedic - Article is likely to be encyclopedic and/or meet any inclusion criteria (NOT, Notability, etc), and does not fall within any rapid deletion process for that wiki (eg "speedy deletion" or "prod").
- Neutrality - Article has been reviewed by an uninvolved user and appears reasonably neutral
- Where neutrality issues exist, appropriate action and tags are in place
- Original research, Verifiability, Copyright - Article appears to meet these criteria.
- Cites - Key and controversial statements cited, and cites checked. Unchecked statements and uncited issues are tagged.
- Tone/style - Article is in an encyclopedic tone, in reasonable and readable
Englishlanguage, broken into reasonable sections with encyclopedic section structure if necessary, and promotional style material has been removed.
- Intro/summary - Contains a broad overview, or for articles with an introduction, the introduction provides a broad overview.
- Links, templates and categories - obvious internal links are linked; external links are appropriate; very obvious navigation templates are included; at least one category.
- Checklist of common issues - A checklist of common issues is reviewed and the article tagged if needed (eg reference improvement needed, limited geographic scope, missing perspectives, relevant WikiProjects, etc)
- Other concerns - Conflict of interest, controversial or complex topic, or other specialist issues, either cleared, or clearly tagged and flagged for attention
Most articles could be assessed by such a checklist in minutes, and (except where there is an editing dispute) these kinds of basic issues fixed or properly tagged (as "open issues for attention") within an hour.
In 5 obviously we need to replace English for the language of the project.
The rest is pretty much reasonable. but depends on what we call baseline qualityy/ For instance the two-sentence article cited above would not be tagged as a baseline quality article since it does not contain introduction. Also, it does not contain sources even though it is pretty much obvious where the sources could be found. We basically can decide whether in this example is acceptable to tag the absence of sources and intro rather than to require a quality reviewer to add them him/herself/
Item 5 ("English") edited.
As for the 2 sentence article, a baseline quality article that is very short might not need a separate introduction. That's a matter for the local community.
Maybe we don't mind a baseline article being short, so long as it's decent quality. Or maybe this means there are two levels of quality we can distinguish: - "baseline quality" (any length, even just 2 sentences, but has the key features as above) and then "expanded baseline article" (long enough to have sections and separate introduction). I think even the shortest and most obvious topic should have sources for its key facts, to satisfy baseline quality.
Who adds them and is tagging enough - separate question. I think tagging for sources is different from tagging for NPOV (which is why I put sources, verifiability, OR etc separate from NPOV). Users can sometimes argue for years about if it's neutral. So tagging and discussion may be reasonable. But it should be easy to require key or contentious facts to be sourced/verifiable/not OR. Hence why I categorized those two separately.
I obviously support two levels - baseline quality and expanded baseline quality
I believe the 17 points I identified in my essay are a more exact and complete list of quality requirements.
I have a problem with #2, on neutrality. The problem is, in articles that present highly specialized knowledge, an average or uninformed outsider cannot jusge whether the article is neutral. At best, they can judge whether it has the appearance of neutrality and for anyone who has lived in the US, this is precisely why so many people make fun of FOX news and even some CNN or other cable news shows, because they use certain techniques to provide the image of neutrality but to anyone who knows the topic, it is not neutral.
In order to judge real neutrality, one has to be able to know how to know what is or is not a fringe view. Also, neutrality may not be achieved by producing "both" sides (which is why in many articles on complex topics committed editors often have to argue with newbies (at least, to the article) why a "criticisms" section would not be appropriate.) Diverging views often do not fall along a one dimensional axis. What is more important than providing a pro and a con side, is providing multiple views that emerge and carry weight in particular contexts. This is common in the social sciences and humanities. When it comes to a host of social science issues, there is a popular debate over "nature versus nurture" and predictably people expect there to be three views (nature, nurture, or half and half). An uninvolved or typical user may see these three sides provided in the article on race and intelligence and confirm that the articl is neutral. But to sociologists studying differences in average IQ scores between Blacks and Whites in the US, there are many debates, none of which have anything to do with nature versus nurture. Geneticists doing twin studies to calculate the heritability of intelligence also are debating a couple of hot issues, that are really not well-explained using terms like nature versus nurture. A sociologist or geneticist reading the article would see it as a typical article addressing common questions lay-people have, but doing nothing to educate the general public about current scientific research. You would need someone who knows something abour sociology or genetics to say YES - this article is providing a neutral account of the different views of sociologists and geneticists on the issues they are debating.
As far as I am concerned this is a really serious issue - but it only concenrs may be 1% of the articles. For this 1% one needs indeed to invite experts etc, but for the purpose of tagging the quality I am inclined to say - ok, let us have a special tag "highly specialized article where the neutrality can not be checked", and then put this tag and draw an attention of the specialized project.
If we're going to distinguish between "baseline quality" and "featured quality", we're going to have to distinguish between "basically neutral" and "completely neutral".
A rigorous report might survey all the major economists on health care to come up with a fair weighting of perspectives. They would cover the different economic/ethical/political viewpoints, leave out some fringe views, and support it all with substantial data. If it was significant, they would give some weight to emerging perspectives, noting that it is a reasonable but frequently debated viewpoint. That's a completely neutral article. It achieves our highest standards of "NPOV".
That means that a basically neutral article might have systemic bias. It might be CNN's facile "here are two perspectives", with no supporting data to critically evaluate those perspectives, and no interest in a third or fourth perspective that doesn't fit neatly into the X vs Y storyline. "Liberals assert that public health care is the best way to cover everyone, but conservatives note that this would be costly." No economic studies about costs, economies of scale, pooled risk, monopsonies for pharmaceuticals, etc... If you're lucky, some anecdotes about a few people's bad experiences with different health care systems. I would never present this as an authoritative source about health care. But would it be basically neutral? Probably. (Unless someone cherrypicked CNN stories to prove a point).
I shudder to concede that it would be "basically neutral". But I think that's what we mean when we say "baseline standard". It's basically trustworthy, with a few asterisks* that would need to be expanded upon.
Concur with Yaroslav. This is a baseline intended to help improve the 2 million articles and establish an expectation that all articles quickly meet a baseline standard. The few articles where an average user cannot judge good baseline, are outliers (ie, exceptions, or "1%" as Yaroslav says). They aren't in any way the majority or even a large minority. They will surely need specialist editing and review as Slrubenstein says.
The aim here is to establish "should/must meet baseline" as THE expectation, for all content, to the point ideally that no editor would think of allowing articles to not meet that standard, like no usual editor would think 3RR is inactionable. A bright line.
The fact that bright line will not sufficiently check 100% of articles but only 99%, is not necessarily a problem for the while. I think it was Philippe who said "perfect is the enemy of good". Once we get most content that way and the expectation that "all articles must be quickly made to meet baseline and kept that way", then we can look at what more (if anything) is needed for the exceptional cases.
@ Randomran: agree in principle. Perfect NPOV is not trivial because it means reviewing and knowing the field neutrally. But a basic level of neutrality should be attainable as a minimum, to the point where a reasonable editor peer reviewing the article feels it's not glaringly unbalanced. As you say, a can of worms, but the guiding principle is a good one and will help. We can discuss the rest in a few years time, once that step's achieved :)