Proposal:Develop systems for accuracy review

The status of this proposal is:
'Doing...'

This proposal is associated with the bolded strategic priorities below.

Share this:

Summary

Create a MediaWiki extension which supports blind, double-blind, and similar systems of accuracy review. Test it on authentically controversial subjects, then measure and publish the results.

Proposal

Peer review is one of the ways that instructional content can be reviewed for accuracy. Accuracy standards are one of the elements which distinguish educational content from mere instructional content, so having the best systems to review content for accuracy is one of the ways we might meet the UK charitable status criterion of combining the collection of knowledge with education.

A Mediawiki extension meeting this need may need access to authentication, captcha, voting, and translation services. For example, authentication and captcha together could be used to increase the probability of confidence in the identity of the reviewer. A voting system should be able to prevent intentional or unintentional defects in reviews from occurring. One method of blinding a text would be to translate it so that reviewers familiar with a different set of languages could be asked to review it. If automatic translation is being used, an automatic back-translation might be used to confirm the suitability of the initial translation for use in blind review.

For example, a selection of text or a url (to a permanent article version or difference, etc.) could be an item for which multiple, randomly-selected reviewers chosen for their stated familiarity with a topic area would be selected. Those reviewers could be shown that text or url (perhaps as part of a list of such items) in a securely authenticated and captcha-ed channel. They would be asked to vote on the accuracy of the item, and have the opportunity to fully explain their votes in comments. If a statistically significant number of votes are in agreement, then the item could be approved as to veracity or rejected. When the votes are not in agreement, then additional voter(s) would perform a tie-breaking function. Each voter's track record in terms of agreement with other voters could be recorded secretly and used to

weight their vote to nullify defective voters, and/or
used to select whether the voter is allowed to become a tie-breaker.

The extent to which a subject has proven controversial in the past is likely proportional to the number of voters required to obtain a statistically significant result.

Such a system should be tested on real-world controversial subjects, the results should be evaluated by a set of independent subject matter experts, and the results should be published so that the community can evaluate the utility of such systems. If the trials succeed, then it could replace http://en.wikipedia.org/wiki/Wikipedia:Third_opinion and similar pages on other projects as the first stop for any content dispute.

Motivation

Content disputes are an ongoing problem, diminishing wikilove and leading to behavior disputes around areas concerning politics, religion, and race.

Key Questions

Does the number of reviewers improve the accuracy of review? At what point do the returns diminish? What proportion of the reviewers is best to be credentialed in the subject matter of the review?
Which methods of authentication and data transmission are least susceptible to leakage across the masking barriers? Does https authentication provide an advantage in actual practice?
Which methods of voting are the easiest and most resistant to defective and strategic voting? What criteria are the most important for voting systems to meet?
In what other ways can we make accuracy review more resistant to defective manipulation?
In what other ways can we increase the accuracy of the reviews' outcomes?
How can review throughput best be increased?
Is this National Science Foundation grant an appropriate source of funding for this project?
How are commercial organizations approaching these problems? (e.g. [1] and its "review" sub-page)

Potential Costs

Rough estimate: less than 1200 person-hours coder and documentation specialist time, plus a similar amount of expert review time to prove the effectiveness of the solution.

References

Proposal:Assessment content
- GIFT microformat extensions for accuracy review

Community Discussion

Do you have a thought about this proposal? A suggestion? Discuss this proposal by going to Proposal Talk:Develop systems for accuracy review.

Want to work on this proposal?

James Salsman
Michael Spece
.. Sign your name here!