Jump to content

Proposal:Have WMF involved in archiving online citations

From Strategic Planning
Status (see valid statuses)

The status of this proposal is:
Request for Discussion / Sign-Ups

This proposal is associated with the bolded strategic priorities below.

  1. Achieve continued growth in readership
  2. Focus on quality content
  3. Increase Participation
  4. Ensure that the underlying project infrastructure is secure, stable, and sufficient to guarantee the permanence of the projects and support ongoing growth.
  5. Encourage Innovation

Please note: This proposal appears to be a duplication of meta:WikiScholar. - 16:24, 8 February 2011 (UTC)


The Wikimedia Foundation should be involved in archiving online citations as this is a fundamental need of Wikipedia, the largest project of the WMF.


An RfC to use cached webpages at Wikiwix.com is currently being discussed. Other options are still actively under discussion - see w:WT:WikiProject External links/Webcitebot2


The English Wikipedia alone is estimated to contain somewhere on the order of 17 million external links. They are pervasively used for referencing, for further reading, and countless other purposes; some Wikipedia readers have mentioned that they find the external links more useful than the actual articles. Unfortunately, dead external links (i.e WP:LINKROT) in citation templates are a major problem on Wikipedia, affecting its reliability and validity; the Internet Archive estimates that the average lifespan of a link is no more than 2 years, and so the number of dead links will grow significantly over the coming decades.

Many efforts are undertaken to combat this problem, but most of them rely on outside companies/non-profits/etc. that are not under our control and a far from an ideal solution.

The Wikimedia Foundation should be actively involved with this issue in an effort to have more control over this problem rather than being at the mercy of other institutions.

Key Questions

Potential Costs

Initial financial cost estimates range from $1,600 to $15,000 for hardware and $70 to $500 per month for operations.

Copyright issues will need to be addressed and will likely fall under fair-use provisions similar to situation with Google's cache, Internet Archive, and WebCite. (see legal case Field v. Google)


Community Discussion

Do you have a thought about this proposal? A suggestion? Discuss this proposal by going to Proposal talk:Have WMF involved in archiving online citations.


Internet archive stop access to old archived pages when a robot stops access to the site. This means that web sites that die and are taken over suddenly have the old content removed by someone other than the original publisher. This happens quite frequently because the links to the sites have value for ads.

If we were involved in archiving we would be able to avoid this business. However we probably should still have a method for the original author of a site to redact content. Hopefully it should be very little used but the reasoning at Archive-it should be inspected and some sort of mechanism will almost certainly be needed. 22:27, 6 February 2011 (UTC)

Version archived

I think we should archive content as soon as possible after a new citation is found and then put a mark on the citation so the archive can be viewed. This archive should not then be overwritten I believe by later archives of the citation unless an editor explicitly requests a new archive somehow, and in that case it should still be possible to get back to the earlier archive if the later one was requested wrongly. 22:27, 6 February 2011 (UTC)

Web page capture failure

Some content might be difficult to capture. If the archiver doesn't capture what an editor wants captured a backup plan for them might be to print the web page to pdf and upload that with some annotation to the archiver. 22:27, 6 February 2011 (UTC)

Want to work on this proposal?

  1. .. Sign your name here!