Content quality/ja

From Strategic Planning






Trends in online content

While Wikimedia dominates the online reference space, it appears to be positioned between several important online content trends:

  • Organizations increasingly relying on social media to provide new ways for people to share real-time information and news [2]
    • 367K CNN iReports worldwide
    • Facebook now has 250M active members worldwide
    • Over 3M tweets are sent per day

  • Blogs becoming a mainstream source of opinion, news, and expert information
    • 77% of active Internet users read blogs
    • There are an avg. of 900K blog posts per day

  • Key players expanding the supply of free online books and published works
    • >7K free public domain books at Amazon Kindle store
    • Google Books will soon distribute work released with Creative Commons licenses

  • Increasing momentum behind open educational resources
    • California calls for the adoption of digital math and science textbooks
    • President Obama proposes investing in free online courses to improve community colleges

What do these trends mean for Wikimedia? What other online content trends should the strategic planning process take into account?

Recent related news

Here is a link to one of the many recent articles about Google Books and Google's latest moves towards expanding its digital library

What is Wikimedia's current position in this content landscape?

Data on the size and growth of Wikipedia content

The current number of articles available for a sample of Wikipedias can be seen here: [3]

Article growth over time for these same Wikipedias can be seen here: [4]

Based on the number of new articles per day, English Wikipedia’s content growth rate appears to have been slowing since 2007: [5]

Data on content breadth and composition

A study by the researchers at PARC titled “What’s in Wikipedia: Mapping Topics and Conflict Using Socially Annotated Category Structure” brings some data to bear on the question what information is actually contained in English Wikipedia’s 2.96M articles. They found information covering 22M categories, which can be grouped into 11 overall topics with the following distribution and growth (2006-2008):

Culture and the arts is not only the largest topic, and twice the size of the next largest topic, but has also seen the most growth since 2006.

Is this the same for other language Wikipedias? How much content sharing currently goes on (e.g. through translation)?

English Wikipedia’s 2.96M articles, and the fact that the PARC researchers found that content could be mapped to 22M categories, also speak to the breadth of content that mass collaboration has made possible. Information comparing Wikipedia’s content breadth to that of other encyclopedic projects can be found at size comparisons. Some of the comparisons that seem most relevant to English Wikipedia have been updated whenever possible and can be seen here:

Data on "vital articles"

Some people argue, however, that content breadth should not be the true goal of Wikipedia. Instead, Wikipedia should focus on creating and improving the quality of a smaller group of "vital articles" that every encyclopedia needs to have.

Members of the community have been working to create a list of 1,000 "essential articles", or "basic subjects for which Wikipedia should have a corresponding high quality article". The current list of those articles can be seen here

Using a slight different set of topics, here is one way to look at the distribution of these vital articles:

And here is the same list sorted by number of page views

For analysis of the quality of these current articles, please visit the Quality factbase page

Data on content usage/affinity

Overall page hits per day for the same sample group of Wikipedias can be seen here: [6]

Average page hits per article per day can then be calculated, as is done here:

However, a closer look at the top 1000 pages in en Wikipedia (by average page hits per day for 2009) shows that the top pages get a disproportionate share of page views and starts to hint at what happens as you move down the content "tail". As a note, the top 1000 pages receive 5% of daily page views, while representing significantly less than 1% of total pages. [7]

Note: "Special", "Portal", and "Wikipedia" pages (e.g. Main Page, Search, Citation Needed) have been removed from these calculations in order to focus in on content that is being viewed. Obvious redirects to other sites (e.g. YouTube, Facebook, Twitter, and MySpace) have also been removed for the same reason.

For the following analysis, the top 100 pages (by average daily page hits) for a sample of language Wikipedias were assigned to a set of general categories, with the following results:

As a note, all "Special", "Search" and "Portal" pages were removed from the analysis in an attempt to isolate the actual content that users are viewing.

What options does Wikimedia have for extending the scope of its content?

A preliminary list of broad options includes:

  • Continuing to expand content breadth and diversity (within and across languages)
  • Expanding the depth of existing content
    • Expanding the support for research on Wikiversity to include research other than content related
  • Expanding to different types of content (different forms of content, for different users)
    • Opening new communities where existing communities have chosen to limit their market, to capture other market segments. Eg. Wikibooks limit to Text-books

What initiatives could Wikimedia consider to support this scope extension?

A preliminary list includes:

  • Content donations
  • Content partnerships (e.g. with content institutions or other online encylopedias)
  • Providing incentives for the community to focus content creation efforts

What is the potential impact of these content initiatives?

  • Adding more content while the number of frequent editors (>100) stagnates means to worsen the articles-per-frequent-editor ratio ("AFE ratio"). The English language edition is a prominent example for the consequences: if the AFE ratio goes beyond a certain value, the community is not longer able to guarantee the reliability of the content (vandalism free); the online-ecosystem gets out of control.

The articles-per-frequent-editor ratio ("AFE ratio") answers the question: "How many articles have to be controlled by one core community member?"

Current AFE ratios (May 2009)
Language edition Articles (ch>200) Frequent editors (e>100) AFE ratio (smaller=better)
en pending pending pending
de 904,000 pending pending
sv 300,000 pending pending
  • Increasing the Market Segments approached by the service, might have the effect of increasing the pool of editors by capturing editors not currently attracted, due to market penetration.

Looking for input

Content data

  • Amount of content by category and topic for other language Wikipedias
  • Page views by category and topic
  • Additional data on content depth and breadth (for Wikipedia and other Wikimedia projects)
  • Content sources, and changes over time

Content context

  • What are the most important trends in traditional and social media that have had, or could have, an impact (either positive or negative) on Wikimedia and the way that it approaches content?
  • What are other options for thinking about where Wikipedia falls in a broader content landscape?
  • Has any research been done on topic distribution in other language Wikipedias?
  • What are perceived (or known) to be Wikimedia’s areas of content strength and where is there the most room for improvement? How is this same or different across language Wikipedias?

  1. Cross, Tom, "Puppy smoothies: Improving the reliability of open, collaborative wikis"
  2. Statistics from [1]
  4. Wikipedia:Statistics
  5. Wikipedia: Statistics
  6. [2]
  7. [3]