Expand reach within large, well-connected populations

From Strategic Planning
Jump to: navigation, search

Summary Issue: Wikimedia is under-penetrated in China & India, two of the largest and fastest growing Internet regions of the world

China and India will drive future global Internet growth

There are currently ~1.5 billion Internet users globally. According to Forrester Research, 38% reside in Asia [1], as compared to 26% in Europe, and 17% in North America. And three of the top ten Internet user bases are Asian countries; China has the highest number of Internet users worldwide with approximately 290 million users, followed by the US (~220 million), Japan (~88 million), and India (~80 million).[2]

Over the next five years, the online population is projected to grow significantly, and growth will largely be outside Europe and North America. By 2013, Forrester Research's Global Online Population Forecast 2008-2013 [3], suggests the number of people online will grow from 1.5 billion in 2008 to 2.2 billion driven by growth in Asia. Within five years, therefore, Asia will represent 43% of all Internet users; the percentage of global Internet users from Europe and North America will shrink to 22% and 13%, respectively. Further, research suggests that India will surpass Japan, a mature market, to become the third largest Internet user base worldwide. Forrester Research predicts that China, India, and to a lesser extend Indonesia, Pakistan, and the Philippines will see a 10% to 20% annual growth over the next five years. [4]

Wikimedia's worldwide penetration

Wikimedia, and in particular Wikipedia, which accounts for 96% of all page views from the over 330 million monthly visitors, has grown exponentially since its inception eight years ago. With 271 different language sites, Wikipedia currently reaches approximately 20% of Internet users worldwide. Wikipedia's reach is truly global. More than 25% of the online population of Europe and North America use Wikipedia, while for Asia, Latin America, and Africa and the Middle East estimates of the online population's use range from between 10-30%. However, Wikipedia's penetration is not uniform and there are still large untapped populations of Internet users globally, most significantly in China and India, two of the largest and fastest growing countries of the connected world, where estimates for Wikipedia's penetration are 1% and between 8-20% respectively.[5] As such, it is a strategic growth imperative that Wikimedia consider approaches to creating vibrant Wikimedia platforms and communities in these countries in order to reach these large and growing Internet using populations.

Figure 1: Global penetration of Wikipedia amongst Internet users

Global Penetration of Wikipedia

Regions with the strongest Wikimedia penetration have robust Wikipedias

Wikimedia is strongest in regions of the world where there is a robust Wikipedia in the principal spoken language (i.e., where there is a Wikipedia with 500 000 or more articles, such as English, German, French, Spanish, and Portuguese). For example, Wikimedia projects are being used by as many as 40% of internet users in Canada, 30% to 35% in European countries such UK, France, Germany, 25% in the US, and approximately 20% in Russia and Latin American countries.[6]

Wikipedia has also developed successfully in countries with unique national languages, achieving over 20% penetration of Internet users in Finland, Japan, and parts of South East Asia. While English fluency is high in many of these countries, in each of these cases, a growing article count in the national language and language of higher level education is evident.[7] Further, the Finnish, Japanese, Vietnamese, and Thai Wikipedias are examples of projects in which "useful" articles (those with greater than 1.5 kB, approximately 150 to 250 words) comprise 25% or more of total article count.[8]

Wikipedias in Chinese and Indian local languages are relatively small

While Wikipedia has become a significant resource for much of the online world, there are still regions where, even with the presence of local language Wikipedias, few people are using or contributing to these projects and they have relatively small numbers of substantive articles. Most notably, there are approximately 380 million people online across China and India today who speak at least one of the languages for which there exists a Wikipedia and yet Wikipedia penetration is as low as 5%.[9] These include languages in which there are large speaker populations (>50M speakers) such as Mandarin, Hindi, and Bengali and for which there are sizeable online populations (>15M Internet users) yet with relatively small Wikipedias:

  • ~43,500 "useful" articles in Mandarin (~17% of total Mandarin articles)[10]
  • ~3,500 "useful" articles in Hindi (<10% of total Hindi articles)[11]
  • ~1,700 "useful" articles in Bengali (<15% of total Bengali articles)[12]

If Wikipedia were to increase its penetration among these speakers from its current 5% to the 20% it is reaching in many other parts of the world, it would result in another 57 million people using Wikipedia. By 2013, over 450 million Chinese and Indians may be online.[13] If Wikimedia were to achieve a 20% penetration rate within these countries, it could result in some 90 million Wikipedians across China and India.

Task Forces

China Task Force

With an online population of ~298 million, China has the largest online population in the world. Wikipedia penetration is < 1% despite a robust jump in articles (39%) from last year. Wikimedia faces some challenges in China that it has not faced in a significant way in other countries.

  • There are significant business competitors to Wikimedia in China. zhWikipedia has ~250,000 total articles and ~43,500 "useful" articles compared to the 3 million articles cited by Hudong[14] and the million plus by Baidu. Baidu launched an agressive plan in June 2009 to compile the largest digital rural encyclopedia in China covering 80% of the total villages in China by financially incentivising rural Chinese to contribute.[15] Both Hudong and Baidu pay their contributors, and they use other websites to compile their encyclopedias.[16] As of May 2009, zhWikipedia had ~2,000 volunteer editors (editors with 5+ edits) whose growth rate has fallen from 30% two years ago to 14% over the last year.[17]
  • Government controls on information may limit contribution and usage of zhWikipedia. Because Wikimedia places no systematic controls on the information in zhWikipedia, it may expose contributors to some risks as the government places changing restrictions on information. In contrast, Baidu and Hudong automatically censor contributions and retro-actively censor articles, removing information deemed too sensitive. This protects both contributors and readers alike from activity that might be illegal.[18]
  • Further, both English and Chinese Wikipedia servers are located offshore. In China, many users are charged foreign-usage fees by university computer labs, connection speeds can be significantly slower and sites are more readily blocked.

Despite these challenges China remains the greatest untapped region of the connected world for Wikimedia. An increase in penetration from the current <1% to 20% would result in approximately 53M more people using Wikimedia. To determine the best path forward in China, Wikimedia must assess resource requirements, and balance Wikimedia values vs opportunity. For this purpose, a task force will further investigate the opportunities and challenges of expansion in China.

See China Task Force for the list of critical questions associated with this Task Force, as well as specific supporting materials.

India Task Force

Across India there are approximately 80 million Internet users who speak a variety of languages. Although exact figures for the number of English speakers in India do not exist, there is reason to believe that a high percentage of these Internet users also study or work in English, as English is the principal language at elite high schools and post-secondary institutions, as well as being the common language for professional advancement.[19] With Wikipedias existing in both English, as well as all 15 of the top South Asian languages, the potential for these Internet users to be Wikipedians is high.

While Wikipedia use among these wired populations is higher on average than in Chinese populations, there are still opportunities to expand Wikipedia penetration in India, where an estimated 8-20% of online Indians currently use Wikipedia.[20] The vast majority of these users use enWikipedia.[21] Since the South Asian language Wikipedias have relatively few substantial articles, it is unclear if those who are currently using enWikipedia would prefer to use other language Wikipedias or if Wikimedia could increase its penetration in India if projects in other languages were more robust.

As previously noted, South Asian language Wikipedias have lower numbers of articles than either zhWikipedia or high penetration native language Wikipedias in both count and percentage of "substantive" articles. Tamil, Malayalum and Hindi have the highest number of "useful" articles with 4,359, 3,508, and 3,461 articles respectively representing 23%, 33%, and 10% of total articles in those languages as of July 2009. These low counts may be due to there being relatively few active editors (those who contribute 5+ edits). There are 537 active contributors across all 15 South Asian language Wikipedias (may include some double counting), compared to zhWikipedia which has 1,947 5+ editors. While Hindi, and Punjabi language Wikipedias have seen their editor base grow by more than 50% in the past year, the editor base for the rest of the Indian languages are growing at a slower pace, and in a few cases is actually declining.[22]

A case study of the Tamil Wikipedia as well as a discussion amongst active editors of Indian language Wikipedias surfaced several barriers to the development of robust South Asian language Wikipedias. These include a low awareness of the new tools available to support typing in top South Asian language scripts, as well as the predominance of English as the language or elite education and professional advancement. Tools to type in South Asian language scripts are often not supported by the older computer systems typically available in India. These problems are compounded by a lack of South Asian language editors with sufficient technical expertise to create bots and adjust tools for these languages. The lack of South Asian language written sources adds to the complexity of creating and enhancing articles under current Wikimedia guidelines and rules.[23]

Collectively, growth in South Asian language Wikipedias in combination with futher enWikipedia penetration could increase Wikimedias reach in India by tens of millions. Further understanding of the barriers to usage of both English and South Asian language Wikipedias is required to determine the best ways to further penetrate India. To this end, a task force will investigate the opportunities and challenges of expansion in India.

See India Task Force for the list of critical questions associated with this Task Force, as well as specific supporting materials.

Additional information and resources

References

  1. Forrester Research, Global Online Population Forecast, 2008 to 2013,India to have 3rd largest number of Internet users by 2013, India Express /
  2. Information on Wikipedia Internet penetration is based on information from two sources, ComScore and International Telecommunications Union 2008
  3. Forrester Research, Global Online Population Forecast, 2008 to 2013, Global Online Population Forecast 2008-2013.
  4. Forrester Research, Global Online Population Forecast, 2008 to 2013, to have 3rd largest number of Internet users by 2013, India Express
  5. For sourcing data and additional information on global penetration of Wikipedia see Wikimedia penetration#Wikimedia penetration by country
  6. For sourcing data and additional information on global penetration of Wikipedia see Wikimedia penetration#Wikimedia penetration by country
  7. For sourcing data and additional information on global penetration of Wikipedia see Wikimedia penetration#Wikimedia penetration by country. Approximately 25% of Internet users in Asia excluding the populations of India, China, Japan, Taiwan, Australia and Korea use Wikipedia. This estimate was derived from data from the International Telecommunications Union and ComScore
  8. For information on article count, article growth rates, and the number of articles greater than 1.5 kB in Southeast Asia see Southeast Asia#Southeast Asian languages and their Wikipedias; for information on Europe see Europe#National languages of Europe of between three and 28 million speakers and their Wikipedias; for information on Japan see East Asia#Table of Major East Asian languages and their Wikipedias; for discussion on why article size can be a good measure of usefulness see Talk:South Asia
  9. For sourcing data and additional information on global penetration of Wikipedia see Wikimedia penetration#Wikimedia penetration by country
  10. China#Languages of China and their Wikipedias
  11. For more information on South Asian language wikipedias please see South Asia#Table of Major South Asian languages and their Wikipedias
  12. For more information on South Asian language wikipedias please see South Asia#Table of Major South Asian languages and their Wikipedias
  13. Forrester Research, Global Online Population Forecast, 2008 to 2013, India to have 3rd largest number of Internet users by 2013, India Express
  14. 'Chinese Wikipedia' offers social networking too, Hudong online encyclopedia
  15. Baidu encyclopedia, Baidu encyclopedia
  16. For a detailed analysis of the competition from Baidu and Hudong see interview with Wikimedia advisory board member Ting Chen page 3 Ting Chen interview
  17. For more information on zhWikipedia see China#Languages of China and their Wikipedias
  18. For information on harassment of the Chinese government of Wikipedians please see the interview with advisory Board member Ting Chen page two Ting Chen interview
  19. See interview with Wikipedia advisory board member Achal Prabhala page 1 Achal Prabhala interview
  20. Wikimedia penetration#Wikimedia penetration by country
  21. This statement is based on preliminary analysis of one day of page visits for English and Indian language wikipedias. Further analysis over a longer time is needed to confirm this statement.
  22. South Asia#Table of Major South Asian languages and their Wikipedias
  23. For the case study on the Tamil Wikipedia see Tamil case study for the discussion amongst Indian language editors; see Talk:South Asia.