Task force/Recommendations/Local Language

From Strategic Planning

Important note about implementation of policies

The local language task force is of the opinion that policies that are good for larger Wikimedia projects not necessarily are so for smaller projects. On small Wikimedia projects the most important thing is to get past the inflection point where a stable community is formed and to give this community means to grow in volume of content and editors. Policies that are meant to bring larger projects in a certain direction will effectively kill a smaller project if it is applied there. There has been inter task force communication with the quality task force where the implementation of quality policies has been discussed. The English version of Wikipedia started with much less emphasis on source referencing; a similar environment is necessary for any small Wikimedia project to take off. The degree to which different policies should be implemented on different projects should therefore probably be decided on a per project basis, and by its own community. It is also important to make the communities of the different projects aware of the need for different policies so that the smaller projects do not automatically follow the policies of the larger projects, but make qualified judgements about what policies fit their own project at their point in the growth curve.

Strategy 1: Outreach

Who can play an essential role in this strategy?
WMF Chapters MediaWiki developers Editors Other volunteers Third parties
Coordination. Make outreach resonate with local culture. Create outreach material.


Outline

Question/Problem

The open content movement is young even in countries where Wikipedia has succeeded; even more so in less developed countries. This, together with the view that "expert knowledge cannot be produced by non-experts", hinders development of local language Wikipedias. Fostering a positive view toward this would help.

For Wikimedia projects to grow in local regions, it is first of all important to make people aware of the usefulness of such projects, and also aware of why they should contribute. The thoughts underlying the movement therefore need to be brought onto the minds and lips of the people in these regions in order to have growth of the projects there. Initial contact with the ideas that underlies movements such as open content and open source is for most people probably through casual conversations. In cultures where these ideas are not well known or well understood the chance that anyone would be introduced to these ideas through a casual conversation is however very unlikely. It cannot be expect that people in these regions will access the underlying ideas for the first time through written text when this neither is the way of initial exposure in regions where Wikimedia projects has succeeded. Other ways than textual information has therefore also to be considered for outreach to be successful.

Strategy

Videos is an excellent outreach tool that can replace the role of casual conversation and also trigger casual conversation on ideas connected to Wikimedia. Videos that explain the underlying ideas behind the movement and motivate people to use and contribute to the projects would be of benefit. The messages conveyed in these videos must in a short time make the concepts and opportunities of open content Wikimedia style clear. The videos can be used to educate people about how they can benefit from the Wikimedia projects.

Above all, people want tools and information that are relevant to their lives - better ways to produce food, health information, home crafts to produce income, ways to improve their quality of life, education for their children, social knowledge, or empowerment among groups that have traditionally felt a lack of ability to improve their lives, and to share and learn with other similar communities. These are worldwide messages although the specifics will vary between country, culture and demographic group. If provided a means to access useful information and also shown how it can help, in many countries and groups, the uptake will likely be strong and enthusiastic.

One way this could be done is to make a video showing how these tools and capabilities improve (or can improve) lives in different countries, as something people are doing right now. If a video is made that shows this across a range of cultures and places, it can be dubbed and subtitled into as many languages as possible (perhaps with local chapter help) and launched on-line. A more extensive effort could try to make the videos local by making several videos, one for each region, that fits the culture there. This could also be extended into a Wikimedia TV channel on-line and freely available DVDs off-line - making this information highly accessible and very visible.

Possible other aspects might include addressing questions of importance in the specific region, try to make celebrities that are influential in a certain region present the message there. Connect the underlying motivation to contribute with the cultural norms of the region, helping others is certainly a norm that is influential in every part of the world, even though it might be formulated in different ways. Also ensure the presentation is encouraging. Choosing the right soundtrack for the right could be essential to put people into the right mood to use and contribute to the Wikimedia. Local chapters should be consulted or involved in the localisation process, and may also have a role in the process of getting the message across to local media, bloggers and NGOs.

Assertion: Wikimedia projects cannot attract readers and contributors if its projects and purpose are unknown

Strategy 2: Stimulate creation of local content

Who can play an essential role in this strategy?
WMF Chapters MediaWiki developers Editors Other volunteers Third parties
Provide information about what is of local interest. Automate collection of information. Create content.
  • Google.com
  • Yahoo.com
  • Alexa.com


Outline

Question/Problem

To increase the use of and participation in local Wikimedia projects it is important that the projects provides material that is of importance to the people in that region. The material provided on e.g. the English Wikipedia might in some regions not be of much interest, even if it would be translated into the local languages. Even increasing the topic coverage on sites such as the English Wikipedia could have some results as it could increase the use of Wikimedia projects by English speakers in a region, giving an increased awareness of the usefulness of these projects and in turn stimulating the use of the projects in the local languages as well.

To stimulate creation of material that is of interest in different regions it is important to know the latest up-to-date happenings in that country. For example in the Wikipedia section, articles relating to research in that country can be shown, and in the wikinews section current news about that country could be given priority. Also in the Wikibooks, Wikispecies and other Wikimedia projects up-to-dates in that particular country, relating to the particular project, should be given priority.

Strategy

Automatically collect data from the wikimedia projects taffic logs, search engines, alexa and so on about what people in different regions want to read about. Then present the regional data easily accessible from the same regions main pages of the different wikimedia projects, in a format that is easy to read and understand. The statistics could be collected in different cathegories such as current research, news, hot topics (topics that has a rise of interest for a couple of days, weaks or months, e.g. a wedding of important persons in the region or a season long tv show), cultural interests (e.g. dance might have a higher number of searches in some regions than others) and so on. Further encourage creation and expansion of articles about content that occurs on the list. It would be good if topics that already has articles of pretty high quality could be filtered out of the list so that the list only contains topics that are not well covered and need to be improved.

Also make sure that topics of higher interest in the given region is highlighted on the main pages of the different Wikimedia projects. Even if there is a very well written article about Denmark in Tajik, there might be other articles that are better suited to be a featured article on the Tajik main page.

Assertion: The major Wikipedias lacks content that are of interest to people in many regions.

Surface of the earth at night
Density of geotaged articles
Density of named places in www.geonames.org

People in different regions are interested of information of different character. An article about the Eifel tower or a European politician is of more interest to a European reader, while an article on lake Victoria or African politician can be of larger interest to African reader. At the moment there is a lack of content that are of interest to people in regions that has had little growth of their own projects.

Some comments from an article at floatingsheep [1] highlights these issues:

  • "Many Wikipedia articles (about half a million) are either about a place or an event that occurred within a place, and most of these geographic articles handily contain a set of coordinates that can be imported into mapping software."
  • "The country with the most articles is the United States (almost 90,000 articles), while most small island nations and city states have less than 100 articles."
  • "Almost all of Africa is poorly represented in Wikipedia".
  • "[T]here are more Wikipedia articles written about Antarctica than all but one of the fifty-three countries in Africa".
  • "[T]here are more Wikipedia articles written about the fictional places of Middle Earth and Discworld than about many countries in Africa".

The three images shows a map of the nightsky, the density of geotaged articles about a region and the density of named places in www.geonames.org respectively.

Comparing the first two images one finds that there is a very high correlation between the density of goetaged articles and electric light. Comparing the second and the third pictures it is also easy to see the large possibilities for extending coverage.

Strategy 3: Localization and internationalization of the MediaWiki software

Who can play an essential role in this strategy?
WMF Chapters MediaWiki developers Editors Other volunteers Third parties
  • Run campaign.
  • Provide monetary support.
Find local translators.
  • Localize the software; support all character sets and right to left script.
  • Internationalize the software.
Translators.
  • translatewiki.net
  • africanlocalization.net


Outline

Question/Problem

For local language projects to grow it is important for editors to be able to interact with the MediaWiki software in their own language. It is also important that the software supports the characters of the local languages and that it support right to left script.

Strategy

A three stage process to get the localization done for all languages with one million or more native speakers is

  1. Run a campaign to get a higher number of volunteering translators to translatewiki.net.
  2. When the number of translated messages starts to plateau, put money into bounty rallies and pay-per-message solutions. (Estimated cost: $100,000 to get all messages translated with these methods alone)
  3. Finally, when neither of these solutions are sufficient to translate the remaining messages. Consider whether it is worth to hire professional translator to get the last messages translated. (Estimated cost: $1 million to get all messages translated with this method alone)

To solve character set issues it would be a good idea to cooperate with already established open source communities that tries to solve the same issues for other systems. More documents about African languages can be found at http://www.africanlocalisation.net/documents, especially the following documents seems interesting; Characters needed for African orthographies in Latin writing system - http://www.africanlocalisation.net/content/characters-needed-African-orthographies-Latin-writing-system

The Task Force has not had time to research what the actuall issues with right to left support is, but it is however important to give right to left languages the same support as left to right languages.

Important note about paying for translation: Even though paying for translation might be an effective way of getting the translation done it is important to realize that volunteers are likely to stop translating them self if others get paid for the same work. Especially hiering of professional translators is likely to discourage volunteering translators. The bounty rallies and pay-per-message methods are less likely to discourage volunteers because everyone has a chance to have a share. But it is still important to ensure that everyone has the same chance on the share then. A monetary reward could also decrease the intrinsic motivation as explained in the Wikipedia article overjustification effect.

Important additional note: One more thing that has been brought forward is the necessity of development/availability of internationalization tools for media content. The Task Force has not had time to research what these issues are or how they can be solved, but one thing that has been mentioned is that SVG->PNG conversion gives strange results for some characters. That the software is internationalized is of course as important as localization.

Assertion: The MediaWiki software could be fully localized for all languages with one millon or more native speakers for $1 million, $100,000, or cheaper.

The statistics on localisation of the MediaWiki software system messages consists as of 25th of December of a list of 323 different localisations. MediaWiki defines 362 localisations. Some localisations are excluded for various reasons, the most commons is that the localisation definition has been created for convenience reasons or is still present for backward compatibility reasons.

Among the languages the amount of localisation varies from 0% to 100%. The number of MediaWiki core non-optional system messages is 2369, and the number of messages for extensions used by Wikimedia is 2727 (per 2009/12/25). The sum of messages to be translated is (2,369 + 2,727 ) * 323 = 1,646,008. The average localisation percentage of MediaWik core is 46.87%. For MediaWiki extensions used by Wikimedia it is 20.61%[2]. This means that about 1,105,757 or 67.1% of the total number of messages has not been translated.

Siebrand at translatewiki.net has estimated that 100 messages could be translated per hour by a professional translator. To translate the approximately 1,100,000 messages that at the moment is untranslated would therefore take about 11,000 hours. If one counts the number of languages with more than one million native speakers on this list there are 275 such languages. Assuming that the percentage of untranslated messages is similar to that of the list of the 323 languages this means about 940,000 untranslated system messages. (Probably the number of untranslated messages in these languages are lower than this because it is the uppermost 275 languages that has been filtered out, which is likely to be reflected in a higher amount of finished translations.) With the same translation speed as above this means about 9,400 translation hours. There are several ways to get these messages translated.

  • Translators could be hired. An estimate from Siebrand is that the cost for hiring translators would be $85/hour plus 20% in overhead. This would mean that all the messages could be translated for 9,400 hours * $85/hour * 1,2 = $958,800. This is about 13% of the goal of this years fund raiser.
  • At translatewiki.net translation rallies has been arranged where translators has been awarded with a share of €1000 if they translate a certain minimum (500?) of system messages. This approach has resulted in paying about $0.08/message. If it is possible to translate all 940,000 messages by this method with the same effectivity this would be a method that costs about $75,000. Or once again comparing to this years fund raiser, 1% of this years goal.
  • A third method to get translation done could be to let translators register with a pay pal account and pay $0,1/message they translate. With the translation speed of 100 messages an hour that Siebrand has estimated this would mean that translators could earn about $10/hour. For translators in wealthy countries this would not be a very high pay and could be seen more as encouraging volunteers for their work. In some less wealthy countries, probably in many that currently are under-localized too, this amount would however probably be a quite high hourly pay. This would mean that for quite well localized languages where volunteering is more likely to happen the money are more of an encouragement to do volunterring work, while for the less localized countries where voluntary work is less likely to happen it is more of an actual wage. It is however a problem how the quality of the translations can be assured with this method. To cover transaction fees and other overhead there could be a minimum threshold of translations that needs to be done to get payed and letting the first translations be unpaid to cover for such expenses. The cost of this method would be $94,000, or about 1,3% of this years fund raiser goal. The exact amount $0,1 could however be changed, giving another prize.
  • A fourth method is to run a massive campaign at all Wikimedia projects that highlights the need for localization to be done. Siebrand could work together with WMF to arrange such a campaign. One way to run this campaign could be to use the fund raiser banner space to promote localization. Also make the local chapters promote localization. The cost for this method would be very low.

Assertion: There are many languages that has character sets that differs from the latin characters.

Latin characters are well supported by MediaWiki but the character set of every language should be supported. For example, according to http://www.africanlocalisation.net/sites/default/files/AtypI08%20African%20fonts.pdf the African languages largely uses Latin alphabets, but with a variety of character set extensions. Other character sets that are used in African languages includes Arabic script, Ethiopic, Tifinagh, Nko, Vai, Kikakui, Bamum and Mandombe. There probably are even more character sets that needs to be supported if one considers not only African languages.

If one just adds together the number of speakers of the languages given as examples in the link above that uses extended Latin alphabets gives about 140 million speakers (slide 12-13), showing that coverage of different character sets is important.

Assertion: To solve character set issues it would be a good idea to cooperate with already established open source communities that try to solve the same issues for other systems.

ANLoc (http://africanlocalization.net) is one such community.

Some open source packages for African fonts are

  • Charis SIL and Doulos SIL
  • Gentium
  • DejaVu fonts
  • Liberation fonts (in progress)
  • Droid fonts (in progress)

Strategy 4: Minimize the bandwidth that is required to load pages

Who can play an essential role in this strategy?
WMF Chapters MediaWiki developers Editors Other volunteers Third parties
Infrastructure. Add "no media" feature to the software.


Outline

Question/Problem

Bandwidth differs significantly from region to region, and is a barrier to reach. If users within a region are not able to use any Wikimedia projects, be it their local Wikipedia or the English version, there will certainly not be any growth of the local projects from within that region.

This article contains a map that shows international bandwidth per capita as of 2005. In Europe and North America, where the growth of Wikimedia projects has been highest, the bandwidth is likewise. In large parts of the Middle East, South Asia, and Africa the bandwidth per capita is less than 1Mbps.

See also Regional bandwidth and this figure for more updated and detailed data.

Strategy

There is a couple of things that can be done to limit these problems:

  • Give visitor the opportunity to turn of automatic loading of media. A highly visible button labeled "Turn of media" or even something more explaining as "Does Wikipedia load slowly? Click here to turn of media and decrease loading times." could be displayed.
  • The local projects could be hosted localy to take advantage of the higher "within country" trafic speeds. (Be sure to catch medias attention in case of such an action. If handled right, this could be an essential outreach move ass well.)
  • Local mirrors or catches can in the same way as local hosting decrease the loading times.

Note: There might be legal and technical problems with hosting, mirroring or catching Wikimedia projects outside the US that has to be considered. Catching could maybe avoid some legal issues that hosting and mirroring presents. Local hosting, mirroring and catching does not necessarily implies that this should be done in every country. There could for example be possible to find a country on each continent that complies with the legal framework that the WMF operates under. Such continent hosting, mirroring and catching would also increase the access speed.

Assertion: The amount of data that on an average needs to be loaded when a new article is loaded is estimated to be around 200kb, in some cases more than 1Mb of data has to be loaded.

Using the Firefox extension Firebug the amount of data that was needed to be loaded when fully reloading the articles that was featured at en.wikipedia.org from 1st to 30th of December where 340kb, visiting the page without being loged in. Moreover, even though no such large article was found in December there was at least one in November where 1.2Mb where needed to be loaded. Namely the article wind that was featured the 18th of November.

Further a random walk through Wikipedia articles where a random blue link where followed from a featured article, a new random blue link from this new article followed and so on, seemed to indicate that the amount of data that was needed to be loaded mostly where in the interval 10kb-500kb. This time there was no reloading done, so as to try to find out how much data that has to be loaded when a random new article is loaded but cathced material reused. Most of the material that was loaded came according to firebug from upload.wikimedia.org, sometimes there where material from other destinations that accounted for almost half the loaded data, but very often the upload.wikimedia.org accounted for the significant part. This indicates that when visiting Wikipedia, most of the information that is loaded is media.

Assertion: On 50kbps connections the average time it takes to load an article is estimated to be about 30 seconds.

Assume that 200kb of data has to be loaded on an average when visiting a new article, which seems reasonable from the random walk. With an internet connection at 50kbits it would at full speed take the user 200*8/50 = 32 seconds to load a new article. The featured article of the 18th November would with the same assumptions take over three minutes to load at full speed.

Assertion: It is likely that many internet connections in the near future will be 50kbps or slower. Also "within country" speeds are severly higher than "outside country" speeds.

More interesting is the bandwidth per connection. When looking at "surf speed" statistics, the average "outside country" surf speeds in the countries in the list very often is 100kbps or below, not very uncommon with values lower than 50kbps. Most of the countries in that list is also quite well developed which makes it very likely that Internet connections in less developed countries is even lower. The statistics does however also show that the "within country" speeds often are much higher than the "outside country" speeds.

With the large amount of mobile phone subscriptions in developing countries it is not very unlikely that many of the Internet connections in the near future will be through mobile technologies. GPRS has maximal connection speeds of 56-114kbps while 3G as a maximum has 14Mbps down speed and 5.8Mbps up speed. Because these are the max speeds of the mobile connections it is not very unlikely that many will connect to the internet through connections that effectively are slower than this.

Pages for collection and analysis of data

The following pages are used for collection and analysis of data related to the four recommendations above:

Additional thoughts from the planing process

Additional thoughts from the planing process that has been brought forward, can be found at Task force/Local language projects/Planning summary. These ideas can form a basis for any continued work on local language project related issues.