Which "local languages" are top priority and why?
For localisation the 'top priority', at least for me, from a translatewiki perspective, and in relation to my contract with the Wikimedia Foundation, are the 50 most spoken languages in the world. Recent weighted statistics are at translatewiki:Project:MediaWiki_localisation_in_the_50_most_spoken_languages.
Typically, languages spoken on multiple continents and languages spoken in Europe are doing great, with average scores of 97 or higher (100 is max.) in a Wikimedia context.
Much more worrying is the situation for languages from Asia (avg: 66) and Africa (avg: 46). Within these groups the languages worst off are Oriya, Zulu, Burmese, Hausa, Min Nan Chinese, Urdu, Urdu, Wu Chinese and Sindhi, all with a score under 40.
The average localisation score in a Wikimedia context at the moment is 73, up from 68 a few months ago. My target is to get this at 83 by the end of Q3-2010. A lot of work must still be done for that.
I have secured some funds for myself to be able to devote a day a week to translatewiki.net, to give some bounties to translators to motivate them to periodically make an extra effort (by organising "Translation Rallys"), and I am currently looking into possibilities to work together with for example the India Wikimedians and making them responsible for progress in the localisation of the languages spoken in their country and providing some sort of reward for that (my current thought is a donation to the Indian Wikimedia Chapter that has not been erected yet).
I would also like to integrate the "meta translation community" with the "translatewiki.net community". Larger groups in localisation are able to provide more continuity and a higher quality.
Would you like to give some concrete advices and expectations of work effort needed at Localisation, and also check whether my analysis in the second paragraph might be realistic?
Done. I have corrected the number of translations per hour that can be expected, and you forgot about the extensions used by Wikimedia. I have rewritten the paragraph based on current statistics from translatewiki.net.
Thanks alot! What is the expected price per hour if we even should recommend WMF to pay for geting this work done?
And do you have any statistics on what impact earlier localization efforts have had on the growth of wikimedia project? Any curves showing for example that the growth rate of a wikipedia in some languages has increased soon after extensive localization has been done for the same languages?
Unfortunately not. I have spoken about this with Erik Zachte in the past, and we have a clear picture on what we need to try and correlate the two, but the localisation data is not available in such a way that he can reliably add it to his traffic and growth statistics. That is something that is on my schadule for January. Then translatewiki.net will be spitting out a weighted localisation score per Wikimedia project language. Basically I have everything I need already, except for two things: mapping of localisations in MediaWiki to Wikimedia projects, and averaging over those multiple localisations to one Wikimedia prefix.
Is there any planed address where such information will be available in the future? And what do you think the cost would be if the translation where payed to get done? Are there also other models than hiering singel translators, like the bounty rally you run on translatewiki.net right now?
- when available?
- Don't know. Depends on my making the data available and Erik Zachte integrating it in the statistics he publishes.
- Cost of hired translators?
- I would estimate around EUR 60/hour, and a 20% overhead.
- Alternative ways to get translations done?
- Well, at twn translations are done for free by translators. Sometimes we provide incentives, but I estimate that the number of translations we get during a rally only accounts for 10% or less of our total volume; it is more of a competitive/volume focused scenario that motivates a particular group of translators; most of them are also active outside of the 'rally periods'. Once all translation work is done for MediaWiki, it also becomes impossible to qualify for a cut of the rally stakes - which is a good thing. About 12 languages are so complete that the translators for it cannot qualify in the rally (which requires 500 new translations for MediaWiki). Other ways of motivating translators are regular newsletters by e-mail to keep reminding the registered translators about having to come back to translatewiki.net - immediately visible results after sending out a mailing -, chasing (potential) translators in their home (wiki) environments - this takes a lot of time - and regularly reporting on the localisation level of languages in the village pumps - this is an activity that GerardM performs on an almost monthly basis which renders visible results every month. The last strategy that is used to keep localisation levels up is enforced by a policy of the Wikimedia Language Committee; they require a certain localisation level before a new language will get its own wiki project (there are also other requirements), and for new Wikimedia projects for languages with existing projects there is also such a requirement. If the language does not get enough support to meet the localisation requirements, no new projects are created. Other possible initiatives that have not yet been undertaken as far as I know, would be to focus communication on the Wikimedia Chapters, and asking them to actively recruit translators for the supported language in their geographical are. Personally I have made an effort towards Hausa language related mailing lists, and I'm still trying to get through to Indian Wikimedia users on the wikimediaindia-l mailing list, but that does not appear to be rendering much result, so I am currently exploring motivating them with a sum of money that will be awarded conditionally, but I have not yet secured funds for that.
Does that sufficiently answer your questions?
I think about presenting different alternatives and estimates of cost for geting the localization of the MediaWiki system messages done for the 275 languages with more than one million speakers.
- Hire translators for 9000 hours at an estimated cost of $85/hour and an overhead of 20% gives a cost of 9000 hours * $85/hour * 1,2 = $918 000. (That is 12,24% of this years goal for the fund raiser.)
- Provide money for bounty rallies (is there any statistics on how many messages that has been translated per dollar in such rallies before?)
- Encourage translation by paying $0,1 per translated message. It could increase the will of volunteers to translate as they can earn about $10/hour with your estimate of translation speed. If translation is done by local people this amount of money may for many of the languages even be a good sallary. The cost would with this method be $90 000. (Which is 1,2% of this years fund raiser.) Translators could register with pay pal accounts and a minimum threshold of translations could be required for any payment to be done so that the first couple of translations could pay for transaction fees.
The numbers sound about right. We have in the past had a Language project, funded by HIVOS through Stichting Open Progress for under-resourced languages with over 100k speakers. We paid about 0.04-0.11 Euro per message in that, depending on the relative importance of the group the message was in. The most heard critique by the professionals I talked to about it was that we didn't pay enough to attract what was potentially out there. But as I said elsewhere in this wiki, I am hesitant to allow people to make a month's living out of translating: it would/might cause (some of the) other translators to feel undervalued, possibly causing them to no longer translate. That is a very valid threat in the "pay translators" scenario (then either being volunteers or professionals).
For the translation rally which has been held and analysed twice we had 35 and 36 qualifying translators. That means at least 18k translations for 1,000 Euro, so that is about 0.06 Euro per translation. It looks like the currently running rally will render a slightly lower result (I expect 25 to 30 qualifying translators - but we will see what this week's end result will be).
Having a project paying translators for a defined subset of languages, time based, and not paying if the full commitment is not honoured, will work, I think. An issue I see is the validation of the work; how should we get that done?
In general I prefer the "lowly paid volunteer option", both for sustainability and feasibility reasons; I do not think there is any chance that the WMF will allocate 900k towards L10n, I think it might several 10s of 1000s of dollars towards it if there is a proper project plan. However, I think we could do something else first: really chase potential translators on their home wikis with a properly formulated pay off on why L10n is important and how they can help. That should keep us busy for a few months. Maybe we could even get the "global site notice" involved, like with the fundraisers, specifically targeted on wikis with a "Wikimedia L10n score" of under 85. This is something I could coordinate with the proper help from the WMF office. Only if we are not able to properly crowd source, or maybe I should say only for those languages we do not prove to be able to properly crowd source, we should think about spending serious money on paying for localisation. First step in the latter would be paying volunteers, and if that doesn't work, we should find professionals to do the job if the WMF thinks the language is important enough to spend the money on. How does that layered approach come across to you?
I understand that there is a valid threat that volunteering translators that not is able to benefit from the share might stop translating if others are payed for the same work. My idea is however to provide information about what different methods would cost and then leave for others to decide wether to preceede along any of these lines. I think the recommendation document could list a number of alternatives and expected costs together with a warning about the potential threat that volunteering translators will quit if others are payed for the same work.
Is there a difference in the likelyhood for volunteering translators to quit depending on the method? The first method would probably mean that a few translators is choosen and payed for doing the work, very likely discouraging volunteering translators to continue their work. But would not the bounty rallies and pay per message methods be open to anyone to benefit from?
I have also worried about the risk of geting non sense results if no validation is performed. How do you solve this in your bounty rallies? Is this not considered a problem because you have to have a translator privileges to be able to translate?
I also wonder if you think that the bounty rallies are likely to be less and less effective in terms of messages/dollar because the farther the process is taken, the less "easy" messages is left to be translated?
I think the idea to crowd source is realy good. Could we have a three step process:
- A global crowd source campain encouraging people to join in localization. Including site notices and whatever can be imagined.
- When the localization work starts to fall of, encourage people to continue and attract new translators by running bounty rallies and using a pay per translation system.
- Finally, for the messages left that doesn't get translated. Decide whether any of the remaining messages are worth translating and hire translators for this work.
Could the translatewiki.net site be customized to allow for all these possible stages? For example enabeling translators to add a pay pal account and automating transactions to translators?
- Is there a difference in the likelyhood for volunteering translators to quit depending on the method?
- More professionals equals fewer to no volunteer translators on those particular language, I would guess.
- Would not the bounty rallies and pay per message methods be open to anyone to benefit from?
- Even pay per message would require some commitment from the translator, IMNSHO, with regards to time and volume to keep things manageable.
- Rallies are likely to be less and less effective [..]. How do you solve this in your bounty rallies?
- I don't as long as the results are still satisfying; I partially steer the results in the rules by demanding some groups to be translated before others, and in the currently running rally by adding a bonus to the two top contributors. If the results would be expected too low in the current form, or if the latest rally does not render sufficient result, I would think up a way to get better results next time - basically train and error.
- Is this not considered a problem because you have to have a translator privileges to be able to translate?
- We give out translator privileges very easily and assume good faith until proven otherwise; it works quite well, indeed. When in doubt, we seek confirmation or another translator for the same language will notify staff, and we will take corrective measures.
- Would the less "easy" messages be left untranslated?
- See previous thoughts on steering and commitments in time and volume, but it is a threat, though minor.
- Possibilities for customising the site?
- Everything is possible; I would however request some WMF developer help and support on getting it done, as the WMF already has a lot of infrastructure in this area.