Local languages/Summary of language issues proposals

From Strategic Planning

This is an unfinished attempt at summarizing the language issues proposals. Please feel free improve it.

Menu for navigation between different wikis

Proposal: Implement a menu to simplify navigation between different wikis, both between different languages and different projects (i.e. wikipedia, wiktionary, ...). Some smart dropdown box or other navigation system would be needed because there are alot of different projects and language versions. Either possibility of including wikiprojects and languages of choice, or a system that displays German and similar projects by begining to type for example "GER..." or "Deu...".

Comments: The possibility of such navigation at the main page www.wikipedia.org might already provide this feature in a satisfactory way, and you can simply search the whole wikipedia.org with any search engine by typing site:wikipedia.org. Moreover, the proposal is thought to not be useful to many readers. Also several projects do have this feature, but in Brya's experience this usually does not prove to be helpful, as links are made willy-nilly, disregarding the question if there actually is useful content at the linked site. It just causes the user to feel he is in a huge badly-kept database instead of in an encyclopedia. However, a fastkey for switching to languages of ones own choice would be easy to implement, while the extra processing and storage costs will be virtually zero.

Links: Proposal:Change_languages_quickly


Unipedia

Proposal: One proposal says: "Create a version of wikipedia which is to be developed for one language, translate all existing wikipedia articles into this language and collect it on this new platform." Another proposal says: "Instead of having a wiki in every language there should be a translator bot that translated the posts into a universal language like English for storage on a server. When someone wants to view a page it is translated to their language and so is are edits the make. This would save a lot of space on your servers. However, on pages that must be localized this bot could be stopped. That way we could HAVE a "world in which every single person on the planet is given free access to the sum of all human knowledge" regardless of the language we speak."

Comments: Wikid77 opposes this idea; It is too difficult to make webpages seem well-balanced with logical titles across numerous languages and there are inherent limits to the word-based lookup of articles by title. He says that he cannot emphasize enough about the bias in titles: even with English-language "American Civil War" versus "War of Northern Aggression" the same subject is presented with differing priorities, including the spoils of the victor, versus the suffering of the losing parties in a war. He predicts extreme conflicts (of bias or prejudice) when trying to homogenize all languages/cultures into a unified "unipedia" of what titles should be chosen. A unipedia really would quickly become a "Tower of Babel" with users struggling how to move forward, with so many conflicting languages/cultures trying to re-structure the tower from their own viewpoints.

For automatic translation the critique goes that automated translation is a technology which has not yet arrived, nor will it in the near future. A proposal of machine assisted translation is made where computers translate the parts it find unambigious. But any meassures of possibility and usefulness of this is not given. One user is in favour of making use of computerised translation facilities, but is not sure that this should be done fully automatically without appropriate editing. His suggestion is that in the case of Wikipedia, a statistical approach could be implemented on the basis, say, of the priority and quality of articles (e.g. talk page criteria) as well as the number of accesses to articles. This would provide a means for deciding on which articles seemed to deserve inclusion, for example, in the English-language Wikipedia and which could be drawn from the English Wikipedia for inclusion in other languages. For the latter, articles in English about the country or countries in which a given language is spoken would merit special attention. The whole question of proper referencing could also be handled along these lines. This would mean that instead of finding a long string of articles in other languages on a given topic, those which have been given high marks for quality and priority would be distinguished from the others (perhaps by colour coding) and bi- or multi-lingual editors would be encouraged to draw on computer translations as a means for creating or expanding articles in other languages.

Links: Proposal:Combination_of_all_languages_into_a_unipedia, Proposal:Content_translated_automatically


Moving articles between different wikis

Proposal: Make it easy to move articles between wikis. If an article is not good enough for wikipedia, but it can be a good article in wikiquotes, there should be a move option between them. It's anoying, when you add an article, and some of the admin removes it, because this is not the right place for it.

Comments: The Transwiki process already does what this proposal suggests. It just needs to be made more well known among editors and especially admins.

Links: Proposal:Easy_move_between_wikis

Make it possible to show articles or parts of articles from other languages

Proposal: Make it possible to show articles or parts of articles from other languages, either in original or machine-translated form, with a Wikipedia article. This could have several forms,

  • Enabling interwiki links from non-existing pages, perhaps also showing the page in one of the languages, or a machine translation of it
    Maybe with A central wiki for interlanguage links? See also Proposal:Content translated automatically.
  • Showing articles in other languages in original or translation when the page is more extensive on that language, or when certain aspects of a subject are treated more extensively in some languages, less extensive or not at all in others
    Try and click ⇔ in the "In other languages" of Wikisource.
  • Using a tool like the previous one for editors, to find aspects of the subject that have been given relatively little attention, and possible sources for translation
  • Enable one text to be used and edited on different wikis, so that one can use the material from wikipedia on wikibooks or vice versa in such a way that changes on one wiki are automatically also corrected on the other one.
    See mw:Manual:$wgEnableScaryTranscluding, mw:Extension:Labeled Section Transclusion.
  • Just show with the interwiki which languages seem to have significant extra information on the subject, and which pages are just stubs or partial translations.

Comments: The problem is also that our readers do not use interwiki that much; many do not know they exist, and periodically someone arrives at the the reference desk saying "hey, I have a great idea: link other languages pages!"

Links: Proposal:Extended_use_of_interwiki


Software package that can integrate an automatic translator

Proposal: Develop a free software package that can integrate an automatic translator. The integrated tool set + translator prepare items to facilitate the work of translation. Add an automatic translator automates all tasks easily translate them to Wikipedia articles. Combine: (Google Translate)

  • Translation of texts produced by an automatic translator that can integrate freely
  • Code wiki inserting the visible text
  • Use their own wikipedia to find the corresponding articles in the new language in the original language of internal links
  • Copy Transwiki Transwiki the language and create original
  • Add the template of translating.

Comments: The Wikipedia project is multilingual and it would be ideal that the authors are engaged in human enrich the content with respect to what others have done. Multilingualism has the disadvantage that prior to enrich the content should be translated from other languages that is already done. Sometimes people do not know other languages are more practical and start again to translate, this causes a duplication of efforts.

The speakers of minority languages are all bilingual, if in addition the dominant language has a powerful project to Wikipedia that puts these projects in a very weak which contributes to aggravate the danger of being lost these languages are a cultural heritage of very important to humanity. Have powerful aid in translation along with bilingualism or multilingualism of its speakers can partly compensate the disadvantage of the speakers of these languages.

The key issue is to make a modular development environment independent translation software translation between pairs of languages. Thus each linguistic community could be deployed in pairs of languages that have more interest, could achieve economic aid local governments in the development of language translators and concrete could take advantage of future progress of machine translation technologies.

Links: Proposal:Integrated_tool_to_assist_in_translations


Improve the possibility of cooporation between different Wikimedia projects

Proposal: Educational materials are being developed by volunteers in hundreds of languages and in various Wikimedia projects. Although this parallel development is a strength, volunteers do not in general coordinate with each other across languages and projects. This leads to inconsistencies and "re-inventing the wheel". Better integration would allow the volunteers to build on each other's work and be more productive.

We should develop mechanisms to foster the integration of contributors and materials across the various languages and WikiProjects. To achieve this integration, I recommend extending interwiki links, categories and WikiProjects, adding machine translation tools, and improving search tools across languages and Wikimedia projects.

I propose that we develop places where editors from different languages and projects can assemble to talk to one another (improved user coordination), and where an editor can easily find out what has been done already on their subject (improved search) and translate and incorporate that material into their own work (improved import).

How might we address these problems?

To improve search

One approach would be to extrapolate from the current system of interwiki links. We need a better method for editors to quickly survey the state of the corresponding articles, rather than systematically clicking on dozens of links. However, I note that there is not always a 1-to-1 relationship between articles on different language Wikipedias. We might use a common system of categories that are cross-linked and searchable from a particular language version of a particular project.

To improve import

Although I appreciate why we should scrupulously give our contributors credit for their work, the present system of import between projects seems cumbersome. It would be great if a system would be worked out to port material freely from one project or one language to another.
Machine translation might be a useful tool for our contributors to adapt other contributors' work to their project or language. Google's translation tools for Wikipedia might be applied to give contributors raw material from which to craft their own article.

To improve user coordination

On the English Wikipedia (and I presume others), editors working within a field coordinate their efforts in WikiProjects. We might be able to extend WikiProjects so that they automatically cross-list messages from their counterparts on other languages and projects. Here again, machine translation might be a helpful tool to facilitate conversations between, e.g., French and Chinese contributors.
More generally, I believe that strengthening and supporting WikiProjects is a good strategic goal for the Foundation. Chapters are good for representing localities, but I believe that a "guild" of contributors in a given area (say, geometry, or electrical engineering) have much more in common with one another than do the contributors from a single locality. A strong, successful WikiProject is the ideal point-of-contact for connections between a professional society of subject-matter experts and the Wikimedia community, as has been seen already in several instances, e.g., the Wikipedia workshops given by the Molecular and Cellular Biology WikiProject on the English Wikipedia.

Comments: For illustration, consider the field of geometry, which is studied every year by millions of students and which lies at the heart of many fields, such as engineering, surveying and physics. The ideas and methods of geometry do not depend on the language in which they're taught, and do not change with time. Therefore, all languages and all Wikimedia projects should be able to benefit from the best available wiki-material on a given subject. As an example, suppose that the best Wikipedia articles on "circle" and "triangle" and "cube" are found in the Spanish, Polish and Japanese Wikipedias, respectively. Ideally, the Spanish geometry contributors should be able to easily draw upon the Polish and Japanese materials to improve the Spanish "triangle" and "cube" articles, and vice versa for the Polish and Japanese geometry contributors. More generally, the Spanish geometry contributors should be able to coordinate with the geometry contributors of other languages to share best practices and develop consistent approaches to explaining the material. By pooling their strengths, small communities of editors could reach critical mass and cover fields much more efficiently by dividing the labor and working in parallel. Also, it's simply more fun to edit as part of a larger, thriving community.

Unfortunately, if we examine geometry resources across the Wikipedias and across projects (e.g., the English Wikipedia, Wikibooks, and Wikiversity), we find no such coordination. Contributors are often "re-inventing the wheel", categorization is inconsistent, excellent articles don't get translated, etc. Editors often work in isolation, and don't benefit from informed reviews of their peers. An editor who decides to improve a specific article on their own Wikipedia, has trouble finding what's been done already on that topic in other languages and projects.

Brya is not sure about this. A lot of the time he feel as if 'integrating' pages is the overriding concern of the project, to the extent that it actively suffocates content. He finds that when he make a new entry, a bot will be along within two or three hours to put in the wrong interwikis. Also Juan de Vojníkov is very sceptic with this proposal, because you can fill the gaps in different languages artificially - by means of imports and translation. That is according my opinion for a big quantity of low quality data and chaos.

However, some commentors also think the proposal is a good one. HenkvD like this proposal in general and thought of an addition: If an article about Circle is missing in a langauage: Just display the article in a related language. For instance in Dutch for dialect wikipedias. Of course a warning and a different background or so should be shown to trigger the reader that this is not the correct language. What is needed is a) interwikis to not jet existing articles (red-link interwikis), and b) a list of related languages. Further Dupuy think this is an excellent proposal, and especially would support some sort of "single sign-on" user accounts (and profile pages) that are unified across all languages/projects.

Thamus who is a somewhat experienced translator agrees with the proposal except the part with automatic translation because he has never found a program that was helpful even in assisting translation.

Links: Proposal:Integration_across_languages_and_WMF_projects


Interoperability amoung Wiki based sites

Proposal: Interoperability amoung Wiki based sites allowing for cross-site search capability and indexing as well as better import and export features.

Search

I propose a central server indexing similar to google of Wiki-based sites allowing for cross-site search. This would be a central database that a site owner could submit their wiki to. If the community judges their Wiki to not be blantent spam, nonsense, or consist of frequent vandalism, they would be included in the index.

One-click export-import

I would like to see Wiki sites work closer together to export and import data directly instead of requiring a user to export XML to their desktop and import it to their Wiki site. An idea would be that if a user could go to the Wiki site he wishes to import to and gives a URL to a page he wants to import from, the sites could use the API to pull the page from one another.

Comments: My main motivation is my difficulty in finding topics using google search. Often when I search for a product that I want to know more about, I get bombarded with e-commerce sites selling the product. Sometimes I will add "Wiki" to the end of my google searches which will make any available Wikipedia pages appear near the top but Wikipedia does not always have what I seek. WikiMedia is a leading provider of Web 2.0 software and I would like to see these Web 2.0 sites more integrated and interoperable. What I propose is a site similar to google that crawls all sorts of Wikis and can use a sort of PageRank system like google does to order by relevancy. The thing is, even with the Firefox add-in, I personally do not know all of the different wiki sites I may want to search. I find I have better luck searching Wikis than using google if I want just general information on a subject. If there was a search engine that only searched Wikis, I feel I'd have an easier time finding what I seek. I am looking for a feature where I can cut out all of the shareware and e-commerce sites from searches where I want information on a subject.

A federated search of multiple wiki sites might be useful, but that's something search engines could and should provide; you tell it to favor sites running MediaWiki and/or a specific list of sites. Pulling information from multiple wikis is tricky. The interwiki prefix table is implicitly a set of trusted wikis, but their page names don't match, won't match, and shouldn't match. Searching for "Spock" happens to go to the Star Trek character's page, but it could well go to a disambiguation page for him and Dr. Benjamin Spock. A domain-specific wiki can have a title like "First novel" or "McCoy" that corresponds to a different title in a more general wiki.

A 1-click export-import of data raises the question of what data gets exported. If you're looking at the contents of infoboxes, then there are research projects that already crawl wikipedia pages deriving facts from them, see en:DBpedia. And you can already do this: see m:Help:Import (transwiki import), mw:API:Edit - Import.

Links: Proposal:Interoperability_(Wiki-to-Wiki_interoperability)


Statistical analysis

Proposal: Come up with a better way to account for the expected number of people we can serve with various incremental improvements to language offerings such as in Category:Proposals for new languages.

Comments: Some alternative to cost center accounting needs to be developed and articulated so that we aren't dependent on the whims and whimsy of standard bodies which are not always transparent. There is currently no way to measure the point at which returns for investment in wider language coverage diminish. If we had better statistics about demand, we could explain our decisions about supporting various communities in a more accurate way.

One commentor firmly believe there are fairly reliable approaches to dealing with this problem. At Wikipedia there are now over 3 million articles in English but far fewer in the other languages and you could start by making a statistical analysis of those pages which:

  • have high quality ratings, preferably coupled with medium to high priority ratings in the criteria found on talk pages;
  • or which have a high number of page accesses.

Under these conditions, it would be possible, for example, to alert editors of other languages to pay special attention to these English language pages and suggest, where their language ability permits, that they should use computerised pre-translation tools to assist in preparing versions in their own language. On the same basis, stub pages in other languages could often been enhanced on the basis of more thoroughly researched English pages with their usually more complete references. Finally, in the opposite direction, interesting articles in other language versions could be prioritised for inclusion in the English Wikipedia.


Links: Proposal:Language_demand_statistics


Unify different Wiki projects

Proposal: Today WMF supports different projects in more than 260 languages: Wikipedia, Wikisource, Wikinews, Wiktionary, Wikiquote, Wikiversity, etc. These different projects could have been created because the community of English, German, French projects have been huge enough to split up different projects with different main focusses. While Wikipedias are very successfull projects in many languages, other projects are often not that active. Even European languages with an active Wikipedia project like Dutch may have problems to keep other projects alive (see Dutch Wikinews).

The aim of this proposal is to help small language communities with different wikiprojects in keeping them alive by merging all projects into one wikiproject. By merging into one project editors and readers could be focussed on one project. Also the problem of fighting vandalism could be reduced because vandalism in different wikiprojects could be seen in one "recent changes"-list. For all languages, which have an active Wikipedia, this could be a way to keep and support other wikiprojects, which are requested for deleting. For Wikipedias with a small community this could be a way to get more active users. This could also be a suggestion for requests for new projects in small languages, if there is no active community in a test project on incubator after some time (before rejecting the request).

Comments: For this proposal Alemannic Wikipedia could be a model. The community of active users is rather small even in Alemannic Wikipedia, the other projects were nearly inactive. So Alemanic Wp community decided to merge all four projects into one project, and open seperate namespaces for the different projects; Wikipedia, Wiktionary, Wikisource and Wikiquote.

John Vandenberg says that this proposal does have merits, however he would like to point out that texts in small languages can be maintained together on the multilingual sources wiki, which is designed for dead languages or collections which will never have a large community. It also acts as an incubator, able to incubate texts in a language for many years without worrying that the collection is not growing.

Links: Proposal:Merging_inactive_wikiprojects_of_small_languages


Multilingual Wiktionary

Proposal: The Wiktionary project, as it stands, is duplicating a lot of information across different languages. A multilingual Wiktionary has already been proposed, and adoption of this or a similar proposal would be a great benefit to all Wiktionaries.

Comments: At present, the goal of the Wiktionary projects is to create a database of dictionary entries in all languages for words in all languages. Unlike standard dictionaries, which cover only one language, Wiktionaries aim to cover all dictionaries (including bilingual dictionaries and, for some projects, thesauri) in one project. This undoubtedly means some information will be repeated across different Wiktionaries, such as translations and word forms. If this information could be stored in a central database or other data storage mechanism, for all Wiktionaries to access, the project would undoubtedly be much better off.

Brian Ammon would agree with this proposal and think Wiktionary should rather be something like the Commons are today. There is too much redundancy because you have an article about a word in the native language Wiktionary and then in any other language Wiktionary (mostly of worse quality). The idea behind Wikipedia (and its sister projects) was to collaborate and share the knowledge. Now he find that with the (often belittled) Wiktionary project, every language version is trying to make a complete dictionary of all languages. It would be a lot easier and less time consuming just to merge all Wiktionarys and only the user interface is different (as said, just like Commons). It is futile to have an article “Emergenz” in the German Wiktionary and below a link to the translated (Italian) word “emergenza” in the German Wiktionary (which is odd enough) and then behind that, the link to the word “emergenza” in the Italian Wiktionary. There should be some kind of template which is translated automatically depending on the ?uselang= function and only the parametres are added such as “word type” “plural forms” “conjugation” etc. which is present already in many Wiktionarys.

The OmegaWiki project provides a multi lingual Wiktionary functionality. At the time the WMF was not interested in it and it had to be renamed from "WiktionaryZ" to OmegaWiki. The project is there and there is room for talks to integrate functionality and content.

Lmaltier says that this proposal forgets something very important: to be able to discuss on contents, editors have to share a common language. The reason why it's always forgotten is that discussion on these new projects are in English, usually. What people not speaking English think is often forgotten. There would be two categories of contributors: an English-speaking elite, and second-grade editors, not really able to fully participate to decisions. This is why wiktionaries have much (really much) more success than OmegaWiki: have a look at OmegaWiki recent changes, you'll see that there are only a few contributors, all of them being able to speak English. Could you imagine discussions in a multilingual merged Wikipedia? It's the same problem for Wiktionary. The situation on Commons is similar: discussions are in English. But it's less of a problem, because there is much less to be discussed (no articles). The current Wiktionary situation allows to try different policies, the most effective solutions being likely to be also adopted by other wiktionaries. Some of the information can be copied from one wiktionary to another one by using bots (but only some information, not all information, of course...). The fact that a multilingual wiki is impossible does not mean that a shared database is impossible: some information might be shared and accessible from all wiktionaries without having to share a common language. An example is the list of anagrams of a word (for a given language). But even this limited common database would cause problems: what if an editor from the Albanian wiktionary adds an anagram for a word, but an Albanian-speaking editor from the English Wiktionary removes it, because, according to the English wiktionary policy, this added word should not be considered as a word? The only way to solve the issue is to define a common policy. But discussing a common policy means sharing a common language. Whatever the way you look at it, it's a lost cause, very interesting at first view, but very difficult to implement technically if you want to make contributions easy, and impossible anyway because of language and psychological reasons. Why replacing successful projects and trying to copy an experimental project, OmegaWiki, which is a failure? JackPotte adds that apart from this we've also noticed that to list the multilingual informations which are destined to be exported toward all wiktionaries wasn't a success: http://en.wiktionary.org/wiki/Wiktionary:Embassy.

Links: Proposal:Multilingual_Wiktionary