Talk:Missing Wikipedias
Contents
| Thread title | Replies | Last modified |
|---|---|---|
| Categorization of areas | 0 | 13:13, 26 May 2011 |
| Language categorization | 0 | 12:57, 26 May 2011 |
| Ideas | 1 | 09:51, 24 May 2011 |
Categorization of areas are somewhat specific, as they are based on language groups besides geography:
- North Africa: "Afro-Asiatic Africa". Thus, Ethiopia, Somalia and a couple of other countries usually considered as "Sub-Saharan" go there.
- Sub-Saharan Africa: The rest of Africa.
- Americas: One of the usual segmentation -- South, Central, North.
- South Asia: Countries under significant influence of Indian culture, mostly Indo-European and Dravidian languages: Afghanistan, Pakistan, India, Bangladesh, Nepal, Bhutan
- West Asia: Middle East.
- South-East Asia: South-East Asia, including South China, which has languages of similar groups as South-East Asia.
- Polynesia and Philippines: Islands between Indian and Pacific Oceans; including Malaysia, Taiwan and Papua New Guinea. Mostly Austro-Asiatic and isolated language groups.
- Siberia: Languages of Siberia.
- Continental Asia: Languages of Asian part (East of Caspian Sea) of former Soviet Union without Russia.
- Australia and Pacific: usual meaning, but without Papua New Guinea.
I've started to categorize languages according to some principles. Some of the are exact, some of them are arbitrary categories:
- Does the language have any Wikimedia test project (usually at Incubator) or not? If yes, that usually shows that there is interest in creating that project. We should see how things are going there, do they need our help and which kind of help they need.
- If literacy is low and there are no efforts to improve it, efforts should go that way.
- Is it about the language without writing system? If yes, efforts should go that way.
- Does the language have Wikimedia project in a "macrolanguage"? That likely means that speakers would be happy to use their macrolanguage project or that they are already using it. However, it doesn't necessarily mean that and should be checked. We have a number of macrolanguage editions which cover probably hundred of "individual" languages.
- Special cases. Up to now, there are three categories (described inside of that section):
- "Macrolanguage" is widely used. (Arabic languages, Mongolian languages.)
- Writing system gives de facto literacy in L1 if L2 is known. (Chinese languages.)
- Languages are spoken in well developed areas of the world by non-endangered population. It is assumed that population want or doesn't want Wikimedia projects because of their internal reasons. Examples are Mainfränkisch and Albanian Gheg (with Incubator project) and Upper Saxon (without Incubator project).
- Languages not inside of any of the category above should be the second priority (after those with test projects). It is likely about languages spoken by people without [good] Internet connection or some other reasons. Every case should be analyzed separately. If it is about low Internet penetration, then we should create alternative ways of reaching that population and instructing them how to create and edit wiki encyclopedia in their language.
First of all, I think to start off with, we should concentrate on languages that are considered independent languages by everybody. I think generally, there is a greater need for these, since (for example) speakers of Jin Chinese can probably read Standard Chinese if they are literate; literate speakers of most Bihari varieties read Hindi, I believe; same goes for Arabic varieties. Additionally, it is probable that the decision of which varieties get separate Wikipedias will be problematic, and I think that going forward we cannot continue to simply blindly follow the Ethnologue classification of what is a dialect and what is a language. For example, I think it is very likely that most North African varities of Arabic (probably with the exception of Hassaniya, which is supposedly extremely divergent, and obviously Egyptian, which is not a conventional North African variety except for some of the dialects in the west of Egypt) can and should share projects, since they are largely mutually intelligible and often considered a single language, often referred to as Derija (See w:en:Derija), North African Arabic or Maghrebi Arabic. Rather than creating a Tunisian Arabic Wikipedia, a Moroccan Arabic Wikipedia, a Libyan Arabic Wikipedia and an Algerian Arabic Wikipedia, it would probably be better to just have a Derija Wikipedia. Unfortunately, no ISO code currently exists as far as I am aware of to refer collectively to North African Arabic varieties. This leads to another thought of mine, which is that we (or "you", the language committee, or some Wikimedian) should be more active in submitting requests for changes to the ISO language codes when necessary. Rather than being slaves to the existing codes, we should be active in shaping them, guided by expert advice. Thus, rather than simply denying a potential request for North African Arabic or giving it a code corresponding to a national variety of Arabic, or throwing our hands up and telling the requester it is their responsibility to request a change to the standard, we should ask an expert if they think that North African Arabic really constitutes a group of mutually intelligible varieties that can and should share the same literature, or if it is a group of varieties that are not mutually intelligible; if the response is that they are mutually intelligible, we should submit our own code change request. This way, we can help facilitate the creation of new Wikipedias in a more active way, rather than simply acting as gatekeepers to people who are probably completely new to the process.
In my view, these are the top most essential Wikipedias we are missing:
- North African Arabic. Currently, no single ISO code; approximately 60 million speakers total.
- Nigerian Pidgin English.
- Jamaican Creole. There is already an incubator project with quite a few articles. It is the native language of almost all Jamaicans, as well as communities in Central America and elsewhere.
- Balochi. Also has an incubator project.
- Batak.
- Mòoré.
- Santali.
- Hiligaynon. Also has a test project.
- Quiche Maya.
- Huasteca Nahuatl - no single ISO code, considered 3 languages according to the Ethnologue, but often uses the same literature; more investigation needed; but over 1 million speakers.
I am in process of categorizing languages (cf. article). After that, there are a number of macrolanguages with Wikipedia editions, which also should be covered. Also, I intentionally removed some languages for which I knew that they have a macrolanguage project; they should be added again, so we could have complete picture.
At the moment, I think that we should make good categorization, then to create widely interpreted priorities, then to see where we have chapters and which languages are inside of the strategic priorities; and after that to make priorities. For example, we could say that the first three languages from your list should be priority, but we don't have chapters there nor the areas are inside of the strategic priorities.