Proposal talk:A central wiki for interlanguage links

From Strategic Planning
Jump to: navigation, search

For past discussion on related topic see:

Contents

Discussion

To my mind, the n² complexity is a good thing. Creating a special interwiki Wiki is not necessary at all. Anyway, there have to be lots of language versions of this Wiki and I think that just this single point makes the whole proposal needless.

--DerAndre 14:17, 15 August 2009 (UTC)

I don't follow. If by "this Wiki" you mean the proposed new wiki / wikiarea, then the point about having it in many languages doesn't follow, because it's still one list, even with different language fascias, if you will.
On the other hand, I largely support this proposal as it stands, though the less complex the solution, the easier. At the moment, bots mean that one error is immediately propagated and can be hard to eventually get rid of. This problem is only likely to increase. Jarry1250 15:36, 16 August 2009 (UTC)
I really like this proposal. Darkoneko 09:11, 17 August 2009 (UTC)
This is one of the important pain points of Wikipedia. At the current state, it requires a huge amount of maintenance to keep with consistent interlanguage links. The extension written by Nikola Smolenski does provide a starting point for a solution. However, it is far from perfect, as there is still lot of confusion about how it is going to work (check the discussion page), and discussion around it has stalled. I believe that this point fits into the bigger problem of improving and consolidating the tools offered to Wikipedians to maintain a huge (and still growing) quality encyclopedia. 212.98.136.42 15:47, 18 August 2009 (UTC)

One good think about this idea is that it solves the problem of deletionists. The English Wikipedia is the central Wikipedia, but it's too strict, and we often have this situation that there is a system of articles that can't be referred to English Wikipedia because its subject is not considered notable enough there — the result is non-English Wikipedians often can't find each other's articles and don't create interlanguage links. Hellerick 11:31, 24 August 2009 (UTC)

The wrong tool for the job

I like the idea, but why on Earth would the central site be a wiki? The optimal interface for collaboratively putting together lists of articles is very different to the optimal interface for collaboratively writing formatted prose. --Tango 13:57, 17 August 2009 (UTC)

Be it a wiki or not, I think that the "newer look" would be good. Obviously, this is "only" a way to do things that we already do in a cleaner mode... Nemo 22:41, 19 August 2009 (UTC)
I agree, it might be an other tool, as long as anybody can change it easily. A wiki has the advantage that no new tool and user interface is needed. A specialized tool might have other advantages, so in that case it might be a good solution too. HenkvD 09:54, 20 August 2009 (UTC)
Using a wiki is a fine idea, all you need to do is make a template which is used on each page of the interwiki. The wiki is just used to provide the infrastructure for collaborative editing. The template should be one where both the languages, and respectively, each possible translation within a language, have a standard delimiter that can be easily interpreted by a script reading the page's source. --Lyc. cooperi 12:22, 1 October 2009 (UTC)
Using a wiki or something similare will then open opportunities to create an utimate collaborative translation tool that people will use to find translations of words, sentences, places names... --Thibho 02:31, 31 October 2010 (UTC)

Nice idea, but be careful

If there was always a 1:1 correspondence between articles in different languages, that would work. For example, in the Wiktionary, this would work, because the articles wiktionary:de:Tier and wiktionary:en:Tier both describe the German word "Tier" (i.e., animal).

But there is quite a number of cases where there are no simple 1:1 relations, but rather 1:N, N:1 or even N:M relations between the different languages. For example, French wikipedia:fr:Ciel translates to wikipedia:en:Heaven and wikipedia:en:Sky. It would perfectly translate 1:1 into the German disambiguation page wikipedia:de:Himmel. On the other hand, English heaven should be mapped to wikipedia:de:Himmel (Religion) whereas sky should be mapped to wikipedia:de:Sternenhimmel and wikipedia:de:Himmel (planetär). You see, this is not at all a 1:1 relation. So what would you store in the central interwiki database?

I'm not saying that this proposal does not make sense; it actually does make very much sense, since in the majority of cases, there actually is a 1:1 relation. However, a solution also must cater for non-1:1 cases in a useful way. --Wutzofant 15:22, 20 August 2009 (UTC)

As locally saved interwikis will still work the old way, the worst thing that can happen is that the current interwikis stay in place and are not replaced by global interwikis. --Slomox 17:18, 20 August 2009 (UTC)
This is not a flaw of the proposal, but a flaw of languages. This a problem that we allready have and that we can not get rid of by means of technology in some other way than allowing a single page to point to several other pages in another language. Micke 11:02, 24 August 2009 (UTC)
My idea here would be to have in a case like this multiple lists, with a statement for "also link to the pages mentioned on that other list" (as well as "also link to pages mentioned on that other list if there is no link to that language already"). I don't agree with your example by the way, but that's another issue. - Andre Engels 07:04, 31 August 2009 (UTC)


A specialized wiki for interlanguage links would solve several problems, with the history of articles now being cluttered by numerous bot actions updating interwikilinks as one of the main problems. Secondary benefits would be that the available bot power and machine power expanded by them could then be used for other purposes.

There are also a number of problems, originating from homonyms and synonyms. In the paragraph above there is already an example with wikipedia:fr:Ciel, Wikipedia:en:Heaven, Wikipedia:en:Sky, Wikipedia:de:Himmel (Religion), wikipedia:de:Sternenhimmel, and wikipedia:de:Himmel (planetär). Imho the central wiki should contain the smallest granular level of all wikis. If i understand the above example well, and assuming its description is correct (which might not be the case according to ) the article in wikipedia:fr:Ciel could link to both central:Heaven and central:sky. wikipedia:de:Himmel would also link to both central:Heaven and central:sky. Both central:Heaven and central:sky would link to wikipedia:de:Himmel and wikipedia:fr:Ciel. 1:N relations can thus be handled without too many problems. N:M relations are of course much more difficult, and the only solution I see is breaking them down into multiple N:M relations. Potentially we now have the same situation; nothing new here.

Brya mentions another topic: What is notable. What we would definitely NOT need is a separate jusdgment on what is notable and what is not. For example, Wikipedia:en:Tribal Wars is not regarded notable on en:, but has an article in a dozen or so other wikis. The policy should be rather straightforward: if a topic has an article in at least 1 wiki, it is noteworthy. Of course, if that 1 wiki deletes its article [i]after[/i] relating it to a topic on central, the topic should be deleted on central to.

Locally saved interwikis can be replaced by 1 link to central, bots can do most of that work.

It seems logical to me to have 1 entry for each atmoic subject, no matter if it is in wikipedia, wiktionary or any other topic. Of cousre that makes it a new question what we would like to see as interwiki links: would we just like to see links to other wikipedias, or also to wiktionaries? And what about wikiversity? It seems to me that that should be something configurable on the lokcal wikis.

TeunSpaans 18:13, 10 June 2010 (UTC)

I disagree that we should expect that things "really should be" 1:1. In particular, each Wikipedia currently gets to make its own decisions as to the degree to which it "lumps" or "splits" topics, and I would hope that will continue to be the case. - en:user:Jmabel 03:39, 30 June 2010 (UTC)

Assymmetry

As articles on the different Wikipedia's do not match (a major problem being the English Wikipedia which does not allow articles on notable topics but deletes these in favor of the agenda of whatever projects claims ownership), it looks to me that what is needed are "negative iw's" ("this article does not equate to ...") to make it work. Otherwise the mess will just grow worse. - Brya 12:18, 27 August 2009 (UTC)

Brilliant

Implementation may not be the easiest and will require, as stated, some special tools to accommodate for language asymmetry, but overall this is a wonderful proposal. --Lyc. cooperi 12:24, 1 October 2009 (UTC)

Impact?

Some proposals will have massive impact on end-users, including non-editors. Some will have minimal impact. What will be the impact of this proposal on our end-users? -- Philippe 00:04, 3 September 2009 (UTC)

Statistical approach needed

While I agree that something needs to be done to facilitate the interlanguage facilities, I strongly believe that we would gain much by using a statistically-based approach along the following lines:

  • prioritise links to the best researched articles on the basis of, for example, user feedback on quality, priority, etc., on the talk pages as well as number of hits (i.e. accesses) on the article itself;
  • develop facilities for translating the best articles (e.g. using Google language tools) into the languages where content is poor or completely missing;
  • ensure as far as possible that gaps, initially in the English Wikipedia, are highlighted or accessible in some way so that editors could be encouraged to develop content on the basis of the computerised translation and their own language knowledge of the original article(s);
  • help solve the problem of referencing, particularly for languages other than English, by providing translated info on sources, references, footnotes, external links, etc.

I could add much more here but would first like to see if anyone else is interested in this approach. In many computer applications, statistics are becoming recognised as a major aid in setting priorities and assisting in content creation. - Ipigott 14:59, 1 October 2009 (UTC)

Problem

Problem: Words are categorical, and those categories don't always match up perfectly. While I can definitely see the value for this, I think it's better that each interwiki be able to be customized by those who know the languages in question. (I actually have, on rare occasions, run into this as an issue on the Norman wikipedia, though usually the Norman fits in perfectly with the French and English). Jade Knight 19:59, 26 August 2009 (UTC)

I actually think that this is one of the problems I hope to solve, rather than one that is created. In the current settings, some interwiki are created by hand, but most of the work is done by bots. And the bots work from the principle that interwiki form an equivalence relationship - that is, if [[xx:A]] refers to [[yy:B]] and [[yy:B]] to [[zz:C]], then [[xx:A]] should also refer to [[zz:C]] and [[yy:B]] should also refer to [[xx:A]]. In a new system, I think it would be much easier to come up with a method to handle these cases as well. One could then create something to say "the page(s) on this page should link to the page(s) on that page, but not the other way around" or "this concept is close enough to that concept that there should be interwiki links to that concept for all languages that do not have an article on this concept". It might be possible to implement this in the current system (using some <!-- comments -->), but that would be cumbersome at best. - Andre Engels 12:07, 27 August 2009 (UTC)

Impact?

Some proposals will have massive impact on end-users, including non-editors. Some will have minimal impact. What will be the impact of this proposal on our end-users? -- Philippe 00:06, 3 September 2009 (UTC)

The impact will be large on those editors who add interwikis to their own or other articles (since the process of adding interwikis will change), very small for others. It is a good thing you mention this, since it points toward a need to have an (easy) interface with the database from the wikipedias for editors, to stop it from causing less pages to actually have interwiki links. - Andre Engels 05:55, 25 September 2009 (UTC)

ISO 2788

One could use ISO 2788 for interwiki links.

UF Used for
USE/SYN Use synonym
BT Broader term
NT Narrower term
RT Related term
TT Top term

See also: http://meta.wikimedia.org/wiki/Talk:OmegaWiki#Connotations --Fasten 16:39, 28 October 2009 (UTC)

Inter-language stub articles

One could also try to generate inter-language stub articles from existing interwiki links and with automatic translation.

The advantage of a dedicated wiki could be that the dedicated central wiki wouldn't have to restrict lemmas to one language, which means stubs could transclude abstracts from different wikipedias, possibly prefering the UN languages Arabic, Chinese, English, French, Russian and Spanish and the larger Wikipedias Japanese, German, Polish, Italian, Dutch and Portuguese. If no English article was available for a lemma the inter-language stub could use an English word as lemma but transclude abstracts from other languages exclusively. --Fasten 17:12, 28 October 2009 (UTC)

Renaming an inter-language stub should maintain referential integrity for all language wikipedias without requiring editing of the articles (Proposal:Interwikis and categories outside article code). --Fasten 17:16, 28 October 2009 (UTC)
The central wiki could at the same time be the data wiki and allow transclusion of data objects. --Fasten 10:33, 29 October 2009 (UTC)

Contents

Thread titleRepliesLast modified
''Interlanguage conflicts will be much easier to resolve, as only one place needs to be changed.''003:52, 28 May 2011
Use Commons ?009:27, 1 December 2010
Proposed merge with "Brand name consolidation"122:59, 10 November 2010
Translation problems with titles513:05, 22 July 2010
Wikimania 2010121:20, 5 July 2010
A step-by-step approach111:51, 4 July 2010
Identification and asymmetry509:29, 2 July 2010

''Interlanguage conflicts will be much easier to resolve, as only one place needs to be changed.''

I disagree with the statement that Interlanguage conflicts will be much easier to resolve, as only one place needs to be changed.

Reaching an ontological conflicts among groups of people who don't share a language is going to be as hard as it's ever been, because nothing in the proposal changes the way the conflict is resolved (=how consensus is achieved), since the proposal merely changes the way the resolution is implemented once achieved. Stuartyeates 03:52, 28 May 2011 (UTC)

Stuartyeates03:52, 28 May 2011

Use Commons ?

While I very much like the idea, I am worried about creating too many Wikiprojects. To me using Commons would be the best solution: we can consider Commons not so much as a file repository as a place to put resources useful to various other projects. The benefits I see are:

  • No need to create a new Wiki (probably easier and quicker)
  • There already is a multilingual-minded community there, as well as various interationalisation tools and templates there. It could be useful if we want to extend the project beyond interwikis into a full-blown database.
  • Having an internationalized database would be very useful for Commons that aims at multilinguality without creating language specific vesions like en.wikipedia or de.wikisource.
  • The name Commons is the most fitting I can think of for this kind of project.
Zolo09:27, 1 December 2010

Proposed merge with "Brand name consolidation"

The suggestion[1] to merge this proposal with Proposal:Brand name consolidation is more than a little unclear to me. What does facilitating links between languages have to do with renaming the foundation and assimilating non-Wikipedia projects?

Ningauble23:52, 9 November 2010

Absolutely nothing, as far as I can see.

Yair rand22:59, 10 November 2010
 

Translation problems with titles

I think this issues has been touched on above, but the topics seem old, so I'll bring it up here (since the traffic is likely to pick up now).

A big problem that I don't immediately see a resolution to is where languages simply don't have words or the capacity to create words for topics in English. We sometimes forget how flexible our language can be. For example, in Malagasy (from Madagascar, the land of the lemurs), the words pondiky means "mouse lemur" in general, but is a regional term that can also be assigned to more than three species, such as the Gray Mouse Lemur. There are also numerous other "mouse lemur" words, again, all regional. There's been a push in Antananarivo (the capital) to standardize and broaden the vocabulary, but it's far from universally excepted. Even for the "official" Malagasy word for "lemur" is gidro, which is a regional term for one or more species of true lemur. Maky is popularly used for "lemur", but is also the popular name for the Ring-tailed Lemur. Babakota is yet another popular term for "lemur", but is the regional name for the Indri. Maybe you can see what's happening here. The Malagasy of each region pick their most popular, tourtist-drawing lemur species and make that name the "official term" for the word "lemur", while also preserving it as the Malagasy name for the species. And if the species' range covers multiple regions, it will have multiple names. Unlike English, the Malagasy don't officially create new names as easily as we can, such as calling a new lemur species (some of which can't be visually distinguished anyway) "Sambirano Mouse Lemur", "Golden-brown Mouse Lemur", or "Reddish-gray Mouse Lemur".

And this is just the Malagasy language. I can just imagine what India is like with its dialects.

Can you think of a way around this? I honestly want to know, because I've considered working to get the lemur articles I write converted to Malagasy. But at this point, even if I did manage to get them translated, I can't figure out which Malagasy article to add them to. – VisionHolder « talk » 01:59, 30 June 2010 (UTC)

02:00, 30 June 2010

This happens in English too in areas where english was widely spoken before mass communication i.e. in England and Ireland. There are regional names for most small animals and plants.

As english spread over the globe most foreign english speakers learned the names for these from books rather at grandmothers knee so they only know the book name. Species introduced to England after mass communication only have an official name too.

My reccommendation is that articles about the native species of madagascar in malagasy should mention all the local names and where they are used and redirects are used to the page from each of those other names. The page name should be the 'official' name if such a thing exists. If there are a number of candidates then the scientific name could be used as a compromise. If the same name is used for different species in different places (such as the european and american Robins) then a disambiguation page is good.

My reccommendation is however of no value. This is an issue that the editors of the Malagasy wikipedia should decide between themselves. If you are the only editor interested in this at the moment then get started on those pages and make them the best you can. When more editors come along you can agree the best way to do it and then write a manual of style page recording what was agreed.

Does that help?

Filceolaire19:51, 1 July 2010

The central repository of interlanguage links for small mammals will probably work by looking for the scientific name, which, with a bit of luck, will get mentioned somewhere in the first sentence, whatever the local name.

Filceolaire19:53, 1 July 2010

Articles on plant, animal and fungal species are added on many wikis, including English, using the scientific name as a lemma. It's necessary whenever there is no vernacular name in that particular language, and it's a handy solution whenever the vernacular name is ambiguous. Even in English there are hundreds of thousands (maybe millions) of species for which no reliable common name exists. Surely the Malagasy wiki could do this too?

Taking up Filceolaire's point, if the structure of the central wiki requires lemmas of some kind, I'd suggest that for the whole area of living species the scientific name would be a much better guide than the English (or any other natural language). In other words, go parallel to wikispecies, not to en:wiki, in this area. Would that work?

What does "bump this thread" mean?

Andrew Dalby16:34, 3 July 2010

Maybe the lemma could be a simple numerical code, which would be language-neutral and solve problems about disambiguations. The content would feature a very brief description in several languages, pretty much like on Commons. Please note that this would require no further effort on behalf of wikipedians, because one should anyway check the page to see what its actual name is - using codes would just mean no discussions on what should be the exact page name on the interlanguage wiki, and it would be language- and culture-neutral. --WinstonSmith 12:13, 5 July 2010 (UTC)

WinstonSmith12:13, 5 July 2010

Yes, numerical code, database-id-like title, please!

Nemo13:05, 22 July 2010
 
 
 
 
 

Wikimania 2010

Note that the interlanguage links topic will be discussed at Wikimania. BTW does somebody have any suggestions about the presentation?

Incnis Mrsi12:24, 4 July 2010

For examples on interwikis I propose you use English and Polish comparisons, since Poland is the host country. If the interface language is set to Polish on the Polish wiki and English on the English wiki it will give a good feeling what Internationalization is about.

HenkvD21:20, 5 July 2010
 

A step-by-step approach

As we all see, the current system has a flexibility which is difficult to manage in a centralized system. An idea would be to maintain the current system and introduce a central interwiki repository, a common interwiki-workspace (iw) which could be used in all clear cut 1-to-1 cases (cities, people etc). For example, we could have only one interwiki for the city of New York, the [[iw:New York city]], which could direct to the interwiki-workspace containing the existing list of interwikis and possibly a definition comment ("New York city in the US, NY State, do not link here the State of New York or any other city called New York, see [[iw:New York (disambiguation)]]" for other options). This would cover more than 80% of the existing interwikis, leaving the rest 20% to the flexibility of the existing system.

FocalPoint07:13, 4 July 2010

A step-by-step is a good idea (for wiki's) anyway, so I second this.

HenkvD11:51, 4 July 2010
 

Identification and asymmetry

Perhaps one approach to the problem of identification and asymmetry could be to consider the unordered list of interwiki links as an independent object. For the simplest case that an article xx:P now has a link yy:Q, the interwiki object becomes (xx:P, yy:Q). The software would add the link yy:Q automatically in the article xx:P. So implicitly yy:Q now links back to xx:P as well. If there is a link from yy:Q to zz:R, this generates another pair (yy:Q, zz:R). These are linked by yy:Q and an (assisted) bot may decide to join them into (xx:P, yy:Q, zz:R). Now all three languages have two links. If de:himmel needs to link to en:heaven and en:sky, the two pairs (de:himmel, en:heaven) and (de:himmel, en:sky) are created. Since the linking object is in one language, no join is proposed. Also N:M relations are possible this way. It would need a lot of thought to make it manageable, but looks feasible. −Woodstone 17:36, 30 June 2010 (UTC)

Woodstone17:36, 30 June 2010

This sounds like exactly what Omegawiki is doing creating a database of 'defined meanings' and the definitions and words corresponding to these in every world language.

Filceolaire20:09, 1 July 2010

There is similarity, but in the WP context the "words" are the existing "articles". The correspondences between the articles would be centrally maintained. on presentation of an article, the interwiki links would be added on the fly. Consequently, the page history is no longer burdened by all the bot edits adding another language. By this approach, it is not necessary to have a central name for each subject, avoiding the question of the language of such a central subject. The list of articles in the various languages is the central object, identified by its content. A good start of the content is the existing interwiki links. −Woodstone 06:04, 2 July 2010 (UTC)

Woodstone06:04, 2 July 2010
 

You mention de:himmel as an example of an N:M relation, but I think you are mistaken. Himmel is a word with two meanings. In fact de:Himmel is a disambiguation page (Begriffsklärung). There are seperate pages de:Himmel (planetär) that link to en:Sky and de:Himmel (Religion) that links to en:Heaven.

Most interwikis are in fact 1:1 relations. It is good to think about 1:N or N:M relations, but we have to bear in mind it is a very small amount of articles that could benifit of this complicate relations.

HenkvD07:10, 2 July 2010

The example above may be wrong, and the current manually created interwikis may be mostly 1-1. However the n-m problem is real when integrating the references across more (all) langauges. For example aa:A links to bb:B and cc:C (two translations), it is quite possible that bb:B links to dd:D and cc:C links to dd:E. So now there are two indirect links from aa:A to the language dd, the articles D and E. This is certain to arise. Any solution proposed should have a way of dealing with this. My proposal (admittedly still a vague idea) is a possible direction of thought about this. It would create in the example two sets of meanings: (aa:A, bb:B, cc:C, dd:D) and (aa:A, bb:B, cc:C, dd:E). It may be necessary to create a disambiguation (aa:A, bb:B, cc:C).

Woodstone08:57, 2 July 2010

I agree that there could be valid reasons to link to either dd:D or dd:E (your example), but is it really needed:

  • many cases might be mistakes (although I admid there could be valid cases)
  • will the added complexity outweigh the small number of occurences?
  • the added complexity might also keep current or introduce new mistakes

My suggestion: Keep It Simple (if it is possible).

HenkvD09:29, 2 July 2010