Proposal:Preparing a translation machine
Appearance
Every proposal should be tied to one of the strategic priorities below.
Edit this page to help identify the priorities related to this proposal!
- Achieve continued growth in readership
- Focus on quality content
- Increase Participation
- Stabilize and improve the infrastructure
- Encourage Innovation
Summary
I propose to prepare what is needed for a future translation machine.
Proposal
- Accepting OmegaWiki as a Wikimedia Foundation's project and improving it.
- Writing a robot to extract as much information as possible from Wiktionaries and to include it in OmegaWiki, through a human check, so that the person entering an expression into OmegaWiki could just accept the fields from Wiktionary by checking boxes, and would have the possibility to modify and complete the fields before inclusion.
- Thinking of a way to involve the general public in a WikiGrammar, written in a rigorous way, like OmegaWiki (not like Wiktionaries) so that this grammar information could later be used by a translation machine.
- Contact the Moses, OpenLogos, Apertium or/and any other community working on opensource translation machine to see if they would be interested in collaborating with the WikiMedia Foundation.
- Thinking of one or two machine translation approaches.
Additions
- Proposal:More precise referencing#Additions : precise interwiki links to allow translations as bubble help and for statistical analysis.¹
- ¹ Moses is a statistical machine translation system that allows you to automatically train translation models for any language pair. All you need is a collection of translated texts (parallel corpus).
Motivation
- There is no high quality translation machine in this planet. The least bad ones are proprietary, or only work between similar languages (typically between romance languages).
- Translation machines have 2 parts: the lexical part (a dictionary a program can read and use easily) and a programmatic part, implementing the grammar and translation rules. Up to now, both parts have been regarded as too technical for the general public to be involved. OmegaWiki made a significant step towards a machine-readable highly-structured dictionary using the general public knowledge. Other steps are necessary in the lexical part, as discussed in OmegaWiki's community pages.
- I think it's possible to use the grammatical knowledge of the general public too, though this has to be thought. It's not very difficult in some fields, such as inflections (conjugations, declensions etc., which are already recorded in Wiktionaries), word places in some languages (for instance whether a French adjective is placed before or after the noun).
- I don't think we're ready now to make a good translation machine. If we begin too quickly, the risk of major errors is high. This is why I propose we build the bricks which will enable to make a translation machine later.
Key Questions
- How to make a highly structured machine-readable WikiGrammar?
- What should be the internal structure of such a wiki: relational database (MySQL, for instance), XML, Prolog knowledge base, or other?
- How to interconnect or merge the Wiktionaries with OmegaWiki?
Potential Costs
References
Web services
- Terminology database of the language service of the German Parliament (Bundestag)
- Linguee — Linguee is more than a German-English dictionary. Search millions of sentences translated by other people.
Community Discussion
Do you have a thought about this proposal? A suggestion? Discuss this proposal by going to Proposal talk:Preparing a translation machine.
Want to work on this proposal?
- .. Sign your name here!