Proposal:Text-to-speech

From Strategic Planning
Jump to navigation Jump to search
Status (see valid statuses)

The status of this proposal is:
Request for Discussion / Sign-Ups

This proposal is associated with the bolded strategic priorities below.


  1. Achieve continued growth in readership
  2. Focus on quality content
  3. Increase Participation
  4. Stabilize and improve the infrastructure
  5. Encourage Innovation.


Summary

A text-to-speech tool. Can be to the PDF creation tool - letting the user download a OGG audio file. Probably not very high-priority, especially for the supposedly high costs involved, but could be taken into consideration in the distant future.

Motivation

People may like to have an audio version of the article (like an audiobook), and most articles are not read - and when they are, they are outdated - sometimes VERY outdated. This is understandable because of the organic nature of Wikipedia. Human beings cannot possibly keep up with the continuous edits and improvements of the articles, especially considering that unlike a Wikipedia article it is very time-consuming and unwieldy to modify an audio file. A TTS engine, on the other hand, would automatize the entire process.

Key Questions

Before considering the implementation..

  • Would it be feasible to licence an existing TTS engine?
  • If not, how much would it cost to develop a custom TTS engine?
  • English only?

If implemented..

  • Would it use excessive server resources?
  • How to prevent abuse? (ie. people creating dummy pages with their own material to be read without actually adding anything to Wikipedia)
  • OGG or MP3? (OGG is rarely supported by portable mp3 players)
  • Do we need to specify output format or create these outputs in advance, wouldn't it be simpler and easier to create some freeware that the blind and others who want to listen to Wikipedia could download, so they can use this to create sound files in whatever format they desire whenever they want?

Possible restrictions

  • Time limit. A page must stay up for at least X days consecutively before being TTS-enabled. Dummy/vandal pages are deleted almost immediately, so that preventive measure alone would quell 90% of the abuse.
  • Usage limited to registered users;
    • Sub-limitation: Time (ie. registered for at least X days or weeks; at least X non-trivial edits and/or a combination of the two)
  • Restrict the TTS function to the most popular articles (Possible selection criteria: if they make it on the Wikipedia DVDs; if they are "featured"; if they receive accolades from a majority of users; if they have been peer-reviewed; if the article itself is very relevant and can be considered part of non-specialist common knowledge (Ie. "World War II" "Human Brain" vs "List of Naruto Characters" "Obscure Afghan New Wave filmmaker nobody ever heard of")
  • Allowing a finite number of downloads per day/week/month. The limits could be lifted or at least eased up for Administrators, regular and well-known contributors, registered non-profit organizations, munificent donors, blind users and the like.

Potential Costs

  • Software development & Maintenance
  • Server resources
  • Human resources (especially if limits are implemented and human intervention and judgment become necessary to deliberate whether to lift them or not)

How this would work (tentative, of course)

  1. A legitimate Wikipedia page is created
  2. After X days, the page is automatically flagged as TTS-Enabled (or TTS-Read, or whatever)
  3. As soon as this happens, the Wikimedia software runs TTS on the page and generates a time-stamped, version-stamped OGG file which can be downloaded directly.
  4. A textbox appears on the page informing the users that the article has been automatically read on Day X, 00:00 AM EST/PST, and that the audio file can be downloaded clicking on a link. (The download may be restricted to Registered users only and protected with a CAPTCHA code in order to discourage the merely curious)
  5. The process is reiterated from Step 2 to Step 4 every X hours (12?) in order to include the latest legitimate edits. The latest versions are kept on file in case the edits are reversed.

The process above assumes that no particular restrictions are imposed on the articles themselves and any of them is read regardless of relevance. If TTS is restricted only to selected articles (probably the best solution), then a validation process should be included in the workflow.

Various

  • At the beginning of each file a pre-recorded introductory note could say something like "This file has been read by the Wikipedia TTS Engine on (day-month-year) at (hour) and distributed under the CC-BY-SA licence. You may distribute copies of this file provided you comply with the requirements of that license, for example by including this notice."

References

Community Discussion

Do you have a thought about this proposal? A suggestion? Discuss this proposal by going to Proposal talk:Text-to-speech.

Want to work on this proposal?

  1. .. Sign your name here!