List of things that need to be free
Originally "10 things that will be free" (PDF) (later "12 things") from Jimbo's speech at Wikimania in 2005.
This table is for analysis of what content that currently is covered by Wikimedia or other groups, and what content Wikimedia possibly could expand into. Details about the different content types are found in the list below the table.
Things that need to be free | Content type | Covered by Wikimedia | Covered by other organization | Could be covered by Wikimedia |
---|---|---|---|---|
Reference works | Encyclopedia | Wikipedia (possibly augmented by WikiData) | EL (es), Hoodong (zh), baidupedia (zh) | |
Dictionary | Wiktionary | wordnik | OmegaWiki | |
Dictionary of quotations | Wikiquote | Lots of unfree and chaotic websites and several books | ||
Newspapers and sources | Wikinews (but not archives as also suggested in this list) | NLA Australian Newspapers, Historic Australian Newspapers, 1803 to 1954; Indymedia | ||
Travel | Wikitravel, WikiVoyage, World66 | Definitely something we could cover as proposed here [1] | ||
Search | Indices of the entire public information sphere | Google, Bing. DMOZ | Yes: Wikipedia and Wikidata provide powerful semantic model: the problem is one of scaling and economics | |
Social media | Social graph | Facebook, Google+, MySpace | There's no really good Free alternative here yet: Diaspora does not seem to have got off the ground yet. Wikimedia has a large community already, and has a reputation for being trustworthy. | |
Magazines & related publications | Magazines and journals out of print/historical | Wikisource | ||
Learning materials | Curriculum | Wikiversity, Wikibooks | Wikieducator, CNX, OCW | |
Art and media | Music | Commons (partial) | Choralwiki | |
Visual Art | Commons | |||
Fictional Stories, Novels & Graphic Novels | Wikisource (Fictional stories & novels) | Project Gutenberg (Fictional stories & novels), WikiPen | ||
The world | Photographs | Commons | Flickr, Facebook photos | |
Audio and video recordings | Commons | Flickr, YouTube | ||
Field recordings | ? more detail needed | |||
Maps | Commons (only as images) | OpenStreetMap, Wikimapia | ||
Public transport network maps and schedules | ||||
Physical objects | Thingiverse and Shapeways | |||
Address databases | Limited freeing-up of postcode data at Free The Postcode | |||
Products & services reviews | See also: Yelp, Zagat's | |||
Language | OCR | ABBYY FineReader is the incumbent (not free); Tesseract (free) is very far | ||
Translation memory [2] | Google Toolkit (not free); Translatewiki, Omegawiki (currently not doing that, at least); m:Wikipedia Machine Translation Project#Existing free software; ... | Proposal:Free Translation Memory | ||
Annotated speech corpora | VoxForge | |||
Annotated text corpora | Tatoeba, Copenhagen Dependency Treebanks. (any others?) | |||
Technical infrastructure | Formats | Ogg Vorbis, ODF, XML, PDF (is this Free?), WOFF (is this Free?), KML (is this Free?), RDF, HTML, PNG, SVG ... | ||
Unique identifiers | ||||
Public key/identity/trust management | ||||
Software | Operating systems | Linux, GNU project, Debian... | ||
Middleware | Apache project, OpenStack... | |||
Search algorithms | ||||
Fonts | Google web fonts, League of Movable Type, ... | |||
Personal publication | Community sites | |||
Broadcasts and event listings | ||||
Research and Science | Academic journals | Wikisource | A official journal proposed here [3] | |
Scientific data | Wikispecies, WikiData | |||
Usage information and Hazard warnings for chemicals and medicines | ||||
Genome data (all species) | ||||
Prosthetics | ||||
Biological parts | Registry of Standard Biological Parts | |||
Know-how, notes and methods | OpenWetWare | |||
Public and government documents | The Law | Wikisource (texts) | ||
Court records | ||||
Public codes and standards | ||||
Public spendings and revenues | OpenGov initiatives (usually by country) | |||
Births, Marriages and Deaths | Rodovid, Werelate.org |
Reference works
Encyclopedia
Encyclopedic overviews of every topic in the world.
Wikipedia's doing ok in 10 languages; getting there in another 10, and making some progress in another 40. (WP is the only prominent resource offline, primarily in En and De; making progress in another 10 languages) Only mediocre coverage, but still the best online resource in that language in some 50 small languages.
In the large languages, notability restrictions and limits on community size have restricted the scope of current material.
- Coverage
- 3.2M topics and 800k local images in English; 1M topics in German, French. 16M across all other langs combined (wikipedia)
- Popularity
- 350M visitors/month (rank:5). 1-2.5 pageviews/hr/pg, for the largest wikis. Users: 10M, 130K active.
- Utility
- Median 4 pageviews, 5 minutes on-site, 50% bounce rate. 25 internal links/pg (en, de).
Dictionary
A comprehensive list of terms, definitions, translations and use cases for a given language.
Wiktionary is doing relevant work, has a good corpus in 10 languages, and getting there in another 30. OmegaWiki is getting multilingual bits right. Wordnik is getting "all words and phrases defined by usage" right, but isn't very free. Dicologos remains a larger corpus of translated terms in many languages. Many small-language dictionaries have not been integrated.
Integration of Wiktionary into Wikipedia (for definitions, hover-information, spelling checks, &c): nonexistent.
- Coverage
- 1.6M words in French, English. 4M across all other langs (wiktionary)
- Popularity
- ?M visitors/month (rank:800). 0.02-0.1 pageviews/hr/pg, for the largest wikis.
- Utility
- Median 2 pageviews, 2 minutes on-site, 65% bounce rate. 2 internal links/pg.
News papers and sources
People should have a right to know whats going on in their world. Press Releases are the basic source of many stories. They should be archived and stored on line - out of the control of whoever first issued them.
Magazines & related publications
People should be able to view the table of contents/summary of any magazine not only for current issue but also the past issues.
Thoughtful bibliographic research on a subject can't be performed if such informations were not available freely and in a friendly mean.
A magazine is a publishing format, there is no minimum standard for circulation, journalistic standards or even public availability. Some magazines are essentially vanity press, others are internal to particular organisations. There may even be some that are classified by their governments. It probably doesn't make sense to aim to include all, or even all publicly available ones.
Learning materials
Curriculum
Basic or modular versions of every major curriculum element in the world's major institutions of learning.
Art and media
Music
High-quality digital recordings of every (classical, out of copyright) work of composition, and of every (classical, out of copyright) notable performer
Naxos has a fairly comprehensive library, quite unfree. It makes 30-second snippets of recordings free to listen to.
High-quality scores of every (classical, out of copyright) work of composition.
Art
High-quality digital images and video of every (classical, out of copyright) work of static or performance art.
Fictional Stories, Novels & Graphic Novels
Out of copyright fictional stories, novels & graphic novels.
Project Gutenberg already covers fictional stories & novels.
The world
Photographs
Informational photos of every object, place, event, and notable figure in the world.
- Coverage
- ~5M photos (of 6M media items on Commons)
- Popularity
- 5M visitors/month (rank: 180). 1M users, 23K active. 0.05 pageviews/pg/hr
- Utility
- Median 2 pageviews, 2 minutes on-site, 80% bounce rate. 0.2 internal links/pg. many links from Wikipedia (and other reusers?)
Audio and video recordings
High-quality recordings of every important (out of copyright) performance and speech. Informational video of every notable event in the world.
Field recordings
Field recordings of traditional music, dance, stories, personal reminiscences should be freely shared, just as the original source freely shared them.
Maps
Map imagery, street segment, and feature data for every part of the world. Surface, underground, earth, and underwater maps. Atmospheric maps.
OpenStreetMap is making good progress on street and feature data in Germany, the UK and a few other countries; and progress in another 20. NASA and ESA? have provided good free surface images down to ?? resolution. Other map layers, including other data about city structure, is pretty siloed in the archives of local IP holders.
Public transport network maps and schedules
Most are free but not always readily available.
Physical objects
As personal fabrication begins to become a viable technology through projects like RepRap, the importance of sharing designs and design patterns of physical objects increases. Examples of sites where such designs are currently shared include Thingiverse and Shapeways (which is also a manufacturing service). In the wikis, blogs, and forums of the personal production community, there's also activity around improving both the tools (software, hardware) of personal productions, and sharing underlying design patterns. Open Structures is an ambitious effort to develop a modular construction model of standardized parts, components and structures.
There's an overlapping community devoted to open source hardware, see the list of open source hardware projects, as well as the general maker community (see the Make Magazine Blog).
Products & services reviews
Technical infrastructure
Formats
High-fidelity, all-purpose formats for all useful classes of information and presentation. Including text, images, audio, video, other sense data, and mixed-media layout.
Images have great free formats. Voice and other audio have Ogg Speex and Vorbis. Fonts and other drawings have SVG. Video has Ogg Theora, which is fairly competetive with commercial formats.
Documents are getting there with ODT and RDF, but there are many competing standards. Layout design has few free options. Flash animations have a fairly free SWF format but limited free players and editors (Gnash is the main option; still a bit slow and incomplete in its implementation, with almost no editing tools).
Unique Identifiers
Permanent unique identifiers for products (such as ISBNs and UPCs) and other things (URLs, post codes).
ISBNs are partly free, though must be purchased for very high small-bundle fees. Some popular IDs are proprietary and lock creators or distributors into a particular system.
URLs are controlled but are widely available from a variety of vendors. Once you own a URL you can attach it to as many sub-pages as you like with no restrictions.
Geographice (Latitude and longitude) coordinates are completely free - anyone can work out the coordinates of any object and publish it without restriction, even if they use a copyright map to find these coordinates.
Post codes are copyright in many jurisdictions and the mail companies get revenue from selling the right to use this database.
Wikipedia pages can be created for any subject or object however they are subject to review and are likely to be deleted if they are not considered sufficiently notable or if the information posted cannot be independently verified.
Search algorithms
Algorithms for indexing and finding material in enormous corpora like the Internet entire.
An 'open search' alliance exists; work isn't much used yet.
Personal publication
Community sites
Default standards of giving creators the rights to reuse works they create through community sites and interfaces - including the right to license them freely. Better still, default free-reuse licenses for community sites where the goal is to create something of lasting value to the world.
Open Social networking
Software tools to let you link personal sites (blogs, User pages, etc.) on different domains so you can host your own blog and still friend, follow, 'like' pages on other web sites.
Broadcasts and event listings
Calendars of announced public events and broadcasts, from TV listings and radio programs to lists of events and performances.
Research and science
Academic journals
Open Access to all major journals.
The Open Access movement is making progress, with a focused mission, and successess in PLOS One and Western university sign-on with Open Access commitments. Journal aggregator groups have made progress in providing access to journals for small blanket fees to entire countries, in the least-developed parts of the world.
Scientific data
The data used in all trusted research papers, experiments, and polls.
Limited progress so far, limited focused messaging outside of Science Commons.
Usage information and Hazard warnings for chemicals and medecines
All chemicals have hazard warning information and safe use information. This should be freely available in standard formats.
All medecines have usage instructions and warnings. This should be freely available in standard formats.
The Human genome
Our DNA can be discovered so it is not an invention that can be patented. The DNA of other creatures likewise.
Prosthetics
The design of prosthetics is a classic case where having a body of freely shared designs will enable users of such prosthetics to customise their prosthetics to suit their particular needs and desires. Otherwise such users cannot control their own bodies and are left dependent on patent holders instead of being empowered to enable themselves.
As prosthetics develop over the next decades they will move from being used to compensate for disabilities to being used to give people enhanced abilities. In those circumstances it is important that people have control over what is in their bodies.
Biological parts
The emerging field of synthetic biology is beginning to tackle the problem of standardizing biological parts that encode specific biological functions, so that a synthetic biologist can program or design organisms with specific characteristics. The Registry of Standard Biological Parts is a first attempt to provide a repository of such parts. It uses MediaWiki in combination with additional specialized software to manage the parts.
Know-how, notes and methods
OpenWetWare is "a group of researchers that are interested in increasing the amount of organization, dissemination, and communication in biological research." Their wiki is a great example of knowledge sharing across many different labs.
Public and government documents
The Law
From municipal laws to international laws, statutes, their revision history, and relevant case law and debates should be both free and accessible.
Cornell and others have freed Federal and State law in the US. Other progress: ?
Court records
Court cases are held in public because it is important that justice is not just done but is seen to be done. Court records and decisions set precedents which can be as important as laws in determining what is and is not legal. In the internet age in public should mean on the net.
Groklaw has done this for the SCO v IBM and SCO v Novell court cases, publishing not just the decisions but all the submissions, transcripts and affidavits as well.
Public codes and standards
From ISO standards to city planning and building standards.
Malamud's public.resource.org has freed many subsets of these documents, with charm and bustle and force. Universal adoption of better standards for sharing are slow to develop.
Births Marriages and Deaths
Historic records of Births marriages and deaths are the foundation of genealogical research. They are public records which should stay public.
Public spendings and revenues
To fight corruption, public spendings and revenues should be freely accessible.
Material currently in copyright
Copyrights don't last forever. But they can last long enough that no legal copies are available when the copyright ends.
A repository of currently copyrighted material would aim to safely store copyrighted material and release it as and when it came out of copyright.