List of things that need to be free

From Strategic Planning
Jump to: navigation, search

Originally "10 things that will be free" (PDF) (later "12 things") from Jimbo's speech at Wikimania in 2005.

This table is for analysis of what content that currently is covered by Wikimedia or other groups, and what content Wikimedia possibly could expand into. Details about the different content types are found in the list below the table.

Things that need to be free Content type Covered by Wikimedia Covered by other organization Could be covered by Wikimedia
Reference works Encyclopedia Wikipedia (possibly augmented by WikiData) EL (es), Hoodong (zh), baidupedia (zh)
Dictionary Wiktionary wordnik OmegaWiki
Dictionary of quotations Wikiquote Lots of unfree and chaotic websites and several books
Newspapers and sources Wikinews (but not archives as also suggested in this list) NLA Australian Newspapers, Historic Australian Newspapers, 1803 to 1954; Indymedia
Travel Wikitravel, WikiVoyage, World66 Definitely something we could cover as proposed here [1]
Search Indices of the entire public information sphere Google, Bing. DMOZ Yes: Wikipedia and Wikidata provide powerful semantic model: the problem is one of scaling and economics
Social media Social graph Facebook, Google+, MySpace There's no really good Free alternative here yet: Diaspora does not seem to have got off the ground yet. Wikimedia has a large community already, and has a reputation for being trustworthy.
Magazines & related publications Magazines and journals out of print/historical Wikisource
Learning materials Curriculum Wikiversity, Wikibooks Wikieducator, CNX, OCW
Art and media Music Commons (partial) Choralwiki
Visual Art Commons
Fictional Stories, Novels & Graphic Novels Wikisource (Fictional stories & novels) Project Gutenberg (Fictional stories & novels), WikiPen
The world Photographs Commons Flickr, Facebook photos
Audio and video recordings Commons Flickr, YouTube
Field recordings  ? more detail needed
Maps Commons (only as images) OpenStreetMap, Wikimapia
Public transport network maps and schedules
Physical objects Thingiverse and Shapeways
Address databases Limited freeing-up of postcode data at Free The Postcode
Products & services reviews See also: Yelp, Zagat's
Language OCR ABBYY FineReader is the incumbent (not free); Tesseract (free) is very far
Translation memory [2] Google Toolkit (not free); Translatewiki, Omegawiki (currently not doing that, at least); m:Wikipedia Machine Translation Project#Existing free software; ... Proposal:Free Translation Memory
Annotated speech corpora VoxForge
Annotated text corpora Tatoeba, Copenhagen Dependency Treebanks. (any others?)
Technical infrastructure Formats Ogg Vorbis, ODF, XML, PDF (is this Free?), WOFF (is this Free?), KML (is this Free?), RDF, HTML, PNG, SVG ...
Unique identifiers
Public key/identity/trust management
Software Operating systems Linux, GNU project, Debian...
Middleware Apache project, OpenStack...
Search algorithms
Fonts Google web fonts, League of Movable Type, ...
Personal publication Community sites
Broadcasts and event listings
Research and Science Academic journals Wikisource A official journal proposed here [3]
Scientific data Wikispecies, WikiData
Usage information and Hazard warnings for chemicals and medicines
Genome data (all species)
Prosthetics
Biological parts Registry of Standard Biological Parts
Know-how, notes and methods OpenWetWare
Public and government documents The Law Wikisource (texts)
Court records
Public codes and standards
Public spendings and revenues OpenGov initiatives (usually by country)
Births, Marriages and Deaths Rodovid, Werelate.org

Reference works

Encyclopedia

Encyclopedic overviews of every topic in the world.

Wikipedia's doing ok in 10 languages; getting there in another 10, and making some progress in another 40. (WP is the only prominent resource offline, primarily in En and De; making progress in another 10 languages) Only mediocre coverage, but still the best online resource in that language in some 50 small languages.

In the large languages, notability restrictions and limits on community size have restricted the scope of current material.

Coverage
3.2M topics and 800k local images in English; 1M topics in German, French. 16M across all other langs combined (wikipedia)
Popularity
350M visitors/month (rank:5). 1-2.5 pageviews/hr/pg, for the largest wikis. Users: 10M, 130K active.
Utility
Median 4 pageviews, 5 minutes on-site, 50% bounce rate. 25 internal links/pg (en, de).

Dictionary

A comprehensive list of terms, definitions, translations and use cases for a given language.

Wiktionary is doing relevant work, has a good corpus in 10 languages, and getting there in another 30. OmegaWiki is getting multilingual bits right. Wordnik is getting "all words and phrases defined by usage" right, but isn't very free. Dicologos remains a larger corpus of translated terms in many languages. Many small-language dictionaries have not been integrated.

Integration of Wiktionary into Wikipedia (for definitions, hover-information, spelling checks, &c): nonexistent.

Coverage
1.6M words in French, English. 4M across all other langs (wiktionary)
Popularity
 ?M visitors/month (rank:800). 0.02-0.1 pageviews/hr/pg, for the largest wikis.
Utility
Median 2 pageviews, 2 minutes on-site, 65% bounce rate. 2 internal links/pg.

News papers and sources

People should have a right to know whats going on in their world. Press Releases are the basic source of many stories. They should be archived and stored on line - out of the control of whoever first issued them.

Magazines & related publications

People should be able to view the table of contents/summary of any magazine not only for current issue but also the past issues.

Thoughtful bibliographic research on a subject can't be performed if such informations were not available freely and in a friendly mean.

A magazine is a publishing format, there is no minimum standard for circulation, journalistic standards or even public availability. Some magazines are essentially vanity press, others are internal to particular organisations. There may even be some that are classified by their governments. It probably doesn't make sense to aim to include all, or even all publicly available ones.

Learning materials

Curriculum

Basic or modular versions of every major curriculum element in the world's major institutions of learning.

Art and media

Music

High-quality digital recordings of every (classical, out of copyright) work of composition, and of every (classical, out of copyright) notable performer

Naxos has a fairly comprehensive library, quite unfree. It makes 30-second snippets of recordings free to listen to.

High-quality scores of every (classical, out of copyright) work of composition.

Art

High-quality digital images and video of every (classical, out of copyright) work of static or performance art.

Fictional Stories, Novels & Graphic Novels

Out of copyright fictional stories, novels & graphic novels.

Project Gutenberg already covers fictional stories & novels.

The world

Photographs

Informational photos of every object, place, event, and notable figure in the world.

Coverage
~5M photos (of 6M media items on Commons)
Popularity
5M visitors/month (rank: 180). 1M users, 23K active. 0.05 pageviews/pg/hr
Utility
Median 2 pageviews, 2 minutes on-site, 80% bounce rate. 0.2 internal links/pg. many links from Wikipedia (and other reusers?)

Audio and video recordings

High-quality recordings of every important (out of copyright) performance and speech. Informational video of every notable event in the world.

Field recordings

Field recordings of traditional music, dance, stories, personal reminiscences should be freely shared, just as the original source freely shared them.

Maps

Map imagery, street segment, and feature data for every part of the world. Surface, underground, earth, and underwater maps. Atmospheric maps.

OpenStreetMap is making good progress on street and feature data in Germany, the UK and a few other countries; and progress in another 20. NASA and ESA? have provided good free surface images down to ?? resolution. Other map layers, including other data about city structure, is pretty siloed in the archives of local IP holders.

Public transport network maps and schedules

Most are free but not always readily available.

Physical objects

As personal fabrication begins to become a viable technology through projects like RepRap, the importance of sharing designs and design patterns of physical objects increases. Examples of sites where such designs are currently shared include Thingiverse and Shapeways (which is also a manufacturing service). In the wikis, blogs, and forums of the personal production community, there's also activity around improving both the tools (software, hardware) of personal productions, and sharing underlying design patterns. Open Structures is an ambitious effort to develop a modular construction model of standardized parts, components and structures.

There's an overlapping community devoted to open source hardware, see the list of open source hardware projects, as well as the general maker community (see the Make Magazine Blog).

Products & services reviews

Technical infrastructure

Formats

High-fidelity, all-purpose formats for all useful classes of information and presentation. Including text, images, audio, video, other sense data, and mixed-media layout.

Images have great free formats. Voice and other audio have Ogg Speex and Vorbis. Fonts and other drawings have SVG. Video has Ogg Theora, which is fairly competetive with commercial formats.

Documents are getting there with ODT and RDF, but there are many competing standards. Layout design has few free options. Flash animations have a fairly free SWF format but limited free players and editors (Gnash is the main option; still a bit slow and incomplete in its implementation, with almost no editing tools).

Unique Identifiers

Permanent unique identifiers for products (such as ISBNs and UPCs) and other things (URLs, post codes).

ISBNs are partly free, though must be purchased for very high small-bundle fees. Some popular IDs are proprietary and lock creators or distributors into a particular system.

URLs are controlled but are widely available from a variety of vendors. Once you own a URL you can attach it to as many sub-pages as you like with no restrictions.

Geographice (Latitude and longitude) coordinates are completely free - anyone can work out the coordinates of any object and publish it without restriction, even if they use a copyright map to find these coordinates.

Post codes are copyright in many jurisdictions and the mail companies get revenue from selling the right to use this database.

Wikipedia pages can be created for any subject or object however they are subject to review and are likely to be deleted if they are not considered sufficiently notable or if the information posted cannot be independently verified.

Search algorithms

Algorithms for indexing and finding material in enormous corpora like the Internet entire.

An 'open search' alliance exists; work isn't much used yet.

Personal publication

Community sites

Default standards of giving creators the rights to reuse works they create through community sites and interfaces - including the right to license them freely. Better still, default free-reuse licenses for community sites where the goal is to create something of lasting value to the world.

Open Social networking

Software tools to let you link personal sites (blogs, User pages, etc.) on different domains so you can host your own blog and still friend, follow, 'like' pages on other web sites.

Broadcasts and event listings

Calendars of announced public events and broadcasts, from TV listings and radio programs to lists of events and performances.

Research and science

Academic journals

Open Access to all major journals.

The Open Access movement is making progress, with a focused mission, and successess in PLOS One and Western university sign-on with Open Access commitments. Journal aggregator groups have made progress in providing access to journals for small blanket fees to entire countries, in the least-developed parts of the world.

Scientific data

The data used in all trusted research papers, experiments, and polls.

Limited progress so far, limited focused messaging outside of Science Commons.

Usage information and Hazard warnings for chemicals and medecines

All chemicals have hazard warning information and safe use information. This should be freely available in standard formats.

All medecines have usage instructions and warnings. This should be freely available in standard formats.

The Human genome

Our DNA can be discovered so it is not an invention that can be patented. The DNA of other creatures likewise.

Prosthetics

The design of prosthetics is a classic case where having a body of freely shared designs will enable users of such prosthetics to customise their prosthetics to suit their particular needs and desires. Otherwise such users cannot control their own bodies and are left dependent on patent holders instead of being empowered to enable themselves.

As prosthetics develop over the next decades they will move from being used to compensate for disabilities to being used to give people enhanced abilities. In those circumstances it is important that people have control over what is in their bodies.

Biological parts

The emerging field of synthetic biology is beginning to tackle the problem of standardizing biological parts that encode specific biological functions, so that a synthetic biologist can program or design organisms with specific characteristics. The Registry of Standard Biological Parts is a first attempt to provide a repository of such parts. It uses MediaWiki in combination with additional specialized software to manage the parts.

Know-how, notes and methods

OpenWetWare is "a group of researchers that are interested in increasing the amount of organization, dissemination, and communication in biological research." Their wiki is a great example of knowledge sharing across many different labs.

Public and government documents

The Law

From municipal laws to international laws, statutes, their revision history, and relevant case law and debates should be both free and accessible.

Cornell and others have freed Federal and State law in the US. Other progress: ?

Court records

Court cases are held in public because it is important that justice is not just done but is seen to be done. Court records and decisions set precedents which can be as important as laws in determining what is and is not legal. In the internet age in public should mean on the net.

Groklaw has done this for the SCO v IBM and SCO v Novell court cases, publishing not just the decisions but all the submissions, transcripts and affidavits as well.

Public codes and standards

From ISO standards to city planning and building standards.

Malamud's public.resource.org has freed many subsets of these documents, with charm and bustle and force. Universal adoption of better standards for sharing are slow to develop.

Births Marriages and Deaths

Historic records of Births marriages and deaths are the foundation of genealogical research. They are public records which should stay public.

Public spendings and revenues

To fight corruption, public spendings and revenues should be freely accessible.

Material currently in copyright

Copyrights don't last forever. But they can last long enough that no legal copies are available when the copyright ends.

A repository of currently copyrighted material would aim to safely store copyrighted material and release it as and when it came out of copyright.