Proposal:Building a database of all books ever published

From Strategic Planning
Status (see valid statuses)

The status of this proposal is:
Request for Discussion / Sign-Ups

Every proposal should be tied to one of the strategic priorities below.

Edit this page to help identify the priorities related to this proposal!


  1. Achieve continued growth in readership
  2. Focus on quality content
  3. Increase Participation
  4. Stabilize and improve the infrastructure
  5. Encourage Innovation



Building a Universal Library

Summary

There is currently no freely available database of all books ever published. This is much needed at least as a reference for Wikimedia projects. It needs to be multilingual, and can only be built as massive collaboration project, seeing the task. Therefore it is quite obvious that Wikimedia as an interest and a capacity to work on such a project.

Currently, the biggest free database of books is the Open Library Project. It is however only in Beta stage, and very far to be comprehensive. OL claims to have 23 million book and author entries. However many entries are duplicates of the same edition, not to mention the same book, so the real number of unique entries is much lower. Wikisource has data which are not included in their database (and certainly also Wikipedia, but I didn't really check). So it seems quite obvious that we should work with Open Library if we decide to work on this.

Features needed

  1.  The data needs a structured web site, not a plain wiki like Mediawiki.
  2.  It would be best not to duplicate work on several places: Openlibrary, Wikipedia, Wikisource, etc.
  3. Multilingual web site.
  4. A powerful search engine. OL search engine is really not upto the task: many important search options are not available now (search by language, by publication date, by author's death date, and I don't think "search by publisher" is very useful.
  5.  A big part of this data is already available, but scattered on various databases, in various languages, with various protocols, etc. So a big part of work needs as much database management knowledge as librarian knowledge.
  6.  What most missing in these existing databases (IMO) is information about translations: nowhere there are a general database of translated works, at least not in English and French. It is very difficult to find if a translation exists for a given work. Wikisource has some of this information with interwiki links between work and author pages, but for a (very) small number of works and authors.
  7.  Missing features at OL are: links between related entries (like interwiki), easy merging of similar content and redirect, easy "request for deletion" process, etc. Some of these are planned for the next version of their software.

Proposal

This is not necessarly a new Wikimedia projects: the most likely senario is helping Openlibrary building a better project, more useful for Wikimedia projects. For example:

  1. Extension for easy searching and linking to a book or an author in OpenLibrary.
    1. OpenLibrary has an API which would allow any relevant wiki article to be dynamically linked to their data, or that an entry could be created every time new relevant data is added to a Wikipedia projects.
    2. That's another possible benefit: automatic list of works/editions/translations in a Wikipedia article. You could add {{OpenLibrary|author=Jules Verne|lang=English}} and you have a list of English translations of Jules Verne's works directly imported from their database. The problem is that, right now, Wikimedia projects have often more accurate and more detailed information than OpenLibrary.
    3. David Strauss did a quick implementation (basically a demo) of an OpenLibrary extension for MediaWiki. In very little amount of code, he was able to easily search the OL (via AJAX) and when the user selected a given result, it populated a Citation template. What was nice is that when no results came up for a given search, there was an "add to open library" button that brought you to the OL site to add your bibliographic information. It would be easy to build upon this work and one could do a really powerful MW extension (and maybe some new templates, etc) that would allow people to contribute to both MW and OL simultaneously.
  2.  OpenLibrary has already a field "link to Wikipedia", which needs improvement. OpenLibrary has author pages for 6.5 million author names. Some of these are "junk" duplicates that should be merged, but still there are quite a large number of authors there. These have a field for a Wikipedia URL, but only 1100 records have a value. Connecting author pages in OpenLibrary to Wikipedia biographies is just one way where we can do a lot, without needing to start a new project.
  3.  OpenLibrary needs a field "full text at Wikisource".

Motivation

Have a complete collaborative, editable, versioned, multilingual, annotated book reference system, easy to link to, matching Wikimedia projects needs. This is all about avoiding duplicate work between Wikimedia and OpenLibrary. It could also increase accuracy by double checking facts (dates, name and title spelling, etc.) between our projects.

Key Questions

  • How can Wikimedia help OL? Do they need our help, or they need only to cooperate with us to understand how we can work together better? What do we know how to do that they are only starting to explore?
  • How can OL help Wikimedia? Do we need their help, or just cooperation to better coordinate our work/references/community efforts? What do they know how to do that we are only starting to explore?

Potential Costs

References

Community Discussion

Do you have a thought about this proposal? A suggestion? Discuss this proposal by going to Proposal talk:Building a database of all books ever published.

Want to work on this proposal?

  1. .. Sign your name here!