Offline/refactor

From Strategic Planning

The largest untapped potential for Wikimedia Foundation content is offline. The nearly 70% of global population without access to Internet[1] represent a huge opportunity for Wikimedia addressing its primary mission statement. Content re-use, and expanding contribution streams, provide a large potential pool of new ways to bring Wikimedia content to "every single human being", and additional ways to "share in the sum of all knowledge"[2]

The Foundation does not need to make large initial investments in this strategic expansion: public, private, and charitable efforts already exist who are investing in the field. Infrastructure development groups are currently challenging Foundation developers ability to support their efforts, and researchers are working to enrich project content for further offline reuse. Funding from both public and private sources has previously been secured, and is yet available, for projects in this strategy sphere.

Current Situation

Internet users per 100 inhabitants 1997-2007 ITU[3]

Recap findings of the TF.

Current WM Situation

Recap findings of the TF.

Key questions

  • Who are the actual consumers?
    • Not who are the target consumers, but who will actually use the product(s).
  • What is the potential effect of various offline initiatives contrasted with their predicted costs?
  • What role should the Wikimedia Foundation play?
    • Enabling third-party publishers, or creating/packaging/distributing product?
      • If the former, WMF relinquishes control of the market targeting. Does this still support the mandate?
  • What, specifically, is the offline strategy intended to do?
  • How, then, do we measure progress?
  • Which methods of linking offline usage back to the online community are best, and how should they be implemented?
  • What is the priority of offline strategy? That is, will any resources actually be budgeted for it, when will they be available, how and by whom will they be distributed?

Options

Do nothing

Globally internet access is improving. The general trend suggests the solely-offline population will continue to decrease, although it also suggests a plateau point within developed nations which never reaches 100%.

Within this strategy option, expanding the reach of WMF projects would involve continued investment of time and effort in developing local language communities, and expanding coverage of local topics to encourage regional audiences. Local technical infrastructure, such as squids inside bottlenecked countries, may improve user experience and thus local contributor recruitment and retention.

The primary drawbacks for this option is it slows development and adoption of content reuse technologies, which also eliminates Wikimedia Foundation influence in the development or adoption of those technologies. For example, WMF has no representation in or knowledge of the Wiktionary RDF experiments by researchers at the University of Leipzig. This option may allow technology innovators to make WMF redundant or monopolise a market, if they develop a system which circumvents the Foundation's bottleneck.

Support third-party development

In an ideal case, creating large, valuable databases attracts people who use exactly that kind of data. These interested parties will develop tools integrating the data, which will spawn further application of the data in wider contexts. Such leveraged use of content has a much greater effect than simple publication. For example, most browsers perform spell-checking with every character typed into the mediawiki edit box; that's well over 6k checks in writing just this particular article on the Strategy website, and all of them *could* be against a dictionary built from the Wiktionary project but instead are built against a much smaller word pool.

Wikimedia Foundation projects' content is available for reuse under various copyright licenses. As the largest single repository on the planet in many cases it should also be the most widely secondarily-published resources. This is not the current case. The Offline Task Force identified the primary obstruction to reuse as the data storage format.

Content is stored as unparsed text. There are no current third-party parsers, primarily because there is no mediawiki parser specification. The difficulty of parsing mediawiki-stored articles, due in large part to the complexity of template parsing, precludes its consideration for many desktop and cellphone projects.

The primary recommendation of the Offline Task Force was to focus on making WMF data more accessible for researchers and data reusers. Storing content in a structured format (such as xml) seems a significant step, although it would require careful planning prior to implementation. Developing and publishing a parser specification is also a fundamental element of supporting third-parties.

Drawbacks:

  • Content storage change requires major Mediawiki versioning: MW 2.0
  • Parser specification has potential dev community internal conflicts.
  • Both cost developer time.

Mobile

Main page: Mobile

The second priority recommendation of the Offline Taskforce involved focusing on the cellphone hardware platform for support. The cellphone is currently the largest single digital platform in the world, outnumbering computers dramatically. Within cellphones there are not-very-clear divisions between 'smart phone' handsets - with well-featured operating systems and complex applications - and less-sophisticated handsets which account for 90% of the current world market. There is uncertainty in estimating the future market penetration of cellphones in developing countries, as none of them have mature markets and the very diversified first-world nations have cellphone usage plateauing below 80%; current forecasts suggest developing nations will have higher cellphone use as there is less land-line infrastructure.

The Wikimedia Foundation is currently engaged in usability projects to improve mobile use experience for users who have digital access as part of their network provider plans. Other alternatives, such SMS article request systems and other network provided dynamic data or OEM provided static data, would involve expanding WMF mobile support.

Drawbacks:

  • Usability waterfall
  • Competing with 3rd party initiatives
  • Creating new technology without a known market
  • Marketing to network providers and OEM.

Audio

Audio Wikipedia provides the opportunity to hear an article read to users who are illiterate, as well as for people with vision impairments.

In the near to medium term recording articles and improving text-to-speech integration may allow this audience better access to Foundation project content, but consideration must be given to audio article navigation.

Drawbacks:

  • Time/storage sink in creating millions of recorded articles which will need constant maintenance.
  • Audio navigation in every language

Print

Main page: Proposal:Publish a collectors edition wikibreviated

Commercial encyclopedias are withdrawing from the print market, and/or are financially challenged. It is thus not clear if there is a market for print editions beyond the tiniest fraction of information articles, while the cost of printing, distributing, and updating paper versions of Wikimedia projects for millions of people would be very high.

There are contexts where printed editions still make sense, and can be implemented in small scale, particularly educational materials. With PediaPress's Collection extension, custom books may be printed directly from WMF project sites.

Drawbacks:

  • For WMF-printed materials: large initial expense, distribution costs, potentially high maintenance cost.
  • Shelf-life.
  • Co-ordination and marketing costs.

Digital

Main page: Proposal:Offline Wikipedia

There are many current and expanding projects which deliver digital editions of a subset of Wikimedia projects. Digital editions can be distributed in a variety of manners, from DVDs to USB thumb drives to MicroSD cards. Some of the possibilities of these include static editions on cellphones, computers not connected to the internet, even in-flight entertainment systems.

Offline digital storage is particularly apt for systems which have intermittent internet connectivity, including laptops and other mobile devices. Especially for cellphones, it is easy to imagine apps which allow maintenance updating of the static database - either via scheduled downloads or via MicroSD updates/flashing on a computer. Offline storage standards and readers currently exist, as do technologies for creating custom collations of WMF-derived content; what is lacking are applications and interfaces for on-demand data updates across a range of platforms such as cellphone OSes.

Local, offline digital storage raises excellent opportunities for native OS editing. The ability to edit an article in a native editor already exists in Mediawiki. Integrating an offline reader with a mediawiki API should not present a great technical challenge, and should be encouraged, but may involve improvements to Mediawiki's versioning engine.

Drawbacks:

  • Publication materials and associated costs (DVD, USB thumbs, microsd)
  • New software associated with on-demand updates
  • Development hours for improving MW versioning
  • Marketing costs.

Key resources

Fill in from TF, current.

Notes/References

  1. [1] Internet World Stats, 30 June, 2010.
  2. Wikimedia Foundation Vision: “Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment.”
  3. This data is 4 years old already. How relevant is it still? amgine