Wikimedia technology infrastructure

From Strategic Planning

This article is a stub. You can help by expanding it.

Current Infrastructure

Wikimedia has under-invested in site performance to meet current expectations, let alone to support growth.

Site Infrastructure

  • There are insufficient offsite backups of site text and Wikimedia Foundation office data.{{citation needed}}
    The toolserver has a full copy of mediawiki databases (text is not fully available). It is also easy to download text backups from http://dumps.wikimedia.org/
  • Very limited backups of media and Web service data.{{citation needed}}
    There is a second box with a copy of the images. Backups are primarily based on ZFS snapshots. [what is Web service data?]

Site Reliability

  • Sites are subject to frequent short-term outages.{{citation needed}}
  • Limited standardization and automation leaves room for human error{{citation needed}}
  • Lack of redundancy could lead to long-term outages{{citation needed}}

Capacity

  • Current media upload capacity is low.{{citation needed}}
  • A significant increase in participation or features such as chat or real-time collaboration would dramatically increase demand on the servers.
    Something like Proposal:Real-time chat placing everybody visiting an article into a chat room would. Adding a irc gateway eaier to reach doesn't need to.

Scenarios

Linear Growth / No Shifts in Usage

Scaling to handle three times the number of visitors we have right now (1 billion/month vs 330 million/month) would require:

  • Three times the computing capacity
  • No shifts in people resources

Increase in Editing / Viewing Ratio

More Multimedia Usage

Development Needs

A/B Testing

During the design and development of new features, feedback from user behavior is difficult and expensive to gather resulting in infrequent user feedback studies and inadequate sample sizes. An automated A/B testing system would allow us to try features out for a subset of users and study their usage as it relates to the existing version, discovering user preferences and usability issues as well as vetting new ideas and prototypes.

  • Ability to send a subset of users to an alternative version of the site based on configuration information controlled by a web-based interface ('bucket testing').
  • Ability to capture usage information from user sessions. A very limited form of this concept exists as a MediaWiki extension called ClickTracking.
  • Facility to analyze usage patterns, especially the comparisons between different versions of the software.