Regional bandwidth

From Strategic Planning
Jump to: navigation, search

Problem description

Bandwidth differs significantly from region to region, and is a barrier to reach. If users within a region are not able to use any Wikimedia projects, be it their local Wikipedia or the English version, there will certainly not be any growth of the local projects from within that region.

This article contains a map that shows international bandwidth per capita as of 2005. In Europe and North America, where the growth of Wikimedia projects has been highest, the bandwidth is likewise. In large parts of the Middle East, South Asia, and Africa the bandwidth per capita is less than 1Mbps.

Akamai State of the Internet Report 2010Q1 figures, figure 24 (Average and average maximum connection speed, average megabytes downloaded per month by mobile provider).

More interesting is the bandwidth per connection. When looking at "surf speed" statistics, the average "outside country" surf speeds in the countries in the list very often is 100kbps or below, not very uncommon with values lower than 50kbps. Most of the countries in that list are also quite well developed which makes it very likely that Internet connections in less developed countries are even slower. The statistics does however also show that the "within country" speeds often are much higher than the "outside country" speeds.

Additionally, with the large amount of mobile phone subscriptions in developing countries it is not very unlikely that many of the Internet connections in the near future will be through mobile technologies (see also Task force/Recommendations/Offline#Outline of offline recommendation #2: Use of cellphones). GPRS has maximal connection speeds of 56-114kbps while 3G as a maximum has 14Mbps down speed and 5.8Mbps up speed. To get a feel for the reality of the problem, use a mobile connection and try to read and edit a Wikipedia when the connection is pretty weak.

Technical Analysis

Using the Firefox extension Firebug the amount of data that was needed to be loaded when fully reloading the articles that was featured at from 1st to 30th of December where 340kb, visiting the page without being logged in. Moreover, even though no such large article was found in December there was at least one in November where 1.2Mb where needed to be loaded. Namely the article wind that was featured the 18th of November.

Further a random walk through Wikipedia articles where a random blue link where followed from a featured article, a new random blue link from this new article followed and so on, seemed to indicate that the amount of data that was needed to be loaded mostly where in the interval 10kb-500kb. This time there was no reloading done, so as to try to find out how much data that has to be loaded when a random new article is loaded but cached material reused. Most of the material that was loaded came according to firebug from, sometimes there where material from other destinations that accounted for almost half the loaded data, but very often the accounted for the significant part. This indicates that when visiting Wikipedia, most of the information that is loaded is media. See also WikiStats: number of requests.

Now assume that 200kb of data has to be loaded on an average when visiting a new article, which seems reasonable from the random walk. With an internet connection at 50kbits it would at full speed take the user 200*8/50 = 32 seconds to load a new article. The featured article of the 18th November would with the same assumptions take over three minutes to load at full speed.

Possible strategies for dealing with these problems

There is a couple of things that can be done to limit these problems:

  • Give visitor the opportunity to turn off automatic loading of media. A highly visible button labeled "Turn off media" or even something more explaining as "Does Wikipedia load slowly? Click here to turn of media and decrease loading times.".
    • We could also reduce required bandwidth reducing thumbnail default width (used to be 180px, now moving to 220px with Vector rollout) for page requests from certain countries; or with a button and a cookie; or inviting users to subscribe and set their preference. (Note that we should increase the number of images which don't have a specified thumb dimension.)
  • The local projects could be hosted locally to take advantage of the higher "within country" traffic speeds.
  • Local mirrors or caches can in the same way as local hosting decrease the loading times.

Note: There might be legal and technical problems with hosting, mirroring or caching Wikimedia projects outside the US that has to be considered. Caching could maybe avoid some legal issues that hosting and mirroring presents. Local hosting, mirroring and caching does not necessarily imply that this should be done in every country. There could for example be possible to find a country on each continent that complies with the legal framework that the WMF operates under. Such continent hosting, mirroring and caching would also increase the access speed.

See also