Proposal talk:Offline Wikipedia

From Strategic Planning
Jump to navigation Jump to search

The most recent dump of all the current versions of English Wikipedia articles is 5.1GB (compressed). I have the cheapest phone I could find, so I don't really know what kind of storage top of the range phones have, but 5.1GB is still rather large. According to Wikipedia the most storage space on any version of the iPhone (chosen vaguely at random, I'm guessing it is one of the best in terms of storage space) is 32GB, so it would fit but it would take up a large proportion of the space. Give it a couple of years, though, and this should be practical - storage space in mobile phones is growing faster than the English Wikipedia, I think. --Tango 02:52, 15 August 2009 (UTC)

This isn't a new idea

  • There's been a few stabs at this, not least the Wikipedia set on the OLPC laptops, Wikipedia 1.0 which led to the Wikipedia for Schools dvd, and the "wiki on a stick" USB-key-Wikipedia of a couple years ago. See also: m:Static content. If community members are interested in seeing this happen in an organized way, there are a lot of people who have considered the problem. Sj is probably the resident expert on all the efforts to gather offline Wikipedia content to date. I know there's also been some stabs at offline wiki technologies for offline editing. -- Phoebe 03:09, 15 August 2009 (UTC)
  • Then, if many people evokes it from time to time, it is reasonably a good idea. However, a system for interactively downloading selected parts of wikipedia while the editor is doing something else, would require some smart browser, or some superhuman and otherwise unstable Ajax hack. The topic here might be the browser SW, not any specific MediaWiki software (or in future we might expect a MediaWiki browser?). Maybe the very good idea suffers from being hard to achieve? Rursus 10:58, 18 August 2009 (UTC)

How about DVDs?

I think offering Wikipedia in DVD storage makes sense, like Encarta and Britannica but for an affordable price of course. On the one hand no of DVDs can by vary depending on the quality of contents; i.e. whether or not to offer accessories of articles such as full resolution images.--Email4mobile 09:13, 15 August 2009 (UTC)

If you want to include images you can only have a tiny fraction of articles. The size I quoted above was just for the text. --Tango 13:59, 17 August 2009 (UTC)

Sanity Check

"Even a mobile phone has enough storage capacity to hold the entire text of Wikipedia in one language"
Whoa. Slow down there. Maybe some high level phones could pull this off, like an iPhone, or some of the ultra high end smartphones, but wide access to mobile devices with that kind of storage is still about 5 years off. It's just not practical to assume that your average mobile device would benefit from this initiative in the sense you're talking about. Perhaps specifying specific articles or general categories to keep cached would be a better idea than including the entire wiki.

"Several person-hours of software development, systems/network administration, and testing time."
Again, whoa. 'Several'? Try hundreds, maybe thousands depending on how well tested it needs to be before going live across WMF. Unless there's this amazing distributed, failure-tolerant wiki access system that has almost no bugs and was designed to just plug right into MediaWiki that no one's told me about, you'd need a team to develop that software. And making something that can access MW on that level and be reliable would take a long time. We're talking professional developers working on the code. Granted, some of the features you specify have already been developed, and you could use open source libraries to create some horrifying, Frankenstein-like mashup of BitTorrent, rsync, squid, and a WYSIWYG editor, but adapting those systems to do the task you're trying to achieve without being horribly inefficient and bloated might prove to be just as difficult as writing the functions from scratch.

"I know that there are no technical barriers to the goals I've outlined"
There are. See above.

Granted, perhaps this proposal could pick up traction if it was open source and placed rather prominently on WP where people who are likely to be coders would see it, but it could be rather difficult to get the clientside software to the level where it's usable enough to release, in order to increase interest and visibility of the project and bring in more developers to continue work. In other words, it could end up being rather difficult to bring the system up to a level where its popularity can keep it going.

A few final points:

  • I'd like to mention that it's 3AM here, and I don't really have the time to properly flesh out my arguments and possible alternatives before I collapse, but I wanted to get my foot in the door here so that I can continue working out the problems I've mentioned, and a few I haven't.
  • I apologize if I seem rather rude in my writing, it's just that being a developer, it can be a bit aggravating hearing someone say "I need you to make a program that allows offline access to Wikipedia, either directly from the servers, or with a peer to peer system, and make it reliable, make it possible to commit changes back to WP, use it to cache content and serve up out of date pages until a fresh version can be obtained, and make it run on a cellphone too. And it has to be stable enough to release to the masses for use on WP without misbehaving and erasing articles or anything weird like that. It'll only take a few hours of coding, right?". Things like that make me start smacking my head against the wall. It's harder than that.
  • I didn't read any of the links in the references section, so some of these issues I'm presenting might, in fact, have been solved long ago. But I don't really have time to read them, as outlined in point 1. If I'm wrong about something, just ignore me.
  • I'm not trying to shoot your idea down or anything, I'm just pointing out a few of the major flaws in your vision for the implementation of a great idea, one that I've wished for myself many times. Like I said, I don't really have time to write up some solutions to these problems, but I don't think there's anything that cant be solved on the list of issues. Maybe I'll write about it some tomorrow.

--Lx45803 07:11, 20 August 2009 (UTC)

Downloadable casher

If you mean what I think you do then it could be done a lot easier. A peace of software could be written that, when you go on a Wikipedia page, checks for a new version to download in the background while it displays a cashed version for you. This wouldn't speed things up immediately, but frequently viewed pages and images, like the main page, logo and a few favourite pages, would be instant. The software could also delete pages that you haven't looked at in a while.Eddy 1000 12:32, 26 August 2009 (UTC)


Some proposals will have massive impact on end-users, including non-editors. Some will have minimal impact. What will be the impact of this proposal on our end-users? -- Philippe 00:14, 3 September 2009 (UTC)


There is an alternative to the two projects mentioned.. Okawix ... Thanks, GerardM 12:14, 9 September 2009 (UTC)

This is NOT an alternative. See my brief review: «It doesn't support the free standard format (i.e.; see also Not user friendly interface (I couldn't figure out how to add a new corpus), no "random page" feature. In the last release, many pages and templates which shouldn't be included, and blue link to non included pages. (I don't know of other bugs, I tried it only for a minute.)» I'll give you additional details by e-mail. Nemo 11:09, 12 September 2009 (UTC) P.s.: Useful link to a recent thread on wikitch-l (I hadn't seen it).

Downloadable Wikipedia

I like this idea of having Wikipedia available while my internet connection is down. 5GB is nothing on a PC. Give me the ability to keep the entire text content of Wikipedia locally and I'd do so without a second thought. Text content without images is still better than no content.

The way I'd use it is I'd still by default use the online wikipedia in my browser. But if my internet connection would be down, I'd power up say wikipedia.exe and look it up in the cached database. Slightly old content is still better than no content when you want to know something. It would be like browsing the offline Encarta or Brittanica, which would be at least equally old, and probably less complete.

On a technical note, the way I imagine it working is that it would first download the full version once, and then download the differences every now and then (weekly or even monthly would be acceptable), based on what version the user already has. You could have predefined rollup intervals, which would give you the chance to better compress the updates, thus minimizing traffic. And you could for each update create a predefined number of direct patches from old versions. For example, saying that the latest content version is 1.1.2, you could have rollups for 1.1.1->1.1.2, 1.1.0->1.1.2, and as far back as you want to go, to minimize traffic even further if clients miss a rollup. Creating patches for each previous version would be automated, and would only need to happen when each rollup is created.

Missing the point about Wikipedia

While the idea is interesting, I think it would have marginal benefits to commercially package and sell wikipedia. I think you are missing the major point and the strength of Wikipedia, Its that it grows every single second, its constantly edited or added to by users from around the world, no other encyclopedia could claim to be more up to date. If there was an offline version today it would be outdated in a matter of days, it would contain tons of errors, misinformation and mistakes, not to mention a majority of important articles would be incomplete. Also it bears mentioning in today's date and age, its easier to go online and look up to date info on wikipedia or elsewhere for that matter than have a large incomplete offline version to look through. the task of commercially producing and distributing Wikipedia would also be a large burden, it might make sense if a third party were to pay Wikipedia for the rights to do that but I doubt it would find much interest.Theo10011 16:38, 5 October 2009 (UTC)

I saw this the other day,[1] Its a WikiReader sold by openmoko, a handheld reader for the wikipedia. It comes with an offline version of wikipedia without the images, here's an excerpt-

"the public Wikipedia XML file has, according to Openmoko, a size of somewhere between 25 and 30 GB and the company was able to compress it to less than 4 GB. And no, there are no images included, as the Wikipedia content with images is about 72 TB."

The offline wikipedia version being discussed here would be without any images, 25 gb uncompressed, just thought that it should be mentioned.


-- 09:02, 25 September 2009 (UTC)


Can we mark this as done now because of this? WereSpielChequers 13:02, 15 October 2009 (UTC)

For me, no. I'd like to see something that's WMF designed and/or specifically advanced by WMF. -- Philippe 15:26, 15 October 2009 (UTC)

hey guys i posted the link above, I have a question about the Wikireader mentioned above. Isn't the Wikimedia foundation aware or involved in the commercial use of Wikipedia, from a legal stand point shouldn't the WMF at least be consulted before using an offline version of wikipedia. the company selling this might be interested in getting involved officially with the foundation through some revenue sharing model or licensing deal so that the WMF can provide up to date versions for the offline version and official support. Theo10011 00:24, 18 October 2009 (UTC)

I was assuming from this link that they had some sort of arrangement with the foundation that allowed them to use our trademarks. The data itself is of course licenced to allow such commercial reuse - provided we get appropriate credit for that, but I thought our logo was a different matter. WereSpielChequers 01:24, 18 October 2009 (UTC)
Don't mark this as done. Not everyone is going to buy the Wiki Reader (it costs $100, anyways) and a free solution is always preferable. MC10 18:37, 7 November 2010 (UTC)