Updating offline content

Updating offline content

I've used offline Wikipedia implementations like Moulin. A major weakness in solutions such as these is they are hard to get small updates. I have to download the whole DVD image to get an update. Over slow internet connection like those here in Cameroon it is very time consuming.

There needs to be a mechanism to update changes from hour to hour, day to day, month to month depending on the preference of the user. Also a way to share updates with others once they are downloaded would be quite useful. Perhaps being able to extract updates from the offline database newer than a given date and save them to a file. Saving those updates to a USB drive and then using it to update multiple computers seems like a good solution.

Different users could setup an update schedule that works for them. For instance the university I work for would benefit from having an offline mirror of sorts, that could update every few hours. While the internet cafe in town could benefit from updating their content every few days. Home users without internet could get updates from the internet cafe, if they wanted.

Alecdhuse06:51, 30 October 2009

(1) Making an offline version of Mediawiki content is a not so easy and time consuming task. (2) Making periodically such work an provide incremental update is worth. (3) Giving this incremental update a precision of an hour is even worth.

As a technical and well experienced expert in the domain, my opinion is: (1) Should be a topic: How to do that correctly for all our projects. (2) This is pretty technical challenge. Solution is storage format dependent, currently as far as I know nobody has published something about that. (2) Pretty unrealistic currently, also not sure this is necessary for end users, also not sure this is theoretically possible without introducing heavy side-effects.

I'm interested in any idea/work/proposition about how to resolve (2) with the format ZIM.

Kelson09:13, 4 November 2009

So could there be an automated process for getting online data, offline? Something similar to the data dumps, but be able to request data added after a specific time stamp? If we intend to let people redistribute it, we should have some kind of verification.

As for the offline format, Zim may work, I don't have much knowledge of that project. However I will be looking into it as time allows.

Alecdhuse14:15, 4 November 2009

"could there be an automated process for getting online data, offline?" Yes, we can. The Foundation will certainly do it and use the ZIM file format. Thomasz is in charge of such stuff on the WMF dev. side. Would be great to involve him in our discussions. This will be intensively discussed the 22 November in Basel during the next OpenZIM dev meeting.

Kelson14:55, 4 November 2009

I'd like to see us putting out a specific collection - let's consider (say) Cameroon French release Version 1.700. This would have a broad selection of general topics, plus very thorough coverage of Cameroon and neighbouring countries. Let's say that's released on January 1st, 2014. We might then put out monthly updates - so on February 1st you could get Version 1.701, on March 1st 1.702, etc. These would be the same articles, but updated versions. Then, at the end of the year, we might review the actual content, add a few new articles and remove some that have become less important. On January 1st, 2015, you would be able to get Version 1.800, and the cycle would begin again.

Much of the work for this in en:WP and fr:WP could be done using the WikiProject assessments and SelectionBot - hopefully this or other solutions will become available elsewhere soon. To do this type of versioning would require three main things that are new, in order to work well:

  1. A reliable method for selecting vandalism-free article versions. I believe that WikiTrust will provide this.
  2. A good system of organization for releases which requires little manual maintenance. This would also include a way to see if your article collection has a new version available, then to download if desired.
  3. A system whereby you can need download the changes to the articles, rather than the entire content (which will mostly be unchanged).

Walkerma 04:23, 24 November 2009 (UTC)

Walkerma04:23, 24 November 2009