Community contributions

Community contributions

Priorities look fine so far. But you may have overlooked and important point, namely contributions from the huge community of wikimedians worldwide.

I've frequently found insightful studies, graphs and metrics gathered by wikimedians who uploaded them on many different places (user_talk pages, specific pages, etc.). It's not the first time that I've found that somebody else already explored a question of interest. This is valuable info that, right now, is scattered all around.

What about creating an open lab for people to submit their own visualizations and studies from wikimedia data? Other projects such as Many Eyes have proven the power of this approach, and I can imagine how well it could work in an active community like this one.

Related to this, the first step to encourage community participation is to improve the frequency and quality of data sources (that is DB dumps). I know it is a priority, but the fact that over the past 3 years DB dumps haven't been published regularly (specially for the big projects) might discourage some potential contributors. We could also think of ways for making it easier to play with these sources, like:

  • Sliced dumps (by #pages, by datetime, etc.).
  • Other formats (CSV or R data files with interesting fields to play with).
  • Improving documentation of existing tools to analyze Wikimedia data.
  • A common page introducing the standard "toolbox" to play with this data.
  • What about an R library supporting the most frequent operations and analyses performed on Wikimedia data?

The format of dumps themselves could also be improved. Sometimes the XML is not very well defined with repeated and non-unique fields.

Just some thoughts to start the discussion.

HTH.

GlimmerPhoenix14:22, 7 January 2011

Great ideas. I especially like the one around creating a repository for research. Right now, we have a fairly simple research page which doesn't look like it's being maintained.

Re: your other points about data sources, have you seen the Editor Trends Study? The toolkit Diederik is developing addresses some of the issues you raise.

Howief19:40, 19 January 2011