Summary of Discussion

Summary of Discussion

Edited by 2 users.
Last edit: 00:50, 19 March 2011

Summary of Discussion[edit]

I have summarized the main points from the Talk:Editor_Trends_Study as of October 22nd, 2010. There are basically three types of issues raised.

1) More granular classification of editors

2) Definitions of a New Wikipedian and an Active Editor

3) Reinventing the wheel


Granular classification of editors[edit]

A number of people mentioned that a more granular classification of editors will give a better understanding of the different editor trajectories within the Wikipedia sites. One way of embracing this idea is that the trends analytics software allows for customization of defining different groups of editors. This means, that in case that we do not have enough time to investigate this right away then it will be relatively easy to conduct such an analysis yourself. But this point is also closely related to the 2nd point.

Definitions of a New Wikipedian and an Active Editor[edit]

Some people wondered how we have chosen the different threshold for New Wikipedian and Active Editor. First, we will use the definitions as they have been used in the past by Erik Zachte's statistical analysis at [stats.wikimedia.org stats.wikimedia.org]. This will reduce confusion and eases interpretation. However, we might want to refine these definitions if the results of the analysis suggests that we

Reinventing the wheel[edit]

Finally, some people warned us for not reinventing the wheel by starting from scratch as a lot has been done in the past. I wholeheartedly agree with that, Wikipedia has a rich history and I am very grateful for all the links / papers that have been suggested. From a software point of view, we will write some new code. The reason for this is that (and this might be a limitation of my knowledge) there are no suitable tools (yet) that can create datasets and run the analyses in a timely fashion independent of Wikipedia locale to answer the questions we are raising. We are using a schemaless database from which we can create datasets that can be read by R, SPSS and Stata.

If you want to help us[edit]

If you would like to help us, for example running the analysis for your local Wikipedia site, then you can do the following.

  • Check out the editor_trends package from Subversion and use the command-line interface to download a dump, and create a dataset. You can either help us in testing this functionality, or if you have a Python background help in improving the code.
Drdee23:13, 22 October 2010