Quality gives fewer editors and updates117:59, 16 May 2012
Veteran editors218:27, 28 March 2011
Fix survey graph: new users per day not month321:36, 18 March 2011
Data used to create the diagrams813:32, 18 March 2011
Retention300:37, 18 March 2011
Rename newbie to newcomer014:27, 10 June 2011
No end of the world soon :)416:06, 16 March 2011
Order of Findings 1-2-4-3-5218:59, 14 March 2011
Are these the right questions?419:37, 17 March 2011
Scale016:21, 11 March 2011

## Quality gives fewer editors and updates

Edited by another user.
Last edit: 12:05, 16 May 2011

## Veteran editors

I've run over the numbers myself, and I graphed a logistic fit against the data. I think it's come out pretty good. The fitted function is $\frac{14988.4}{(1 + 3.50179 e^{-1.59948 (year-2006.06)})^{1.35287}}$

The export of the image was done for my by RoanKattouw as the trial edition of the software used to make the export (Mathematica) doesn't allow exports. Martijn Hoekstra 21:19, 20 March 2011 (UTC)

Martijn Hoekstra21:19, 20 March 2011

How did you define veteran editor? I'm trying to understand what's actually being plotted. Thanks!

Howief22:03, 22 March 2011

I defined veteran editor as 'editor that was still active at least 12 months after first editing Wikipedia.' which IMO is a fairly 'fair' definition. I've got some data for 6 months and 24 months too, but 24 months isn't properly showing the plateau (yet). For the 2005 mark it's the number of retained editors for 12 months from the jan 2004 tranche, for the 2005.25 mark it's the number of editors retained for 15 months from the jan 2004 trance + the number of editors retained for 12 months from the april 2004 trance, etc. etc.

Martijn Hoekstra18:27, 28 March 2011

## Fix survey graph: new users per day not month

The survey has a glaring error in describing the graph of new users in English Wikipedia, as being new users per month. Instead, the text should state, "new users per day each month" and the graph icon "–•–" should be labeled as "new users per day". Well, perhaps not everyone would consider the "per month" phrase as a glaring error, but it is off by 30x times too low: enWP gains nearly 192,000 new users per month. As a sanity check, remember: almost nothing in English Wikipedia happens as low as a mere 10,000 times per month: new articles run 28,100 per month, or project/talk pages run 310,000 per month. Anyway, during recent years, the daily growth in registered users of English Wikipedia has been closer to 9,000 per day (currently: 6,290 new users per day in March 2011). So, the graph matches those figures of new users "per day" rather than per month. -Wikid77 18:15, 17 March 2011 (UTC)

Wikid7718:15, 17 March 2011

Are you referring to this chart? This chart actually measures "New Wikipedians", not new Registered Users. The definition of "New Wikipedian" is the same as the one used on stats.wikimedia.org, namely a registered user that has made at least 10 lifetime (cumulative) edits in the main namespace (see this table for the raw data: http://stats.wikimedia.org/EN/TablesWikipediaEN.htm). This number is significantly lower than the number of registered users per day, which you've correctly stated is in the thousands per day.

Thanks for the vigilance on this. And yes, 30x difference would be considered a glaring error :)

Howief20:31, 17 March 2011

How many users create a username, then stop at 9 edits? Most of the new usernames seem to go above 10 edits (even sockpuppets), then stop. Hence, when they exceed 10 edits, then the result is similar to all those active usernames appearing in numbers as if registered on the same day. Otherwise, to have only 6,400 new 10-edit users per month (only 8 per hour?), then the other 97% of new users must stop at 9 edits. Why would 97% stop there? That is why I raised the issue.

Wikid7721:03, 18 March 2011

We don't have any data yet on users that create an account and then stop prior to completing their 10th edit. I'm hoping we can get some data on this soon. The decline in New Wikipedians can actually be broken down into two parts: 1) fewer users making their first edit and 2) of the users that make their first edit, fewer are making it to their 10th edit. Each of these requires a different approach to course-correct. I'm hoping to get numbers on the 1-10 edit fallout as well as absolute numbers on how many users make at least one edit and how that's trended over time.

One piece of data we do have -- we know that on the English Wikipedia, about 30% of users who create an account end up editing with the first 10 days. My guess is that this comprises the bulk of users who end up editing. Using those numbers can give us a very rough approximation for what the ratio of [1-9 editors]/[10+ editors] is.

216.38.130.16221:36, 18 March 2011

## Data used to create the diagrams

It would be really helpful to publish the data tables used to draw the various diagrams. Otherwise it is quite difficult to create comparisons between the English Wikipedia discussed here and the findings pertaining to a project not included in the original study.

Thanks,

Bdamokos12:58, 14 March 2011

The data is available to my understanding, or at least you can build them yourself using Editor Trends Study/Software.

Steven Walling at work00:00, 15 March 2011

I haven't found the data tables used for the diagrams and running the software for the English Wikipedia might take a week or so (seems a waste of my resources if it is only a matter of copy-pasting into a wiki page – I'd rather spend that time on running the software on wikis not previously studied - which is why it would be handy to have the data easily available for comparison).

Bdamokos12:07, 15 March 2011

They actually aren't currently in data tables to my understanding. The data sets are created by the software toolkit from the regular XML dumps, and then they are stored in MongoDB collections, which is a document-oriented database rather than a relational one with tables like MySQL. As far as data in say, simple CSV format, it might exist but I don't know. My instinct is that all the tools and methods were open sourced because the actual data is so big that it's not something anyone can just download in one quick go, open up in Excel, and see what's going on in the graphs. It makes more sense at studies of this scale to let people generate the data themselves in whatever format they want.

Steven Walling at work23:45, 15 March 2011

Hi. Sorry, if I was unclear, what I am referring to is that the graphs be made more acessible by providing the values of x and y of the graph, e.g. on the image description pages. A patient person might add this info based on visual observation, but I expect there is an easier and faster way. (This would be a minuscule part of the data, and the ones creating the graphs probably have it in the software they used to create the picture files.)

I understand that the analytic software will be available pending the fixing of a bug. Thanks,

Bdamokos00:12, 16 March 2011

Ah, I see. That is different.

Steven Walling at work01:28, 16 March 2011

## Retention

Hi. Would it be possible to create a graph with the retention data but use the absolute numbers instead of the percentages?

(I am curious, because looking at the Hungarian Wikipedia it seemed to me (the red and purple lines; the blue is the new editors, green the new Wikipedians) that the average number of retained users seemed to stagnate at a lower level, while the number of new users increased constantly, which of course results in lower retention rates, but also hint at there being some kind of bottleneck as opposed to an ever more "hostile" environment... It would be interesting to see if this is similar in other Wikipedias as well?)

Bdamokos15:13, 15 March 2011

I will include the absolute numbers in the upload so that you can manipulate any way you like.

Where did you get the data for the Hungarian Wikipedia chart? How are "new editors" defined? I'm assuming you're using the same definition for "New Wikipedians" (>=10 lifetime edits). And what are the red and purple lines?

Howief20:38, 17 March 2011

Thanks.

I used the software for the editor trends study (unfortunately it failed at the last step; so I exported the data from MongoDB into CSV and been trying to replicate the charts[1] using some Excel black magic – I'll check if using the wikilytics software to export the data would give different results).

By new editors I meant people who have made at least 1 edit (the blue line); New Wikipedian is >=10 lifetime edits (green line); red line is the number of new editors who have edited in the 12th month after joining; purple line is the number of new Wikipedians who have edited in the 12th month after becoming a new Wikipedian.

Bdamokos20:52, 17 March 2011

I'm glad you're using Wikilytics! I think the new editors trend is very interesting. It's curious that the enormous spike in editors that made >=1 edit around Aug-Oct 2007 was not accompanied by a proportionate increase in New Wikipedians (>=10 edits). I would expect the increase in New Wikiepdians to less, but not that much less.

I just posted the links to the data used for the graphs here.

Howief00:37, 18 March 2011

## Rename newbie to newcomer

Edited by another user.
Last edit: 14:27, 10 June 2011

This seems like a simple issue, to just use the word "newcomer" to replace word "newbie" as perhaps considered less offensive to some readers. Especially in the U.S., "political correctness" is still a severe issue, such as the recent firing of famous comedian Gilbert Gottfried, as the former voice of the Aflac insurance duck, because of jokes he allegedly made about the en:2011 Sendai earthquake and tsunami, in Japan. It might seem common sense to allow a comedian to make jokes, or just admonish humor considered in bad taste; however one wrong word can foment severe rabid outage, and people in the U.S. are often fired, or entirely lose their careers, over 1 wrong word, despite the ironic hipocrisy against America's treasured "en:Freedom of Speech", being far more free in Greece or other countries than in the U.S.  Anyway, perhaps using the word "newcomer" more often would avoid some outrage against the survey report. -Wikid77 17:45, 17 March 2011 (UTC)

Wikid7717:45, 17 March 2011

## No end of the world soon :)

Wonderful to see the study in such an advanced version. And the news is not half bad. Yes, there are problems, but the project is not crashing down. Great to see it has sparked more debates on improving community; hopefully over the next few years we will figure out how to revitalize the community.

I would like to see more analysis of the namespace editing and activity. In particular, regarding the people who leave and people who stay: what namespaces do they edit? How active are they? For example, I wonder if the people who stay tend to be more active in Wikipedia namespace, and those who leave, more active in the article namespace?

Piotrus05:28, 12 March 2011

Very interesting, much here to digest, and many thanks to the people who created this useful study.

Yes, one quick reaction I have is "Well, is any of this really a problem"? It's possible that a lot of this is because of the Wikipedia getting closer to being "finished" in that all the important articles have been written. That is what a lot of people like to do most I suppose. I would expect the number of editors do decrease under this circumstance, and this is an effect of our success.

Herostratus19:06, 13 March 2011

As far as the content/subject of what active editors are participating in, the Editor Trends Study itself doesn't address that, but my research that is gearing up now, the Contribution Taxonomy Project, is going to use the datasets from Editor Trends to try and put hard numbers to the different kinds of edits, in order to track more qualitative trends in editing. If there are certain kinds of activity that you think should be measured and want to help measure them, then please dive in on that project in the coming weeks.

Steven Walling at work22:32, 13 March 2011

If this trend would be seen in the largest/most in depth language version I could agree that that being nearly "finished" could be an explanation, but if we see such a decline in medium size projects, then the explanation would have to be different.

GoEThe16:06, 16 March 2011

Piotrus, thanks for your comments. Regarding the namespace analysis, Diederik has modified the Wikilytics code to incorporate other namespaces. Some student at Stanford have done some preliminary analysis on edit distribution by namespace and I will ask them to post their findings (which may take a while since they're in the middle of finals right now). Their analysis doesn't address your specific questions, but it's a start. I'd encourage you and others to use Wikilytics to delve into specific questions that you're interested in.

Howief18:48, 14 March 2011

## Order of Findings 1-2-4-3-5

Is there a reason for numbering the findings in a different order than they are presented here (or for presenting them in a different order than they are numbered)? I should rather ask this question before renumbering the findings myself. Nol Aders 11:03, 13 March 2011 (UTC)

I figure that the creators of the study, Howie and Diederik, ordered these findings carefully for a reason. I would hold off on reordering them for now thanks.

Steven Walling at work22:27, 13 March 2011

In the summary, the findings are ordered for readability. Specifically, Findings #2 (retention drop. . .) and #3 (. . .not simply attributed to vandalism/experimentation) go together from a logical standpoint. We wanted to make sure the summary provided a basic, easy to understand overview for readers who are interested in the top-level findings.

The main body of the document has a slightly different flow. The data is more of the focus, so that approach we took here was to look at one set of analysis (e.g., the cohort analysis) and present the findings from the data. Finding #4 (retention not worsening) comes after Finding #2 because they are derived from the same graph. Finding #3 is from a different graph, which is why it is presented separately.

Howief18:59, 14 March 2011

In paragraph Editor retention has not worsened over the past three years (Finding #4) "the Seigenthaler controversy and the beginnings of the BLP policy" are mentioned − could someone please give links to these subjects for the benefit of those who like myself have not been with those discussions? Thank you.

Chris5516:06, 13 March 2011

## Are these the right questions?

Am I alone in finding some of the presuppositions of this analysis surprising. e.g.

• The percentage of editors with less than one year of experience has fallen quickly since 2006.
Surely it would be worse if the opposite happened, or even if the proportions stayed the same.
• There's a dramatic decline in the number of "new Wikipedians" after May 2007 and I would have expected to see some exploration of this, but there is none.
Did something change at that date? Have any questionnaires or other measures been taken to find the reasons?
• The number of editors is only one measure of the success of wikipedia, yet it seems to be the only one considered.
Is not the number of readers also important? What about the quality of the articles?
• This is a multilingual project so I tried to check the stats on another site (German) and was surprised that not only was the executive director's message not totally translated into German but it referred back to this page for the stats. I couldn't find it at all on the French site.
My knowledge of other languages is limited and I'm not sure how I can relate to other language wikis or indeed whether the California base of Wikipedia is really doing it.

Chris55 20:46, 11 March 2011 (UTC)

Chris5520:46, 11 March 2011

I clicked on the link at the top "Self-organisation around proposals" to find that it took me to the main page of strategy.wikimedia.org and to discover that it had been rather disgracefully defaced - but I can't change it. Somebody please...

Chris5520:54, 11 March 2011

Number of readers and quality of articles are both heavily discussed elsewhere, see en:wiki/Wikipedia:Stats for some examples. both are important and both are still improving year on year. Readership, quality of articles, number of articles and number of edits have long been among the measures of Wikimedia success. This study is focussed on the relatively new phenomenon - our dwindling editing community.

Now that we have this data yes we can try to work out what has caused the patterns.

WereSpielChequers01:08, 13 March 2011

I take your point - obviously we need the stats. But most of the analysis is around a fall in the retention rates after 2005 - whereas what sticks out from the graphs is that something major happened in May 2007. The overall number of active editors, which had been climbing steadily, suddenly starts to fall. When you look at the change in new editors the change is even more extreme.

The most obvious possible cause is the release of version 1.10 of the software which happened in May 2007. But I can't see from the list of new features what might have changed things. Indeed 1.09 released in January of that year seems more significant with the release of the undo revision feature. But I don't know much about the actual use of the Wikimedia software in Wikipedia. Can someone on the inside please help? Chris55 16:03, 13 March 2011 (UTC)

Chris5516:03, 13 March 2011

Well, I waited a few days to see what other major explanations people might have for the early March/May 2007 drop in new editors. So, again, I feel strongly that the drop was mainly due to the notorious banning of Wikipedia in U.S. schools and colleges, as demanded by academic officials beginning in February 2007. See the enWP essay about bans, with 19 news articles:

Other bans were suggested in England. Those early bans, coupled with the typical 3-month school vacations (June-August) seem to be what thwarted the English Wikipedia. Note how some similar reductions occurred in the German Wikipedia in early 2007 (which had the similar 50% of users younger than 22 years) despite deWP's faster initial growth of gaining new users, while the French Wikipedia graph shows only a steady, seasonal variation in adding new users, with no sign of bans in France, to deter new users from joining French WP. Because the French WP shows no signs (yet) of a dramatic decline in new users, the French statistics can be used as a control group, to exclude the effect of updates for MediaWiki release 1.09, as not having a strong impact to deter new users. Obviously any large colleges banning Wikipedia use, among 2,000 to 50,000 students per college, would cause a massive decline in user access during 2007-2008 and beyond. -Wikid77 19:37, 17 March 2011 (UTC)

Wikid7719:37, 17 March 2011

## Scale

Could the x-axis scales, for example in this section have the same scale? The english wikipedia one starts in Jan 2001 and ends in Sep 2010, while the german graph starts at Feb 2002 and ends at Aug 2010, and so on. This would make visual comparisons easier for the readers. GoEThe 16:19, 11 March 2011 (UTC)

GoEThe16:19, 11 March 2011