Troubleshooting
Samat: can you send me the console output, I need more information.
If I give the python manage.py dataset or python manage.py -l Hungarian -c new_editor_count dataset command, I get the same result:
in the console:
... Final settings after parsing command line arguments: Project: Wikipedia Input directory: c:\wikimedia\hu\wiki Output directory: c:\wikimedia\hu\wiki and subdirectories Language: Hungarian / Magyar / hu Start exporting dataset Exporting data for chart: new_editor_count Project: wikilytics Dataset: huwiki_editors_dataset wikilytics huwiki_editors_dataset new_wikipedian 100% |########################################################################| Processing time: 0:00:07.050000 Storing dataset: C:\editor_trends\datasets\huwiki_new_editor_count_max_year=2012_min_year=2003.csv Serializing dataset to wikilytics_charts +----------+---------------+--------+---------------+---------+---------+---------+---------------+--------------------- +---------------------+ | Variable | Mean | Median | SD | Minimum | Maximum | Num Obs | Num of | First Obs | Final Obs | | | | | | | | | Unique Groups | | | +----------+---------------+--------+---------------+---------+---------+---------+---------------+--------------------- +---------------------+ | count | 973.555555556 | 789.0 | 853.956836016 | 14 | 2154 | 8762 | 9 | 2003-07-09 06:43:25 | 2011-02-22 19:01:16 | +----------+---------------+--------+---------------+---------+---------+---------+---------------+--------------------- +---------------------+ Dataset contains 1 variables Project: huwiki JSON encoder: to_bar_json Raw data was retrieved from: huwiki/huwiki_editors_dataset None Processing time: 0:00:07.090000
in the huwiki_new_editor_count_max_year=2012_min_year=2003.csv file:
"date count" "1-1-2006:12-31-2006 789" "1-1-2007:12-31-2007 1560" "1-1-2005:12-31-2005 287" "1-1-2004:12-31-2004 66" "1-1-2003:12-31-2003 14" "1-1-2010:12-31-2010 1613" "1-1-2011:12-31-2011 308" "1-1-2008:12-31-2008 2154" "1-1-2009:12-31-2009 1971"
That's the correct behavior as the new_editor_count is the default plugin that will run if you do not explicitly give a plugin name. So python manage.py dataset and python manage.py dataset -c new_editor_count give the same result. The data from the csv file looks good to me :) so I am happy to see that you are making progress. I will start preparing a video on how to replicate the editor trends study. Thanks for all the questions and feedback!
As I mentioned above it works fine.
I'd like to repeat this study and generate the same figures for the Hungarian language as the result page shows for big language versions, but after I've finished the calculation, I have only 9 numbers. I have expected a more complex result file. :) Which plugins should I run? (Bdamokos and you wrote that many of them don't work yet.)
And I have still a small problem during the process (I'm not sure whether you could fix or not):
BSON document too large, unable to store TXiKiBoT | BSON document too large, unable to store SieBot | BSON document too large, unable to store Xqbot### | BSON document too large, unable to store Luckas-bot########## | BSON document too large, unable to store SamatBot################ | BSON document too large, unable to store AsgardBot########################### | BSON document too large, unable to store DeniBot
What about this editors and their edits?
Thank you for all your trouble,
I've got the same errors, but as all these accounts belong to bots, I don't think it is a big loss if they are not stored among the humans.
(I think Diederik is currently making a video on how to replicate the study, so the second problem should be fixable as well...)
Exactly, these are bot edits and are discarded at the moment. The reason is a limitation of Mongo. With Mongo 1.8 this should be resolved so if you are really interested in these edits then I suggest you wait for Mongo 1.8. Else, there is nothing to worry about.