Troubleshooting
Dear Diederik,
Thank you for your answer. I'm waiting for the update.
Best regards,
Sorry for the delay. Please download the most recent version from Subversion and give it a spin. Let me know if it works. The documentation needs to be updated as well.
Dear Diederik,
I have tried this updated version, but I am afraid it still doesn't work properly.
After python manage.py dataset I got this message:
Traceback (most recent call last): File "manage.py", line 449, in <module> main() File "manage.py", line 422, in main rts = runtime_settings.RunTimeSettings(project, language, args) File "C:\editor_trends\classes\runtime_settings.py", line 62, in __init__ self.targets = self.split_keywords(self.get_value('charts')) File "C:\editor_trends\classes\runtime_settings.py", line 115, in split_keywords keywords = keywords.split(',') AttributeError: 'function' object has no attribute 'split'
If I have tried python manage.py -l Hungarian all I got this message and I didn't find the result csv (where should I find?):
Starting dataset_launcher Start exporting dataset Processing time: 0:00:00.010000 Function dataset_launcher does not return a status, implement NOW
Could you please check the code again? Thank you, cheers,
I am looking at it right now.
Okay, try it again please. The right command is:
python manage.py -l Hungarian -c new_editor_count
the 'c' stands for chart and it specifies what kind of chart you want to generate. If you do not provide 'c' then you will get an error.
Hmm. I am sorry, but I got this:
Traceback (most recent call last): File "C:\editor_trends\manage.py", line 583, in <module> main() File "C:\editor_trends\manage.py", line 554, in main project, language, parser, = init_args_parser() File "C:\editor_trends\manage.py", line 482, in init_args_parser default=inventory.available_analyses()['new_editor_count']) File "C:\editor_trends\analyses\inventory.py", line 41, in available_analyses plugins = import_libs(path) File "C:\editor_trends\analyses\inventory.py", line 67, in import_libs func = getattr(module, module_name) AttributeError: 'module' object has no attribute 'list_makers'
I needed an easy_install texttable command which is not in the documentation. And I think, you thought python manage.py -l Hungarian -c new_editor_count dataset command above.
Thank you for your patience, best regards,
You were very unlucky :) Today somebody else started committing code as well and that caused the problem. try again, i have fixed it. thanks for your patience.
It works fine now :) But I don't really understand the result: huwiki_new_editor_count_max_year=2012_min_year=2003.csv is a file with file size of 230 bytes (9 data lines). Is this csv the result file? Wikimedia folder contains >3,5 GB and >1000 files, and data folder contains 2 GB and 6 files. Could you please help me again? Thank you very much,
I've been trying the other plugins under the analysis directory, but all of them seem to return some sort of error, except for the new editor count.
E.g.
Microsoft Windows [verziószám: 6.1.7600] Copyright (c) 2009 Microsoft Corporation. Minden jog fenntartva. C:\wikimedia\editor_trends>manage.py dataset -c histogram_edits Wikilytics is (c) 2010-2011 by the Wikimedia Foundation. Written by Diederik van Liere (dvanliere@gmail.com). This software comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to distribute it under certain conditions. See the README.1ST file for more information. Final settings after parsing command line arguments: Project: Wikipedia Input directory: c:\wikimedia\hu\wiki Output directory: c:\wikimedia\hu\wiki and subdirectories Language: Hungarian / Magyar / hu Start exporting dataset Exporting data for chart: histogram_edits Project: wikilytics Dataset: huwiki_editors_dataset wikilytics huwiki_editors_dataset new_wikipedian Process Analyzer-2: | Traceback (most recent call last): File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap self.run() File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run task.plugin(self.var, editor, dbname=self.rts.dbname) File "C:\wikimedia\editor_trends\analyses\plugins\histogram_edits.py", line 25 , in histogram_edits var.add(new_wikipedian, cnt) File "C:\wikimedia\editor_trends\classes\dataset.py", line 290, in add start, end = self.set_date_range(date) File "C:\wikimedia\editor_trends\classes\dataset.py", line 146, in set_date_ra nge return datetime.datetime(date.year, 12, 31), \ AttributeError: 'bool' object has no attribute 'year' Process Analyzer-3: Traceback (most recent call last): File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap self.run() File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run task.plugin(self.var, editor, dbname=self.rts.dbname) File "C:\wikimedia\editor_trends\analyses\plugins\histogram_edits.py", line 25 , in histogram_edits var.add(new_wikipedian, cnt) File "C:\wikimedia\editor_trends\classes\dataset.py", line 290, in add start, end = self.set_date_range(date) File "C:\wikimedia\editor_trends\classes\dataset.py", line 146, in set_date_ra nge return datetime.datetime(date.year, 12, 31), \ AttributeError: 'bool' object has no attribute 'year' Process Analyzer-4: Traceback (most recent call last): File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap self.run() File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run task.plugin(self.var, editor, dbname=self.rts.dbname) File "C:\wikimedia\editor_trends\analyses\plugins\histogram_edits.py", line 25 , in histogram_edits var.add(new_wikipedian, cnt) File "C:\wikimedia\editor_trends\classes\dataset.py", line 290, in add start, end = self.set_date_range(date) File "C:\wikimedia\editor_trends\classes\dataset.py", line 146, in set_date_ra nge return datetime.datetime(date.year, 12, 31), \ AttributeError: 'bool' object has no attribute 'year' Process Analyzer-5: Traceback (most recent call last): File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap self.run() File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run task.plugin(self.var, editor, dbname=self.rts.dbname) File "C:\wikimedia\editor_trends\analyses\plugins\histogram_edits.py", line 25 , in histogram_edits var.add(new_wikipedian, cnt) File "C:\wikimedia\editor_trends\classes\dataset.py", line 290, in add start, end = self.set_date_range(date) File "C:\wikimedia\editor_trends\classes\dataset.py", line 146, in set_date_ra nge return datetime.datetime(date.year, 12, 31), \ AttributeError: 'bool' object has no attribute 'year'
C:\wikimedia\editor_trends>manage.py dataset -c list_makers Wikilytics is (c) 2010-2011 by the Wikimedia Foundation. Written by Diederik van Liere (dvanliere@gmail.com). This software comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to distribute it under certain conditions. See the README.1ST file for more information. Final settings after parsing command line arguments: Project: Wikipedia Input directory: c:\wikimedia\hu\wiki Output directory: c:\wikimedia\hu\wiki and subdirectories Language: Hungarian / Magyar / hu Start exporting dataset Exporting data for chart: list_makers Project: wikilytics Dataset: huwiki_editors_dataset wikilytics huwiki_editors_dataset new_wikipedian Process Analyzer-2: | Traceback (most recent call last): File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap self.run() File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run task.plugin(self.var, editor, dbname=self.rts.dbname) File "C:\wikimedia\editor_trends\analyses\plugins\list_makers.py", line 28, in list_makers for year in xrange(new_wikipedian.year, var.max_year): NameError: global name 'new_wikipedian' is not defined Process Analyzer-3: Traceback (most recent call last): File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap self.run() File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run task.plugin(self.var, editor, dbname=self.rts.dbname) File "C:\wikimedia\editor_trends\analyses\plugins\list_makers.py", line 28, in list_makers for year in xrange(new_wikipedian.year, var.max_year): NameError: global name 'new_wikipedian' is not defined Process Analyzer-4: Traceback (most recent call last): File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap self.run() File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run task.plugin(self.var, editor, dbname=self.rts.dbname) File "C:\wikimedia\editor_trends\analyses\plugins\list_makers.py", line 28, in list_makers for year in xrange(new_wikipedian.year, var.max_year): NameError: global name 'new_wikipedian' is not defined Process Analyzer-5: Traceback (most recent call last): File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap self.run() File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run task.plugin(self.var, editor, dbname=self.rts.dbname) File "C:\wikimedia\editor_trends\analyses\plugins\list_makers.py", line 28, in list_makers for year in xrange(new_wikipedian.year, var.max_year): NameError: global name 'new_wikipedian' is not defined
Also
Microsoft Windows [verziószám: 6.1.7600] Copyright (c) 2009 Microsoft Corporation. Minden jog fenntartva. C:\wikimedia\editor_trends>manage.py dataset -c total_number_of_articles Wikilytics is (c) 2010-2011 by the Wikimedia Foundation. Written by Diederik van Liere (dvanliere@gmail.com). This software comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to distribute it under certain conditions. See the README.1ST file for more information. Final settings after parsing command line arguments: Project: Wikipedia Input directory: c:\wikimedia\hu\wiki Output directory: c:\wikimedia\hu\wiki and subdirectories Language: Hungarian / Magyar / hu Start exporting dataset Exporting data for chart: total_number_of_articles Project: wikilytics Dataset: huwiki_editors_dataset wikilytics huwiki_editors_dataset new_wikipedian Process Analyzer-2: | Traceback (most recent call last): File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap self.run() File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run task.plugin(self.var, editor, dbname=self.rts.dbname) File "C:\wikimedia\editor_trends\analyses\plugins\total_number_of_articles.py" , line 23, in total_number_of_articles edits = editor['edits'][year] TypeError: list indices must be integers, not dict Process Analyzer-3: Traceback (most recent call last): File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap self.run() File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run task.plugin(self.var, editor, dbname=self.rts.dbname) File "C:\wikimedia\editor_trends\analyses\plugins\total_number_of_articles.py" , line 23, in total_number_of_articles edits = editor['edits'][year] TypeError: list indices must be integers, not dict Process Analyzer-4: Traceback (most recent call last): File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap self.run() File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run task.plugin(self.var, editor, dbname=self.rts.dbname) File "C:\wikimedia\editor_trends\analyses\plugins\total_number_of_articles.py" , line 23, in total_number_of_articles edits = editor['edits'][year] TypeError: list indices must be integers, not dict Process Analyzer-5: Traceback (most recent call last): File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap self.run() File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run task.plugin(self.var, editor, dbname=self.rts.dbname) File "C:\wikimedia\editor_trends\analyses\plugins\total_number_of_articles.py" , line 23, in total_number_of_articles edits = editor['edits'][year] TypeError: list indices must be integers, not dict
I get the same errors on Ubuntu and Win 7 64 bit Python 2.7.
Once these problems are fixed (either on my end or in svn), is there a way to iterate through all possible plugins at once?
I fixed all of the new_wikipedian related plugins. We are making a lot of changes to Wikilytics and it will be inherently unstable at the moment but thanks for letting me know. The list_makers and total_number_of_articles plugin are in development and will not be ready for the coming weeks.
You can chain multiple charts like this: -c plugin1,plugin2
Thank you, at least one of them is working already and I'll see the others. Can you please update the wiki page with the list of plugins that should be working (so we don't disturb you with questions about the ones in active development)?
just send me an email directly, things change so rapidly i rather not have a list of which plugin is working which one is not. they should work and else we are working on them :)
Ok. So far the "histogram_edits", "new_editor_count", "time_to_new_wikipedian" and "total_number_of_new_wikipedians" work for me, not sure that is enough yet to replicate the findings of the study. I'll be checking the others from time to time after an svn update.
Can you explain what these do? The histogram edits I guess gives the total number of edits for every year? The time to new wikipedian the average time to reach the 10th edit in seconds? And the new editor count and the total number of new wikipedians gives exactly the same results, the number of people who have reached 10 edits in a given year?
Thanks,
Samat: can you send me the console output, I need more information.
If I give the python manage.py dataset or python manage.py -l Hungarian -c new_editor_count dataset command, I get the same result:
in the console:
... Final settings after parsing command line arguments: Project: Wikipedia Input directory: c:\wikimedia\hu\wiki Output directory: c:\wikimedia\hu\wiki and subdirectories Language: Hungarian / Magyar / hu Start exporting dataset Exporting data for chart: new_editor_count Project: wikilytics Dataset: huwiki_editors_dataset wikilytics huwiki_editors_dataset new_wikipedian 100% |########################################################################| Processing time: 0:00:07.050000 Storing dataset: C:\editor_trends\datasets\huwiki_new_editor_count_max_year=2012_min_year=2003.csv Serializing dataset to wikilytics_charts +----------+---------------+--------+---------------+---------+---------+---------+---------------+--------------------- +---------------------+ | Variable | Mean | Median | SD | Minimum | Maximum | Num Obs | Num of | First Obs | Final Obs | | | | | | | | | Unique Groups | | | +----------+---------------+--------+---------------+---------+---------+---------+---------------+--------------------- +---------------------+ | count | 973.555555556 | 789.0 | 853.956836016 | 14 | 2154 | 8762 | 9 | 2003-07-09 06:43:25 | 2011-02-22 19:01:16 | +----------+---------------+--------+---------------+---------+---------+---------+---------------+--------------------- +---------------------+ Dataset contains 1 variables Project: huwiki JSON encoder: to_bar_json Raw data was retrieved from: huwiki/huwiki_editors_dataset None Processing time: 0:00:07.090000
in the huwiki_new_editor_count_max_year=2012_min_year=2003.csv file:
"date count" "1-1-2006:12-31-2006 789" "1-1-2007:12-31-2007 1560" "1-1-2005:12-31-2005 287" "1-1-2004:12-31-2004 66" "1-1-2003:12-31-2003 14" "1-1-2010:12-31-2010 1613" "1-1-2011:12-31-2011 308" "1-1-2008:12-31-2008 2154" "1-1-2009:12-31-2009 1971"
That's the correct behavior as the new_editor_count is the default plugin that will run if you do not explicitly give a plugin name. So python manage.py dataset and python manage.py dataset -c new_editor_count give the same result. The data from the csv file looks good to me :) so I am happy to see that you are making progress. I will start preparing a video on how to replicate the editor trends study. Thanks for all the questions and feedback!
As I mentioned above it works fine.
I'd like to repeat this study and generate the same figures for the Hungarian language as the result page shows for big language versions, but after I've finished the calculation, I have only 9 numbers. I have expected a more complex result file. :) Which plugins should I run? (Bdamokos and you wrote that many of them don't work yet.)
And I have still a small problem during the process (I'm not sure whether you could fix or not):
BSON document too large, unable to store TXiKiBoT | BSON document too large, unable to store SieBot | BSON document too large, unable to store Xqbot### | BSON document too large, unable to store Luckas-bot########## | BSON document too large, unable to store SamatBot################ | BSON document too large, unable to store AsgardBot########################### | BSON document too large, unable to store DeniBot
What about this editors and their edits?
Thank you for all your trouble,
I've got the same errors, but as all these accounts belong to bots, I don't think it is a big loss if they are not stored among the humans.
(I think Diederik is currently making a video on how to replicate the study, so the second problem should be fixable as well...)
Exactly, these are bot edits and are discarded at the moment. The reason is a limitation of Mongo. With Mongo 1.8 this should be resolved so if you are really interested in these edits then I suggest you wait for Mongo 1.8. Else, there is nothing to worry about.