Troubleshooting

Fragment of a discussion from Talk:Wikilytics

Dear Diederik,

I have tried this updated version, but I am afraid it still doesn't work properly.

After python manage.py dataset I got this message:

Traceback (most recent call last):
  File "manage.py", line 449, in <module>
    main()
  File "manage.py", line 422, in main
    rts = runtime_settings.RunTimeSettings(project, language, args)
  File "C:\editor_trends\classes\runtime_settings.py", line 62, in __init__
    self.targets = self.split_keywords(self.get_value('charts'))
  File "C:\editor_trends\classes\runtime_settings.py", line 115, in split_keywords
    keywords = keywords.split(',')
AttributeError: 'function' object has no attribute 'split'

If I have tried python manage.py -l Hungarian all I got this message and I didn't find the result csv (where should I find?):

Starting dataset_launcher
Start exporting dataset

Processing time: 0:00:00.010000
Function dataset_launcher does not return a status,                 implement NOW

Could you please check the code again? Thank you, cheers,

Samat17:28, 28 March 2011

I am looking at it right now.

Drdee17:58, 28 March 2011

Okay, try it again please. The right command is:

python manage.py -l Hungarian -c new_editor_count

the 'c' stands for chart and it specifies what kind of chart you want to generate. If you do not provide 'c' then you will get an error.

Drdee20:58, 28 March 2011
Edited by author.
Last edit: 11:00, 29 March 2011

Hmm. I am sorry, but I got this:

Traceback (most recent call last):
  File "C:\editor_trends\manage.py", line 583, in <module>
    main()
  File "C:\editor_trends\manage.py", line 554, in main
    project, language, parser, = init_args_parser()
  File "C:\editor_trends\manage.py", line 482, in init_args_parser
    default=inventory.available_analyses()['new_editor_count'])
  File "C:\editor_trends\analyses\inventory.py", line 41, in available_analyses
    plugins = import_libs(path)
  File "C:\editor_trends\analyses\inventory.py", line 67, in import_libs
    func = getattr(module, module_name)
AttributeError: 'module' object has no attribute 'list_makers'

I needed an easy_install texttable command which is not in the documentation. And I think, you thought python manage.py -l Hungarian -c new_editor_count dataset command above.

Thank you for your patience, best regards,

Samat01:47, 29 March 2011

You were very unlucky :) Today somebody else started committing code as well and that caused the problem. try again, i have fixed it. thanks for your patience.

Drdee03:54, 29 March 2011

It works fine now :) But I don't really understand the result: huwiki_new_editor_count_max_year=2012_min_year=2003.csv is a file with file size of 230 bytes (9 data lines). Is this csv the result file? Wikimedia folder contains >3,5 GB and >1000 files, and data folder contains 2 GB and 6 files. Could you please help me again? Thank you very much,

Samat10:17, 29 March 2011

I've been trying the other plugins under the analysis directory, but all of them seem to return some sort of error, except for the new editor count.

E.g.

Microsoft Windows [verziószám: 6.1.7600]
Copyright (c) 2009 Microsoft Corporation. Minden jog fenntartva.

C:\wikimedia\editor_trends>manage.py dataset -c histogram_edits

Wikilytics is (c) 2010-2011 by the Wikimedia Foundation.
Written by Diederik van Liere (dvanliere@gmail.com).
This software comes with ABSOLUTELY NO WARRANTY. This is
    free software, and you are welcome to distribute it under certain
    conditions.
See the README.1ST file for more information.

Final settings after parsing command line arguments:
         Project: Wikipedia
 Input directory: c:\wikimedia\hu\wiki
Output directory: c:\wikimedia\hu\wiki and subdirectories
        Language: Hungarian / Magyar / hu
Start exporting dataset
Exporting data for chart: histogram_edits
Project: wikilytics
Dataset: huwiki_editors_dataset
wikilytics huwiki_editors_dataset new_wikipedian
Process Analyzer-2:                                                           |
Traceback (most recent call last):
  File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap
    self.run()
  File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run
    task.plugin(self.var, editor, dbname=self.rts.dbname)
  File "C:\wikimedia\editor_trends\analyses\plugins\histogram_edits.py", line 25
, in histogram_edits
    var.add(new_wikipedian, cnt)
  File "C:\wikimedia\editor_trends\classes\dataset.py", line 290, in add
    start, end = self.set_date_range(date)
  File "C:\wikimedia\editor_trends\classes\dataset.py", line 146, in set_date_ra
nge
    return datetime.datetime(date.year, 12, 31), \
AttributeError: 'bool' object has no attribute 'year'
Process Analyzer-3:
Traceback (most recent call last):
  File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap
    self.run()
  File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run
    task.plugin(self.var, editor, dbname=self.rts.dbname)
  File "C:\wikimedia\editor_trends\analyses\plugins\histogram_edits.py", line 25
, in histogram_edits
    var.add(new_wikipedian, cnt)
  File "C:\wikimedia\editor_trends\classes\dataset.py", line 290, in add
    start, end = self.set_date_range(date)
  File "C:\wikimedia\editor_trends\classes\dataset.py", line 146, in set_date_ra
nge
    return datetime.datetime(date.year, 12, 31), \
AttributeError: 'bool' object has no attribute 'year'
Process Analyzer-4:
Traceback (most recent call last):
  File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap
    self.run()
  File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run
    task.plugin(self.var, editor, dbname=self.rts.dbname)
  File "C:\wikimedia\editor_trends\analyses\plugins\histogram_edits.py", line 25
, in histogram_edits
    var.add(new_wikipedian, cnt)
  File "C:\wikimedia\editor_trends\classes\dataset.py", line 290, in add
    start, end = self.set_date_range(date)
  File "C:\wikimedia\editor_trends\classes\dataset.py", line 146, in set_date_ra
nge
    return datetime.datetime(date.year, 12, 31), \
AttributeError: 'bool' object has no attribute 'year'
Process Analyzer-5:
Traceback (most recent call last):
  File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap
    self.run()
  File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run
    task.plugin(self.var, editor, dbname=self.rts.dbname)
  File "C:\wikimedia\editor_trends\analyses\plugins\histogram_edits.py", line 25
, in histogram_edits
    var.add(new_wikipedian, cnt)
  File "C:\wikimedia\editor_trends\classes\dataset.py", line 290, in add
    start, end = self.set_date_range(date)
  File "C:\wikimedia\editor_trends\classes\dataset.py", line 146, in set_date_ra
nge
    return datetime.datetime(date.year, 12, 31), \
AttributeError: 'bool' object has no attribute 'year'


C:\wikimedia\editor_trends>manage.py dataset -c list_makers

Wikilytics is (c) 2010-2011 by the Wikimedia Foundation.
Written by Diederik van Liere (dvanliere@gmail.com).
This software comes with ABSOLUTELY NO WARRANTY. This is
    free software, and you are welcome to distribute it under certain
    conditions.
See the README.1ST file for more information.

Final settings after parsing command line arguments:
         Project: Wikipedia
 Input directory: c:\wikimedia\hu\wiki
Output directory: c:\wikimedia\hu\wiki and subdirectories
        Language: Hungarian / Magyar / hu
Start exporting dataset
Exporting data for chart: list_makers
Project: wikilytics
Dataset: huwiki_editors_dataset
wikilytics huwiki_editors_dataset new_wikipedian
Process Analyzer-2:                                                           |
Traceback (most recent call last):
  File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap
    self.run()
  File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run
    task.plugin(self.var, editor, dbname=self.rts.dbname)
  File "C:\wikimedia\editor_trends\analyses\plugins\list_makers.py", line 28, in
 list_makers
    for year in xrange(new_wikipedian.year, var.max_year):
NameError: global name 'new_wikipedian' is not defined
Process Analyzer-3:
Traceback (most recent call last):
  File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap
    self.run()
  File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run
    task.plugin(self.var, editor, dbname=self.rts.dbname)
  File "C:\wikimedia\editor_trends\analyses\plugins\list_makers.py", line 28, in
 list_makers
    for year in xrange(new_wikipedian.year, var.max_year):
NameError: global name 'new_wikipedian' is not defined
Process Analyzer-4:
Traceback (most recent call last):
  File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap
    self.run()
  File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run
    task.plugin(self.var, editor, dbname=self.rts.dbname)
  File "C:\wikimedia\editor_trends\analyses\plugins\list_makers.py", line 28, in
 list_makers
    for year in xrange(new_wikipedian.year, var.max_year):
NameError: global name 'new_wikipedian' is not defined
Process Analyzer-5:
Traceback (most recent call last):
  File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap
    self.run()
  File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run
    task.plugin(self.var, editor, dbname=self.rts.dbname)
  File "C:\wikimedia\editor_trends\analyses\plugins\list_makers.py", line 28, in
 list_makers
    for year in xrange(new_wikipedian.year, var.max_year):
NameError: global name 'new_wikipedian' is not defined

Also

Microsoft Windows [verziószám: 6.1.7600]
Copyright (c) 2009 Microsoft Corporation. Minden jog fenntartva.

C:\wikimedia\editor_trends>manage.py dataset -c total_number_of_articles

Wikilytics is (c) 2010-2011 by the Wikimedia Foundation.
Written by Diederik van Liere (dvanliere@gmail.com).
This software comes with ABSOLUTELY NO WARRANTY. This is
    free software, and you are welcome to distribute it under certain
    conditions.
See the README.1ST file for more information.

Final settings after parsing command line arguments:
         Project: Wikipedia
 Input directory: c:\wikimedia\hu\wiki
Output directory: c:\wikimedia\hu\wiki and subdirectories
        Language: Hungarian / Magyar / hu
Start exporting dataset
Exporting data for chart: total_number_of_articles
Project: wikilytics
Dataset: huwiki_editors_dataset
wikilytics huwiki_editors_dataset new_wikipedian
Process Analyzer-2:                                                           |
Traceback (most recent call last):
  File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap
    self.run()
  File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run
    task.plugin(self.var, editor, dbname=self.rts.dbname)
  File "C:\wikimedia\editor_trends\analyses\plugins\total_number_of_articles.py"
, line 23, in total_number_of_articles
    edits = editor['edits'][year]
TypeError: list indices must be integers, not dict
Process Analyzer-3:
Traceback (most recent call last):
  File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap
    self.run()
  File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run
    task.plugin(self.var, editor, dbname=self.rts.dbname)
  File "C:\wikimedia\editor_trends\analyses\plugins\total_number_of_articles.py"
, line 23, in total_number_of_articles
    edits = editor['edits'][year]
TypeError: list indices must be integers, not dict
Process Analyzer-4:
Traceback (most recent call last):
  File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap
    self.run()
  File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run
    task.plugin(self.var, editor, dbname=self.rts.dbname)
  File "C:\wikimedia\editor_trends\analyses\plugins\total_number_of_articles.py"
, line 23, in total_number_of_articles
    edits = editor['edits'][year]
TypeError: list indices must be integers, not dict
Process Analyzer-5:
Traceback (most recent call last):
  File "C:\Python27\lib\multiprocessing\process.py", line 232, in _bootstrap
    self.run()
  File "C:\wikimedia\editor_trends\classes\analytics.py", line 98, in run
    task.plugin(self.var, editor, dbname=self.rts.dbname)
  File "C:\wikimedia\editor_trends\analyses\plugins\total_number_of_articles.py"
, line 23, in total_number_of_articles
    edits = editor['edits'][year]
TypeError: list indices must be integers, not dict

I get the same errors on Ubuntu and Win 7 64 bit Python 2.7.

Once these problems are fixed (either on my end or in svn), is there a way to iterate through all possible plugins at once?

Bdamokos15:45, 29 March 2011

I fixed all of the new_wikipedian related plugins. We are making a lot of changes to Wikilytics and it will be inherently unstable at the moment but thanks for letting me know. The list_makers and total_number_of_articles plugin are in development and will not be ready for the coming weeks.


You can chain multiple charts like this: -c plugin1,plugin2

Drdee15:56, 29 March 2011

Thank you, at least one of them is working already and I'll see the others. Can you please update the wiki page with the list of plugins that should be working (so we don't disturb you with questions about the ones in active development)?

Bdamokos16:03, 29 March 2011

just send me an email directly, things change so rapidly i rather not have a list of which plugin is working which one is not. they should work and else we are working on them :)

Drdee16:14, 29 March 2011

Ok. So far the "histogram_edits", "new_editor_count", "time_to_new_wikipedian" and "total_number_of_new_wikipedians" work for me, not sure that is enough yet to replicate the findings of the study. I'll be checking the others from time to time after an svn update.

Can you explain what these do? The histogram edits I guess gives the total number of edits for every year? The time to new wikipedian the average time to reach the 10th edit in seconds? And the new editor count and the total number of new wikipedians gives exactly the same results, the number of people who have reached 10 edits in a given year?

Thanks,

Bdamokos16:30, 29 March 2011

I will work on more documentation to help you out. Please keep sending feedback and questions!

Drdee17:59, 1 April 2011

Thank you, your continued help is much appreciated!

Bdamokos18:03, 1 April 2011
 
 
 
 
 
 

Samat: can you send me the console output, I need more information.

Drdee15:58, 29 March 2011

If I give the python manage.py dataset or python manage.py -l Hungarian -c new_editor_count dataset command, I get the same result:

in the console:

...

Final settings after parsing command line arguments:
         Project: Wikipedia
 Input directory: c:\wikimedia\hu\wiki
Output directory: c:\wikimedia\hu\wiki and subdirectories
        Language: Hungarian / Magyar / hu
Start exporting dataset
Exporting data for chart: new_editor_count
Project: wikilytics
Dataset: huwiki_editors_dataset
wikilytics huwiki_editors_dataset new_wikipedian
100% |########################################################################|
Processing time: 0:00:07.050000
Storing dataset: C:\editor_trends\datasets\huwiki_new_editor_count_max_year=2012_min_year=2003.csv
Serializing dataset to wikilytics_charts
+----------+---------------+--------+---------------+---------+---------+---------+---------------+---------------------
+---------------------+
| Variable |          Mean | Median |            SD | Minimum | Maximum | Num Obs |        Num of |           First Obs
|           Final Obs |
|          |               |        |               |         |         |         | Unique Groups |
|                     |
+----------+---------------+--------+---------------+---------+---------+---------+---------------+---------------------
+---------------------+
|    count | 973.555555556 |  789.0 | 853.956836016 |      14 |    2154 |    8762 |             9 | 2003-07-09 06:43:25
| 2011-02-22 19:01:16 |
+----------+---------------+--------+---------------+---------+---------+---------+---------------+---------------------
+---------------------+
Dataset contains 1 variables
Project: huwiki
JSON encoder: to_bar_json
Raw data was retrieved from: huwiki/huwiki_editors_dataset
None

Processing time: 0:00:07.090000

in the huwiki_new_editor_count_max_year=2012_min_year=2003.csv file:

"date	count"
"1-1-2006:12-31-2006	789"
"1-1-2007:12-31-2007	1560"
"1-1-2005:12-31-2005	287"
"1-1-2004:12-31-2004	66"
"1-1-2003:12-31-2003	14"
"1-1-2010:12-31-2010	1613"
"1-1-2011:12-31-2011	308"
"1-1-2008:12-31-2008	2154"
"1-1-2009:12-31-2009	1971"
Samat10:14, 1 April 2011

That's the correct behavior as the new_editor_count is the default plugin that will run if you do not explicitly give a plugin name. So python manage.py dataset and python manage.py dataset -c new_editor_count give the same result. The data from the csv file looks good to me :) so I am happy to see that you are making progress. I will start preparing a video on how to replicate the editor trends study. Thanks for all the questions and feedback!

Drdee17:57, 1 April 2011
Edited by another user.
Last edit: 18:44, 1 April 2011

As I mentioned above it works fine.

I'd like to repeat this study and generate the same figures for the Hungarian language as the result page shows for big language versions, but after I've finished the calculation, I have only 9 numbers. I have expected a more complex result file. :) Which plugins should I run? (Bdamokos and you wrote that many of them don't work yet.)

And I have still a small problem during the process (I'm not sure whether you could fix or not):

BSON document too large, unable to store TXiKiBoT                             |
BSON document too large, unable to store SieBot                               |
BSON document too large, unable to store Xqbot###                             |
BSON document too large, unable to store Luckas-bot##########                 |
BSON document too large, unable to store SamatBot################             |
BSON document too large, unable to store AsgardBot########################### |
BSON document too large, unable to store DeniBot

What about this editors and their edits?

Thank you for all your trouble,

Samat18:40, 1 April 2011

I've got the same errors, but as all these accounts belong to bots, I don't think it is a big loss if they are not stored among the humans.

(I think Diederik is currently making a video on how to replicate the study, so the second problem should be fixable as well...)

Bdamokos18:43, 1 April 2011

Exactly, these are bot edits and are discarded at the moment. The reason is a limitation of Mongo. With Mongo 1.8 this should be resolved so if you are really interested in these edits then I suggest you wait for Mongo 1.8. Else, there is nothing to worry about.

Drdee18:50, 1 April 2011

You are right, Mongo 1.8 solved this problem.

Samat12:44, 5 April 2011
 

You are right! I am waiting for the tutorial video :)

Thanks for everything, Diederik!

Samat18:53, 1 April 2011