Talk:Wikilytics
- [History↑]
Contents
| Thread title | Replies | Last modified |
|---|---|---|
| Problem with transforming | 4 | 19:46, 6 April 2011 |
| Problem (?) with extracting | 8 | 19:51, 5 April 2011 |
| Troubleshooting | 25 | 12:44, 5 April 2011 |
| Plugin documentation | 0 | 17:18, 2 April 2011 |
| FYI | 0 | 21:38, 1 April 2011 |
| Store Wikipedia dump file | 1 | 17:59, 1 April 2011 |
| Extract Wikipedia dump file | 3 | 17:58, 1 April 2011 |
| Questions about the software. | 1 | 23:53, 3 December 2010 |
python manage.py transform command gives me this message:
Start transforming dataset
wikilytics huwiki_editors_raw
38018
{u'date': datetime.datetime(2003, 9, 13, 7, 14, 41), u'article': 684328, u'ns': 505}
Traceback (most recent call last):
File "manage.py", line 461, in <module>
main()
File "manage.py", line 457, in main
args.func(rts, logger)
File "manage.py", line 328, in transformer_launcher
transformer.transform_editors_single_launcher(rts)
File "C:\editor_trends\etl\transformer.py", line 313, in transform_editors_single_launcher
editor()
File "C:\editor_trends\etl\transformer.py", line 80, in __call__
character_count = determine_edit_volume(edits, first_year, final_year)
File "C:\editor_trends\etl\transformer.py", line 226, in determine_edit_volume
if edit['delta'] < 0:
KeyError: 'delta'
Yes, you need to redo the store and transformation phase as I have made significant changes in the last couple of days (there are more variables added).
so go to mongo enter use wikilytics then enter show collections and then enter:
db.huwiki_editors_raw.drop() db.huwiki_editors_dataset.drop() db.huwiki_articles_raw.drop()
I get the following error when trying to run manage.py transform:
Microsoft Windows [verziószám: 6.1.7600]
Copyright (c) 2009 Microsoft Corporation. Minden jog fenntartva.
C:\wikimedia\editor_trends>manage.py transform
Final settings after parsing command line arguments:
Project: Wikipedia
Input directory: c:\wikimedia
Output directory: c:\wikimedia\hu\wiki and subdirectories
Language: Hungarian / Magyar / hu
Start transforming dataset
wikilytics huwiki_editors_raw
Traceback (most recent call last):
File "C:\wikimedia\editor_trends\manage.py", line 461, in <module>
main()
File "C:\wikimedia\editor_trends\manage.py", line 457, in main
args.func(rts, logger)
File "C:\wikimedia\editor_trends\manage.py", line 328, in transformer_launcher
transformer.transform_editors_single_launcher(rts)
File "C:\wikimedia\editor_trends\etl\transformer.py", line 310, in transform_e
ditors_single_launcher
ids = db.retrieve_distinct_keys(rts.dbname, rts.editors_raw, 'editor')
File "C:\wikimedia\editor_trends\database\db.py", line 144, in retrieve_distin
ct_keys
ids = retrieve_distinct_keys_mapreduce(editors, field)
File "C:\wikimedia\editor_trends\database\db.py", line 156, in retrieve_distin
ct_keys_mapreduce
cursor = collection.map_reduce(map, reduce)
File "build\bdist.win-amd64\egg\pymongo\collection.py", line 943, in map_reduc
e
File "build\bdist.win-amd64\egg\pymongo\database.py", line 293, in command
File "build\bdist.win-amd64\egg\pymongo\helpers.py", line 119, in _check_comma
nd_response
pymongo.errors.OperationFailure: command SON([('mapreduce', u'huwiki_editors_raw
'), ('map', Code('function () { emit(this.editor, 1)};', {})), ('reduce', Code('
function()', {}))]) failed: db assertion failure
The command python manage.py extract give this message:
Extracting data from XML
c:\wikimedia c:\wikimedia\hu\wiki
c:\wikimedia\hu\wiki\huwiki-latest-stub-meta-history.xml
c:\wikimedia\hu\wiki\huwiki-latest-stub-meta-history.xml.gz
Process Process-1:
Traceback (most recent call last):
File "C:\Program Files\Python 2.7\lib\multiprocessing\process.py", line 232, in _bootstrap
self.run()
File "C:\Program Files\Python 2.7\lib\multiprocessing\process.py", line 88, in run
self._target(*self._args, **self._kwargs)
File "C:\editor_trends\etl\enricher.py", line 697, in stream_raw_xml
fh = file_utils.create_streaming_buffer(filename)
File "C:\editor_trends\utils\file_utils.py", line 222, in create_streaming_buffer
fh = create_txt_filehandle(path, None, 'r', 'utf-8')
File "C:\editor_trends\utils\file_utils.py", line 207, in create_txt_filehandle
path = os.path.join(location, filename)
File "C:\Program Files\Python 2.7\lib\ntpath.py", line 73, in join
elif isabs(b):
File "C:\Program Files\Python 2.7\lib\ntpath.py", line 57, in isabs
s = splitdrive(s)[1]
File "C:\Program Files\Python 2.7\lib\ntpath.py", line 125, in splitdrive
if p[1:2] == ':':
TypeError: 'NoneType' object is not subscriptable
After this message it looks extracting dump fine but stopped at this line and nothing happen:
Finished parsing bz2 archives
Following a Ctrl+C I got this message:
Traceback (most recent call last):
File "manage.py", line 461, in <module>
main()
File "manage.py", line 457, in main
args.func(rts, logger)
File "manage.py", line 281, in extract_launcher
enricher.launcher(rts)
File "C:\editor_trends\etl\enricher.py", line 823, in launcher
multiprocessor_launcher(function, dataset, storage, locks, rts)
File "C:\editor_trends\etl\enricher.py", line 777, in multiprocessor_launcher
input_queue.join()
File "C:\Program Files\Python 2.7\lib\multiprocessing\queues.py", line 316, in join
self._cond.wait()
File "C:\Program Files\Python 2.7\lib\multiprocessing\synchronize.py", line 220, in wait
self._wait_semaphore.acquire(True, timeout)
KeyboardInterrupt
What OS are you using? and can you email me the log file from /logs/
Win7 64bit. Sure.
Ok, I am making a lot of changes (as you have noted) and I am really trying to get a stable version as soon as possible. thanks for your patience.
mmm I cannot replicate this, could you please update to the most recent version on SVN and try again?
hmm. I used the latest revision (85439; I update editor_trends through SVN before every run). I have replicated right now.
Processing of huwiki-latest-stub-meta-history.xml took 0:21:48.918000
Number of articles: 512102
Number of revisions: 8256238
Finished parsing bz2 archives
[waiting infinite amount of time then CTRL+C]
Traceback (most recent call last):
File "manage.py", line 461, in <module>
main()
File "manage.py", line 457, in main
args.func(rts, logger)
File "manage.py", line 405, in all_launcher
res = function(rts, logger)
File "manage.py", line 281, in extract_launcher
enricher.launcher(rts)
File "C:\editor_trends\etl\enricher.py", line 823, in launcher
multiprocessor_launcher(function, dataset, storage, locks, rts)
File "C:\editor_trends\etl\enricher.py", line 777, in multiprocessor_launcher
input_queue.join()
File "C:\Program Files\Python 2.7\lib\multiprocessing\queues.py", line 316, in join
self._cond.wait()
File "C:\Program Files\Python 2.7\lib\multiprocessing\synchronize.py", line 220, in wait
self._wait_semaphore.acquire(True, timeout)
KeyboardInterrupt
If you are running into any problems while using the Editor Trends Study/Software then please start a topic here.
Hi!
Thank you for this software. I've tried it for the Hungarian Wikipedia in Win7 Enterprise environment and I couldn't finish the process. I've followed the instructions step by step and when I've given the command python manage.py export I got this message (everything worked fine until this step):
manage: error: invalid choice: 'export' (choose from 'sort', 'all', 'config', 'show_languages', 'transform', 'django', 'download', 'dataset', 'extract', 'store')
I've also tried python manage.py dataset command and I've got this message:
Traceback (most recent call last):
File "manage.py", line 450, in <module>
main()
File "manage.py", line 423, in main
rts = runtime_settings.RunTimeSettings(project, language, args)
File "c:\Program Files\Python 2.7\Scripts\editor_trends\classes\runtime_settings.py", line 62, in __init__
self.targets = self.split_keywords(self.get_value('charts'))
File "c:\Program Files\Python 2.7\Scripts\editor_trends\classes\runtime_settings.py", line 115, in split_keywords
keywords = keywords.split(',')
AttributeError: 'function' object has no attribute 'split'
After this I've tried the python manage.py -l Hungarian all command and I've got this message at the end:
Starting dataset_launcher
Start exporting dataset
Processing time: 0:00:00.010000
Traceback (most recent call last):
File "manage.py", line 450, in <module>
main()
File "manage.py", line 446, in main
args.func(rts, logger)
File "manage.py", line 257, in all_launcher
res = function(rts, logger)
File "manage.py", line 205, in dataset_launcher
log.log_to_mongo(properties, 'dataset', 'export', stopwatch, event='finish')
NameError: global name 'properties' is not defined
Could you please help what can be the problem?
Dear Samat,
I think it has to do that the export function has been renamed to dataset. But that's throwing an error and I will fix that. I will let you know when there is an update (hopefully soon). Thanks for reporting this. Best, Diederik
Dear Diederik,
Thank you for your answer. I'm waiting for the update.
Best regards,
Sorry for the delay. Please download the most recent version from Subversion and give it a spin. Let me know if it works. The documentation needs to be updated as well.
Dear Diederik,
I have tried this updated version, but I am afraid it still doesn't work properly.
After python manage.py dataset I got this message:
Traceback (most recent call last):
File "manage.py", line 449, in <module>
main()
File "manage.py", line 422, in main
rts = runtime_settings.RunTimeSettings(project, language, args)
File "C:\editor_trends\classes\runtime_settings.py", line 62, in __init__
self.targets = self.split_keywords(self.get_value('charts'))
File "C:\editor_trends\classes\runtime_settings.py", line 115, in split_keywords
keywords = keywords.split(',')
AttributeError: 'function' object has no attribute 'split'
If I have tried python manage.py -l Hungarian all I got this message and I didn't find the result csv (where should I find?):
Starting dataset_launcher Start exporting dataset Processing time: 0:00:00.010000 Function dataset_launcher does not return a status, implement NOW
Could you please check the code again? Thank you, cheers,
Hi, I have started working on documenting the plugins at: http://meta.wikimedia.org/wiki/Wikilytics_Plugins it's in progress but I am working on it. Diederik
Since this software is being used for projects other than Editor Trends Study now, Diederik (drdee) requested that we move "Editor Trends Study/Software" to a more permanent home on Meta under the name Wikilytics. However, I'm still unsure about export/import of LiquidThreads to a MediaWiki site that lacks LT, so I'm leaving the Talk page alone for now.
Not sure if this is related to recent issues with the extraction phase but I'm seeing problems in the store phase after extraction and sorting seemed to have finished without any major errors:
rfaulkner@wmf128:~/trunk/projects/editor_trends$ python manage.py -l Polish store
Wikilytics is (c) 2010-2011 by the Wikimedia Foundation.
Written by Diederik van Liere (dvanliere@gmail.com).
This software comes with ABSOLUTELY NO WARRANTY. This is
free software, and you are welcome to distribute it under certain
conditions.
See the README.1ST file for more information.
Final settings after parsing command line arguments:
Project: Wikipedia
Input directory: /home/rfaulkner/wikimedia/pl/wiki
Output directory: /home/rfaulkner/wikimedia/pl/wiki and subdirectories
Language: Polish / Polski / pl
Start storing data in MongoDB
Storing article titles...
/home/rfaulkner/wikimedia/pl/wiki
2 False AWK
Traceback (most recent call last):
File "manage.py", line 583, in <module>
main()
File "manage.py", line 579, in main
args.func(rts, logger)
File "manage.py", line 306, in store_launcher
store.launcher(rts)
File "/home/rfaulkner/trunk/projects/editor_trends/etl/store.py", line 106, in launcher
store_articles(rts)
File "/home/rfaulkner/trunk/projects/editor_trends/etl/store.py", line 96, in store_articles
collection.insert({'id':id, 'title':title})
UnboundLocalError: local variable 'id' referenced before assignment
Any clues as to what may be happening here? I recall there may have been an issue with xml parsing cElementTree::iterparse .. would this be related?
Hi Diederik,
I had to manually unzip the dump. When I tried:
python manage.py -l Polish extract
The following resulted:
Final settings after parsing command line arguments:
Project: Wikipedia
Input directory: /home/rfaulkner/wikimedia/en/wiki
Output directory: /home/rfaulkner/wikimedia/en/wiki and subdirectories
Language: English / English / en
Extracting data from XML
/home/rfaulkner/wikimedia/en/wiki
Checking if dump file has been extracted...
Dump file enwiki-latest-stub-meta-history.xml.gz has not yet been extracted...
Unzipping zip file
Processing time: 0:00:00.028703
Launching process...
Launching process...
Launching process...
Launching process...
4
Process Process-3:
Traceback (most recent call last):
File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in _bootstrap
self.run()
File "/usr/lib/python2.6/multiprocessing/process.py", line 88, in run
self._target(*self._args, **self._kwargs)
File "/home/rfaulkner/trunk/projects/editor_trends/etl/extracter.py", line 286, in parse_dumpfile
filesize = file_utils.determine_filesize(location, filename)
File "/home/rfaulkner/trunk/projects/editor_trends/utils/file_utils.py", line 243, in determine_filesize
3
There are no more jobs in the queue left.
2
There are no more jobs in the queue left.
1
There are no more jobs in the queue left.
return os.path.getsize(path)
File "/usr/lib/python2.6/genericpath.py", line 49, in getsize
return os.stat(filename).st_size
OSError: [Errno 2] No such file or directory: '/home/rfaulkner/wikimedia/en/wiki/enwiki-latest-stub-meta-history.xml'
^CTraceback (most recent call last):
File "manage.py", line 583, in <module>
main()
File "manage.py", line 579, in main
args.func(rts, logger)
File "manage.py", line 263, in extract_launcher
extracter.launcher(properties)
File "/home/rfaulkner/trunk/projects/editor_trends/etl/extracter.py", line 439, in launcher
tasks.join()
File "/usr/lib/python2.6/multiprocessing/queues.py", line 316, in join
self._cond.wait()
File "/usr/lib/python2.6/multiprocessing/synchronize.py", line 212, in wait
self._wait_semaphore.acquire(True, timeout)
KeyboardInterrupt
I manually ran the "extract" and it appears to be running with a progress bar but I'm not sure what all of the output means. Also what is happening exactly in this step? A bit more description under here would be helpful.
for some reason it is unzipping the english version instead of the polish version. i'll have a look.
Sorry that exception was from another command when I wasn't specifying the Polish language. However I still had to do the manual extract of the Polish dump. I can replicate that to get the actual error.
I posted some questions and comments here.