Store Wikipedia dump file
From Talk:Wikilytics
Store Wikipedia dump file
Not sure if this is related to recent issues with the extraction phase but I'm seeing problems in the store phase after extraction and sorting seemed to have finished without any major errors:
rfaulkner@wmf128:~/trunk/projects/editor_trends$ python manage.py -l Polish store Wikilytics is (c) 2010-2011 by the Wikimedia Foundation. Written by Diederik van Liere (dvanliere@gmail.com). This software comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to distribute it under certain conditions. See the README.1ST file for more information. Final settings after parsing command line arguments: Project: Wikipedia Input directory: /home/rfaulkner/wikimedia/pl/wiki Output directory: /home/rfaulkner/wikimedia/pl/wiki and subdirectories Language: Polish / Polski / pl Start storing data in MongoDB Storing article titles... /home/rfaulkner/wikimedia/pl/wiki 2 False AWK Traceback (most recent call last): File "manage.py", line 583, in <module> main() File "manage.py", line 579, in main args.func(rts, logger) File "manage.py", line 306, in store_launcher store.launcher(rts) File "/home/rfaulkner/trunk/projects/editor_trends/etl/store.py", line 106, in launcher store_articles(rts) File "/home/rfaulkner/trunk/projects/editor_trends/etl/store.py", line 96, in store_articles collection.insert({'id':id, 'title':title}) UnboundLocalError: local variable 'id' referenced before assignment
Any clues as to what may be happening here? I recall there may have been an issue with xml parsing cElementTree::iterparse .. would this be related?