en.WP uses a range of categories which are poorly organized and non-standard as knowledge category schema. This enhances the difficulties deriving useful metadata from dumps.
Possibilities of adding categories
Parallel categorization within projects
It may be possible to use standardized knowledge categories as a parallel categorization initiative.
- Decimal-style categories do not directly compete with plaintext category titles
- The Universal Decimal Classification (UDC) system may be available for use on a project such as Wikipedia (see User talk:JakobVoss)
Working within the available dumps, it may be possible to provide category cross-referencing to UDC categorization with post-processing.
- User:Hippietrail is currently developing tools to parse certain of en.WP infoboxes to extract metadata as an extension of the Mediawiki dump DTD.