Open Web Analytics

Open Web Analytics is an open source, PHP-based framework for web analytics. It already is integrated into MediaWiki as well as other PHP-based open source projects.

It has a large user community. Peter Adams, the creator, would like to expand the developer community. There are many bug reports and fixes, with about 4-5 patches per release. There are only two core engineers. The work is self-funded through consulting.

DOMstream recording for looking at mousetracks, controllable frequency
Data Warehousing Star Schema (extensible)...automatically tracks every request that's made across every dimension
Configure to write to local database (flat file) or HTTP post to another server running OWA (for high volume, sets up event queue)
Plug-able filters (hooks that we can write custom filters to)
Throughput - in event queue async mode it could handle very high throughput, but at some point it has to be played to the database...in batch
Largest example extant 1,000,000 pageviews a day (somebody in France. Peter can give us pointers for learning)

Current version:

currently just page views
tracks every hit within a session (30-minute period of activity)
uses PHP hooks we already have in our pages
reporting interface is integrated as a special page
instrumentation comes out of the extension...no need for page tagging
heat maps of clicks on the page

Enhancements in 1.3:

canvas based reporting
unlimited action events - set of events that allow you to track any action you want
user roles
REST support

Stuff that's not being done yet:

aggregate path analysis needs a better interface
distributed processing
continuous summarisation for custom reports
heat maps - supports time ranges but not at a granularity smaller than 24 hours currently
DOMstreams are per userID, but doesn't track that yet
segmentation, but can add custom dimensions per visitor level
vary the sample rate based on geo - would need to write an administrative interface to restrict invocation of tracker
Validate ComScore: Unique visitors per month, would we have to sample or could we handle typical volume? - Would depend on how we scale MySQL
No summary engine. Nice because you never have to get beneath the summary, but really they don't have this yet because its never been used on the scale of WMF. Event queue is the only answer now.
Pathing, larger scale are all things on the event horizon for OWA. The biggest issue cf. Peter will be scale. There's no problem staging data with OWA, the issues will be handling the volume of data (batch write out) and then whether we can get meaningful reports out of so much data.

Problems he ran into with MediaWiki: no Install event to trigger the schema

Caveats: since nobody has used OWA on a Wikimedia scale, we're not really sure of performance characteristics. We would need to pilot it. Also the system will throw a *lot* of data and we'll have to create an architecture to deal with that. On the other hand, we won't have the problem of re-negotiating our privacy policy because we'd be hosting that data ourselves.

Known Issues

Disables cacheing (why? Can we work around this?)
Permissions; writes to its extension directory from the web
Doesn't work well with Vector UI
Privacy -- Many privacy-violating options on by default, and easy to turn on if you're an "admin" with this interface
- Need to set up user classes that have options to edit, options to view

Work that would be needed

anonymization of data before releasing it