Task force/Analytics/Feature prioritization

Short term priorities

Note: Domains ordered alphabetically, does not imply ranking

Community Initiatives

Process owner: Rob Lanphier / Facilitator: Erik Zachte

Data warehouse for stats on content served, e.g. for GLAM initiative (proposed, to be discussed on next meeting)

Engineering

Process owner: Danese Cooper / Facilitator: Rob Lanphier

udp2log multicast support
OWA evaluation
mediawiki.org stats for Zak's doc work
Mobile
- Analyze both API and screen scraping logs for mobile apps.

External Reporting

Process owner: Erik Moeller / Facilitator: Erik Zachte

Restyled Report Card

Global Development

Process owner: Barry Newstead / Primary contact: Mani Pande

Basic Metrics for English Wikipedia

Total editors from India

Mobile Stats

Regional analysis

Infrastructure

Process owner: Danese Cooper / Facilitator: Tomasz Finc

Product Strategy

Process owner: Erik Moeller / Primary contact: Howie Fung / Facilitator: Nimish Gautam?

A/B testing
Participation analytics (A/B testing --> effects on participation)

User Experience

Process owner: Danese Cooper / Primary contact: Parul Vora /

Article Feedback

Process owner: Alolita Sharma / Primary contact: /

Basic analysis

Long term priority organization

Dashboard

Develop our own way of tracking how our projects are doing
Getting more fidelity around how different segments of our users are doing

Analytics for Specific Projects

Editor trends and community health (in support of product roadmap)
Analytics for specific initiatives:
- GLAM
- India
- Article Feedback
- Account Creation Improvement
- LiquidThreads
- Mobile

Ensuring our infrastructure can handle increased analytics demands

Formulate role of WMF vs tool server cluster in data capture/storage/aggregation/delivery
Fixing fragile/broken parts of the system (e.g. udp2log)
Deploying new tools that give us new views (e.g. OWA)
Increasing our development speed and paying down technical debt

Process/Priorities Thoughts

How we're going to organize/attributes to assign to each feature:

By data source (e.g. squid logs vs OWA data capture)
By tool used to implement
By person/team responsible for implementing
By priority

How to set priority:

What feature development priority does it inform?
What programs priority does it inform?
How essential is the data to executing on feature or program priority?
What deadlines loom?

Features to consider (priority in parenthesis)

From the requirements doc:

Overall dashboard of site health; dashboard per segment
- Uniques (medium)
- Page views (done)
- Visits (medium)
- PV/visit (medium)
- Bounce rate (medium)
- Minutes/Visit (medium)
- Entry pages (medium)
- Exit Pages and destinations (medium)
- Traffic sources, referrer breakdown (high)
- % new vs repeat (high)
- Geographic breakdown (high)

Segmentation
- By project -- highest level; everything below is for a particular project (high) --> the highest priority is that this will be the primary filter with the ones below being secondary filters
- Reader vs. editor (and now rater) (high)
- By geography - country (high)
- By geography - city level (medium)
- By referrer (high)
- By device type (?)

User pathing: ability to instrument specific paths and produce fallout reports. E.g., Account Creation Process, Editing flow, ratings flow (high)
- Segmentation (per above)
Split A/B testing (elements of pages and pages themselves) (depends)

Open question: requirements around behavior x web intersection; frank's account creation project and impact on subsequent editing;

Followup work: editor account age cohorts - Diederik (in progress)
Analysis of effect of reverts on editor retention - Diederik (high)

Editing history clusters: set first edit at t=0 and follow users over time to see what patterns emerge. (Cal-IT?) (medium)

Tracking where readers are from, regardless of language (segmentation of web analytics data) (high)
- Activity on one Wikipedia from users in another geography, e.g.
  - PV of English Wikipedia from India (both mobile and desktop)
  - New articles created on English Wikipedia from India, etc.
  - Editor activity on English Wikipedia from India

Tracking where editors are from, regardless of languages (segmentation of Zachte's active editor data) (high)
Tracking user environment (medium)
- Screen resolution
- Computer horsepower
- Browser capabilities
Mobile : TBD

Engineering driven work:

OWA evaluation (high)
OWA integration/testing/improvement (tbd)
udp2log improvements (high)
rsyslog deployment (possible udp2log replacement) (medium)
Data Warehouse
Hadoop/Hive/Hbase
Analysis of API queries
XML Dumps/Snapshots - stable, consistent, valid, .. etc