Task force/Analytics/Feature prioritization
Short term priorities
Note: Domains ordered alphabetically, does not imply ranking
Community Initiatives
Process owner: Rob Lanphier / Facilitator: Erik Zachte
- Data warehouse for stats on content served, e.g. for GLAM initiative (proposed, to be discussed on next meeting)
Engineering
Process owner: Danese Cooper / Facilitator: Rob Lanphier
- udp2log multicast support
- OWA evaluation
- mediawiki.org stats for Zak's doc work
- Mobile
- Analyze both API and screen scraping logs for mobile apps.
External Reporting
Process owner: Erik Moeller / Facilitator: Erik Zachte
Global Development
Process owner: Barry Newstead / Primary contact: Mani Pande
Infrastructure
Process owner: Danese Cooper / Facilitator: Tomasz Finc
Product Strategy
Process owner: Erik Moeller / Primary contact: Howie Fung / Facilitator: Nimish Gautam?
- A/B testing
- Participation analytics (A/B testing --> effects on participation)
User Experience
Process owner: Danese Cooper / Primary contact: Parul Vora /
Article Feedback
Process owner: Alolita Sharma / Primary contact: /
- Basic analysis
Long term priority organization
Dashboard
- Develop our own way of tracking how our projects are doing
- Getting more fidelity around how different segments of our users are doing
Analytics for Specific Projects
- Editor trends and community health (in support of product roadmap)
- Analytics for specific initiatives:
- GLAM
- India
- Article Feedback
- Account Creation Improvement
- LiquidThreads
- Mobile
Ensuring our infrastructure can handle increased analytics demands
- Formulate role of WMF vs tool server cluster in data capture/storage/aggregation/delivery
- Fixing fragile/broken parts of the system (e.g. udp2log)
- Deploying new tools that give us new views (e.g. OWA)
- Increasing our development speed and paying down technical debt
Process/Priorities Thoughts
How we're going to organize/attributes to assign to each feature:
- By data source (e.g. squid logs vs OWA data capture)
- By tool used to implement
- By person/team responsible for implementing
- By priority
How to set priority:
- What feature development priority does it inform?
- What programs priority does it inform?
- How essential is the data to executing on feature or program priority?
- What deadlines loom?
Features to consider (priority in parenthesis)
From the requirements doc:
- Overall dashboard of site health; dashboard per segment
- Uniques (medium)
- Page views (done)
- Visits (medium)
- PV/visit (medium)
- Bounce rate (medium)
- Minutes/Visit (medium)
- Entry pages (medium)
- Exit Pages and destinations (medium)
- Traffic sources, referrer breakdown (high)
- % new vs repeat (high)
- Geographic breakdown (high)
- Segmentation
- By project -- highest level; everything below is for a particular project (high) --> the highest priority is that this will be the primary filter with the ones below being secondary filters
- Reader vs. editor (and now rater) (high)
- By geography - country (high)
- By geography - city level (medium)
- By referrer (high)
- By device type (?)
- User pathing: ability to instrument specific paths and produce fallout reports. E.g., Account Creation Process, Editing flow, ratings flow (high)
- Segmentation (per above)
- Split A/B testing (elements of pages and pages themselves) (depends)
Open question: requirements around behavior x web intersection; frank's account creation project and impact on subsequent editing;
- Followup work: editor account age cohorts - Diederik (in progress)
- Analysis of effect of reverts on editor retention - Diederik (high)
- Editing history clusters: set first edit at t=0 and follow users over time to see what patterns emerge. (Cal-IT?) (medium)
- Tracking where readers are from, regardless of language (segmentation of web analytics data) (high)
- Activity on one Wikipedia from users in another geography, e.g.
- PV of English Wikipedia from India (both mobile and desktop)
- New articles created on English Wikipedia from India, etc.
- Editor activity on English Wikipedia from India
- Activity on one Wikipedia from users in another geography, e.g.
- Tracking where editors are from, regardless of languages (segmentation of Zachte's active editor data) (high)
- Tracking user environment (medium)
- Screen resolution
- Computer horsepower
- Browser capabilities
- Mobile : TBD
Engineering driven work:
- OWA evaluation (high)
- OWA integration/testing/improvement (tbd)
- udp2log improvements (high)
- rsyslog deployment (possible udp2log replacement) (medium)
- Data Warehouse
- Hadoop/Hive/Hbase
- Analysis of API queries
- XML Dumps/Snapshots - stable, consistent, valid, .. etc