Task force/Analytics/Requirements

From Strategic Planning

These are requirements for an analytics infrastructure from several different projects.

It is good to keep an open attitude towards limitations of analytics:

  1. Overreliance on numbers can lead to analysis paralysis (meaning there is always something more to research before decision can be taken). Our community has made formidable strides based on be bold for smaller actions, and think bold, but discuss with peers for major choices.
  2. 'Objective' numbers can tell one story where 'subjective' inquiry and discussion lead to other interpretations. Example: Amsterdam police claim their crime stats clearly show that petty crime is on the decline: e.g. less and less bike thefts get reported. Many in the general public say: why would we report bike theft anyway, when it is not a priority for the police, and they never return a stolen bike?

Requirements

General

Based on the project-specific requirements listed below, five general requirements emerged. Additionally, some requirements specific to different projects also emerged. Those are captured below each project under Activity Metrics.

Priority should be listed as need within six months, one year, or two years.

Requirement Priority Tracking, but needs improvement Not tracking Notes
Click paths 12 months X Usability is doing this for the Vector editing toolbar. Not currently tracking page paths.
Segmentation 6 months X
A/B Testing X
Heat maps X
Different types of pages X

Architectural

See Task force/Analytics/Principles for a good list of these.

Privacy

The requirements are captured under Task force/Analytics/Principles. Possible ways to address privacy concerns:

  1. Limit what's tracked
    • Actual page titles is not necessarily interesting for UX, although other projects (Outreach or Strategy, for example) might care
  2. Binning information (e.g. bands of user editing)
  3. Removing outliers
  4. Introduction

User Experience

Web Analytics

Web analytics will help us better understand how users interact with Wikipedia (the website) and other projects. In particular, we would be able to gain a more detailed understanding of what features are being used by which segments of users, how our users accomplish tasks, how changes on the site impact user behavior, etc. Some questions we would like to ask:

  • What do different segments of users (e.g., readers, editors, admins, etc.) do on the site? How do these activities differ between user segments? For example, when readers come to the site, where do they go? Where do they come from? Do they use the home page? How do they go from one Wikipedia article to another (if they do at all)? How long is the average session? How often do they visit a talk page? Similar questions may be asked for editors -- When a user edits an article, what types of page views immediately preceded the edit? Do they go directly to the page (suggesting an a-priori interest in the article)? Or do they view the site as a reader and edit in reaction to what they read?
  • Overall user metrics across segment could help us measure engagement (e.g.,, PV/month, PV/session)
  • How easily do users perform specific tasks? For example, how easily can users make an edit (edit-to-save ratio). How does this vary across user segment?
  • What features do users use?
  • How do users interact with various elements of the navigation?
  • How do changes we make affect on the site affect all of the above?
  • How do changes we make correlate with changes in behavior (ie, does something we do change one of these dimensions of user behavior over time, or at a certain time)? -> A/B testing

Feature requirements:

  • User segmentation (i.e., we should be able to get all the different analytics listed below by user segment). Segment may be defined a number of ways: logged in/logged out; reader/editor/admin; editor by activity (e.g., casual/heavy, by edit count, by edit size), etc.
  • User pathing
  • Task fallout
  • Entrance/exit pages
  • General stats
  • Task metrics
  • Heat map (?)
  • Split A/B testing (elements of pages and pages themselves)
  • Overall dashboard of site health; dashboard per segment
  • User environment (e.g., browser, resolution, js enabled, geocoding, etc.)

Activity Metrics

Activity metrics will help us understand the broader interactions our users have with Wikipedia and other projects. Rather than being focused on how users interact with the site, these metrics are focused on the patterns of activity over time. Some of these items will need to be correlated with web data to get a more complete picture of user behavior. Some questions we want to ask:

  • How does editing cluster by frequency/edit size? Are there a set of users that make many small edits vs. users that make few large edits?
  • What is the role of reversion within users's editing patterns. Are there users that almost exclusively revert? That rarely revert?
  • How does reversion impact a new editor's likelihood of editing again? How does having the first edit reverted affect likelihood of subsequent edits?
  • How does a user transition from being a first-time editor to a more frequent editor?
  • What is the impact of new editor outreach on subsequent editing behavior?
  • How does editing behavior cluster around content areas? How does this vary across user type?
  • Are there a set of typical lifecycles that editors follow? For example, do users edit in spurts (edit a lot, go away, and then come back)?
  • Do different types of edits (e.g., article/talk/reversion/other) correlate with different lifecycles?
  • There is probably a concept of lifetime value of a user. If we can develop a good one, what are the predictive markers of a user with high lifetime value?

Feature Requirements:

  • Editing histograms: there are many different types of histograms we can run, the simplest being number of edits.
  • Editing history clusters: set first edit at t=0 and follow users over time to see what patterns emerge.
  • Editing across content area: map of editing to some approximation of content area (maybe category?)

Community health metrics

Community health metrics should track basic population dynamics of different kinds of volunteers on Wikimedia projects, as well as trying to understand the "cause of death" for both new and experienced editors. In trying to understand the hazards that contribute to community problems, we must also track different kinds of dispute resolution activity, as well as difficulties that prevent a contribution from being completed and accepted by the community.

  1. Lifecycle patterns
    • How can we compare the first 100 edits of active editors versus editors who tried Wikipedia and left?
      • User space activity (article space versus talk space versus Wikipedia space)
      • Reversions
      • Favorite article categories / talk pages
    • What happens in the last 100 edits of veteran (e.g.: 90 days or more of activity) editors that causes them to leave?
      • User space activity (article space versus talk space versus Wikipedia space)
      • Reversions
      • Favorite article categories / talk pages
    • Population dynamics
      • Number of active editors
      • Number of active administrators
      • Number of active editors who have been blocked
      • Classified by total number of edits
      • Classified by time on Wikipedia (1 month, 3 months, 6 months, 1 year, 2 years)
      • Classified by userboxes / user categories
    • "Death" dynamics
      • Left and retired
      • Left and wikibreak
      • Left and recent dispute resolution activity (e.g.: AN/I, arbcom)
      • Left and recent block/ban
      • Left inexplicably
  2. Dispute patterns
    • Tracking dispute resolution activity over time:
      • Activity at ArbCom
      • Activity at the administrator noticeboard
      • Etc.
      • Administrator noticeboard issues marked as unresolved, versus resolved
    • Comparing consensus-discussions versus no consensus disputes
      • Number of users participating
      • Number of administrators participating
      • Number of previously blocked users participating
    • Examining the life-cycle of editors who are blocked immediately versus those who are blocked indefinitely after one year of activity
      • User space activity (article space versus talk space versus Wikipedia space)
      • Reversions
      • Favorite article categories / talk pages
      • Administrator activity on the user's talk page
  3. Obstacle patterns
    • Number of abandoned edits (e.g.: clicked edit, and did not save)
      • Broken down by user experience level (e.g.: new versus experienced editors)
    • Number of edits reverted
      • Broken down by user experience level (e.g.: are newer users more likely to be reverted? can we quantify the value of experience?)
      • Broken down by size in kb (e.g.: are larger edits more likely to be reverted?)
      • Broken down by article category (e.g.: are certain topics more closely guarded?)
      • Broken down by article quality (e.g.: are featured articles more closely guarded? are stubs less guarded?)
      • Broken down by whether it includes a citation or not (e.g.: what role does citing a fact have in reducing the likelihood of reversion?)
  4. Interproject population dynamics
    • Migration rate of people moving from / to en wikipedia from / to other language versions

Outreach Activity Metrics

Who is using and participating in different Wikimedia projects? This is critical to assess Wikimedia's reach in various geographic locations as well as to access the effectiveness of targeted geographic interventions. Specifically:

  • number of unique visitors:
    • by project (a breakdown of unique visitors within each project by country)
    • by country (a breakdown of unique visitors within each country by project)
  • participants:
    • by project (a breakdown of participants within each project by country)
    • by country (a breakdown of participants within each country by project)

When trying to reach out to new potential readers it is important to know what kind of content they are interested in. A broader range of readers would in turn lead to a broader range of editors. It is also important to be able to inform potential editors about what kind of content they actually can contribute. Specifically:

  • What do people want to read? From Google, Yahoo, Alexa, Wikipedia traffic logs etc. Strive to make it possible to break this information down and to make it possible to analyze the "content demand" in different categories (region, language, sex, age, occupation, etc). So that directed efforts to reach out to new reader groups can be informed.
  • What kind of content is underrepresented on Wikimedia projects? This could be passed along directly to editors. For example the Swahili Wikipedia can from information about what Swahili speaking internet users wants to read about and information about what kind of content already exist on the Swahili Wikipedia provide direct information to editors about what articles that are in most need of creation or expansion.
  • Editors interests, education, hobbies, etc.? This could further increase the possibility of pointing editors to content in need of creation or expansion that fits their interests.

Bandwidth might be a limiting factor for some visitors. So...

  • How much data needs to be loaded by visitors when they load an article? Make it possible to break it down into content type (article text, pictures, videos, scripts, etc.) so that efforts to minimize the bandwidth requirements can be informed.
  • What connection speeds do users and potential users have? Make it possible to break this down into different regions. So that it is possible to judge what amount of data that is reasonable to expect that people in different areas are able to load.
  • Some way of measuring how localization of MediaWiki affects the local projects. Data about how many visitors, editors, articles etc. in a specific language correlates with the number of translated system messages for MediaWiki in the same language would help to judge what amount of effort that should be put into localization.

Fundraising Activity Metrics

Overall our need is to identify which set of pages led a donor to donate. Alongside this we need to know if they abandon their donation fully or come back later. Once this data is in place we can further dive in and find regional/project/language differences.

These are different from the general requirements listed above in that they are specific to donation-related stats:

  • Ability to identify user paths
    • We need the ability to track each session from the clicked banner to possible donation and any abandonment
    • The data can at most be an hour old
    • We need hourly stats retained for at least three fiscal years
  • Tracking will be limited to foundation wiki and will be bound to our established privacy policy
  • Robust UI that allows us to easily see the multiple possible paths our donors take
  • Easy clear reporting of stats for public consumption
  • Breakdown of same stats (banners, paths) by identifiable info (by country or language)

Strategy Activity Metrics

Strategy's current strategies and requirements for evaluation are at Evaluation/Community engagement. In general, we want analytics that will help measure community engagement. We answers to questions such as:

  • Did LiquidThreads improve or impede participation? Break this down by existing and new contributors.
  • How effective were different methods of getting people to specific places on strategy? For example:
    • Site notices? Would love to do A/B testing of different wordings.
    • Dropping messages on Talk pages versus Village Pump versus Task Force talk pages
  • What are the most active pages? Break this down according to namespaces; for example, Proposals.
  • Who are the most active contributors? Break this down by day, week, month, etc.

Specific requirements:

  • Ability to identify user paths
  • Frequently accessed pages, broken down granularly (by day, by week, by month, etc.)
  • Frequently edited pages, broken down granularly
  • User segmentation
    • Task force members. Different project members. For example, on Evaluation/Phase 2, I'd like to see what percentage of the activity came from Task Force members versus other participants on strategy wiki.
  • LiquidThreads behavior.
    • Click-tracking
  • A/B testing of site notices
  • Social network analysis based on LiquidThreads threads or page edits

Note that some of these stats are currently available at a high-level through User:Erik Zachte's tools.