Task force/Analytics/2010-05-19 Andreas Weigend

From Strategic Planning

Andreas Weigend visited the Wikimedia Foundation on May 19, 2010 for an informal "data day." Our goal was to talk about how Wikimedia could better think about and use data to help with its strategic priorities. Fifteen Foundation employees and six guests (including Andreas) participated. Notes were taken collaboratively on an instance of Etherpad.

Facebook vs. Twitter followers

As an opening exercise, each participant was asked to report (based on memory) their number of Facebook friends and of Twitter followers.

Then we each were asked to check our answers; some discussion of what the differences meant.

Facebook/Twitter experiment results

What is a fundamentally private service? Intimacy

Andreas: "Privacy was a blip in history."

Who cares about privacy?

Daniel J. Solove, I've got nothing to hide: and other misunderstandings of privacy.

  • Catalogues multiple notions of privacy; hiding embarassing information is only one component. Control over how information is used / shared, ability to correct.
  • Notion of privacy has changed (Facebook and Internet services are pushing this; also other technological advances, e.g. in genetics)

Moka: difference of web-literate populations that have instincts about what to share, what not to share; understand how information flows

Andreas: people are lazy and don't care until something big happens.

Moka: not laziness, ignorance.

Miller from Princeton published a paper in Scientific American in the 1980s: literacy has changed. Americans were "at the bottom" of general literacy.

Knowing vs understanding (factual knowledge vs. strategic thinking).

External representation of who you are by an external party vs. self-representation; self-representation was not possible before. Erving Goffman's work in the 1950s, The Presentation of Self in Everyday Life.

danah boyd on privacy in social networks, particularly how it pertains to children/adolescents.

Steven recently deleted his Facebook profile. Steven: "Facebook is not on the side of its users" "interactions in person diluted by using FB so much."

Neil: interesting that Steven is leaving school and FB at the same time, sees FB as a tool for adults to keep in touch even without a strong in-person community to keep them connected.

"How do social networks change your notion of friendship?" Andreas looked at it in China and in the U.S.

Relevant blog posts from Andreas:

Wikimedia

Howie noting lack of visiblity of people on Wikipedia. Asked for experiences from others

  • RobLa: At Second Life, if a user made a friend, it was a big predictor if someone would stay active
  • EEKim: according to some in the Wiki community 'retention' is not as important as it is to become a quality contributor
  • Others volunteer that they stayed Wikipedians because of relationships with admins

Wikipedia/Amazon parallel. Does it matter whether something is primarily social

Barry: as we make approach decisions about what direction to go with the development of Wikimedia projects, can you help us think about how to consider data?

Ethics about data gathering and privacy; commitment to experimental rigor.

Andreas's PHAME framework for thinking about data:

Problem
Hypotheses
Actions
Metrics
Experiments

Amazon gets $100 for each co-branded credit card, gives $30 to the end user.

Hypothesis: giving it to them right away gives them incentive to spend.

Is it more effective to give the incentive money up front (immediately), or after first purchase (delayed incentive, but can be incentive of itself "oh, I have a credit!")? Given the set of data, what insights could we get from that?

Nobody finds great datasets anymore, they create great datasets through experiments

A/B testing especially

Howie & Micah mention difficulty of getting data at Wikipedia, or predicting secondary social effects

You can't afford NOT to do experiments

Andreas: "you can't afford not to do experiments"

Strategic planning process expressed a strong preference for an experimental approach

Eugene: Wikimedia is at an inflection point. Up till now, culture has not been so supportive of an experimental approach.

that's not really true ; both usability projects have started to use a research-driven approach

I think Eugene meant prior to that even. He tends to acknowledge the data brought in by usability projects as an example of what's done RIGHT.

as a sidenote: research has been one of the main topics discussed by the UX team during the business planning process; unfortunately, it's been hard to convince the executive staff to dedicate resources to research, because they feel there are a lot of unfinished dev work/technical issues to fix first

Pete: This is a tough bit for me to follow. It seems to me that Wikipedia exists ONLY because of a vast network of interrelated experiments. I could understand if the point is adopting some central notion about best practices with experimentation.
It's not the experiment itself, it's the type of data that's created as part of that experiment that's the issue for people. We can experiment all day, but we're limited in what data we can capture.
That point makes sense, but it's not what I heard Eugene saying. Maybe I misunderstood him.

Wikimedia privacy policy: "When a visitor requests or reads a page, or sends email to a Wikimedia server, no more information is collected than is typically collected by web sites. The Wikimedia Foundation may keep raw logs of such transactions, but these will not be published or used to track legitimate users."

There's a different Wikimedia Donor privacy policy.

Publicly available data can be found at: http://stats.wikimedia.org

  • Hits (any number of different actions can be found)

Other resources:

Privately held data is at stuff like: the fundraising team's database; reader relations database of calls, OTRS databases, volunteer database, stuff like that.

Data vs Infrastructure (plumbing vs

Questions

  • Role of social?
  • PROBLEMS THAT DATA / METRICS / EXPRIMENTS CAN HELP WITH
  • Fund raising
  • Quality of content
  • How to measure it?
  • Breadth of content
  • Editors / Contributors
  • number
  • who they are, where they come from, why they leave, what they expect, what their mental model is -- to design the best experience for them
  • Engagement?
  • Lifecycle
  • Comfy / friendly place to hang out at and contribute to
  • Keep barriers to entry as low as possible

Scientific method (pete's memory)

  1. Formulate a hypothesis
  2. Design an experiment
  3. Execute the experiment
  4. Gather the results
  5. Interpret the results
  6. Draw a conclusion
  7. (Formulate a new hypothesis)

http://en.wikipedia.org/wiki/Hypothetico-deductive_model

  1. Gather data ( observations about something that is unknown, unexplained, or new )
  2. Hypothesize an explanation for those observations.
  3. Deduce a consequence of that explanation (a prediction). Formulate an experiment to see if the predicted consequence is observed.
  4. Wait for corroboration. If there is corroboration, go to step 3. If not, the hypothesis is falsified. Go to step 2.

reader

anon edit vs register

Erik: (hypothesis): one of the reasons to create an account may be to customize their *reading* experience, not necessarily a desire to edit

Andreas: From a reader, what makes them simply make an IP edit

English Wikipedia: 32% of edits are from non-registered users. Dutch is 10%.

Neil: How does vandalism relate?

Robert: Majority of unregistered edits (~80%) are legitimate, compare to ~95% of registered edits being helpful.

Howie: with a marketing campaign, you generate a certain type of user. Our current view is pretty monolithic.

Barry:

Rebecca: Are we assuming that registered users have a higher retention rate?

  • (could reason be to avoid harassment?)

(Pete: Danny Horn at Wikia is a good person to talk to about this)

Hypotheses:

  1. The cost of signing in is significant
  2. The cost of signing up is significant

Moka: developing incentives

Eugene: Wikipedians have done some research by making fake accounts and making edits. (User:WereSpielChequers)

Ed Chi attempted to normalize for vandalism.

Anonymous versus logged-in edits

  • How to measure quality? Edit persistence?
  • What about mistakenly anonymous edits (forgot not logged in)

Weigend suggests surfacing the benefits of being logged in to the user (lower likelihood of reversion, etc.)

Ariel: what if being logged in becomes the new normal, and then new users are reverted at a similar rate

We don't do a good job of expressing to end-users the benefits of getting an account

We discussed some graphs that Erik Zachte sent on editing and reverts.


Closing Comments

  • (ASW) how can we understand the real barriers to contribution?
  • (ASW) Most ppl just go to get some info -- parallel: airline vs just go on that trip with your friend
  • (ASW) who really knows who the "good" editors are?
  • ( (Neil K) Experiments > anecdotes & theories. Let's talk about getting better experiment infrastructure
  • Moka: What is the life-cycle of an editor?
  • Pete Forsyth:
    • let's do experiments, but carefully examine assumptions, do it with the scientific method. (Let's esp. look at first step of scientific method, "use your experience to formulate a hypothesis/experimental question")
    • SHARED OWNERSHIP Wikipedia is getting to the point where it's a major institution, have a desire to help that may transcend individual incentives. let's leverage this to persuade editors
  • Howie: 375M uniques per month! We're lucky to have such a base for experimentation. Our community loves data and information.
  • Micah's talking point picture: http://www.flickr.com/photos/35034358900@N01/4622563354/sizes/o/
    • Looking at the "funnel" from reading -> click edit -> click submit -> downstream results (reversions, quality...)
  • Rebecca: Register/not register. The #1 answer to "why did you donate" is "because you asked. There may be an important parallel here. Figure out how to invite people.

Andreas's story: Airline collecting data on whether people wanted to drink Coke or Pepsi. One student said, "All I care about is getting to Paris and having a good time with my girlfriend." Moral: Don't get too caught up with unimportant questions.

Suggestion

  • (ASW) Figure out the simplest thing you might want to know. then ask something 10 x simpler
  • (ASW) Surface positive and negative effect on the community, consider "social capital" predictor