What's your perception on how the strategy process is going?
Some conversations are going well; I’m concerned about getting to closure in some areas. We got 2K applications from the call for volunteers, but now it sounds like we are dealing with a small number of people in the task forces. We may be falling back to people we know, drifting to “inside baseball,” the usual suspects talking about the same problems. We need to follow up to reengage those who applied, give them actionable items to do. We need to move them to do stuff. Volunteer workgroups can help build chapters in countries, work on technologies, improve friendliness. There are things these people can do. We need a good process in place to mobilize that energy.
Can you provide us with a brief overview of the history of MediaWiki, who created it, for what purpose, how it became the platform for Wikipedia?
When the project was started, one of the first things that Jimmy and Larry did was to find a software solution to run Wikipedia. They had heard about the concept of the wiki. By 2001, there were multiple wiki implementations, most of them in open source software. In the first year, they picked the open source script that runs MeatBallWiki. One of the important early developments for Wikipedia was to move away from the CamelCase model of making links – writing words together to become a hyperlink. The first software development was done by volunteers; it was customized by users of Wikipedia The first wiki we used was a single script, it couldn’t scale for large scale web traffic, there was no database backend. In 2002, Magnus Manske, a student, developed a new wiki engine from scratch. It was installed in 2002 and initially caused serious growing pains. The site was slow, the features didn’t work, there was lots of new functionality that hadn’t been built to scale, but it had a database backend. So it was a better foundation to build on, but it didn’t serve us well in the first 6 months. It was rewritten by another volunteer, Lee Daniel Crocker, who optimized the hell out of it and addressed performance issues so the site could scale.
Lee was joined by other volunteers, including Brion Vibber and Tim Starling, who maintained the code base and added new functionality. They gave it a new name, “MediaWiki.” Since then a lot has happened.
It has been substantially rearchitectured in a number of ways, e.g. the backend revision storage had to be redesigned to fix some brokenness inherent in the old architecture. We’ve introduced new features like categories and templates, that we use now everyday. See <http://en.wikipedia.org/wiki/MediaWiki_release_history>. Some changes had a dramatic impact, like the introduction of categories. That was a straightforward new way to surface relevant and related content. A taxonomy emerged within weeks.
Templates also had a big effect. It was transformative; while it complicated the editing of text substantially, it made it possible to reuse blocks of text. It is a killer feature for an encyclopedia where you have the same layout in multiple pages. The introduction of templates was a blessing and a curse – made it much easier for those power users and steepened learning curve for newcomers.
Can you describe current state of the MediaWiki software? What are its strengths, its weaknesses? MediaWiki has evolved alongside with the community that has optimized it over time. It is built to serve Wikipedia’s purpose. Very little has been added because it was just a great idea. It had to satisfy a specific articulated need. Over the past 7 years, it has basically grown to accommodate the needs of the community. For example, one of MediaWiki's innovations was the introduction of namespaces to separate articles, discussions, and policies – this makes Wikipedia one of the few places where rules, policies and norms can be documented and emerge in a collaborative fashion.
Functionality like categories and media embedding serve the specific needs of the editing community. MediaWiki supports embedding of flags and maps well, as well as the rendering of mathematical formulas. References are an optimized feature. The drawback with all of that stuff is that it is driven by the existing highly technical community. They don’t mind if it is ugly, as long as they can use it; they don’t mind if it adds more challenges to the new editor, if it makes their life easier. We have a passionate encyclopedia community that the software has been optimized for. It makes it harder for new people.
What is the relationship between the Wikimedia Foundation and the MediaWiki software? How much control does the foundation have over the direction of the software?
It used to be that all development came from volunteers. There were no paid developers until 2005, but Jimmy Wales’ company allocated some time of its paid technical staff. What started to happen after 2005 was that the leadership role shifted from volunteers to paid staff. We recognized that we needed to stabilize and grow the developer community. Brion is the best example of this; Tim Starling was in the same situation. Those are people who would have probably burned out if we had just kept them in volunteer roles. The Foundation became more responsible for leadership; also paid staff could be more accountable for their work. This was just the beginning as the organization grew, with budget and staff. Now we employ a large development team, relative to other nonprofits. As a nonprofit we are investing significantly in software development. At this point, volunteers typically focus on specialized and highly focused small projects, not the advancement of the core as a whole. For example, we have a volunteer who maintains our search engine. He has been doing that for 2-3 years. Sometimes he spends a few hours a week on this, and other times a lot more. He just looks after the search engine. We verify his code and deploy it. He tells us what infrastructure he needs and we provide it.
On the staff end, we have grown to do things like the usability initiative, so that we can push strategic advancement in one particular area. It is hard to organize volunteers to do that. Prior to that shift – I would say that when things started – the software ecosystem was such that a single dedicated volunteer could rewrite the software. Lines of code and dependencies have dramatically increased so single volunteers can’t do that, they can work on specific components.
Who and about how many people work on MediaWiki (ex. Volunteers, paid foundation staff)? How is the work of the developers organized? Who sets the direction for the project?
MediaWiki is integrated because it is a single application, but it does have a wealth of extensions/plug-ins like the math functionality. Those extensions have designated maintainers. The person who writes it owns it. Some are better maintained and others are no longer actively advanced.
MediaWiki doesn’t have a lot of structure. The lead developer (Brion) has left. Tim is now the most senior developer. He just engaged in a long code review process of the last changes that volunteers and staff have made. On any given month there are 20-30 active developers on MediaWiki. That includes paid developers. The core development team is 5 people, plus the 5-6 FTEs that are working on the usability project.
Most MediaWiki extensions live in Wikimedia's version control system. There are other extensions that others own, for example, some development activity happens outside of us at Wikia. They have built lots of add-ins for adding social networking features, for example. They make them open source, but they build them for their own purpose. Their extensions would not necessarily work for us, but they might also not be known to us.
In the ecology for MediaWiki, there is the Wikimedia Foundation and Wikia. Wikia is more successful than people realize, but they aren’t advancing MediaWiki in ways that are necessarily particularly helpful to us. There is also WikiEducator. There is some MediaWiki use in the scientific community, and there are many corporate users. Most of these are not investing in developing the core infrastructure.
What are your thoughts on the role of the CTO at Wikimedia and how has your thinking on that role evolved? We are looking for someone who brings a mix of experience with open source communities and experience as a manager. We are not just looking for a coder or just for a non-profit manager. The Technology Dept encompasses operations; operations staff require someone to look after them and to manage the ongoing expansion of the team. At this point in the organization, the challenges are more management than technical. We are separately hiring a code maintainer position, a senior software developer.
What is the process for new people to become developers of MediaWiki?
We are liberal in giving people access. We don't bring in more people than we can handle. We aren’t actively recruiting volunteers right now. The experienced core team is overstretched. To manage an open source project you need senior people to review code. If it breaks, the site goes down. We can only absorb so much development within a time window, or we end up absorbing developers at a faster rate than we can handle.
There are 160 people that have rights access to the version control system. Software development is rarely something that someone does just as a hobby. If you get good at it, it will be your career path. If you then spend 10 hours a day doing that work, you are not going to want to volunteer large amounts of time with Wikipedia on top of it. We have a challenge with long term retention of volunteers, that’s why we want to offer most active volunteers opportunities. Many of our contractors are students; hopefully some will transition to full time work with us. There are people who have stuck around as volunteers, but now they are getting serious jobs, so we know they won’t be spending a lot of time with us.
Can you talk a bit more about the usability initiative and what progress is being made on that front?
For the organization, the usability project is an important milestone. It is the most serious strategic project in terms of software development that the Foundation has ever undertaken. There are dedicated staff who are working on it who are not doing anything else. They have a budget of $900K. They are focused on usability not firefighting. As a whole it is successful in that it has performed. We have brought in developers who haven’t worked in MediaWiki. They are learning the ropes and now they are becoming productive. The risk is that just as it starts getting productive, they are wrapping up. This is why we want to extend its lifecycle. There are just a few months left in the project, and we want to retain those people.
What changes will come out of it?
What we’re going to get done is meeting the objectives in the grant proposal. The most significant change is the way the software looks to the editor. They are simplifying it by hiding complex mark-up and adding a new toolbar with access to important features. That will be the most dramatic change. They have been doing other stuff too. Based on their studies, they know that people don’t find it easy to navigate. They will redesign key navigation elements, increase the white space, make important links larger, and make it easier for people to do the things we want them to do.
Technology is one part of the user experience. If you have an emotionally negative experience with other editors, you remember that and may take it very seriously. But if you click edit and are confused, you will just go away. This may be why so many more people talk about the negative social environment rather than the technology issues with Wikipedia. The technology is still very important.
We do need to think about user experience beyond technology. One factor that disincentivizes people from editing is reversions. We need to think about how experienced editors interact with newbies. Experienced editors see it as a button pushing transaction rather than about the newbie. That’s why integrating faces and profiles into that process, could make a difference. If you can see that the person you are about to revert is on line and you can chat with them, that might help. We need to give experienced people mentoring tools, tools that can help them facilitate the entry of newbies to the community.
I have read that because MediaWiki lacks a defined syntax it is very difficult to create a WYSIWYG interface, can you explain that and what it might take to create a WYSIWYG interface for MediaWiki?
While MediaWiki was intended to make the editing process more transparent than HTML, the problem is that it has defined shorthands, some of which are defined in a way that makes it hard to predictably translate. The way to understand how it will work is to throw stuff at it rather than look in documentation. It is hard for anyone outside of the Foundation to do something with wiki syntax. One example that was done outside the Foundation: Wiki-to-print -- this add-on lets you take wiki syntax, turn it into pdf and print it out.
There are several ways you could move from wiki syntax to WYSIWYG. You could move from wiki syntax to enriched XHTML and present it in WYSIWYG form. You may want to be able to fall back to wiki syntax, that means you would want to be able to convert back and forth. This is not impossible, this is what Wikia is doing. They have been using it in production for months.
What technology changes do you think are necessary to increase participation?
WYSIWYG is the next stage. If we were to extend the usability initiative by one year, we could get to the rich text editing environment.
What key priorities for technology would you identify?
We need to protect the infrastructure and data. We have vulnerabilities. Not all of the data is backed up or stored optimally or sufficiently.
We have issues with performance and reliability. There are scenarios where we could be down for four weeks due to having to rebuild the infrastructure.
In terms of performance, one thing we’d want to measure is if we get equivalent load times around the world, and if they are what they should be.
In terms of optimization for media and image files, there are strategies we need to explore. We are doing everything ourselves. We have our own caching infrastructure, and we are using that rather than using world wide content delivery networks. We are not serving a lot of large files to a lot of people; but at some point we may be. For now, we are under-optimized for the stuff we are doing.
In terms of metrics, we are not collecting as much information as we should be. We are getting gradually better as our needs and demand for data increase. An area where we have a gap is site performance metrics. This is not an unsolvable problem. We are building a relationship with Keynote, for example, to obtain global monitoring data.
How critical is investment in technology that would optimize for new media?
We do need to increase capabilities there. It is a function of usability. We want to make it desirable for people to use. Commons is growing in participation and images. We are not seeing a trend in increased demand for uploading videos. We want to have that problem. We need to have software that makes it easier to upload – which is why we have a partnership with Kultura – which is working to develop open video editing software, tools for uploading, sequencing, and editing video. As that matures, we’ll have growing demand. Even if demand grows fast, our infrastructure can handle it. I anticipate that in the next year, we want to at least investigate content delivery networks. Many of these companies would want to help us. Everyone wants to be a hosting partner for Wikipedia.
What are other key technology issues for Wikimedia?
For example, functionality on Wikimedia Commons for adding image annotations. No staff looked at that code or approved it. There are a number of those features that are complex and powerful and built by volunteers. There is a programming ecosystem that is beyond our reach.
What we are trying to do is build the core team to build MediWiki and to scan the larger ecosystem for innovations that should be factored into core functionality.
The German chapter invested in a tool server, so that you can access the databases and develop your own tools. Users create scripts and tools, some of which have become very widely used, such as interactive maps. People have built powerful stuff. The German chapter and the Foundation are funding this infrastructure. The problem is there is no process for reviewing and doing stuff with that and integrating it into the core platform.