Proposal:Donated remote proxy server network

From Strategic Planning
Status (see valid statuses)

The status of this proposal is:
Request for Discussion / Sign-Ups

Every proposal should be tied to one of the strategic priorities below.

Edit this page to help identify the priorities related to this proposal!


  1. Achieve continued growth in readership
  2. Focus on quality content
  3. Increase Participation
  4. Stabilize and improve the infrastructure
  5. Encourage Innovation



Summary

Create a network of volunteers offering free remote proxy servers to take load away from central wmf sites.

Proposal

The entire idea is about serving existing, more or less stable pages, not about editing, and not about rapidly changing content, like Special:RecentChanges.

Serving pages again and again is big share of traffic currently going on between squids in few wmf locations and client browsers. The idea is to have more of those squids, and to have them in arbitrary places. This shall not cost the wmf extra. Hardware is owned and maintened by volunteer donors, who also take the traffic cost.

I can see two types of potential donors who would join this kind of network:

  1. Computing centers having big lease cables which they currently use below their capacity, having otherwise unused rack space, having a stock of spare servers, or new machines needing "burn in", etc. They may have everything needed to run proxy servers for Wikipedia (or wmf in general) and it does not cost them extra to do so, except setting the proxy server software up once per machine, and adding them to their DNS.
  2. There are many people out there who are behind DSL lines, running bittorrents, yeti@home, or other shared services, who might be willing to support wikimedia as well running a proxy for them, the performance of which might vary with their own computer and network use.

Both should be expected to come and go, since interests change, servers break down, the computing center does not want extra traffic before their paying customers force them to the next increase which then again leaves them with plenty of unused capacity for a while, etc.

So as to make it easy for potential corroborators to join the network, we should offer them a kind of toolkit installation with a set of simple installation scripts. Without wanting to go into the details of planning it, once a proxy server has been set up and configuerd, linking it into the network, supervising its performance and avalability, routing requests to it, and so on, should all happen automatically.

Motivation

  • Reducing server cost on the wmf side.
  • Getting more people involved, who are wanting to give something to mankind.
  • Building a community network of wmf proxy server supporters.
  • Keeping network traffic low, and possibly more local.

Key Questions

  • How to keep up with growing traffic?
  • How to reduce network traffic?
  • How to have wiki pages served quicker?
  • What evidence is there that this proposal would be a net benefit in performance?
  • What evidence is there that this proposal would be a net benefit in cost?
  • What evidence is there that the WMF's privacy obligations could be met under this proposal?
  • If it is beneficial why has it not been targeted by the foundation technical staff in charge of this area?

Potential Costs

  • Increased operating costs in keep software current and secured on additional remote systems
  • Increased operating costs in monitoring the distributed infrastructure and following up with failure events
  • Development costs in building additional synchronization infrastructure (preventing stale pages)
    • The current infrastructure is only suitable for a few large clusters
  • Development costs in creating additional load-balancing software infrastructure
    • The current infrastructure works mostly because 'North America' and 'everywhere else' are pretty distinct internet routing table wise. The same distinction doesn't exist at finer levels. The existing stuff can only go about as fine as one cluster per continent.
  • Decreases in performance for logged in users because non-anonymous text can't be satisfied by a proxy cluster.
  • Decreases in performance for random anonymous users due to location mismatching (no GSLB method is flawless; some people will get mapped to a far away cluster)
  • The coarseness of existing GSLB methods makes things like dsl end user based hosting totally unworkable.
    • At best we can direct all the users behind a particular ISPs name-servers to a particular cluster. There is a reason that bittorrent uses special client software.
    • Only a fraction of a percent of WMF's users could be expected to install special software in order to access a distributed version of the sites.
  • Increased exposure to legal attack in a larger number of jurisdictions.
  • Increased difficulty in resisting unjust surveillance orders.
  • Decreases in total reliability.
    • Its fairly easy to keep a small number of sites up and running, if each site stays just as reliable you will see more failures with more sites. GSLB isn't instant, if your client has resolved to use a particular cluster it is fairly likely to continue trying to use that cluster for many hours even when it is down. Many providers recursive resolvers now partially ignore DNS TTL.
  • Greatly increased risk of private readership traffic information to third parties.
    • WMF's private reader data is worth millions, how will we know that a proxy capacity donor isn't merely donating so they can capture and sell the information?
  • Greatly increased risk of traffic modification. Anyone running a proxy could insert ads or subtly change content and detecting this behaviour would be very difficult.
  • Greatly increased risk of disclosure of confidential editor information
  • Dilution of traffic endangers peering
    • The WMF currently offloads a significant fraction of its total traffic at no cost to other providers who have agreed to interconnect with the wikimedia network for mutual benefit: Its better to get the traffic directly from Wikimedia than to pay for it to traverse an intermediate ISP. If the traffic too diluted in its sources then it will not be worth the trouble of configuring a peering interconnection, resulting in an overall increase in cost and decrease in performance.



References

Community Discussion

Do you have a thought about this proposal? A suggestion? Discuss this proposal by going to Proposal talk:Donated remote proxy server network.

Want to work on this proposal?

  1. .. Sign your name here!