Jump to content

Proposal talk:Distributed Infrastructure

From Strategic Planning
Latest comment: 14 years ago by Fasten in topic Symbiosis with centralized architecture

Some month before I had the same idea. I think it will be amazing, but it's a very difficult task. We can share between volouteers:

  • storage (data)
  • computing power (task)

I think we can start from the second point, for example we can use voulounters node to resize images, or to render latex formulas. -Wiso 12:06, 25 August 2009 (UTC)Reply


Some proposals will have massive impact on end-users, including non-editors. Some will have minimal impact. What will be the impact of this proposal on our end-users? -- Philippe 00:08, 3 September 2009 (UTC)Reply

I believe the overall impact of this kind of change is near inpredictable, and, the outcome would depend upon the choices of realization. If realized, it would be a kind of massively distributed p2p/grid bandwidth and storage offloading system, probably unprecedented as such. To end-users and in theory - it would have no impact, since it would be invisible (other than running in BOINC like fashion on their computer systems if they choose to).

How would this help?

Distributed computing generally works by trading off efficiency and speed for lower hardware costs. Jobs have to be queued up, everything has to be done multiple times to make sure the results are consistent, and there may be significant delays between download -> work starting and between work ending -> upload. For most of the processing-intensive aspects of MediaWiki (parsing articles, generating special pages), speed is critical. We can't have people going to a special page and being told "Your request has been sent to the grid, come back in 30 minutes for the results." Mr.Z-man 16:35, 17 September 2009 (UTC)Reply

see Proposal:Distributed_Wikipedia of which this proposal is a sub-proposal, for details on how all the "downsides" described here can be overcome. Lkcl 14:52, 30 September 2009 (UTC)Reply
The vast majority of our 80,000 requests a second are to access information, and whilst the web is very fast in some parts of the world in others 56k is still a good speed. In countries where Internet speeds are low and there are bottlenecks on the Internet it could help if we had a local copy of the most heavily accessed stuff inside the bottleneck - editing would still have to go through the bottleneck but not searches. I don't see this being relevant in the first world but in large areas beyond it this could make a difference.
According to the tec report at Wikimania we are reducing our numbers of servers as server technology is now increasing their capacity faster than we are growing. Meanwhile the drive to green computing is encouraging more users who might once have considered distributed computing projects like seti at home to switch their PCS off at the wall when not in use. So I suspect this would have limited scope or opportunity except to address third world bottlenecks on the web. WereSpielChequers 16:57, 17 September 2009 (UTC)Reply
There is also a significant and growing number of people nowadays who have computer turned on for the most part of the day, whether considering the green computing trend or not.
I think discussing whether people have PCs turned off or on is pointless here. However, the fact is, that distributed computing projects like BONIC (SETI etc.) turned out to be highly successful. People who want to contribute this way simply do, it is other way of donating money.--Kozuch 12:21, 1 December 2009 (UTC)Reply
1) the whole point of peer-to-peer distributed computing / distributed document sharing is that you ensure that the people who make the most use of the information also provide corresponding amounts of resources. the bittorrent protocol is a good example. the more you share, the more you receive. thus, arguments saying "ohh, well, if we rely on people's goodwill they'll just shut off the machines and thus shut down wikipedia" are demonstrably false. 2) this proposal is a sub-proposal of Proposal:Distributed_Wikipedia Lkcl 14:52, 30 September 2009 (UTC)Reply
We can solve the problems WereSpielChequers mentions without devoting tons of time and effort to develop such a system. We already have squid caches in Europe. All we would need is a rack in a datacenter in Africa somewhere and probably someone local to install/maintain the servers. We already have much of the software necessary to support it. Mr.Z-man 20:28, 30 September 2009 (UTC)Reply
The point worth focusing to, as much as I gather, isn't so much storage as is the bandwidth itself - the unused bandwidth on growing numbers of computers with flat rate Internet. Yes, it would require a lot of time and effort, compared to linear adding of infrastructure. But, it is also evident that in long term it would be very desirable if you could minimize bandwidth and storage cost by a factor of, say, up to 10. With that, you would also potentially greatly minimize the essential donation amount needed to keep things running which also seems desirable to me.
As much as I see, this kind of project will probably not be outright started because it would probably be too hard and massive and because there are probably greater priorities at lower costs to be attended to ATM. Wiso here made a pertinent suggestion. Also, you can always wait till someone else develops a open system for such kind of public distribution and then just 'plug into' it. 21:57, 30 September 2009 (UTC)gstReply
That's just it, all you save is storage cost (bandwidth savings would probably be minimal and possibly negative, as you have to send updated versions of pages to thousands of local caches instead of just a couple). But storage space is cheap and constantly getting cheaper. Most (all?) distributed computing projects distribute the load of processing, which is minimal for caches. Mr.Z-man 22:45, 1 October 2009 (UTC)Reply

Proposal to merge with Proposal:Distributed_Wikipedia

this proposal is effectively a sub-goal of the distributed wikipedia proposal. The goals of the Distribute Infrastructure proposal contain exactly the same kinds of arguments, aims and motivations (but they are much more eloquently put, here!). The distributed wikipedia proposal goes farther than the distribute infrastructure proposal, by advocating the creation of a second API by which any contributor may entirely bypass the "standard" HTTP wikipedia.org interface (in a similar way to that of the facbook API), to create their own entire fully-functional GUI replacement. for iphone. embedded systems. for twwwitturrr. for easier accessibility for people with learning difficulties or with physical disabilities. for people who dislike javascript. for people who love nothing _but_ javascript (GWT, Pyjamas, Web 2.0 lovers etc). for the OLPC project. etc. Lkcl 20:37, 30 September 2009 (UTC)Reply

Ahh, I wouldn't agree motivations of 'Distribute Wikipedia' and 'Distribute Infrastructure' are exactly the same things, and though they may have some similarities I'd definitely keep them separated for clarity sake - not differentiating between content distribution and infrastructure distribution would just add to confusion IMHO. Also, the various API proposals have a whole subchapter of their own: Call_for_proposals#API 22:48, 30 September 2009 (UTC) gstReply

Google App Engine

One might be able to run parts of the system on Google App Engine (or another cloud platform) on demand. --Fasten 12:45, 13 November 2009 (UTC)Reply

See also: Proposal talk:Distributed backup of Wikimedia content#Distributed filesystems

Symbiosis with centralized architecture

Seems like developing a distirbuted infrastructure will be large task both technically and "politically". What about starting it as a Wikimedia Labs project (under labs.wikimedia.org) and seeing how it can perform? We would see what is the interest and how the project would be doing both on developers and users.--Kozuch 12:45, 1 December 2009 (UTC)Reply

If you fail to plan you plan to fail. --Fasten 09:59, 2 December 2009 (UTC)Reply