Proposal talk:Data.wikimedia.org

From Strategic Planning
Jump to navigation Jump to search

I'm fond of this idea. The more data-driven information we can get, the more we can make logical and rational choices about how to proceed in the future. -- Philippe 18:10, 18 August 2009 (UTC)

I think this should be one of the main WMF priorities, but it is more difficult than it seems. Semantic MediaWiki does a very good job with the usual article-property-value triplets (e.g. you can provide an exchangerate parameter in the currency infobox of the USD article, you can tell MediaWiki this means the exchange rate of USD is whatever by putting [[exchange rate::{{{exchangerate}}}]] somewhere in the template, and you can include its value anywhere else by writing {{#show:USD | ?exchange rate}}), but in the real world data tends to be much more complicated: it can change with time, sources can differ on the correct value, different estimations might have different precision and so on. --Tgr 17:13, 20 August 2009 (UTC)

I don't see the point in starting with a link repository instead of actual data, but Semantic MediaWiki can also import from (possibly regularly changing) external files with the External Data subextension. --Tgr 17:19, 20 August 2009 (UTC)

This blog post might also be of interest: http://blog.werdn.us/2009/06/semantic-data-in-wikipedia/ --Tgr 17:21, 20 August 2009 (UTC)

Old discussion about this: m:Wikidata --Tgr 17:23, 20 August 2009 (UTC)

Thank you for referring to this interesting pages, I try to read them on weekend. --Erzbischof 18:21, 20 August 2009 (UTC)
I think we have to be careful about this proposal: don't mix the problem of data storage and call of data. We have first to think to get and store data. Then we can see how access to this data from the different wikis.Snipre 16:46, 21 August 2009 (UTC)

Combine with Proposal:A central repository of all language independent data

Semantic Mediawiki has trouble if a fact has more than three bits.

It can tell you <Berlin> <is capital of> <Germany> but it is awkward to add a start date and an end date to this datum, much less a reference to the source of the info. I suggest we concentrate on making the data in Infoboxes machine parseable and translateable by moving it to a separate data store, as described in this proposal, then transclude it back into every wiki. See my comment on Proposal:A central repository of all language independent data.Filceolaire 15:31, 24 August 2009 (UTC)

Combine them yourself, be bold! We have a whole Category:Proposals for data-related features. Nemo 15:59, 24 August 2009 (UTC)

The Semantic Internal Objects extension can supposedly handle n-ary relations. --Tgr 18:10, 24 August 2009 (UTC)

Impact?

Some proposals will have massive impact on end-users, including non-editors. Some will have minimal impact. What will be the impact of this proposal on our end-users? -- Philippe 00:07, 3 September 2009 (UTC)

existing implementations

Hello,

This is so far the best idea I have seen. en:User:Erik_Möller already raised the issue back in 2004 or 2005, it was by an email in wikitech-l about metadata. At that time we added template support in MediaWiki, and Erik was wondering if the data could actually be saved in database to let us queries articles by metadata, update any data shared by many articles in a single mouse click. Imagine querying for "profession=actor" and "nation=usa" ? Probably easier than manually categorizing everything.

There are two website already doing this, they might be worth a look. Maybe we can even "ally" with them to bring their data in WikiMedia :

http://www.swivel.com/ 
this let you describe and upload data series very easily. The serie is then available to anyone to build graphs from. You can even mix data series and build graphs using premade templates. I invite you to at least have a look at the main page.
http://www.freebase.com/ 
it is exactly our template system, just being better and easier to query. The idea is to fill informations about a subject with terms ( key = values ).

As for the data storing, it is probably best to avoid a relational database such as MySQL. Facebook and twitter use new data stores. Facebook opensourced its key-value store ( http://incubator.apache.org/cassandra/ ), twitter use RInda (a tuple space implementation in Ruby http://en.wikipedia.org/wiki/Tuple_space ). Maybe they can even help us in data modelling and setting it up.

--Hashar 17:42, 22 September 2009 (UTC)

Relationship to Proposal:Move to an OpenURL-type mechanism for linking to sources / references

Proposal:Move to an OpenURL-type mechanism for linking to sources / references is a proposal to move towards metadata-driven references rather than URL-driven references. Doing so requires a data store for references which could easily be part of any data.wikipedia.org. Stuartyeates 00:04, 26 September 2009 (UTC)