Jump to content

Task force/Offline/IRC/2009-12-15

From Strategic Planning

Philippe|Wiki checks clock. Do I have the time wrong?

[12:02pm] Amgine_: No, I don't think so.
[12:02pm] Amgine_: At least, that's why I'm here, and wizzy.
[12:03pm] Philippe|Wiki: Well, OK, then, I'll get more hot apple cider.
[12:03pm] Amgine_: Mmm, when you get back tell me, so I can go get more coffee.
[12:03pm] Philippe|Wiki: back.
[12:04pm] Amgine_: kk, my turn.
[12:05pm] walkerma joined the chat room.
[12:05pm] Philippe|Wiki: hey walkerma
[12:05pm] walkerma: Hi, sorry I'm a couple of minutes late!
[12:05pm] Philippe|Wiki: 'sokay. Amgine's gone for caffeine, he'll be right back.
[12:05pm] walkerma: Heiko can't make it today.
[12:05pm] Philippe|Wiki: and wizzy's on the phone
[12:06pm] walkerma: OK - we know how South Africans love their phones (and hair)!
[12:06pm] walkerma: I'll ping BozMo
[12:06pm] • Philippe|Wiki sits back to let walkerma take the heat for that one.
[12:06pm] Amgine_: back
[12:06pm] Philippe|Wiki: wb, amgine'
[12:07pm] wizzy: walkerma: not really here for another 30 mins
[12:07pm] Amgine_: I'm getting roundly ignored in -tech on the request for a talking channel on irc.wikimedia.
[12:07pm] Amgine_: So, content...
[12:07pm] Philippe|Wiki: Well, let's go with this and I'll add it to my list of stuff to do
[12:07pm] Amgine_: No, first, wikipock - Walkerma?
[12:08pm] walkerma: OK wizzy, I understand.
[12:08pm] walkerma: OK, briefly - I had a great chat with the founder of WikiPock
[12:09pm] Amgine_: <chuckles> Yep, that was brief...
[12:09pm] walkerma: He is very interested in working with us to produce useful new collections. I don't think he'd thought much about developing countries before, though
[12:10pm] Amgine_: Is WikiPock a 100% collection?
[12:10pm] walkerma: He thought that they would probably put together a memory card containing a collection of articles etc
[12:10pm] _schiste_ left the chat room. (Connection timed out)
[12:10pm] walkerma: Yes it is the complete English WP, and I think they throw in pt and es as well
[12:11pm] wizzy: i like the idea of downloading 'packs' - my interests
[12:11pm] walkerma: But it's missing the info boxes and all pictures
[12:11pm] Philippe|Wiki: tables render?
[12:11pm] walkerma: Some points:
[12:11pm] walkerma: 1. Things like tables and infoboxes will be included in Version 2, which will come out in the New Year
[12:13pm] walkerma: Version 2 will be faster, smaller, and include all the stuff that's missing right now. Anyway, some of their releases already contain tables etc - it's just that the USB key version that I have didn't, as it was put together very quickly for Kul as a proof of concept
[12:13pm] Philippe|Wiki: m'kay, thanks.
[12:14pm] walkerma: 2. They are interested in more customised collections - such as smaller collections with pictures, aimed at kids, or bigger collections where the top 30,000 articles might include some pictures - that sort of thing
[12:14pm] Amgine_: Two quick questions: are they working from dumps? for the infoboxen: are they parsing externally?
[12:15pm] walkerma: 3. They are also interested in the categorisation/organisation initiative we talked about (UDC, indexing, etc)
[12:15pm] walkerma: 4. He will join us on IRC next Tuesday if you're OK with it
[12:15pm] walkerma: That's it
[12:15pm] Amgine_: Very okay.
[12:16pm] walkerma: Angine: Yes, he said they work from the standard XML dump (I didn't think it was standard XML, but I'm pretty ignorant on such things!)
[12:16pm] Amgine_: It's standard xml, but the MW DTD is very shallow (doesn't have much nuanced data)
[12:17pm] Philippe|Wiki: I'm here, but i'm trying to pin werdna to the wall on fixing a couple of bugs... so I'm watching with half an eye.
[12:17pm] walkerma: Amgine: FOr infoboxes, he said they render just fine, it's just a bit fiddly - but he didn't give me the details
[12:17pm] Amgine_: <grins @ Philippe|Wiki> Werdna's here, too, probably watching with less than half an eye.
[12:17pm] Amgine_: (except when we mention Andrews)
[12:18pm] Amgine_: Will grill him next week, walkerma.
[12:18pm] Philippe|Wiki: walkerma: can someone please explain to me (in words of two syllables or less, preferably) why tables are such a big deal? Why are they hard? Is it the parser, or what?
[12:18pm] Amgine_: It's the parser.
[12:18pm] Philippe|Wiki: that's what i needed
[12:18pm] Philippe|Wiki: thanks
[12:19pm] walkerma: And most infoboxes are done as tables, right?
[12:19pm] Philippe|Wiki: yup
[12:19pm] Philippe|Wiki: LOTS of stuff is done as tables. Including whole articles (list of XYZ...)
[12:19pm] Amgine_: Tables *plus* extensive parser functions.
[12:19pm] walkerma: :[:[List of South American countries]]
[12:19pm] Philippe|Wiki: (btw, i loved it when ^demon looked me in the eye and said "forget everything you know about parsing. It's not a parser. It's a cruncher.")
[12:20pm] Amgine_: Speaking of parser, I interviewed two more developers, and sent out two more e-mail interviews.
[12:20pm] Amgine_: Yep, that's true.
[12:20pm] werdna: Amgine_: hmmmm?
[12:20pm] Amgine_: The #1 item from all interviews so far: parsing templates - and release a parser specification.
[12:21pm] Amgine_: (so that they can parse templates, etc.)
[12:21pm] Philippe|Wiki: (do we HAVE a parser specification?)
[12:21pm] Amgine_: No.
[12:22pm] Philippe|Wiki: that's gonna make it hard to release one.
[12:22pm] Amgine_: The parser accretes and morphs, and is very non-standardized, and it's amazing that MW continues to work so well.
[12:22pm] Amgine_: But, in part, that ends up costing in developers/development time.
[12:22pm] Philippe|Wiki: nods
[12:23pm] Philippe|Wiki: Amgine, I'm talking to eekim after office hours today... if I haven't gotten an answer for you on limesurvey before then, i'll do it then, i hope.
[12:24pm] Amgine_: I think I have that mentioned in the survey I created. I have about 8-10 project managers/devs who are willing to take the survey.
[12:24pm] Amgine_: With Hejk, Walkerma, Wizzy, I think we can probably get at least 10-20 more.
[12:25pm] walkerma left the chat room. (lindbohm.freenode.net irc.freenode.net)
[12:25pm] Philippe|Wiki: oops, there goes walkerma
[12:25pm] walkerma joined the chat room.
[12:25pm] Amgine_: That'll give us a small n for quantitative, but some good qualitative content to work with.
[12:26pm] Philippe|Wiki: 30 isn't a terrible n; you can build a decent survey off that if you appropriately present it. Yours looks good to me (i'd prefer a fewer possible answers on one of the questions, but that's splitting hairs)
[12:27pm] walkerma left the chat room. ("ChatZilla 0.9.85 :[Firefox 3.0.15/2009101601]")
[12:27pm] Amgine_: <grin> I think it averages 3.5 questions per survey.
[12:27pm] walkerma joined the chat room.
[12:27pm] Amgine_: I think that's brief enough to get developers to really answer things.
[12:27pm] Philippe|Wiki: survey design is fun.
[12:28pm] Amgine_: blah... I haven't brought it to our statisticians. They'll probably hate on me for the question designs.
[12:29pm] Amgine_: But I think that's for v2.0 questionaire, when we have a better idea of what are the real issues people are facing.
[12:29pm] Philippe|Wiki: Yeah, let's not get too tactical, please.... higher level = better for now. We'll have a chance to chart migration paths and such later
[12:29pm] wizzy: The main problem with wikipedia is the dumps / collections are so huge
[12:29pm] wizzy: It would be nice to mix and match - africa + mathematics
[12:29pm] wizzy: but that means a reader must look in a dir, and the articles won't link across collections
[12:29pm] Amgine_: Anyway, Walkerma? are you stable now?
[12:30pm] Philippe|Wiki: wizzy: Are you assuming doing that mix and match by.... what, categories?
[12:30pm] wizzy: so I guess we are stuck with what *we* think they need
[12:30pm] Amgine_: wizzy: one of the developers I'm talking with has a FF tool to work with remote archived dumps.
[12:30pm] Amgine_: So you don't have to download the dump at all.
[12:31pm] Amgine_: I'm also working on a dump query tool that does the same thing, only it's a class so it can be dropped into your own php scripts.
[12:32pm] Huib left the chat room. (lindbohm.freenode.net irc.freenode.net)
[12:32pm] wizzy left the chat room. (lindbohm.freenode.net irc.freenode.net)
[12:32pm] mark left the chat room. (lindbohm.freenode.net irc.freenode.net)
[12:32pm] FT2 left the chat room. (lindbohm.freenode.net irc.freenode.net)
[12:32pm] millosh left the chat room. (lindbohm.freenode.net irc.freenode.net)
[12:32pm] schiste left the chat room. (lindbohm.freenode.net irc.freenode.net)
[12:32pm] Amgine_: werdna: irc://irc.wikimedia.org/$lang.$site
[12:33pm] Philippe|Wiki: d'oh, another netsplit?
[12:33pm] Amgine_: <sighs>
[12:33pm] Amgine_: So, it's really just you and I Philippe|Wiki.
[12:33pm] Philippe|Wiki: yeah, looks like others are on lindbohm.
[12:33pm] Philippe|Wiki: a lot of 'em anyway
[12:34pm] Amgine_: Yes. So, let me give you a brief dump about en.Wiktionary, and probably wiktionaries in general.
[12:35pm] Philippe|Wiki: sweet, i need this.
[12:35pm] Amgine_: The content is fairly structured, so it would be a very good test case for semantic tags.
[12:35pm] Philippe|Wiki: hmm, good point.
[12:35pm] werdna: Amgine_: what about it?
[12:35pm] Amgine_: This content set is also the most likely to be used by third parties.
[12:36pm] Amgine_: It's an actual channel. Could it be voiced please?
[12:37pm] Amgine_: Philippe|Wiki: currently most linux installations can use DICT servers to power their spellcheckers; Wiktionary data could be parsed to do that very easily.
[12:37pm] • Philippe|Wiki nods
[12:37pm] Philippe|Wiki: so what's stopping it?
[12:38pm] Amgine_: The biggest problem is we need a nuanced structured data output, in the DICT format.
[12:38pm] Amgine_: So that we don't include, for example, Hindi terms in what should be an english-only dict.
[12:39pm] Huib joined the chat room.
[12:39pm] schiste joined the chat room.
[12:39pm] wizzy joined the chat room.
[12:39pm] millosh joined the chat room.
[12:39pm] mark joined the chat room.
[12:39pm] FT2 joined the chat room.
[12:39pm] Philippe|Wiki: okay, and theoretically have the ability to join them to a couple if necessary (I think of "Cinquo De Mayo" or something similar which might be in more than one)
[12:40pm] walkerma: OK, I'm back again!
[12:40pm] Amgine_: <nods> Exactly. So User:Hippietrail is currently working on a parser for the en.Wiktionary which creates a database with treed content.
[12:40pm] Amgine_: Hullo Walkerma!
[12:40pm] Philippe|Wiki: wb, walkerma
[12:40pm] Amgine_: The benefit is the data is then readily queried, so any kind of output can be created.
[12:41pm] gerardm- joined the chat room.
[12:41pm] Philippe|Wiki: okay, i follow.
[12:41pm] Amgine_: The problem is in two parts: each Wiktionary language would require a custom parsing system to retrieve the nuanced data.
[12:41pm] Amgine_: And this should be done by WMF, not a user on his netbook/Toolserver account.
[12:42pm] Amgine_: (and, btw, that's apparently the only computer the user has...)
[12:42pm] _sj_ joined the chat room.
[12:42pm] Amgine_: Hullo _SJ_
[12:42pm] walkerma: Are you discussing possible use of Wiktionary as spellchecker, and perhaps used inside WP articles too?
[12:43pm] walkerma: Hi _sj_ !
[12:43pm] Amgine_: That would be very cool, but mostly we're talking about making the content available in structured forms for re-use.
[12:43pm] Philippe|Wiki: OK, so someone take me through the path here.... what needs to be done next for this task force to emerge to recommendations of some type? it sounds like you're beginning to solidify around them.
[12:44pm] walkerma: I think we're working through a checklist - one item per meeting
[12:45pm] Amgine_: So far I see three recommendations: reduce hurdles to content re-use, add developers/hardware to help projects create products for reuse, and focus on cellphones as a high-value platform for offline delivery.
[12:45pm] walkerma: Hopefully by the end of today we have a consensus on what to recommend re content
[12:45pm] wizzy: *blink*
[12:46pm] walkerma: At least a start, anyway!
[12:46pm] Philippe|Wiki: Amgine_: OK, I'm happy with those, if you all are. Obviously, there's more work to be done on specificity and a path to them and such... but those give me a high-level idea of the output. Good.
[12:47pm] Amgine_: In case it got missed in the netsplits and so on, I have built a basic survey to be given to content-reusers asking for their input on what they see as issues/problems, and what they'd like us to do about 'em.
[12:48pm] Amgine_: Hopefully we will be able to get that survey up on a WMF server soon, and we can send it to people we want to take it.
[12:48pm] GerardM- left the chat room. (lindbohm.freenode.net irc.freenode.net)
[12:48pm] wizzy left the chat room. (lindbohm.freenode.net irc.freenode.net)
[12:48pm] mark left the chat room. (lindbohm.freenode.net irc.freenode.net)
[12:48pm] FT2 left the chat room. (lindbohm.freenode.net irc.freenode.net)
[12:48pm] millosh left the chat room. (lindbohm.freenode.net irc.freenode.net)
[12:48pm] schiste left the chat room. (lindbohm.freenode.net irc.freenode.net)
[12:48pm] Huib left the chat room. (lindbohm.freenode.net irc.freenode.net)
[12:49pm] Amgine_: <sighs>
[12:49pm] Philippe|Wiki: lol
[12:49pm] Philippe|Wiki: beautifully timed
[12:49pm] Amgine_: Walkerma: if you are still here: http://strategy.wikimedia.org/wiki/Task_force/Recommendations/Offline_1
[12:49pm] walkerma: wizzy: Do you have any to add?
[12:50pm] Amgine_: I'm also chatting with the developers of limesurvey about some issues I discovered.
[12:50pm] Amgine_: Wizzy got netsplitted, Walkerma.
[12:50pm] walkerma: OK
[12:51pm] Amgine_: Wave after wave. Someone is really being a dick.
[12:51pm] walkerma: Is this some kind of vandalism/attack, then?
[12:51pm] peteforsyth joined the chat room.
[12:51pm] Philippe|Wiki: hey peteforsyth
[12:51pm] Amgine_: walkerma: Yes.
[12:52pm] walkerma: Well, Amgine, could you tell us about your ideas on content
[12:52pm] Amgine_: Basically, someone has captured a bunch of zombie/virus-infected computers, and is trying to over-power the servers freenode has with useless complaints.
[12:52pm] walkerma: Some people have pretty empty lives, don't they...!
[12:53pm] peteforsyth: hey Philippe|Wiki !
[12:53pm] GerardM- joined the chat room.
[12:53pm] Huib joined the chat room.
[12:53pm] schiste joined the chat room.
[12:53pm] wizzy joined the chat room.
[12:53pm] millosh joined the chat room.
[12:53pm] mark joined the chat room.
[12:53pm] FT2 joined the chat room.
[12:53pm] peteforsyth: Just "landed'" in SF
[12:53pm] Philippe|Wiki: hey peteforsyth: congrats. I'll be there the first couple weeks of January, look forward to catching up with you.
[12:53pm] Amgine_: WMF content is loosely structured: Wikipedia articles all have an introduction, usually have an infobox, and usually have references/bibliography for example.
[12:54pm] peteforsyth: That'll be great!
[12:54pm] Amgine_: Wiktionary is more structured: entries usually have language headers, parts of speech, definitions, etc.
[12:54pm] Amgine_: Each of these "structures" could be marked by machine-readable code, making the content easier to parse.
[12:54pm] walkerma: Yes, I see
[12:55pm] Amgine_: Dumps of content currently do not expand templates, which is the number one problem for data reuse.
[12:55pm] peteforsyth: Amgine: sounds like more work. When do we get to go to the beach?
[12:56pm] walkerma: Snowing hard here in upstate New York, the beach might be a bit cold
[12:56pm] Philippe|Wiki: peteforsyth: Eugene took me last time I was in SF... Whimper and whine about not being able to think in the office, and he usually caves.
[12:56pm] Amgine_: Since there is no 3rd party parser available, it is an extremely difficult task to either import the dump to a local installation and use an api to expand the templates, or to build a custom but not-complete parser to manage just the template expansions.
[12:57pm] peteforsyth: walkerma: where are you? I was just in Syracuse and Lake George area. (thankfully, before the snow came.)
[12:57pm] Philippe|Wiki: So, Amgine_ Do we have any idea of the level of effort involved in streamlining the dumps?
[12:57pm] Amgine_: Moderate. One suggestion from the m.wikipedia people is another API which serves html parsed content.
[12:57pm] Philippe|Wiki: hmm, interesting. OK.
[12:57pm] walkerma: I'm in http://en.wikipedia.org/wiki/Potsdam_(village),_New_York
[12:58pm] walkerma: Amgine, that sounds good
[12:58pm] peteforsyth: Judging by my experience with the WikiReader, I agree -- lack of templates is a pretty big drawback.
[12:58pm] Philippe|Wiki: yep.
[12:58pm] Philippe|Wiki: Tables and Templates. That's what we keep coming back to.
[12:58pm] Amgine_: Well, if there were a parser specification, other developers could build parsers in other languages - that would offload the development/maintenance cost from WMF to 3rd parties.
[12:59pm] Amgine_: So that is the prefered solution.
[12:59pm] walkerma: Well, if we could move on from that - What should we include besides WP and Wiktionary? I'd say WikiQuote, for sure
[1:00pm] Philippe|Wiki: Wikinews?
[1:00pm] Philippe|Wiki: I'm thinking archivally
[1:00pm] walkerma: Or should all WMF content be possibitilies?
[1:00pm] Amgine_: The less-favourable options are: create many varieties of dumps to suit requests, or add developers to service project-specific development.
[1:00pm] walkerma: Philippe - I hadn't thought of it that way
[1:00pm] walkerma: Good idea
[1:01pm] walkerma: wizzy: What would people in RSA like to see, besides WP?
[1:01pm] Amgine_: Actually, the project I'd most like to see being given love is Wikisource - specifically to create processes for importing public court records.
[1:02pm] Philippe|Wiki: oooh, fascinating.
[1:02pm] wizzy: walkerma: I don't think they know. Vital Articles is a great start
[1:02pm] Amgine_: <imagines Wikisource as the free equivalent of westlaw.>
[1:02pm] peteforsyth: wow walkerma , you're way up there -- looks like a suburb of Montreal!
[1:03pm] walkerma: Some things like Wiktionary are probably quite small in terms of memory
[1:03pm] walkerma: Suburb of Ottawa, more like!
[1:03pm] peteforsyth: Amgine: that's a great vision. I have been talking with law professors about that. They're excited too.
[1:03pm] wizzy: I guess it is colored by my impressions - I want WP
[1:03pm] Amgine_: Not really. There are 1.5 million articles, and each "spelling" may appear in any number of languages.
[1:04pm] Amgine_: I believe there are 2000 different languages represented at least once in en.Wiktionary.
[1:04pm] Amgine_: And english is not the largest.
[1:04pm] walkerma: As I was saying before the netsplit, WikiPock throws in Wiktionary and Wikiquote for free
[1:04pm] peteforsyth: walkerma: Ah, OK. I rode my bicycle through Kingston and Watertown some years back -- nice ferry. Beautiful area.
[1:05pm] Amgine_: <nods> That's very cool, Walkerma.
[1:06pm] Amgine_: I wish Wiktionary was on a different software using structured data, but it isn't, so we have to work with what there is.
[1:06pm] Philippe|Wiki: Yeah, agreed.
[1:06pm] Philippe|Wiki: How hard is it to lay a data model on top of mediawiki? I'm thinking a custom extension or something?
[1:06pm] walkerma: I was also wondering about the feasibility of using non-WMF content that is done in Wikimedia software - things like WikiHow, WikiTravel, WikiEducator
[1:07pm] Amgine_: Philippe|Wiki: unfortunately, each language/project is very non-regular. So even though ru.Wiktionary has the same *type* of information as en.Wiktionary, it would require a competely separate data model to parse.
[1:08pm] Philippe|Wiki: ah
[1:08pm] Philippe|Wiki: okay
[1:08pm] walkerma: Aren't you really talking about Semantic Mediawiki, Philippe?
[1:08pm] Philippe|Wiki: I don't know. Am I?
[1:08pm] Amgine_: A fourth recommendation could be "create standard formats accross projects"
[1:08pm] GerardM-: Philippe|Wiki: have you ever looked at OmegaWiki ?
[1:08pm] GerardM-: it is hard and it uses a custom extension
[1:08pm] walkerma: Amgine: Can you elaborate?
[1:08pm] GerardM-: and Amgine, OmegaWiki could use Wiktionary data ...
[1:08pm] GerardM-: <grin> as you know
[1:08pm] Amgine_: <grin> Yep, I know.
[1:09pm] Amgine_: Ask me again after Jan 12.
[1:09pm] Philippe|Wiki: GerardM-: I have looked. But I am not qualified to judge it.
[1:09pm] walkerma: http://semantic-mediawiki.org/wiki/Semantic_MediaWiki
[1:09pm] Amgine_: That model adds another parser requirement for data re-users.
[1:09pm] Amgine_: If it is implemented, every current re-user will have their custom tools broken.
[1:10pm] Amgine_: Not necessarily bad, just something to keep in mind.
[1:10pm] DragonFire1024 is now known as DragonFire_aw.
[1:11pm] wizzy left the chat room. (lindbohm.freenode.net irc.freenode.net)
[1:11pm] mark left the chat room. (lindbohm.freenode.net irc.freenode.net)
[1:11pm] GerardM- left the chat room. (lindbohm.freenode.net irc.freenode.net)
[1:11pm] FT2 left the chat room. (lindbohm.freenode.net irc.freenode.net)
[1:11pm] millosh left the chat room. (lindbohm.freenode.net irc.freenode.net)
[1:11pm] schiste left the chat room. (lindbohm.freenode.net irc.freenode.net)
[1:11pm] Huib left the chat room. (lindbohm.freenode.net irc.freenode.net)
[1:11pm] walkerma: Any thoughts on non-Mediawiki content?
[1:11pm] Amgine_: walkerma: elaborating: if all Wiktionaries used the *same* format of ==Language== ===Etymology=== ===Part of speech=== # definitions
[1:12pm] Amgine_: By having certain things the same across the languages, then a single parser can extract the metadata.
[1:12pm] Philippe|Wiki: afk for a sec
[1:14pm] Amgine_: On Wikipedia, for example, if all articles had a no-more-than 248 character section 0, a more in-depth section 1 introduction, and the L2 sections were standardized (like ==History==, ==References==, ==See also==, etc.)
[1:15pm] Philippe|Wiki: i'm back
[1:15pm] Amgine_: These kinds of standardizations across the whole project, so on de.Wikipedia and every other language, cause the data to be 'regular' and much much easier to parse.
[1:16pm] Amgine_: I don't think anyone is actually hearing me, Philippe|Wiki, so I'm mostly typing for the log.
[1:17pm] Amgine_: <pokes _sj_> I need a name at OLPC who is dealing with Wikipedia installs.
[1:17pm] gerardm- joined the chat room.
[1:17pm] Huib joined the chat room.
[1:17pm] schiste joined the chat room.
[1:17pm] wizzy joined the chat room.
[1:17pm] millosh joined the chat room.
[1:17pm] mark joined the chat room.
[1:17pm] FT2 joined the chat room.
[1:17pm] Philippe|Wiki: heh, the log appreciates it.
[1:17pm] walkerma: Amgine_ I hear you - I think it is an excellent proposal, and one that I think we should take through as a recommendation
[1:17pm] Amgine_: Wizzy: You'll want to read the log when we're done.
[1:18pm] Amgine_: We probably have missed some of your input.
[1:18pm] Amgine_: And you, ours.
[1:18pm] wizzy left the chat room. (lindbohm.freenode.net irc.freenode.net)
[1:18pm] mark left the chat room. (lindbohm.freenode.net irc.freenode.net)
[1:18pm] GerardM- left the chat room. (lindbohm.freenode.net irc.freenode.net)
[1:18pm] FT2 left the chat room. (lindbohm.freenode.net irc.freenode.net)
[1:18pm] millosh left the chat room. (lindbohm.freenode.net irc.freenode.net)
[1:18pm] schiste left the chat room. (lindbohm.freenode.net irc.freenode.net)
[1:18pm] Huib left the chat room. (lindbohm.freenode.net irc.freenode.net)
[1:19pm] Amgine_: walkerma: I have my doubts of being able to suggest L2 standards to en.Wikipedia.
[1:19pm] Amgine_: On the other hand, it might be adopted on de/fr if they negotiate it.
[1:19pm] Amgine_: And once several wikis are willing to regularize, it will be easier to expand the re-use of those wikis.
[1:20pm] Philippe|Wiki: Reusers would hate us if we change structure, wouldn't they?
[1:21pm] GerardM- joined the chat room.
[1:21pm] Huib joined the chat room.
[1:21pm] schiste joined the chat room.
[1:21pm] wizzy joined the chat room.
[1:21pm] millosh joined the chat room.
[1:21pm] mark joined the chat room.
[1:21pm] FT2 joined the chat room.
[1:21pm] walkerma: Amgine: I wonder if someone could write a script that could do the formatting - take the raw WP article or Wiktionary entry, and reformat it into our standard style. Would that be possible?
[1:21pm] Amgine_: The more standardized we get, the more problems it could create if we change. But the reality is that with a wiki such structures will never be perfectly implementd.
[1:22pm] Philippe|Wiki: heh, that's the beauty of wikis, isn't it?
[1:22pm] Amgine_: Walkerma: yes. Another of the developers I'm interviewing runs "AutoFormat" bot on en.Wiktionary, which checks every revision to see if the article edited meets the formatting standards they have created there.
[1:23pm] Amgine_: It manages literally thousands of checks on categories, sorts, header levels, and so on.
[1:23pm] walkerma: So we could have a recommended format for offline releases, complete with XML tags added in, that may look quite different from the original article in terms of just appearance
[1:23pm] walkerma: But our offline re-users could process it and organize it much more effectively
[1:23pm] Amgine_: You could also have all that in the online version.
[1:24pm] Amgine_: And it would look the same as the original.
[1:24pm] walkerma: Amgine - true - if this were 2001 it would be easy to implement that!
[1:24pm] • Philippe|Wiki writes: "Recommendation 2 from offline task force: time travel machine."
[1:25pm] • Philippe|Wiki continues: "assign this one to werdna."
[1:25pm] Amgine_: No, it would be easy to allow the tags via the parser, but it would take a community effort actually insert them.
[1:26pm] Amgine_: For example: <wikipedia:bibliography> could be allowed, and the community would need to add it to the references section.
[1:26pm] walkerma: Perhaps we could get it working just for offline releases first - then you would be able to make a good case for adopting the same standards for the online version?
[1:26pm] Philippe|Wiki: Could any of that be bot-done, or does it require decision making?
[1:26pm] Amgine_: (it would be nice to encourage the development and implementation of a DTD)
[1:27pm] Amgine_: Add the tags could be bot done. The tags could also be added to a parsed dump.
[1:27pm] Amgine_: But both of those should be community initiatives - not part of this task force.
[1:27pm] walkerma: Yes
[1:28pm] • Philippe|Wiki nods. That can be discussed when we talk about a path to implementation of recommendations.
[1:28pm] Amgine_: <nods>
[1:28pm] walkerma: OK, before we break up, I'd like to reiterate my earlier questions - what other content besides WP (+ Wiktionary, WikiSource)?
[1:29pm] walkerma: What should be our priorities in terms of content?
[1:29pm] walkerma: WikiNews? WikiQuote? WikiBooks? WikiVersity? And non-WMF content?
[1:29pm] Philippe|Wiki: Is there a reason not to fully implement?
[1:29pm] Amgine_: Oi, not really.
[1:29pm] Philippe|Wiki: (pardon me, as usual, if it's a dumb question)
[1:30pm] Amgine_: There are a couple of points...
[1:30pm] Philippe|Wiki: i mean, i get that we'd need a test bed, but we could use testwiki for that... make sure it works... then roll out in phases, yes?
[1:30pm] Huib is now known as Huib|Goats.
[1:30pm] Amgine_: As far as implementation, are we talking about the structures/semantics?
[1:31pm] Philippe|Wiki: <shrug>
[1:31pm] Philippe|Wiki: you tell me
[1:31pm] Amgine_: Because, if so, and we need to both create project-wide standards and an initial parser, we should focus on Wiktionary as the most-structured/easiest to parse first.
[1:32pm] Amgine_: (although WP is a more desirable target - it's bigger, more contentious, and harder.)
[1:32pm] walkerma: I think I will recommend that we focus on Wiktionary as our no. 2 after WP, then consider taking certain WikiBooks for no. 3. WikiQuote would be probably quite small, though a bit disorganised, but it could probably be small enough to include without too much trouble
[1:33pm] walkerma: I think WikiBooks is patchy - there are a few real gems but you have to look for them, among a lot of partly-done efforts
[1:33pm] Philippe|Wiki: Is the idea that what works for WP will work consistently? My experience is that WP often requires custom EVERYTHING because of its monster scale.
[1:33pm] Philippe|Wiki: Where sometimes the smaller projects can do very well with a one-size-fits-most type of solution.
[1:34pm] Philippe|Wiki: if that's the case, I'd advocate one-size-fits-most first, and THEN wp
[1:34pm] Amgine_: Yes, that's my concern as well. Also, it's the most complex in article styles.
[1:34pm] walkerma: Philippe - that's one question I'd like to consider - I think each type content may have specific needs. For example the News content is rather different from Wiktionary
[1:34pm] Amgine_: On Wiktionary, for example, everything is (supposed to be) one type of article, with only a small set of possible headers.
[1:35pm] walkerma: WikiBooks is really a few large items rather than lots of small ones
[1:35pm] Philippe|Wiki: The other (frankly political) consideration is that doing one-size-fits-most first gives us a chance to show the other projects some love.
[1:35pm] walkerma: Amgine_ : That sort of standardisation should really help us, as long as it's mostly adhered to
[1:36pm] walkerma: Philippe: Yes, the political aspect is a significant one
[1:36pm] Amgine_: It's about 98% of the entries, according to people who are working with the data.
[1:36pm] Philippe|Wiki: but i'm putting the cart before the horse, I think
[1:36pm] Amgine_: It's also, luckily, mostly done by :[:[wikt:en:User:Hippietrail]].
[1:36pm] wizzy left the chat room. (lindbohm.freenode.net irc.freenode.net)
[1:36pm] mark left the chat room. (lindbohm.freenode.net irc.freenode.net)
[1:36pm] GerardM- left the chat room. (lindbohm.freenode.net irc.freenode.net)
[1:36pm] FT2 left the chat room. (lindbohm.freenode.net irc.freenode.net)
[1:36pm] millosh left the chat room. (lindbohm.freenode.net irc.freenode.net)
[1:36pm] schiste left the chat room. (lindbohm.freenode.net irc.freenode.net)
[1:36pm] Huib|Goats left the chat room. (lindbohm.freenode.net irc.freenode.net)
[1:36pm] Philippe|Wiki: here we go again
[1:36pm] Philippe|Wiki: poor lindbohm
[1:37pm] Amgine_: <chuckles> I see it as about once every 8-10 minutes.
[1:37pm] Amgine_: So, the zombies are probably set to all work in a wave.
[1:38pm] Philippe|Wiki: yep
[1:38pm] Amgine_: Anyway, if we can talk Andrew into helping on this, I think we could have results by the end of January.
[1:39pm] Amgine_: Possibly sooner.
[1:39pm] Amgine_: Is there any way we can contract him on the project?
[1:39pm] Philippe|Wiki: Andrew is pretty heavily resourced at the moment... but I can beg for his time when we get there
[1:39pm] Philippe|Wiki: by Andrew, you mean werdna?
[1:39pm] Philippe|Wiki: or another andrew?
[1:39pm] walkerma: Which Andrew is that?
[1:39pm] Amgine_: Different Andrew.
[1:39pm] Amgine_: But also from Australia.
[1:39pm] Philippe|Wiki: ahhh, okay
[1:40pm] Philippe|Wiki: I dunno, that's tech budget, and I'd have to check. I fear budget requests.
[1:40pm] walkerma: Amgine: Could you contact HippieTrail and see what the Wiktionary people might think about offline releases?
[1:40pm] Amgine_: <nods> I could get a project estimate from him in hours.
[1:40pm] Amgine_: <grin> I've been talking with 3 developers there (each on projects not directly related to each other)
[1:41pm] Philippe|Wiki: We'll need something ballpark eventually, but not yet Let's focus on writing the recommendations up on a high level first and deal with nitty gritty after
[1:41pm] walkerma: I've done a little on WikiNews and WikiQuote, I'm willing to talk to them about making offline content from those
[1:41pm] Amgine_: You could say I have the english portion of that project pretty saturated.
[1:41pm] peteforsyth left the chat room.
[1:41pm] Amgine_: <nods> I helped establish Wikinews, and still have good contacts there as well.
[1:41pm] Kelson joined the chat room.
[1:41pm] GerardM- joined the chat room.
[1:41pm] Huib|Goats joined the chat room.
[1:41pm] schiste joined the chat room.
[1:41pm] wizzy joined the chat room.
[1:41pm] millosh joined the chat room.
[1:41pm] mark joined the chat room.
[1:41pm] FT2 joined the chat room.
[1:41pm] Philippe|Wiki: ...and they're back
[1:42pm] walkerma: Amgine: That would be great! I've only written 3-4 articles
[1:42pm] Philippe|Wiki: OK, can I ask that we bring this to a close? We've got office hours shortly
[1:42pm] walkerma: on WikiNews
[1:42pm] Amgine_: Sure we're mostly done I think.
[1:42pm] walkerma: Can I talk to the WikiHow people? I met a few of them in Buenos AIres
[1:42pm] Amgine_: <nods> Yes, please. Tell Jack hello.
[1:43pm] Amgine_: Dvortygirl is also a good contact there for the community, knows who is who.
[1:43pm] walkerma: Great, thanks
[1:43pm] Philippe|Wiki: Dvortygirl is around quite a lot. Nice lady. Met her in B.A.
[1:43pm] Amgine_: <nods> She's heading back to South America this week or next.
[1:44pm] Philippe|Wiki: As one who just got back from there... let her. It's hot and humid.
[1:44pm] Amgine_: heh...
[1:44pm] Philippe|Wiki: BTW... for planning purposes... I am around next week, but I'm out Monday and Tuesday of the following week.
[1:44pm] Philippe|Wiki: But you don't really need me at these things, so don't plan around me.
[1:45pm] Amgine_: <nod> I actually may be on the road next Tuesday, holiday travel, but will try to plan a wifi stop for this meeting.
[1:45pm] Amgine_: Same time, Walkerma?
[1:45pm] walkerma: I will be away with family from around 24 Dec- 1 Jan, but I should have some internet time
[1:45pm] walkerma: I think same time next week will be good for me
[1:45pm] Philippe|Wiki: same time next week works for me
[1:46pm] Philippe|Wiki: Thanks, folks, I'm going to take a couple minutes break and I'll post the log
[1:46pm] walkerma: Though I will be very tired - grades are due by 1500 ITC
[1:46pm] walkerma: UTC
[1:46pm] Amgine_: wizzy: 1800 UTC on next Tuesday?
[1:46pm] walkerma: and I will be probably grading all night as usual!