Jump to content

Task force/Offline/IRC/2009-12-10

From Strategic Planning

StrategyBot joined the chat room.

[11:02am] walkerma: Hi! Should we get started on the Offline meeting?
[11:02am] Amgine: Yes
[11:02am] walkerma: (The meeting about offline content)
[11:02am] Amgine: Will talk l8r werdna
[11:02am] Amgine: You first walkerma
[11:02am] Amgine: <waves again @ hejko>
[11:03am] hejko: hi, yes let's start
[11:03am] walkerma: Hi hejko! Since BozMo and Wizzy aren't here, I suggest we have a meeting to discuss various general topics. Is that OK?
[11:03am] Amgine: kk
[11:04am] hejko: ok
[11:04am] walkerma: I'd also like to get people's thoughts on topics for future meetings - I put a list of possible things here: http://strategy.wikimedia.org/wiki/Task_force/Offline/IRC#Possible_discussion_topics
[11:04am] walkerma: Do you have another specific topic that we need to consider?
[11:05am] walkerma: Do you have a preference about the ones we have listed so far?
[11:05am] sj| joined the chat room.
[11:05am] Amgine: is reading 'em
[11:05am] walkerma: Hi sj|
[11:06am] sj|: hi martin!
[11:06am] hejko: hi sj!
[11:06am] randmontoya left the chat room. (Read error: 145 (Connection timed out))
[11:06am] randmontoya_ is now known as randmontoya.
[11:06am] Amgine: First point that leaps out at me: all of that is Wikipedia-exclusive. Not a problem, just wondering if we should keep that focus or diffuse and include sister project content?
[11:07am] sj|: +1
[11:07am] Amgine: SJ|: http://strategy.wikimedia.org/wiki/Task_force/Offline/IRC#Possible_discussion_topics
[11:08am] • Philippe|Wiki waves at SJ too
[11:10am] walkerma: Amgine: That's why I put "Offline content" on the list, to discuss things besides WP - is that one your favourite for a coming agenda topic?
[11:10am] walkerma: Sorry, I meant to say "What content" - the #4 topic on the list
[11:10am] Philippe|Wiki: It's a fav for me I'd like to see it on the agenda.
[11:11am] hejko: I would like to discuss how we continue to work towards recommendations for the WMF.
[11:12am] walkerma: My examples are often taken from WP, but that doesn't mean we only need consider WP content!
[11:12am] Amgine: Walkerma: Yes, although it probably devolves to the same questions about data dumps.
[11:13am] Amgine: Hejko: I've continued work on two recommendations.
[11:13am] hejko: As far as I understood it we shall propose recommendations that are backed by statistics and interviews. I am not sure whether a detailed plan how to execute those recommendations is within the scope of our mandate.
[11:13am] walkerma: hejko: Could you draft a sentence or two to explain the specifics of how you see that issue?
[11:14am] hejko: Here http://strategy.wikimedia.org/wiki/Template:Recommendations it says that we should start early drafting recommendations.
[11:14am] Amgine: http://strategy.wikimedia.org/wiki/Task_force/Recommendations/Offline_2 and http://strategy.wikimedia.org/wiki/Task_force/Recommendations/Offline_1
[11:15am] Philippe|Wiki: I would also say that I'm not terribly concerned about this group coming to recommendations. I like the idea of coming to convergence, but I think you folks are doing fine - don't worry about the "early" statement.
[11:15am] hejko: Ah, good.
[11:15am] hejko: Offline 2 is not yet linked in the side-box
[11:15am] Amgine: oops.
[11:15am] • Philippe|Wiki fixes that
[11:16am] aude left the chat room. ("ChatZilla 0.9.85 :[Firefox 3.5.5/20091102152451]")
[11:16am] walkerma: I should mention - I'm in my last week of classes right now, with final exams next week and grading after that. I have two deadlines of December 21st; after that, I will have MUCH more time to focus on this work
[11:16am] Amgine: btw, I was e-mailing with Hampton Catlin, who is doing a lot of the development at http://m.wikipedia.org
[11:17am] hejko: Philippe|Wiki: any news from Sarah regarding a contact from the mobile phone industry?
[11:17am] Amgine: He had a suggestion that really seemed to feed into the Zim concept: an API which serves html-parsed content akin to the current API which serves wikisyntax.
[11:17am] Philippe|Wiki: hejko: I'll check with her today; I haven't heard anything
[11:18am] hejko: Philippe|Wiki: what about the "Please have recommendations complete by January 12, 2010." statement. Is this a fixed deadline to have our proposals considered by the WMF or is it more relaxed?
[11:19am] hejko: Because I think over the next 2-3 weeks there won't happen too much.
[11:19am] Philippe|Wiki: Both. It's the date by which we really need something prepared to take to the Board, but this project will be a continual evolution after that. The Board will evaluate and make recommendations and tweak and massage these ideas and will, I have no doubt, ask you to continue to work.
[11:20am] Amgine: I'm going to be available and working on this until Solstice eve, after that will hopefully be barely online until near Greg New Year's
[11:20am] walkerma: I've been in touch with WikiPock and Tomasz
[11:20am] walkerma: No real info yet
[11:20am] Amgine: Excellent!
[11:21am] Amgine: Not so excellent, but not unexpected.
[11:22am] Amgine: Should I post the e-mails I've been getting, or briefs about the e-mail threads?
[11:23am] walkerma: Amgine: I tend to treat emails as personal information, so I don't like to put them out there in public, unless the other person specifically says it's OK to do so. But some briefs would be very useful
[11:24am] Amgine: <nods>
[11:24am] hejko: Amgine: and please add some context
[11:24am] Philippe|Wiki: Likewise, I would lean toward summarizing them
[11:24am] Amgine: Will do, though I'm probably terrible at it.
[11:25am] Amgine: Okay, synopsis of my work this past week: Did research on getting copies of the full UDC (68k categories) - approximately $430 US, and no copies currently available in my town.
[11:26am] Amgine: There is a single copy held at the central public library as a reference tool.
[11:27am] Amgine: The editor I've been communicating with at UDC doesn't think Wikipedia would need the full 68k categories... but they currently have about 840 000 categories.
[11:28am] Philippe|Wiki: =:0X
[11:28am] Philippe|Wiki: That's a lot.
[11:28am] Amgine: Next topic is data dumps information: I have a couple people working with en.WP dumps, one of them is trying to parse a pair of en.WP infoboxes for metadata about languages.
[11:29am] Amgine: Apparently it's rather more complex than I'd hoped, but it is doable.
[11:29am] werdna: have you heard of dbpedia?
[11:29am] walkerma: Amgine: We have someone in chemistry, User:Beetstra, who is making our Chemboxes machine-readable
[11:30am] walkerma: werdna: Yes, I love dbpedia! I often include it in my presentations
[11:30am] Philippe|Wiki: /me notes that Amgine is doing a ton of work for someone who didn't want to join the task force. There's gotta be a barnstar for that....
[11:30am] Amgine: werdna: Yes, this andrew (yes, another andrew, from Aus too) is working with dbpedia.
[11:31am] Amgine: I spoke with several people at Semantic Mediawiki: if SMW is included, all dumps will require an additional level of parsing.
[11:31am] Amgine: So, for this Task Force SMW represents a harm.
[11:32am] Amgine: I'm still working through some essays I was assigned about context tagging, and some of it looks like it would be very useful if it were included at the save - by the contributors rather than by parsers.
[11:34am] Amgine: I spoke with a couple of outside projects using WMF data, one of them avoids using dumps entirely - relying on IRC bot-reported revisions to update their live mirrors of all WMF projects.
[11:34am] Amgine: Their data is not stored in MW databases, so they can do their own styles of post-parsing. This is the Wikawix group.
[11:36am] Amgine: Let's see... then I was talking with the mobile wikipedia person, who thinks a separate API for html-parsed output. That's a new conversation so I don't have a real good grasp of all the concepts.
[11:36am] Amgine: Done.
[11:36am] walkerma: Thanks a lot for all your work!
[11:37am] hejko: Amgine: yes, great work.
[11:38am] walkerma: I'm hoping that Tomasz will join us on ITC at some point, so we can discuss dumps, and hear the information from the horse's mouth!
[11:38am] hejko: But did one of them give a hint on what they'd consider a helpful improvement to the way the data is provided?
[11:38am] walkerma: SOrry, I meant on IRC
[11:39am] Amgine: hejko: All of them have said the current dumps are shallow, and need further contextual tags.
[11:39am] Amgine: The general agreements are in that very brief list I posted at Offline 1
[11:40am] Amgine: Of those working with dumps, all want templates expanded, preferably mined for metadata.
[11:40am] Amgine: Infoboxes are a big thing everyone mentions.
[11:41am] Amgine: Here's an example Catlin would like to see - in an html output - stand by for flood:
[11:41am] Philippe|Wiki: Amgine: I know there are issues with tables rendering correctly in the WikiReader... is that a factor in this as well?
[11:41am] Amgine: <article rev="123222" name="Haml">
[11:41am] Amgine: <infobox>
[11:41am] Amgine: htmlhtmlhtml
[11:41am] Amgine: </infobox>
[11:41am] Amgine: <languages>
[11:42am] Amgine: <language href="http://es.wikipedia.org/wiki/Håml" />
[11:42am] Amgine: </languages>
[11:42am] Amgine: <sections>
[11:42am] Amgine:
[11:42am] Amgine: </sections>
[11:42am] Amgine: <images>
[11:42am] Amgine: etcetcetc
[11:42am] Amgine: </images>
[11:42am] Amgine: </article>
[11:42am] Amgine: Philippe|Wiki: probably, since that's a form wiki syntax that has to be parsed.
[11:42am] Philippe|Wiki: Yeah, it makes it fun for things like "List of XYZ..." when the tables don't show
[11:43am] walkerma: Amgine: At what level do we most need to get metadata out - do we need information about the article TOPIC, or do we need to get more data from the article CONTENT? Or do you think both are critical?
[11:44am] hejko: Amgine: Ok, so first of all they want easily parsable XHTML and further as much semantic annotation as possible, right?
[11:44am] Amgine: Both are critical, but the primary complaint at the moment is that everything must be parsed as wiki syntax, and includes no context tags to make it easy to parse.
[11:44am] Amgine: Yes Hejko.
[11:45am] walkerma: So Tom Bylanders XML version of our 0.7 release is looking more and more important!
[11:45am] hejko: Amgine: So I think this could lead to a very precise proposal.
[11:45am] Amgine: I should point out there's an extension to display interwiki pages side-by-side that allows the use of div tags to be proxy context tags.
[11:46am] Amgine: It's getting some attention on Wiktionary because it's in use on Wikisource.
[11:46am] Amgine: Hejko: It's not a specification by any means, but yes.
[11:49am] hejko: Amgine: As I said before I don't think that we need to propose implementation details but rather should generate proposals that are backed by some interviews or statistics. If the WMF considers this important they probably setup another project to look at the nifty details.
[11:49am] randmontoya left the chat room. (Remote closed the connection)
[11:49am] randmontoya joined the chat room.
[11:49am] Amgine: I have also been talking with people on sister projects about what they'd like to see in the dumps. Oddly, I can't find anyone working on wikipedia who has any opinions about the dumps.
[11:50am] Amgine: <usually they're pretty opinionated about most things...>
[11:50am] Philippe|Wiki: Amgine, have you talked to Tomasz in our office?;
[11:50am] flonight joined the chat room.
[11:50am] Philippe|Wiki: He's our XML dump god
[11:51am] walkerma: hejko: I think the boundary between "recommendation" and "implementation" is quite blurred - at least for me, someone who is technically challenged.
[11:51am] hejko: Philippe|Wiki: But I think this is more about whether they are generated on regular basis at all.
[11:51am] hejko: ... which was a major pain in the past.
[11:51am] Philippe|Wiki: I think that Tomasz probably has some fairly strong opinions on content too...
[11:52am] walkerma: Especially as recommendations and implementation are so interdependent
[11:52am] Philippe|Wiki: tfinc(at)wikimedia.org
[11:52am] Amgine: Yes, I tried to get him to make comments. He basically said everyone he works with is happy with the dumps, and that any dump solution is too contextual to make generalizations from.
[11:52am] Philippe|Wiki: Ah, okay
[11:52am] hejko: walkerma: yes, same for me. but I want to avoid that we get lost in details and fail to provide good arguments about why we propose something.
[11:54am] Amgine: Unfortunately, I destroyed my logs when I moved OSes last... <grumbles>
[11:55am] Philippe|Wiki: hejko, yes, please I'd like to get good arguments about proposals, and then focus on a migration path after.
[11:55am] Philippe|Wiki: But we hsould have some estimate of level of effort
[11:56am] Amgine: We must be near the end of the meeting. My coffeepot is empty.
[11:56am] walkerma: I think these issues - however technical - do seem to be a pretty major stumbling block for preparing offline releases, so we do need to cover them in some depth. But you're right, we will need to bring things together at some point into a coherent recommendation
[11:57am] Amgine: <nods> We have about a month plus change.
[11:57am] walkerma: sj| Are you around? Before we break up, would you be able to tell us briefly about your OLPC work?
[11:57am] walkerma: Amgine: WHen is "Greg New Year" you mentioned?
[11:58am] Amgine: Gregorian New Year is Jan 1.
[11:58am] hejko: Amgine: Do you think it would be feasible to setup a questionnaire with some statements based on our current view of the situation with dumps and have them answered by the known offline projects? The answers (and derived stats) could be used to back our proposal.
[11:58am] walkerma: Aha! I wondered what the Greg was!
[11:59am] walkerma: hejko: Great idea.
[11:59am] Amgine: Hejko: I can try to build a quick instrument somewhere. Maybe the talk page of Offline 1?
[12:00pm] Amgine: Do we want open-ended questions, or answered questions?
[12:00pm] Amgine: (that is, qualitative or quantitative data collection?)
[12:00pm] hejko: I'd prefer quantitative as we already identified problems and possible solutions.
[12:00pm] Philippe|Wiki: (mix of both? Quant number with a free form entry for "other stuff"?)
[12:01pm] Amgine: Not good, Philippe|Wiki, as they analyse very differently.
[12:01pm] Amgine: But we can add some "tell us more" things.
[12:02pm] hejko: I'd like to collect info on the main pains. An invitation to provide additional comments and recommendations should of course also be available at the end.
[12:02pm] Philippe|Wiki: Well, we're not going for statistically valid here, correct? We're looking for working information. Let's not get too bogged down in stats.
[12:02pm] Philippe|Wiki: But it's your survey and you guys know better than I what you want to get out ofi t
[12:03pm] walkerma: I would suggest starting with some general questions to set the context, then a few specific ones like the format for dumps, and what metadata are needed
[12:03pm] walkerma: But it should be fairly short & sweet
[12:03pm] Amgine: Heh. I do health surveys on the side, Philippe|Wiki.
[12:05pm] Amgine: Mmm... I do actually have a development platform of limesurvey on the server downstairs... I spose I could just hack something up on that.
[12:05pm] brianmc: yah. My health survey was "living: condition terminal"
[12:06pm] Amgine: brianmc: you don't want to know what the current study survey is...
[12:06pm] brianmc: heh
[12:06pm] Philippe|Wiki: Amgine: We've got access to a LimeSurvey account if necessary
[12:06pm] Philippe|Wiki: We're using it for a survey from another task force
[12:07pm] Amgine: I thought it was for commons? <still grumbles about the questions there>'
[12:07pm] hejko: I think if we can conclude with: 90% of the interviewers responded to the question "Would real XML dumps significantly improve your work?" with yes, this would strongly back our proposal/
[12:07pm] Philippe|Wiki: Amgine: not that one
[12:07pm] Philippe|Wiki: Amgine: It's a community health survey
[12:07pm] Amgine: Ah...
[12:07pm] walkerma: OK, is there anything else we want to cover today?
[12:08pm] Amgine: Okay hejko. I could build a survey to insure that outcome, but I'd rather build an honest unbiased one.
[12:08pm] Amgine: hejko's topic.
[12:08pm] hejko:
[12:08pm] aude-wiki joined the chat room.
[12:09pm] walkerma: Amgine:
[12:09pm] walkerma: http://www.youtube.com/watch?v=2yhN1IDLQjo
[12:10pm] walkerma: Is there anything else we need to cover today? Should we cover "What content" next Tuesday at 1700h UTC?
[12:10pm] hejko: That was just an example. Maybe we can do two surveys. A qualitative one now and once we refined the results another that features easy to comprehend numbers. This is just thought as a favor to the WMF to help them with their decision without having a deeper understanding of the issue.
[12:10pm] Amgine: Can't do youtube on this platform, walkerma. If you send it to me as an e-mail I can try it later.
[12:11pm] walkerma: Just a comedy piece, from Yes Minister, titled "Opinion Polls: Getting the results you want"
[12:11pm] Amgine: <nods> Good point Hejko.
[12:11pm] Amgine: <grins @ walkerma>
[12:13pm] walkerma: BozMo has mentioned that 1700h is a bad time for him; should we perhaps meet at 1800h UTC to make it easier for him?
[12:13pm] Philippe|Wiki: OK by me
[12:14pm] walkerma: We don't have any Asians in this group, so I don't think it should be a problem for timezones
[12:14pm] walkerma: (is that correct?)
[12:16pm] • Philippe|Wiki has switched it on his schedule for next week
[12:17pm] Amgine: Works for me.
[12:19pm] walkerma: OK, SJ| seems to have gone quiet, so let's finish off here. I hope that we will be able to "show off" the new en:WP bot for you soon - this is really amazing, IMHO, and it shows what can be done for assessing & selecting. Not sure if the ideas will translate to other projects
[12:19pm] walkerma: Anything else?
[12:20pm] Philippe|Wiki: If nobody's got anything, I'll be happy to post the log to the Offline/IRC page :0)
[12:20pm] walkerma: OK, and we'll see everyone on Tuesday at 1800h UTC. If there's a change, I'll post it and email people.
[12:21pm] hejko: ok
[12:21pm] Philippe|Wiki: Thanks, folks
[12:21pm] walkerma: I want to check times with BozMo and Wizzy
[12:21pm] walkerma: Thanks everyone!