Proposal talk:Assessment content

From Strategic Planning
Jump to: navigation, search

I wrote some strategy but edited into discussion, instead of here. I didn't select the discussion format for any particular reason. I'm just still getting acquainted with this system. I think "discussion" would come in handy if there was more activity but we seem to be fine so far. There is nothing I can think of to correct this error though. Mbrad 02:29, 14 February 2010 (UTC)

It's not an error, just an experimental choice. Anyone can still make regular edits to this page, but I have no idea how watchlists will be affected. The "Liquid Threads" extension is new; we're just trying it out.
If you or anyone wants to help out, please join the #mediawiki channel on freenode IRC, then vote for this feature request and add yourself to its cc: list. Then install the Quiz extension on your local copy of MediaWiki and start with a GIFT format translator to the existing Quiz question format. Once someone submits an importer and exporter as a patch to the bug, we can start working on new database tables for each question in an adaptive testing system and an accuracy review system. Does that sound good? If so, please get on IRC and vote for and add yourself to the cc list of the bug, and talk with the experienced MediaWiki programmers about how they would approach whichever stage of the problem you're working on. Thank you. 18:43, 15 February 2010 (UTC)


So am I to understand that the idea would be to create article/subject/category based quizzes that people can take online? I think that's a great idea, but I'm not sure if that's what's being proposed here. --Bodnotbod 16:58, 18 August 2009 (UTC)

Yes, thanks, that is one of the things such a database would be able to provide, with the appropriate presentation interpreter. 05:28, 19 August 2009 (UTC)
This is sort of a fascinating idea. Off to rate it... :) -- Philippe 05:30, 19 August 2009 (UTC)
I think the summary should be modified. The value of adopting (or translating) a standard goes well beyond achieving charitable status in the UK.Mbrad 21:26, 7 February 2010 (UTC)
Thanks, yes, I revised it. Thanks again for your help. I hope I understand your suggestions because they all seem very good. 05:59, 9 February 2010 (UTC)

How we can do better than previous projects

Please familiarize yourself with these earlier MediaWiki extensions and a similar GPLed system:

Does either support the default text editor in creating questions and quizzes which don't have a multimedia component? Do either of them support using the default text editor to create and change questions and quizzes which do involve a multimedia component? 21:11, 23 August 2009 (UTC)

How to get started?

Assuming that some portion of [1] or other source(s) of funding is approved for this project, and/or a bunch of volunteers sign up for the task force, what is the right way to get started?

  • microformat definition, culling the good stuff from the existing QTI specs
  • input editor for assessment -- if the microformat is easy enough, we can just use the ordinary MediaWiki edit box most of the time, but not for questions with graphics or audio output or input components
    • What features are needed to support
    • can we make input editor(s) which supports turning a set of assessment questions into a game?
      • based on the learner's score?
      • based on "choose your own adventure" style interaction?
  • output player -- how do we present the assessment items (questions)
    • how do we keep track of the results?
    • how do we decide which assessment question to offer next?
  • how can we make systems for people other than the question authors to validate new assessment items 09:02, 23 August 2009 (UTC)

Normalization of assessment items (questions) in database

Is there an RDBMS analysis of these data types? If so, where is it? Can we describe it in wikitables?

I can't find one. Here are the question states and fields in outline format:

assessment item state

  • incomplete
    • ambiguous
    • ungrammatical
    • non-sequitur
    • implies false assumption
    • circular
    • dependent on future circumstance or decision
  • open
    • hypothetical (also open -- but less so?)
    • answered
      • reviewed -- note: the fields necessary for this step don't yet appear below
        • complete (passed review)
          • asked
            • scored
              • challenged
                • assessed
        • rejected (failed review or assessment)

assessment item fields

  • question text
    • suggested answers (e.g., "true, false")
      • context of a blank (following the question text and the blank)
        • optional default for blank
    • correct answer
      • list of correct answers (may have zero elements);
      • OR a pattern describing the list of correct answers
        • can be a (likely phoneme- or word-composite) hidden markov model representing pronunciation
    • summary statistics
      • list of answers given; for each:
        • answer
        • confidence
          • self-reported
          • derived
        • whether help was requested
        • score
        • user id
        • timestamps
          • presentation
          • answer
        • whether score was challenged -- note: this partially supports accuracy review
      • average score
    • relations to other questions
      • set of questions which help answer this question
      • set of questions which answering this question helps answer -- Can some of the directed graph of which questions assist in the answering of other questions be derived from categorization or must it be stored completely explicitly?
      • set of relations to other questions by relative difficulty; for each:
        • question
        • more or less difficult, and how much
    • optional
      • general help
      • specific help
      • hints explaining why wrong answers are wrong ('surprise result')

That should be enough to normalize from. Those do not include the fields necessary to support review per Proposal:Develop systems for accuracy review. Those fields need to be added because the list of elements including timestamps present a 6NF-level problem. 17:59, 10 September 2009 (UTC)

I have never done a sixth normal form normalization before, but this text from the other proposal and the question about at [2] should help:
a selection of text or a url (to a permanent aritcle version or diff, etc.) could be an item for which multiple, randomly-selected reviewers chosen for their stated familiarity with a topic area would be selected. Those reviewers could be shown that text or url (perhaps as part of a list of such items) in a securely authenticated and captcha-ed channel. They would be asked to vote on the accuracy of the item, and have the opportunity to fully explain their votes in comments. If a statistically significant number of votes are in agreement, then the item could be approved as to veracity or rejected. When the votes are not in agreement, then additional voter(s) would perform a tie-breaking function. Each voter's track record in terms of agreement with other voters could be recorded secretly and used to (1) weight their vote to nullify defective voters, and/or (2) used to select whether the voter is allowed to become a tie-breaker. 20:39, 12 September 2009 (UTC)
So we need:
  • items to review (questions, diffs, or permanent links)
    • topic(s) of item
    • for each, a list of:
      • votes
      • comments
      • user id
  • reviewers (indexed by user id)
    • reviewers' stated familiarity with various topics (opt-in, derived from categories, or otherwise?)
    • authorization tokens from secure log in
    • authentication tokens from captcha responses
    • votes
    • comments on votes
    • track record of agreement with other voters
      • on a per-topic basis(?)
  • topics
    • some measure of controversiality or other description of voter agreement within the topic and resulting statistical significance of a given number of votes 02:58, 14 September 2009 (UTC)

There are more ideas for sequencing and aggregation in the QTI spec and this schema based on these rules from del Soldato, T. & du Boulay, B. (1995) "Implementation of Motivational Tactics in Tutoring Systems," Journal of Artificial Intelligence in Education, 6(4): 337-78. 00:17, 25 August 2009 (UTC)

Special Assessment Fields

Assuming that this assessment content will in large be employed as Wikiversity lesson content, we should take this opportunity to consider as best we can, the special needs of online learners in this context. I've addressed a couple of subjects below but there may be many more structural ideas that would impact this schema. Additionally there is currently in academia a budding of the "scholarship of assessment" as a central subject, including the emergence of full time positions dedicated to assessment. Since assessment is relatively new as a subject of scholarship, interest in altogether new special assessment fields may evolve to support valuable new theories. For this reason and especially since we are in the new arena of electronic assessment, some effort should be made to survey the prior scholarship on this subject matter, and at the very least this "Assessment Content Proposal" should be flexible. I haven't studied XML in a while but I believe that means changing the schema from time to time, while keeping the integrity of QTI compatibility. I don't know much about the subject of QTI or schema, or of Mediawiki's software.

Item Response Theory Variables

Teacher directed education typically is constrained by time. A lesson must be covered within a quarter or semester, a curriculum must be completed within x total credit hours. However our learners are not constrained in this way. For the Wikiversity learner the constraints are instead formed by potentially large gaps in time between segments of lessons caused by numerous possible factors, as well irregularity caused by random insertion in a new subject. This creates unique opportunities that, barring failing in producing quality content, presents an exciting new global autodidactic potential.

One effective tool would be to subscribe to periodic assessment of a subject that is on hold, employing the spacing effect via email or other "pushed" channels, or it can simply remain passively available upon returning to the Wikiversity domain. Development of this kind of tool also opens up new possibilities for retention of material from traditional instructor lead courses completed long ago, a tool useful for all of education.

However, that type of tool doesn't necessitate any new special assessment fields (other than perhaps the learning goals described below). What it does is provide a context for a computerized adaptive testing (or CAT) tool. In short, like the GRE, CAT adapts to the test-taker by increasing or decreasing the difficulty of subsequent questions in response to the test takers previous answer. The value of this type of test is in placing a learner in a lesson, whether returning to a subject after a long time, or looking for a point of entry into a new field of study with a specific learning goal in mind (eg. desiring to write a multilateration function). CAT systems often employ Item Response Theory to rank questions. This theory is most commonly used with three variables to describe the difficulty of a question, variables that would require fields in the QTI:

  • the item discrimination parameter (a sub i in the formulae)
  • the item difficulty parameter (b sub i)
  • the item guessing parameter (c sub i)

The value of this type of test is that it provides navigation to a learner that may not know where they need to navigate to. Additionally there are as of yet no web based CAT testing services. This might allow Wikiversity to become the focal point for content creation by agencies that are looking for just such a tool.

Just as a note I understand that the suggestion appears as though it is a complex solution but consider that among all of the WMF projects, Wikiversity has the highest learning curve, and this curve is specifically what Wikiversity content is about serving in the first place. I think it is highly appropriate for a Wikiversitarian, more than anybody else, to reach for the stars.

Learning Goals

Currently in academia there is a lot of recognition about the problem of linking "learning goals" to assessment. Often this means linking "higher order thinking" goals to assessment. As this suggests, the term "learning goal" may represent an item on Bloom's Taxonomy but it may also represent a section of a lesson (such as employing l'hopital's rule on difficult limits). Whatever the case may be a single "learning goal" field with special syntax could incorporate both complimentary interpretations of "learning goal" in one place.

This type of field will make it easy to map a collection of assessment content to a lesson, while being free to exist in broader contexts (outside of the lesson it is embedded in), for example in:

  • the CAT tests I mentioned above
  • simpler entry assessment tests
  • cumulative subject tests
  • grouping assessment content for subject based trivia games

Aside from presiding on assessment content, "learning goals" in of themselves can provide even more useful navigation than what already exists on Wikiversity. In my opinion it is an important organizational category.


Some proposals will have massive impact on end-users, including non-editors. Some will have minimal impact. What will be the impact of this proposal on our end-users? -- Philippe 00:05, 3 September 2009 (UTC)

If they wanted to use assessment content for interactive instruction, it could be quite substantial. 16:03, 3 September 2009 (UTC)
In my opinion the impact of this proposal is quite high. Assessment is a critical content type in any form of learning. Wikiversity would be the primary beneficiary but as the wikisister with the highest learning curve, and that addresses the highest learning curve, when enough editors have come over the hump this project may ultimately have the highest impact on humanity out of all of WMF's projects. Additionally assessment itself is a new field of specialized study in academia and any contributions WMF, and it's community, could make in this area could also greatly serve it's mission to expand all knowledge. Mbrad 21:22, 7 February 2010 (UTC)

GIFT picoformat needs to be extended

So as not to re-invent the wheel, Moodle's GIFT picoformat for quizes looks really good for our purposes. However, in order to work with the del Soldato/du Boulay "Motivational Tactics," and accuracy review cited above, each question would also need to be able to have additional information specified from "assessment item fields" above. I put a summary of the GIFT picoformat at and requirements for those extensions at followed by pending extension choices being discussed at 06:00, 9 February 2010 (UTC)

LiquidThreads discussion


Thread titleRepliesLast modified
Before April 30th1000:21, 20 February 2010

Before April 30th

April 30th is the deadline, as I understand it, when WMF will have established their 5 year plan. If it is alright with you I'd like to work on this a bit.

I think it needs a few things, some of them simple and some not as simple.

For starters we should be explicit about what we are asking for. Do we want developers to spend time on this, or do just want developer support? Personally, as a student, I am looking for a Google Summer of Code project for the Summer and this sounds about perfect for me, in terms of size and difficulty. In the case that they agreed with other aspects of this proposal they would only need to provide mentorship.

Another aspect of this proposal that might need work is in persuasion. We are asking WMF to allow us to adopt the proprietary standard of Moodle, and we should continue to generate arguments for this. In parts of WV there are pages that appear to have the competitive attitude that, "We ARE not Moodle!". Whether or not this is the case, it is not uncommon for different institutions to form "silos" and we should take into account any resistance of that sort we may silently encounter. This is a problem that currently prevails across academic institutions, so why should WV be any different? Here is a video from a lecture at MIT entitled Effective Examples of Educational Technology and Priorities for Future Investment. The video is just over 1 hour and 20 minutes but the discussion occurs within the first 30 minutes. It is the best citation I have at the moment. Hopefully I can follow the speakers publication to find text citations, or find other places where this discussion is talking place. One of my planned activities before the deadline is to generate an essay that provides this information and a new analysis of WV and Moodle as complimentary technologies and communities.

I have more thoughts but if it's alright I'll share them a bit later. Also I would like to help with the proposal and I would just like to make sure that you are comfortable with me editing the proposal page. We'll be able to communicate and we'll have the history so I think it can work out ok.

02:18, 14 February 2010

Also does anyone have any idea why this talk page can not be edited to include new chapters?

02:25, 14 February 2010

Nevermind..I got it..its at the bottom.

02:26, 14 February 2010

Please do work on this! I have very little time for it and can only do it in small bursts while I'm waiting on other work to get done. Have you installed MediaWiki and become comfortable with PHP? The first step might be to write a converter in any language from GIFT into traditional WV Quiz "tables" format. See if the other Mediawiki programmers on the #mediawiki IRC channel will let you put that translator up on the toolserver; if they won't, let me know here and I can help you host it too.

GIFT isn't proprietary in the traditional sense because Moodle is all GPLed. You can use any of its supported interoperability formats with any other tools, e.g. xml structure editors, etc. I've already made arrangements to share all of the underlying proposed technology with Moodle's Quiz programmer.

07:30, 14 February 2010

Great! I have both of those requirements. I have MediaWiki installed already and I am comfortable with PHP. I have a Sun Java programmer Cert and I have found that most languages are extremely similar. In any case I have completed an accredited course on PHP to make sure.

I shouldn't have used the word proprietary. I meant to say only that it is "identified" with the Moodle infrastructure and community domain and therefore may encounter "silo" thinking resistance. I think I've just read too many older section of WV that seem to want the project to be evangelically limited in scope.

So you would suggest a proof of concept converter? I guess that it may a fairly small project by most standards. If you are considering sharing this with a Moodle programmer then you must be thinking of a two-way converter, which makes sense. We are also lacking an export function for WV quizzes in that case.

Also do you have a user account on WMF? I see various 99.x.x.x IP addresses around on this proposal's talk page and I can only presume that they are all you operating from a DHCP internet connection. Is this true?

19:53, 14 February 2010

I've asked a couple people at Google to support this as a Summer of Code project, and I'll be asking for them to match whatever I can raise at over the next few months or so. When do you start needing to get paid for the summer? I'll try to raise enough sponsorship to make sure that you will be able to do this.

18:43, 16 February 2010

I am hoping to start working on a project for WV at the end of May at the earliest. I didn't know that GSOC was funded by other sources outside Google, but even if that is not the case of course I am open to other sources of funding. I would just like to be able to see this project through instead of having to find work doing something on a project that I don't care as strongly about.

00:21, 20 February 2010

Thank you; yes, I don't log in here these days so I always know what new users have to expect. It helps me offer suggestions for improvement when the admins get worn down by the vandals and start adding elitist restrictions. But my email is on the MediaWiki bug, and yours should be too.

Please try out IRC #mediawiki and vote for and add yourself to the cc list of this feature request (you should see channel messages from an irc bot when you do), and talk with the experienced MediaWiki programmers there for tips when you get stuck. 18:49, 15 February 2010 (UTC)

18:49, 15 February 2010

When you get finished or bored with a GIFT importer/exporter, please consider writing the adaptive testing and question accuracy review algorithms as abstract PHP modules or pseudocode before implementing them inside MediaWiki's Quiz module, and put those here on this page above the LiquidThreads discussion so that Moodle Quiz programmer Tim Hunt and others will be able to find them. Or, if you think it would be easier, work out the implementation first and then abstract them into PHP modules or pseudocode. I can't say which is more likely to work better for you.

There may not be much database extension overlap between MediaWiki and Moodle, but I don't know either schema very well, so I hope I'm mistaken. Have you looked at the MediaWiki or Moodle SQL table schemas yet? They would need to be extended for both adaptive testing and accuracy review. It is unclear to me whether MediaWiki's database schema would need any changes for integrated conversion between GIFT and Quiz format in MediaWiki's Quiz module. Do you think it would?

19:08, 15 February 2010

I was cramming yesterday and I'm cramming today but tomorrow I'll be available to respond and participate.

17:07, 17 February 2010

I added myself to the bugzilla feature and voted. I haven't used IRC before but I'll try to sort that out. Previous attempts have not worked out properly.

I don't think GIFT translations will effect the schema. It would probably occur during that point when Parser.php calls the Quiz.php hook. I think, though Im not positive, that the WV microformat is retrieved from the db, then rendered as HTML (pretty much when Parser would be doing the same thing) since it tied to Parser.php. So by adding new parsing rules to identify GIFT. It could read GIFT from the same (page based?) db locations. Alternately we could translate GIFT to WV microformat when we first come across it but that means writing an entire a new extension probably.

If we wanted to go a step beyond we could write an XSLT function into Quiz.php which may be able read translation paths from a separate document. That way if someone wants to add a new microformat, or an XML format, they would only need to add the rules to the XSLT, instead of updating Quiz.php. I think that is considerably more ambitious and might modestly effect the schema, Im not sure.

As far as the Computer Adaptive Testing (CAT) I'll provide the documentation that you are asking for. Id be interested in meeting Tim Hunt (online) as well, in case I had questions. Moodle does have an "adaptive questions" extension, and I'm curious about it. Ideally I would want to be able to share the questions and question rankings with Moodle. From what I understand the larger variety of quiz takers a question has been encountered by, the more accurately we can rank the question according to the parameters of Item Response Theory (IRT), which is the most commonly used CAT ranking system.

00:14, 20 February 2010