Proposal talk:Assessment content

    From Strategic Planning

    I wrote some strategy but edited into discussion, instead of here. I didn't select the discussion format for any particular reason. I'm just still getting acquainted with this system. I think "discussion" would come in handy if there was more activity but we seem to be fine so far. There is nothing I can think of to correct this error though. Mbrad 02:29, 14 February 2010 (UTC)Reply[reply]

    It's not an error, just an experimental choice. Anyone can still make regular edits to this page, but I have no idea how watchlists will be affected. The "Liquid Threads" extension is new; we're just trying it out.
    If you or anyone wants to help out, please join the #mediawiki channel on freenode IRC, then vote for this feature request and add yourself to its cc: list. Then install the Quiz extension on your local copy of MediaWiki and start with a GIFT format translator to the existing Quiz question format. Once someone submits an importer and exporter as a patch to the bug, we can start working on new database tables for each question in an adaptive testing system and an accuracy review system. Does that sound good? If so, please get on IRC and vote for and add yourself to the cc list of the bug, and talk with the experienced MediaWiki programmers about how they would approach whichever stage of the problem you're working on. Thank you. 18:43, 15 February 2010 (UTC)Reply[reply]


    So am I to understand that the idea would be to create article/subject/category based quizzes that people can take online? I think that's a great idea, but I'm not sure if that's what's being proposed here. --Bodnotbod 16:58, 18 August 2009 (UTC)Reply[reply]

    Yes, thanks, that is one of the things such a database would be able to provide, with the appropriate presentation interpreter. 05:28, 19 August 2009 (UTC)Reply[reply]
    This is sort of a fascinating idea. Off to rate it... :) -- Philippe 05:30, 19 August 2009 (UTC)Reply[reply]
    I think the summary should be modified. The value of adopting (or translating) a standard goes well beyond achieving charitable status in the UK.Mbrad 21:26, 7 February 2010 (UTC)Reply[reply]
    Thanks, yes, I revised it. Thanks again for your help. I hope I understand your suggestions because they all seem very good. 05:59, 9 February 2010 (UTC)Reply[reply]

    How we can do better than previous projects

    Please familiarize yourself with these earlier MediaWiki extensions and a similar GPLed system:

    Does either support the default text editor in creating questions and quizzes which don't have a multimedia component? Do either of them support using the default text editor to create and change questions and quizzes which do involve a multimedia component? 21:11, 23 August 2009 (UTC)Reply[reply]

    How to get started?

    Assuming that some portion of [1] or other source(s) of funding is approved for this project, and/or a bunch of volunteers sign up for the task force, what is the right way to get started?

    • microformat definition, culling the good stuff from the existing QTI specs
    • input editor for assessment -- if the microformat is easy enough, we can just use the ordinary MediaWiki edit box most of the time, but not for questions with graphics or audio output or input components
      • What features are needed to support
      • can we make input editor(s) which supports turning a set of assessment questions into a game?
        • based on the learner's score?
        • based on "choose your own adventure" style interaction?
    • output player -- how do we present the assessment items (questions)
      • how do we keep track of the results?
      • how do we decide which assessment question to offer next?
    • how can we make systems for people other than the question authors to validate new assessment items 09:02, 23 August 2009 (UTC)Reply[reply]

    Normalization of assessment items (questions) in database

    Is there an RDBMS analysis of these data types? If so, where is it? Can we describe it in wikitables?

    I can't find one. Here are the question states and fields in outline format:

    assessment item state

    • incomplete
      • ambiguous
      • ungrammatical
      • non-sequitur
      • implies false assumption
      • circular
      • dependent on future circumstance or decision
    • open
      • hypothetical (also open -- but less so?)
      • answered
        • reviewed -- note: the fields necessary for this step don't yet appear below
          • complete (passed review)
            • asked
              • scored
                • challenged
                  • assessed
          • rejected (failed review or assessment)

    assessment item fields

    • question text
      • suggested answers (e.g., "true, false")
        • context of a blank (following the question text and the blank)
          • optional default for blank
      • correct answer
        • list of correct answers (may have zero elements);
        • OR a pattern describing the list of correct answers
          • can be a (likely phoneme- or word-composite) hidden markov model representing pronunciation
      • summary statistics
        • list of answers given; for each:
          • answer
          • confidence
            • self-reported
            • derived
          • whether help was requested
          • score
          • user id
          • timestamps
            • presentation
            • answer
          • whether score was challenged -- note: this partially supports accuracy review
        • average score
      • relations to other questions
        • set of questions which help answer this question
        • set of questions which answering this question helps answer -- Can some of the directed graph of which questions assist in the answering of other questions be derived from categorization or must it be stored completely explicitly?
        • set of relations to other questions by relative difficulty; for each:
          • question
          • more or less difficult, and how much
      • optional
        • general help
        • specific help
        • hints explaining why wrong answers are wrong ('surprise result')

    That should be enough to normalize from. Those do not include the fields necessary to support review per Proposal:Develop systems for accuracy review. Those fields need to be added because the list of elements including timestamps present a 6NF-level problem. 17:59, 10 September 2009 (UTC)Reply[reply]

    I have never done a sixth normal form normalization before, but this text from the other proposal and the question about at [2] should help:
    a selection of text or a url (to a permanent aritcle version or diff, etc.) could be an item for which multiple, randomly-selected reviewers chosen for their stated familiarity with a topic area would be selected. Those reviewers could be shown that text or url (perhaps as part of a list of such items) in a securely authenticated and captcha-ed channel. They would be asked to vote on the accuracy of the item, and have the opportunity to fully explain their votes in comments. If a statistically significant number of votes are in agreement, then the item could be approved as to veracity or rejected. When the votes are not in agreement, then additional voter(s) would perform a tie-breaking function. Each voter's track record in terms of agreement with other voters could be recorded secretly and used to (1) weight their vote to nullify defective voters, and/or (2) used to select whether the voter is allowed to become a tie-breaker. 20:39, 12 September 2009 (UTC)Reply[reply]
    So we need:
    • items to review (questions, diffs, or permanent links)
      • topic(s) of item
      • for each, a list of:
        • votes
        • comments
        • user id
    • reviewers (indexed by user id)
      • reviewers' stated familiarity with various topics (opt-in, derived from categories, or otherwise?)
      • authorization tokens from secure log in
      • authentication tokens from captcha responses
      • votes
      • comments on votes
      • track record of agreement with other voters
        • on a per-topic basis(?)
    • topics
      • some measure of controversiality or other description of voter agreement within the topic and resulting statistical significance of a given number of votes 02:58, 14 September 2009 (UTC)Reply[reply]

    There are more ideas for sequencing and aggregation in the QTI spec and this schema based on these rules from del Soldato, T. & du Boulay, B. (1995) "Implementation of Motivational Tactics in Tutoring Systems," Journal of Artificial Intelligence in Education, 6(4): 337-78. 00:17, 25 August 2009 (UTC)Reply[reply]

    Special Assessment Fields

    Assuming that this assessment content will in large be employed as Wikiversity lesson content, we should take this opportunity to consider as best we can, the special needs of online learners in this context. I've addressed a couple of subjects below but there may be many more structural ideas that would impact this schema. Additionally there is currently in academia a budding of the "scholarship of assessment" as a central subject, including the emergence of full time positions dedicated to assessment. Since assessment is relatively new as a subject of scholarship, interest in altogether new special assessment fields may evolve to support valuable new theories. For this reason and especially since we are in the new arena of electronic assessment, some effort should be made to survey the prior scholarship on this subject matter, and at the very least this "Assessment Content Proposal" should be flexible. I haven't studied XML in a while but I believe that means changing the schema from time to time, while keeping the integrity of QTI compatibility. I don't know much about the subject of QTI or schema, or of Mediawiki's software.

    Item Response Theory Variables

    Teacher directed education typically is constrained by time. A lesson must be covered within a quarter or semester, a curriculum must be completed within x total credit hours. However our learners are not constrained in this way. For the Wikiversity learner the constraints are instead formed by potentially large gaps in time between segments of lessons caused by numerous possible factors, as well irregularity caused by random insertion in a new subject. This creates unique opportunities that, barring failing in producing quality content, presents an exciting new global autodidactic potential.

    One effective tool would be to subscribe to periodic assessment of a subject that is on hold, employing the spacing effect via email or other "pushed" channels, or it can simply remain passively available upon returning to the Wikiversity domain. Development of this kind of tool also opens up new possibilities for retention of material from traditional instructor lead courses completed long ago, a tool useful for all of education.

    However, that type of tool doesn't necessitate any new special assessment fields (other than perhaps the learning goals described below). What it does is provide a context for a computerized adaptive testing (or CAT) tool. In short, like the GRE, CAT adapts to the test-taker by increasing or decreasing the difficulty of subsequent questions in response to the test takers previous answer. The value of this type of test is in placing a learner in a lesson, whether returning to a subject after a long time, or looking for a point of entry into a new field of study with a specific learning goal in mind (eg. desiring to write a multilateration function). CAT systems often employ Item Response Theory to rank questions. This theory is most commonly used with three variables to describe the difficulty of a question, variables that would require fields in the QTI:

    • the item discrimination parameter (a sub i in the formulae)
    • the item difficulty parameter (b sub i)
    • the item guessing parameter (c sub i)

    The value of this type of test is that it provides navigation to a learner that may not know where they need to navigate to. Additionally there are as of yet no web based CAT testing services. This might allow Wikiversity to become the focal point for content creation by agencies that are looking for just such a tool.

    Just as a note I understand that the suggestion appears as though it is a complex solution but consider that among all of the WMF projects, Wikiversity has the highest learning curve, and this curve is specifically what Wikiversity content is about serving in the first place. I think it is highly appropriate for a Wikiversitarian, more than anybody else, to reach for the stars.

    Learning Goals

    Currently in academia there is a lot of recognition about the problem of linking "learning goals" to assessment. Often this means linking "higher order thinking" goals to assessment. As this suggests, the term "learning goal" may represent an item on Bloom's Taxonomy but it may also represent a section of a lesson (such as employing l'hopital's rule on difficult limits). Whatever the case may be a single "learning goal" field with special syntax could incorporate both complimentary interpretations of "learning goal" in one place.

    This type of field will make it easy to map a collection of assessment content to a lesson, while being free to exist in broader contexts (outside of the lesson it is embedded in), for example in:

    • the CAT tests I mentioned above
    • simpler entry assessment tests
    • cumulative subject tests
    • grouping assessment content for subject based trivia games

    Aside from presiding on assessment content, "learning goals" in of themselves can provide even more useful navigation than what already exists on Wikiversity. In my opinion it is an important organizational category.


    Some proposals will have massive impact on end-users, including non-editors. Some will have minimal impact. What will be the impact of this proposal on our end-users? -- Philippe 00:05, 3 September 2009 (UTC)Reply[reply]

    If they wanted to use assessment content for interactive instruction, it could be quite substantial. 16:03, 3 September 2009 (UTC)Reply[reply]
    In my opinion the impact of this proposal is quite high. Assessment is a critical content type in any form of learning. Wikiversity would be the primary beneficiary but as the wikisister with the highest learning curve, and that addresses the highest learning curve, when enough editors have come over the hump this project may ultimately have the highest impact on humanity out of all of WMF's projects. Additionally assessment itself is a new field of specialized study in academia and any contributions WMF, and it's community, could make in this area could also greatly serve it's mission to expand all knowledge. Mbrad 21:22, 7 February 2010 (UTC)Reply[reply]

    GIFT picoformat needs to be extended

    So as not to re-invent the wheel, Moodle's GIFT picoformat for quizes looks really good for our purposes. However, in order to work with the del Soldato/du Boulay "Motivational Tactics," and accuracy review cited above, each question would also need to be able to have additional information specified from "assessment item fields" above. I put a summary of the GIFT picoformat at and requirements for those extensions at followed by pending extension choices being discussed at 06:00, 9 February 2010 (UTC)Reply[reply]

    LiquidThreads discussion