Proposal talk:Implement and deploy checksum revision table

From Strategic Planning
Jump to: navigation, search

Contents

Thread titleRepliesLast modified
Hash in revision or text ?113:17, 4 September 2011
Index113:11, 4 September 2011

Hash in revision or text ?

Right now the proposal title and committed patch implement this in the revision table. Why is this though ? In my opinion it makes more sense in the text table (which the introduction paragraph of the proposal mentions as target table as well).

It's the hash of the text, not of the revision meta-data. There can (and should be) mutiple revisions with the same hash of the revision text. Right now MediaWiki only re-uses a text-table row if a revision is a direct revert of an earlier revision (using the "rollback" feature). If a normal undo takes place or if there were multiple editors between the vandalism and the user had to dig back manualy and save an old revision, then MediaWiki stores a second copy of the text.

Anyway, just to bring this up. Do we want it in the text table ?

Krinkle14:46, 3 September 2011

I think the revision table makes most sense because that way:

  • it will be easily exposable in the XML dump files
  • it will be easily exposable in the API
  • it will be available on toolserver
Drdee13:16, 4 September 2011
 

No index needed ? If we want to re-use text-table rows and query by a generated hash when saving the revision text, we would need an index, right ?

Krinkle14:42, 3 September 2011

Yes, if you want to query by hash then you would need obviously an index but I haven't heard a use case yet where we really would want to query often the hash column. In addition, the checksum will not be always unique across different pages. If two different pages have been blanketed then they would have the same hash. So we might need a compounded index in that case but I would like to hear more different use cases first before we decide on including an index.

Drdee13:11, 4 September 2011