Warning: Can't synchronize with repository "(default)" (/home/git/ome.git does not appear to be a Git repository.). Look in the Trac log for more information.
Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

User Story #10335 (closed)

Opened 11 years ago

Closed 10 years ago

FS checksums for data integrity

Reported by: bpindelski Owned by:
Priority: critical Milestone: 5.x
Component: OmeroFs Keywords: n.a.
Cc: omero-team@… Story Points: n.a.
Sprint: n.a. Importance: n.a.
Total Remaining Time: 0.0d Estimated Remaining Time: n.a.

Description (last modified by bpindelski)

This story aims to capture all the tickets related to work needed to be done to get a checksum system in place to guarantee data integrity during an FS upload of a file.

Goal

To guarantee file integrity by conducting checksum calculation during and post upload and to offer the user a choice of checksum speed vs. security. This requirement does not supersede the integrity guaranteed on the transport layer of the OSI model, but enhances it.

Proposed workflow

During upload of a single image, a checksum of a specific type (low/medium/high security) is computed for each element of the image using the byte content of the file under upload. The checksum type and value is attached to the file before upload and transmitted together with the data. On the receiving side, the server reads the checksum type and calculates the value using the same algorithm. If the checksums match, the image is considered valid.

Exceptions:

  • checksum mismatch after upload - image considered invalid, error returned to client, the user has the possibility to stop the import process,
  • checksum calculation fails client-side due to algorithm error - import process stops automatically,
  • checksum calculation fails server-side due to algorithm error - error returned to client, the user has the possibility to stop the import process,
  • checksum fails on corrupted file after n-th round of verification - error returned to client,
  • checksum capability mismatch between client and server - lowest common denominator chosen, user informed about the algorithm chosen.

Implementation context

The checksum has to be calculated “on-the-fly” to avoid duplicating file I/O operations. Error correction is not a requirement at the current stage. External libraries can be used (with the caveat of supporting JDK 5). The transmission medium does not influence the quality of the checksum algorithm used, but in the future the file type might. The other language bindings (Python, C++) have to be considered during the design stage - the clients will want to work with a unified checksum naming scheme and also the implementations of checksum algorithms have to be present in both Python and C++.

Proposed algorithms

Ordered by computational cost:

  • Adler-32,
  • CRC-32,
  • MD5,
  • Murmur hash,
  • SHA1.

Change History (9)

comment:1 Changed 11 years ago by agilo

  • Status changed from new to accepted

Updated status, related task in progress

comment:2 Changed 11 years ago by bpindelski

  • Description modified (diff)

comment:3 Changed 11 years ago by bpindelski

  • Description modified (diff)

comment:4 Changed 11 years ago by bpindelski

  • Description modified (diff)

comment:5 Changed 11 years ago by jamoore

Should we consider this work done with demo 3, or keep it open for the next round(s) of checksum work?

comment:6 Changed 11 years ago by mtbcarroll

I am okay with it being closed if open tickets do capture the details of work yet to do on it. For instance, one might consider that https://trac.openmicroscopy.org.uk/ome/ticket/10338#comment:15 includes some checksum capability mismatch discussion, but maybe we at least need to add C++ and Python code to achieve what other server-supported checksum algorithms beyond SHA1 can actually be easily supported in those client libraries and have the clients use that code. One might also want to check with Petr in case the text on this ticket is useful grist for his test plan mill.

comment:7 Changed 10 years ago by mtbcarroll

Perhaps this story can now be closed.

comment:8 Changed 10 years ago by bpindelski

All tickets have been fixed and the checksum code has stabilised. Thanks for all the hard work, Mark.

comment:9 Changed 10 years ago by bpindelski

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.84678 sec.)

We're Hiring!