User Story #10335 (closed)
Opened 11 years ago
Closed 10 years ago
FS checksums for data integrity
Reported by: | bpindelski | Owned by: | |
---|---|---|---|
Priority: | critical | Milestone: | 5.x |
Component: | OmeroFs | Keywords: | n.a. |
Cc: | omero-team@… | Story Points: | n.a. |
Sprint: | n.a. | Importance: | n.a. |
Total Remaining Time: | 0.0d | Estimated Remaining Time: | n.a. |
Description (last modified by bpindelski)
This story aims to capture all the tickets related to work needed to be done to get a checksum system in place to guarantee data integrity during an FS upload of a file.
Goal
To guarantee file integrity by conducting checksum calculation during and post upload and to offer the user a choice of checksum speed vs. security. This requirement does not supersede the integrity guaranteed on the transport layer of the OSI model, but enhances it.
Proposed workflow
During upload of a single image, a checksum of a specific type (low/medium/high security) is computed for each element of the image using the byte content of the file under upload. The checksum type and value is attached to the file before upload and transmitted together with the data. On the receiving side, the server reads the checksum type and calculates the value using the same algorithm. If the checksums match, the image is considered valid.
Exceptions:
- checksum mismatch after upload - image considered invalid, error returned to client, the user has the possibility to stop the import process,
- checksum calculation fails client-side due to algorithm error - import process stops automatically,
- checksum calculation fails server-side due to algorithm error - error returned to client, the user has the possibility to stop the import process,
- checksum fails on corrupted file after n-th round of verification - error returned to client,
- checksum capability mismatch between client and server - lowest common denominator chosen, user informed about the algorithm chosen.
Implementation context
The checksum has to be calculated “on-the-fly” to avoid duplicating file I/O operations. Error correction is not a requirement at the current stage. External libraries can be used (with the caveat of supporting JDK 5). The transmission medium does not influence the quality of the checksum algorithm used, but in the future the file type might. The other language bindings (Python, C++) have to be considered during the design stage - the clients will want to work with a unified checksum naming scheme and also the implementations of checksum algorithms have to be present in both Python and C++.
Proposed algorithms
Ordered by computational cost:
- Adler-32,
- CRC-32,
- MD5,
- Murmur hash,
- SHA1.
Change History (9)
comment:1 Changed 11 years ago by agilo
- Status changed from new to accepted
comment:2 Changed 11 years ago by bpindelski
- Description modified (diff)
comment:3 Changed 11 years ago by bpindelski
- Description modified (diff)
comment:4 Changed 11 years ago by bpindelski
- Description modified (diff)
comment:5 Changed 11 years ago by jamoore
Should we consider this work done with demo 3, or keep it open for the next round(s) of checksum work?
comment:6 Changed 11 years ago by mtbcarroll
I am okay with it being closed if open tickets do capture the details of work yet to do on it. For instance, one might consider that https://trac.openmicroscopy.org.uk/ome/ticket/10338#comment:15 includes some checksum capability mismatch discussion, but maybe we at least need to add C++ and Python code to achieve what other server-supported checksum algorithms beyond SHA1 can actually be easily supported in those client libraries and have the clients use that code. One might also want to check with Petr in case the text on this ticket is useful grist for his test plan mill.
comment:7 Changed 10 years ago by mtbcarroll
Perhaps this story can now be closed.
comment:8 Changed 10 years ago by bpindelski
All tickets have been fixed and the checksum code has stabilised. Thanks for all the hard work, Mark.
comment:9 Changed 10 years ago by bpindelski
- Resolution set to fixed
- Status changed from accepted to closed
Updated status, related task in progress