User Story #10335 (new)
Opened 11 years ago
Last modified 10 years ago
FS checksums for data integrity — at Initial Version
Reported by: | bpindelski | Owned by: | |
---|---|---|---|
Priority: | critical | Milestone: | 5.x |
Component: | OmeroFs | Keywords: | n.a. |
Cc: | omero-team@… | Story Points: | n.a. |
Sprint: | n.a. | Importance: | n.a. |
Total Remaining Time: | 0.0d | Estimated Remaining Time: | n.a. |
Description
This story aims to capture all the tickets related to work needed to be done to get a checksum system in place to guarantee data integrity during an FS upload of a file.
Goal
To guarantee file integrity by conducting checksum calculation during and post upload and to offer the user a choice of checksum speed vs. security. This requirement does not supersede the integrity guaranteed on the transport layer of the OSI model, but enhances it.
Proposed workflow
During upload of a single image, a checksum of a specific type (low/medium/high security) is computed for each element of the image using the byte content of the file under upload. The checksum type and value is attached to the file before upload and transmitted together with the data. On the receiving side, the server reads the checksum type and calculates the value using the same algorithm. If the checksums match, the image is considered valid.
Exceptions:
- checksum mismatch after upload - image considered invalid, error returned to client, the user has the possibility to stop the import process,
- checksum calculation fails client-side due to algorithm error - import process stops automatically,
- checksum calculation fails server-side due to algorithm error - error returned to client, the user has the possibility to stop the import process,
- checksum fails on corrupted file after n-th round of verification - error returned to client,
- checksum capability mismatch between client and server - lowest common denominator chosen, user informed about the algorithm chosen.
Implementation context
The checksum has to be calculated “on-the-fly” to avoid duplicating file I/O operations. Error correction is not a requirement at the current stage. External libraries can be used (with the caveat of supporting JDK 5). The transmission medium does not influence the quality of the checksum algorithm used, but in the future the file type might.
Proposed algorithms
Ordered by computational cost (list to be extended):
- Adler-32,
- CRC-32,
- Murmur hash,