User Story #10335 (closed)
FS checksums for data integrity
|Reported by:||bpindelski||Owned by:|
|Total Remaining Time:||0.0d||Estimated Remaining Time:||n.a.|
Description (last modified by bpindelski)
This story aims to capture all the tickets related to work needed to be done to get a checksum system in place to guarantee data integrity during an FS upload of a file.
To guarantee file integrity by conducting checksum calculation during and post upload and to offer the user a choice of checksum speed vs. security. This requirement does not supersede the integrity guaranteed on the transport layer of the OSI model, but enhances it.
During upload of a single image, a checksum of a specific type (low/medium/high security) is computed for each element of the image using the byte content of the file under upload. The checksum type and value is attached to the file before upload and transmitted together with the data. On the receiving side, the server reads the checksum type and calculates the value using the same algorithm. If the checksums match, the image is considered valid.
- checksum mismatch after upload - image considered invalid, error returned to client, the user has the possibility to stop the import process,
- checksum calculation fails client-side due to algorithm error - import process stops automatically,
- checksum calculation fails server-side due to algorithm error - error returned to client, the user has the possibility to stop the import process,
- checksum fails on corrupted file after n-th round of verification - error returned to client,
- checksum capability mismatch between client and server - lowest common denominator chosen, user informed about the algorithm chosen.
The checksum has to be calculated “on-the-fly” to avoid duplicating file I/O operations. Error correction is not a requirement at the current stage. External libraries can be used (with the caveat of supporting JDK 5). The transmission medium does not influence the quality of the checksum algorithm used, but in the future the file type might. The other language bindings (Python, C++) have to be considered during the design stage - the clients will want to work with a unified checksum naming scheme and also the implementations of checksum algorithms have to be present in both Python and C++.
Ordered by computational cost:
- Murmur hash,