Task #10338 (closed)
Opened 11 years ago
Closed 11 years ago
Checksum choice on import
Reported by: | bpindelski | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | 5.0.0-beta1 |
Component: | OmeroFs | Version: | 5.0.0-beta1 |
Keywords: | n.a. | Cc: | fs@… |
Resources: | n.a. | Referenced By: | n.a. |
References: | n.a. | Remaining Time: | n.a. |
Sprint: | FS demo 4.4 |
Description (last modified by bpindelski)
The user should be able to select a specific checksum algorithm (being aware of the speed/reliability trade-off - that will have to be added to the docs) to be used system-wide. This information has to be passed to the lower layers to init the checksum class with a specific algorithm. The setting should be changeable through the CLI interface.
In a later stage we might want an extra setting of the algorithm for non-image files (i.e. weaker checksum for .doc etc. files).
Change History (27)
comment:1 Changed 11 years ago by jmoore
- Sprint set to FS Demo 2
comment:2 Changed 11 years ago by jmoore
- Sprint FS Demo 2 deleted
- Summary changed from Bug: Checksum choice on import to Checksum choice on import
comment:3 Changed 11 years ago by bpindelski
Good call, Josh. It is indeed new code that's being added to address this issue.
comment:4 Changed 11 years ago by jmoore
- Sprint set to FS Demo 3
comment:5 Changed 11 years ago by mtbcarroll
- Cc mtbcarroll added
comment:6 Changed 11 years ago by mtbcarroll
In making the checksum algorithm client-selectable an obvious issue is how to adjust the API of raw file stores, because they have checksumming built in. Presumably one needn't select an algorithm for reading. At the moment, the checksumming starts on the first write.
For code that shouldn't allow algorithm selection, I'd prefer to hard-code the selection in the caller rather than having raw file stores default to SHA-1 (maybe scripts, user pics?), because then it forces callers to think about the choice and prevents us from accidentally defaulting when we meant to pass in the selection.
Would it be okay to add a setChecksumAlgorithm or suchlike to the raw file store operations: it must be called before the first write? For normal imports, I'd guess that the algorithm can be chosen in ImportSettings and then ManagedImportProcessI or somesuch could set the algorithm on clients' behalf when handing them a store for each file they upload.
comment:7 Changed 11 years ago by mtbcarroll
- Cc fs@… added; mtbcarroll removed
comment:8 Changed 11 years ago by mtbcarroll
The latest commit at http://github.com/mtbc/openmicroscopy/commits/checksum-selection-server (including the commit message) shows the current state of my work, in case it needs to be picked up and made at least somewhat usable while I am away this week.
comment:9 Changed 11 years ago by jamoore
Mark: since RFS isn't responsible for creating the original files, is there anything wrong with doing the algorithm detection via the OriginalFile instance itself?
comment:10 Changed 11 years ago by mtbcarroll
That sounds like an excellent idea that should reduce the changes and complexity. I will give it a try.
comment:11 Changed 11 years ago by mtbcarroll
https://github.com/openmicroscopy/openmicroscopy/pull/995 includes the model changes for storing checksum choice in OriginalFile.
comment:12 Changed 11 years ago by bpindelski
- Description modified (diff)
comment:13 Changed 11 years ago by mtbcarroll
Further work on the hash and hasher columns of originalfile is in https://github.com/openmicroscopy/openmicroscopy/pull/1020
comment:14 Changed 11 years ago by mtbcarroll
https://github.com/openmicroscopy/openmicroscopy/pull/1031 includes management of default checksum algorithm. Clients may now pass null to accept the server default.
This raises a few issues for me. The shorter term one is that my previous code did not allow a null so all the import paths I could find had the client explicitly specify SHA1. This now needs to be undone so that those clients' activity is, where appropriate, instead in accordance with the server default. We can test this by setting the default to be MD5, exercising a variety of upload paths, seeing if any SHA1 hashes appear in the originalfile table, and reviewing the instances in which they do.
comment:15 Changed 11 years ago by mtbcarroll
In the longer term I believe that there is an issue of clients that do not have an implementation for the server's default checksum algorithm and are now handed an import process that is demanding to use it. After some consideration of use cases, and tradeoffs between what's easy and what's desirable, I would suggest that the interaction should be that,
- the server returns to requiring the client to specify the checksum algorithm to create an import process
- the client is still offered the getChecksumAlgorithms API call to find out which are available, so that it may be sure to select one that is supported server-side.
Either the client asks the server which algorithms are supported, and it chooses one, or it lets the server default apply as follows,
- rather than a single default, the server is configured with a list of supported algorithms sorted in order of preference
- the client tells the server which algorithms it supports
- the server tells the client which algorithm to set (the first in the list that the client supports) in requesting an import process.
comment:16 Changed 11 years ago by Josh Moore <josh@…>
(In [5813fdb75dff3cd32efe2ec7893a0f200966a931/ome.git] on branch develop) Merge pull request #1031 from bpindelski/chksm-setting
Set checksum algorithm for import using bin/omero (see #10338)
comment:17 Changed 11 years ago by jburel
I am still not convinced that the users will be able to understand which algorithm to choice.
- Having it configurable in the server yes,
- returning the algo used.
- Displaying how strong the algo is: maybe
- Letting the user select: probably not
comment:18 Changed 11 years ago by jamoore
- Sprint changed from FS Demo 3 to FS Demo 4
Moving to next FS milestone.
comment:19 Changed 11 years ago by mtbcarroll
For when the server default algorithm is changed and clients are happy to accept the server default, some test plan should check where file uploads still result in non-default SHA1 being used.
comment:20 Changed 11 years ago by jburel
Is this still something we wish to pursue for demo 4.x, Great to have but I see that more like a long term goal
Moving it out of 4.x
comment:21 Changed 11 years ago by jburel
- Sprint FS demo 4.x deleted
comment:22 Changed 11 years ago by bpindelski
Looking at the comments in this ticket, I don't think we have a clear strategy on how to move forward with the checksum work. If I understand correctly, the only thing left to do would be to look at implementing the "different checksum per file type" scenario, and that's not a trivial task. I'd say there are more pressing matters, but happy to hear other opinions.
comment:23 Changed 11 years ago by mtbcarroll
One "left to do" thing would plausibly be comment 15; Josh has recently been doing some refactoring that might interact with it.
comment:24 follow-up: ↓ 25 Changed 11 years ago by mtbcarroll
I don't know if Josh's refactoring affected this; I don't mind owning comment 15 factored out into a new ticket.
comment:25 in reply to: ↑ 24 Changed 11 years ago by bpindelski
Replying to mtbcarroll:
I don't know if Josh's refactoring affected this; I don't mind owning comment 15 factored out into a new ticket.
If you're ok with creating an extra ticket and associating it with the story, please do. I'm happy to also route some resources back to looking at checksums.
comment:26 Changed 11 years ago by mtbcarroll
comment:27 Changed 11 years ago by jamoore
- Resolution set to fixed
- Sprint set to FS demo 4.4
- Status changed from new to closed
- Version set to 5.0.0-beta1
I don't see any reason not to. Thanks all.
Blazej, I'm going to be slightly more optimistic and say this isn't a bug but a new feature! :) Also moving out of the demo 2 sprint. Haven't created a demo 3 sprint, but likely we can move it there, no?