Warning: Can't synchronize with repository "(default)" (/home/git/ome.git does not appear to be a Git repository.). Look in the Trac log for more information.
Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Task #2581 (closed)

Opened 11 years ago

Closed 9 years ago

Use and validate real SHA1s on all uploads

Reported by: jamoore Owned by: mtbcarroll
Priority: critical Milestone: 5.0.0-beta1
Component: OmeroFs Version: n.a.
Keywords: n.a. Cc: fs@…
Resources: n.a. Referenced By: n.a.
References: n.a. Remaining Time: n.a.
Sprint: FS demo 4.x

Description (last modified by mtbcarroll)

In many places, we are still setting OriginalFile.sha1 to "pending". These should all be fixed in order to detect file corruptions on uploads.

Change History (27)

comment:1 Changed 10 years ago by jmoore

Referencing ticket #2595 has changed sprint.

comment:2 Changed 9 years ago by cblackburn

Referencing ticket #2595 has changed sprint.

comment:3 Changed 9 years ago by jburel

Referencing ticket #2595 has changed sprint.

comment:4 Changed 9 years ago by jmoore

  • Component changed from General to OmeroFs
  • Milestone changed from Unscheduled to OMERO-5
  • Sprint set to FS Demo 2

comment:5 Changed 9 years ago by mtbcarroll

  • Owner set to mtbcarroll
  • Status changed from new to accepted

comment:6 Changed 9 years ago by mtbcarroll

Does anybody know why they were set to pending in the first place: is there a need to fire off an asynchronous checksummer for performance reasons, or something? Still looking at the actual cases ...

comment:7 Changed 9 years ago by jmoore

Primarily to save the overhead of generating the SHA1s in the calling thread. So they will likely need to be reviewed on a case by case basis. But if we now have the option of selecting something with less overhead than sha1 then perhaps we can do it in-thread.

comment:8 Changed 9 years ago by mtbcarroll

True. Also at some point, not tied to a client call, we'll want a separate checksummer job anyway for the subsequent re-checks of files.

comment:9 Changed 9 years ago by jmoore

Definitely. Initially a quartz job would probably suffice until we have a proper MQ.

comment:10 Changed 9 years ago by mtbcarroll

We seem to have curious use of SHA1 in,

  • server
    • PixelsImpl sets it to "Pending..."
  • blitz
    • OMEROMetadataStoreClient sets it for "Foo" for a Pixels instance
    • SharedResourcesI sets it to "UNKNOWN" for a new table
    • AbstractRepositoryI sets it to the repository UUID
    • CheckedPath can set it to ""
  • insight
    • OMEROGateway sets it to "pending".

comment:11 Changed 9 years ago by mtbcarroll

  • Cc fs@… added; jburel cxallan removed

comment:12 Changed 9 years ago by jmoore

Of all of these, the only one that I know to be used is AbstractRepositoryI all the rest amount to "we didn't calculate it". I don't know of any logic which actively expects any of "UNKNOWN", "pending", "Pending...", etc. (and certainly not Foo!) so it would seem fine to at least standardize though as a first step.

comment:13 Changed 9 years ago by jmoore

  • Sprint FS Demo 2 deleted

Not included in fs demo 2 tag. Removing from sprint

comment:14 follow-up: Changed 9 years ago by mtbcarroll

AbstractRepositoryI seems to (ab)use the sha1 column but in develop the RepositoryDaoImpl does use the UUID stored there and it all seems to work. Separately, develop's CheckedPath quite reasonably puts empty strings in there to signify directories and non-existent files, but it really makes me wonder if the column should be made nullable and then these other string constants can also simply be null.

OMEROMetadataStoreClient and PixelsImpl are concerned with Pixel instances' sha1 properties which perhaps this ticket doesn't care about. Though, dev_4_4's OMEROMetadataStoreClient also seems to be using it on OriginalFile instances when archiving on image import, and I haven't yet satisfied myself this gets fixed at the other end after upload.

Similarly, perhaps OMEROGateway's setting of it for file upload is fixed at the other end by the call stack eventually hitting RawFileBean.save but I've not yet seen how this happens.

What's going on with SharedResourcesI and the tables I'm not sure at all; at least, I'm doubtful that tables' checksums get either updated or consulted.

Some of this I'll try to resolve empirically with the help of a debugger and psql but anyone more familiar with the purpose of these methods and the flow of control among them is very welcome to chime in with spoilers.

comment:15 in reply to: ↑ 14 Changed 9 years ago by jmoore

For any of the upload scenarios, like the gateway, we should really move to a verification step to know that the file has actually been properly uploaded. As to nullability, certainly an option to consider.

comment:16 Changed 9 years ago by mtbcarroll

Regardless of any unresolvedness in the above, proposed first steps for dev_4_4 look to be to ensure that SHA1's are actually checked for,

  • OMEROGateway.uploadFile, using the checksum from RawFileBean.save (test using file attachments)
  • file import from Insight and CLI,
    • for original metadata import into Files/ (from RawFileBean.save's checksum)
    • for archived files, which currently have an SHA1 of "Pending" from OMEROMetadataStoreClient.createOriginalFileFromFile (different in develop).

comment:17 Changed 9 years ago by mtbcarroll

One issue is that of how to write integration tests for ImportLibrary that simulate import-time server/client checksum mismatch.

comment:18 Changed 9 years ago by jmoore

Catch a notification event, open a new RawFileStore and overwrite some of the data?

comment:19 Changed 9 years ago by mtbcarroll

http://github.com/openmicroscopy/openmicroscopy/pull/890 addresses some of the above. If it omits anything that is important to resolve in the near term, feel free to comment here in order to draw attention to that gap.

comment:20 Changed 9 years ago by mtbcarroll

On the topic of nullability for inappropriate OriginalFile instance properties, https://github.com/openmicroscopy/openmicroscopy/pull/892 is now opened.

comment:21 Changed 9 years ago by jmoore

  • Sprint set to FS Demo 3

comment:22 Changed 9 years ago by mtbcarroll

A shorter-term to-do here is checksums for attaching files through the web client. A good starting point might be to look at components/tools/OmeroWeb/omeroweb/webclient/controller/container.py which leaves a hash in the DB of 'pending'.

comment:23 Changed 9 years ago by mtbcarroll

  • Description modified (diff)
  • Sprint changed from FS Demo 3 to FS Demo 4
  • Summary changed from Use and validate real SHA1s on all uploads / downloads to Use and validate real SHA1s on all uploads

Split download verification out into http://trac.openmicroscopy.org.uk/ome/ticket/10754

comment:24 Changed 9 years ago by mtbcarroll

The above shorter-term to-do is addressed in https://github.com/openmicroscopy/openmicroscopy/pull/1073

I am not aware of any outstanding to-do's.

comment:25 Changed 9 years ago by bpindelski

One more thing left to fix on dev_4_4. Please see https://trac.openmicroscopy.org.uk/ome/ticket/10727.

Last edited 9 years ago by bpindelski (previous) (diff)

comment:26 Changed 9 years ago by bpindelski

Closing due the amount of work remaining on other fronts before Paris 2013. If the issues still cause bugs, separate tickets can be opened. Comments here might be useful in the future.

comment:27 Changed 9 years ago by bpindelski

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.80721 sec.)

We're Hiring!