Context Navigation

Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Task #11192 (closed)

Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

Bug: Performance issues with ZipReader init

Reported by:	omero-qa	Owned by:	mlinkert
Priority:	minor	Milestone:	5.0.0-rc1
Component:	Bio-Formats	Version:	5.0.0-beta1
Keywords:	n.a.	Cc:	julio.mateos-langerak@…, jburel
Resources:	n.a.	Referenced By:	n.a.
References:	n.a.	Remaining Time:	n.a.
Sprint:	n.a.

Description

https://www.openmicroscopy.org/qa2/qa2/qa/feedback/7391/

Comment: Hi,

When I select a zip file to import, it starts to "prep" it an it continues prepping for ages (it is a big zip file of 4Gb).

The problem is that I cannot cancel the prepping unless I quit the importer.

Cheers, Julio

Testing with a 610MB LSM file in a 380MB Zip container. Initialisation of the reader takes around 45 minutes. This looks like the time is primarily spent in the delegate ImageReader? prior to starting up the LSMReader; the LSMReader initialisation and plane reading are relatively fast.

This looks like it might be taking a long while to identify the correct reader to use. However... we have the image filename in the zip already. Looking at initFile in ZipReader?, I'm not entirely sure how the reader.setId() translates to use of the ZipHandle? since we don't (AFAICS) directly tell the reader to use the zip handle or the Location in the zip; I guess this must happen, but I can't yet see where. We're not passing the contained filename to reader.setId, so maybe this causes issues efficiently identifying the correct reader?

References

Change History (7)

comment:1 Changed 6 years ago by rleigh

Component changed from from QA to Bio-Formats

comment:2 Changed 6 years ago by rleigh

Cc jburel added

comment:3 Changed 6 years ago by mlinkert

I'll do what I can here, but importing large Zip files is a really bad idea across the board, especially in FS. Identifying which reader to use for a zipped set of files is no different from identifying the reader for the same files on disk - we still need to look in the file to know for sure as the filename itself is generally not sufficient.

comment:4 Changed 6 years ago by jamoore

Couple of thoughts/questions:

Do we need to look into unzipping the contents, either client-side, server-side or both?
Will the reader caching help sufficiently under FS?

comment:5 Changed 6 years ago by rleigh

I think this depends upon what exactly we are expecting of the ZipReader?. What is its current use case? Currently it doesn't allow import of more than one file, so one can't upload a zip file containing multiple images; it only looks at the first one, and even then it seems to ignore its name and use the name of the zipfile. My question here is whether we are treating the zip as an image in its own right, or just as a container of images. I *think* we're currently doing the former, but I would prefer the latter. I'd like to be able to upload a zipfile of an entire dataset or screen and have it import as though it were a directory.

Currently it will be far far faster to unzip the content client-side and use it directly. But I still don't see why it's so slow--it shouldn't need to try so many readers out when it has a unique extension as in the case of LSM above. Unless it is doing that and it really is this slow.

If we do cache the reader, it will definitely help. It's respectably fast when it's using the correct reader; but with the above limitations. But do we want to be storing the zip on the server side?

comment:6 Changed 6 years ago by mlinkert

Resolution set to fixed
Status changed from new to closed
Version set to 4.4.9

Should be fixed with: https://github.com/openmicroscopy/bioformats/pull/796

comment:7 Changed 6 years ago by jamoore

Milestone changed from Unscheduled to 5.0.0-beta2
Version changed from 4.4.9 to 5.0.0-beta1

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

Download in other formats: