Task #11192 (closed)
Bug: Performance issues with ZipReader init
| Reported by: | omero-qa | Owned by: | mlinkert |
|---|---|---|---|
| Priority: | minor | Milestone: | 5.0.0-rc1 |
| Component: | Bio-Formats | Version: | 5.0.0-beta1 |
| Keywords: | n.a. | Cc: | julio.mateos-langerak@…, jburel |
| Resources: | n.a. | Referenced By: | n.a. |
| References: | n.a. | Remaining Time: | n.a. |
| Sprint: | n.a. |
Description
https://www.openmicroscopy.org/qa2/qa2/qa/feedback/7391/
Comment: Hi,
When I select a zip file to import, it starts to "prep" it an it continues prepping for ages (it is a big zip file of 4Gb).
The problem is that I cannot cancel the prepping unless I quit the importer.
Cheers, Julio
Testing with a 610MB LSM file in a 380MB Zip container. Initialisation of the reader takes around 45 minutes. This looks like the time is primarily spent in the delegate ImageReader? prior to starting up the LSMReader; the LSMReader initialisation and plane reading are relatively fast.
This looks like it might be taking a long while to identify the correct reader to use. However... we have the image filename in the zip already. Looking at initFile in ZipReader?, I'm not entirely sure how the reader.setId() translates to use of the ZipHandle? since we don't (AFAICS) directly tell the reader to use the zip handle or the Location in the zip; I guess this must happen, but I can't yet see where. We're not passing the contained filename to reader.setId, so maybe this causes issues efficiently identifying the correct reader?
Change History (7)
comment:1 Changed 6 years ago by rleigh
- Component changed from from QA to Bio-Formats
comment:2 Changed 6 years ago by rleigh
- Cc jburel added
comment:3 Changed 6 years ago by mlinkert
comment:4 Changed 6 years ago by jamoore
Couple of thoughts/questions:
- Do we need to look into unzipping the contents, either client-side, server-side or both?
- Will the reader caching help sufficiently under FS?
comment:5 Changed 6 years ago by rleigh
I think this depends upon what exactly we are expecting of the ZipReader?. What is its current use case? Currently it doesn't allow import of more than one file, so one can't upload a zip file containing multiple images; it only looks at the first one, and even then it seems to ignore its name and use the name of the zipfile. My question here is whether we are treating the zip as an image in its own right, or just as a container of images. I *think* we're currently doing the former, but I would prefer the latter. I'd like to be able to upload a zipfile of an entire dataset or screen and have it import as though it were a directory.
Currently it will be far far faster to unzip the content client-side and use it directly. But I still don't see why it's so slow--it shouldn't need to try so many readers out when it has a unique extension as in the case of LSM above. Unless it is doing that and it really is this slow.
If we do cache the reader, it will definitely help. It's respectably fast when it's using the correct reader; but with the above limitations. But do we want to be storing the zip on the server side?
comment:6 Changed 6 years ago by mlinkert
- Resolution set to fixed
- Status changed from new to closed
- Version set to 4.4.9
Should be fixed with: https://github.com/openmicroscopy/bioformats/pull/796
comment:7 Changed 6 years ago by jamoore
- Milestone changed from Unscheduled to 5.0.0-beta2
- Version changed from 4.4.9 to 5.0.0-beta1
I'll do what I can here, but importing large Zip files is a really bad idea across the board, especially in FS. Identifying which reader to use for a zipped set of files is no different from identifying the reader for the same files on disk - we still need to look in the file to know for sure as the filename itself is generally not sufficient.