Task #10740 (closed)
RFE: Import of large plates- timing
Reported by: | pwalczysko | Owned by: | cblackburn |
---|---|---|---|
Priority: | critical | Milestone: | 5.0.0-rc1 |
Component: | Import | Version: | 5.0.0-beta1 |
Keywords: | FS import | Cc: | ux@…, fs@… |
Resources: | n.a. | Referenced By: | n.a. |
References: | n.a. | Remaining Time: | n.a. |
Sprint: | n.a. |
Description
When importing plates like
/Volumes/ome/data_repo/from_skyking/flex/anatole/Meas_05(2012-05-16_18-51-33)/001009001.flex
/Volumes/ome/data_repo/from_skyking/flex/christophe/B5_1 GFP RFP Draq5 20times 7planes step2.flex
/Volumes/ome/data_repo/from_skyking/flex/anatole/001004000.flex
The part of the import process which takes the majority of time is not Upload any more, but Scanning / Pending
(Pending in the following sense: it was shown as message in Importer, when the item was obviously on the turn to be imported and the previous import in the queue is fully done.)
This is in contrast with large images problematics - we have spent a lot of effort to make Upload phase shorter and make it more user-friendly - possibly the scanning / pending might use some boost as well.
Attachments (8)
Change History (33)
comment:1 Changed 11 years ago by jamoore
- Sprint changed from FS Demo 3 to FS Demo 4
comment:2 Changed 11 years ago by jburel
Another option, will be to upload directly and scan server side but that might imply that "junk" could end up on the server.
- Need way to report
- UI to present what is on the server.
- ability to link files uploaded to data if required etc.
This is more a story than just a RFE
comment:3 Changed 10 years ago by pwalczysko
- Milestone changed from 5.0.0 to 5.0.0-beta1
Again reminded of this when preparing the big plates import for dev_4_4 DB to be upgraded to develop DB on omero4-demo server. Imported about 150-200 GB of data - took 2 working days (the nights were not used for imports) = about 24 hours. Again, scanning swallowed the vast majority of the time. Although this is the dev_4_4 importer, it is obviously something which will be very similar on develop (see above comments).
The formats imported: Operetta and InCell?
data_repo/public/images/HCS/INCELL2000/INMAC384-DAPI-CM-eGFP_59223_1 imported ~16x 9.1 GB each
data_repo/public/images/HCS/Operetta imported ~1x, 4GB
comment:4 Changed 10 years ago by pwalczysko
- Priority changed from major to critical
- Version set to 5.0.0-beta1
comment:5 Changed 10 years ago by jamoore
- Milestone changed from 5.0.0-beta1 to 5.0.0-beta2
- Sprint FS demo 4.x deleted
Was previously in 5.0.0. Pushing to beta2 for an initial glance, but may well need to be pushed back out.
comment:6 Changed 10 years ago by cblackburn
Could this be related this issue http://trac.openmicroscopy.org.uk/ome/ticket/11250#comment:8 ?
comment:7 Changed 10 years ago by rleigh
One thing I noticed which looks like a bottleneck is during the "Scanning" phase, I see a lot of database inserts in the Blitz log. These inserts look like they are all separate operations; could these all be batched up into a single transaction? 100000 inserts in separate transactions is going to kill performance.
comment:8 Changed 10 years ago by cblackburn
- Owner set to cblackburn
- Status changed from new to accepted
comment:9 Changed 10 years ago by jamoore
cblackburn: Certainly fixes from http://trac.openmicroscopy.org.uk/ome/ticket/11250 and to a lesser degree https://github.com/openmicroscopy/openmicroscopy/pull/1878 will have improved this, but I don't think they will have made it go away.
rleigh: do you have an example of such a log? What file format? Which file set?
comment:10 Changed 10 years ago by pwalczysko
@jamoore: @rleigh indicates import of .png format screenshots in the first comment here http://trac.openmicroscopy.org.uk/ome/ticket/11250#comment:8.
comment:11 Changed 10 years ago by rleigh
josh: The plate was /ome/data_repo/plate 11001_Plate_136/ and I don't have any logs I'm afraid; I was just looking at tail -f Blitz-0.log during the various stages of the import.
comment:12 Changed 10 years ago by cblackburn
Firstly, It would be good to be clear about what this ticket is for. From comment 3 above Petr mentions slow 4.4 plate imports. But are plate imports under FS faster, comparing like with like? It sounds like FS imports are faster than 4.4 imports but yet this ticket is marked as critical for FS. I don't doubt there are improvements that can be made to all imports. But will FS offer faster plate imports for users or is there a specific problem highlighted here?
Secondly, in looking at CLI imports versus Insight imports it looks like the feedback between the two doesn't quite tally. The scanning and upload stages are similar. But on CLI the gap between the FILESET_UPLOAD_END and METADATA_IMPORTED events is about 17s. On Insight the gap between the upload appearing to end and the metadata starting to be imported is 40s (this is the pending stage) then the metadata takes another 10s to import. These two periods just don't add up so I wonder if either or both sets of feedback is representative of what is actually happening.
comment:13 Changed 10 years ago by pwalczysko
Imported a plate data_repo/public/images/HCS/INCELL2000/INMAC384-DAPI-CM-eGFP_59223_1/ into omero4-demo server (a develop server) using Insight #513.mac
user-4 Read-only-1 Screen Retest....
15.58 put the plate into the import Queue
15.59 press the Import button
16.04 up till now, the message in Importer was "scanning", which now changed to "Unknown format"
16.09 start of the recording of the attached movie
16.42 message in Importer changed to "pending"
16.44 Upload started in Importer, the progress bar appears
17.08 Upload finished
17.10 Reading pixels message in Importer
17.14 Generating thumbnails message in importer
17.15 Import finished with green tick in Importer
17.20 Thumbnails viewed ok (they could be probably viewed okay immediately at 17.15, but were not)
Total import time on develop: 76 minutes
Total time spent on scanning (or any pre-upload state) on develop: 45 minutes
Compare the Blitzlog and the movie of the Importer (started at 16.09) which are attached.
Changed 10 years ago by pwalczysko
Changed 10 years ago by pwalczysko
comment:14 Changed 10 years ago by pwalczysko
On the same machine, but on an omero4 server, I have imported the same plate as in the previous comment. These 2 imports can be directly compared. This comment is dev_4_4 speed, whereas the previous comment documents develop speed.
Imported a plate data_repo/public/images/HCS/INCELL2000/INMAC384-DAPI-CM-eGFP_59223_1/ into omero4-demo server (a dev_4_4 erver) using Insight (latest Howe build)
user: kyle, Group Nevis group, Screen Retest...
18.22 put the plate to Queue
18.23 press Import button in Insight
18.23 start recording the attached movie - the message in the importer is "scanning"
19.02 (approximately, + - one minute) the message in the importer changed to "Analysing"
19.04 in importer, the files counter started to count down
19.28 message in the importer changed to Processing
19.30 import in importer UI finished with green tick
19.31 images of the plate were viewed and found OK
Total import time on dev_4_4: 68 minutes.
Total time spent on "scanning" and "analysing" in dev_4_4: 41 minutes
See attached Blitzlog and movie (movie started at 18,23).
Changed 10 years ago by pwalczysko
comment:15 Changed 10 years ago by pwalczysko
To summarize: From the previous 2 comments analysis it is clear that
- FS plate import is not faster, but rather slightly slower than the dev_4_4 plate import.
- The main cultprit on the speed of import is the "scanning" phase - this is the phase which matters most
- The discrepancy between CLI imoprt and logs is not something I can comment on, but the Blitzlogs are comparable to the Insight Importer outputs and movies as attached above - how well they bind, I do not know, as I do not understand the Blitzlog.
- The critical status of this ticket comes not from the fact that this is somehow upgrade related (which it is not). Neither it comes from the fact that this is a regression (although slightly slower import on FS can be regarded as regression). The main reason is that we for some reason still assume that large plates import is FASTER on FS - which it should be, but IS definitely NOT. (see also our FS presentation in Paris - the numbers also suggested that FS import is slower).
comment:16 Changed 10 years ago by jamoore
Thanks, guys. I just tried importing the INCELL*223 plate locally and found this during the "Scanning" phase:
"Thread-49" prio=10 tid=0x00007f7244418800 nid=0x18d1 in Object.wait() [0x00007f7206b4b000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000000c2daaf98> (a IceInternal.Outgoing) at java.lang.Object.wait(Object.java:503) at IceInternal.Outgoing.invoke(Outgoing.java:125) - locked <0x00000000c2daaf98> (a IceInternal.Outgoing) at omero.grid._ManagedRepositoryDelM.importFileset(_ManagedRepositoryDelM.java:48) at omero.grid.ManagedRepositoryPrxHelper.importFileset(ManagedRepositoryPrxHelper.java:91) at omero.grid.ManagedRepositoryPrxHelper.importFileset(ManagedRepositoryPrxHelper.java:41) at ome.formats.importer.ImportLibrary.createImport(ImportLibrary.java:264) at org.openmicroscopy.shoola.env.data.OMEROGateway.importImageFile(OMEROGateway.java:6303) at org.openmicroscopy.shoola.env.data.OmeroImageServiceImpl.importFile(OmeroImageServiceImpl.java:1230) at org.openmicroscopy.shoola.env.data.views.calls.ImagesImporter.importFile(ImagesImporter.java:77) at org.openmicroscopy.shoola.env.data.views.calls.ImagesImporter.access$000(ImagesImporter.java:53) at org.openmicroscopy.shoola.env.data.views.calls.ImagesImporter$1.doCall(ImagesImporter.java:102)
i.e. perhaps some of the issue comes from missed events, which I think has been recorded elsewhere. Insight was already saving state to the server, but still listed as "Scanning". Would that explain the discrepancy between Insight and CLI?
If that's the case, perhaps the loss of time comes from the total number of calls to setId (both client-side and server-side). The build on omero4-demo doesn't contain my https://github.com/openmicroscopy/openmicroscopy/pull/1878 branch which should remove one or two of the calls to setId. I'll finish profiling and see if I can detect any other bottlenecks.
comment:17 Changed 10 years ago by jburel
Events like FILESET_UPLOAD_START or FILE_UPLOAD_STARTED are not returned immediately after the scanning phase.
This is noticeable for the incell files, not the case for the flex from test_image_good.
comment:18 Changed 10 years ago by jamoore
The following two screenshots show the time taken for 2 of the import phases (server-side). 28 seconds are taken up with creating the directory structure, while the import itself (the 5 or 6 "steps" of ManagedImportRequestI) take 112 seconds.
I'm currently looking at the makeDirs methods to see if there's any easy fix.
comment:19 Changed 10 years ago by jamoore
PR opened which saves time in the create directory phase: https://github.com/openmicroscopy/openmicroscopy/pull/1885 - in one example siRNA (mias) import, suggestImportPaths went from 16 to 6 seconds on my local machine.
comment:20 Changed 10 years ago by jburel
A new event will also be posted so the user is notified.
comment:21 Changed 10 years ago by jamoore
In the case of MIAS, the above changes have reduced most of the overhead down to the actual reading of bytes. I'll attach a screenshot of what I'm seeing: namely that the MinimalTiffReader.setId calls are taking much of the time. If we can introduce memoization at that level, we should be able to recover a good deal.
Changed 10 years ago by jamoore
relative time between setId and openBytes for TIFFs in MIAS filesets
comment:22 Changed 10 years ago by cblackburn
- Resolution set to fixed
- Status changed from accepted to closed
For Beta 2 some performance issues have been addressed in the PRs cited. Once Beta 2 is released wider performance issues can be examined in the next stage. Tickets will be opened once some further profiling has taken place.
comment:23 Changed 10 years ago by jmoore <josh@…>
(In [5b00d39b5ad9249950b3a61bfe70ed62d749e179/ome.git] on branch develop) Reduce the number of transactions for createOrFixUserDir (See #10740)
comment:24 Changed 10 years ago by jmoore <josh@…>
(In [1ee9822f930f4ff9c08c461795a8c230a7b86a25/ome.git] on branch develop) Encapsulate all mkdir calls into one tx (See #10740)
A single transaction can be used from suggestImportPaths
to create all directories needed for an entire fileset.
This brings one call of importFileset() from 16 down
to 6 seconds.
comment:25 Changed 10 years ago by Josh Moore <josh@…>
(In [f19739f8eb5a1a9f5fceafa422dd9b6dfe17b7ce/ome.git] on branch develop) Merge pull request #1885 from joshmoore/10740-plate-imports
Reduce the number of transactions for import (See #10740)
Moving to next FS milestone for cleanup. If this is more appropriate elsewhere, please feel free to move it.