Task #12012 (new)
Opened 10 years ago
Last modified 10 years ago
BUG: Importing a large number of files fails — at Version 3
Reported by: | spli | Owned by: | |
---|---|---|---|
Priority: | critical | Milestone: | 5.0.1 |
Component: | Import | Version: | 5.0.0-rc3 |
Keywords: | n.a. | Cc: | rleigh, java@… |
Resources: | n.a. | Referenced By: | n.a. |
References: | n.a. | Remaining Time: | n.a. |
Sprint: | n.a. |
Description (last modified by jamoore)
OMERO.clients-5.0.0-ice35-b19.mac.zip, gretzky: OMERO.web 5.0.0-ice33-b19.
Import a dataset with a large number of files, e.g. /ome/team/simon/digits/train-png.tar.bz (42,000 files split into 10 directories).
The import failed after importing 2492 files. The server appears to have open file handles to a log file for each image:
omero@gretzky:~$ lsof -u omero | grep /nightshade/OMERO/omero-c6100-dec-5-2013-valuable/OMERO-merge-develop-FS/ManagedRepository/ |wc -l 2503
e.g.
... /nightshade/OMERO/omero-c6100-dec-5-2013-valuable/OMERO-merge-develop-FS/ManagedRepository/user-4_5/2014-02/24/17-38-55.156.log ...
Insight has managed to get itself stuck at 100% cpu with an Abnormal Termination:
java.lang.Exception: Abnormal termination due to an uncaught exception. java.lang.StackOverflowError at java.awt.AWTEventMulticaster.componentHidden(AWTEventMulticaster.java:172) ...1022 copies of this line cut... at java.awt.AWTEventMulticaster.componentHidden(AWTEventMulticaster.java:172) Abnormal termination due to an uncaught exception. java.lang.StackOverflowError at java.awt.AWTEventMulticaster.componentHidden(AWTEventMulticaster.java:172) ...1022 copies of this line cut... at java.awt.AWTEventMulticaster.componentHidden(AWTEventMulticaster.java:172) Exception in thread "AWT-EventQueue-0" at org.openmicroscopy.shoola.env.ui.UserNotifierImpl.showErrorDialog(UserNotifierImpl.java:191) at org.openmicroscopy.shoola.env.ui.UserNotifierImpl.notifyError(UserNotifierImpl.java:291) at org.openmicroscopy.shoola.env.AbnormalExitHandler.doTermination(AbnormalExitHandler.java:147) at org.openmicroscopy.shoola.env.AbnormalExitHandler.terminate(AbnormalExitHandler.java:85) at org.openmicroscopy.shoola.env.AWTExceptionHanlder.handle(AWTExceptionHanlder.java:99) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at java.awt.EventDispatchThread.handleException(EventDispatchThread.java:412) at java.awt.EventDispatchThread.processException(EventDispatchThread.java:318) at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:312) at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:211) at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:201) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:196) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:188) at java.awt.EventDispatchThread.run(EventDispatchThread.java:122)
Whilst this dialog is displayed Insight appears to continue attempting to upload files, judging by omeroinsight.log:
... 2014-02-24 17:42:14,583 INFO [ ome.formats.importer.ImportCandidates] ( Threa d-18) 1 file(s) parsed into 1 group(s) with 1 call(s) to setId in 1ms. (32ms tot al) [0 unknowns] 2014-02-24 17:42:14,583 INFO [ ome.formats.importer.ImportConfig] ( Threa d-18) OMERO Version: 5.0.0-ice35-b19 2014-02-24 17:42:14,584 INFO [ ome.formats.importer.ImportConfig] ( Threa d-18) Bioformats version: 5.0.0-ice35-b19 revision: cd2a5db date: 24 February 20 14 2014-02-24 17:42:14,584 DEBUG [ o.o.shoola.util.NetworkChecker] ( Threa d-18) Cached networkup: true 2014-02-24 17:42:14,608 INFO [ ome.formats.importer.ImportConfig] ( Thread-18) OMERO Version: 5.0.0-ice35-b19 2014-02-24 17:42:14,608 INFO [ ome.formats.importer.ImportConfig] ( Thread-18) Bioformats version: 5.0.0-ice35-b19 revision: cd2a5db date: 24 February 2014 2014-02-24 17:42:14,608 INFO [ ome.formats.importer.ImportCandidates] ( Thread-18) Depth: 4 Metadata Level: MINIMUM 2014-02-24 17:42:14,623 INFO [ loci.formats.ImageReader] ( Thread-18) APNGReader initializing /Users/simon/omero-test/tmp/train-png/1/00001447.png 2014-02-24 17:42:14,640 INFO [ ome.formats.importer.ImportCandidates] ( Thread-18) 1 file(s) parsed into 1 group(s) with 1 call(s) to setId in 1ms. (32ms total) [0 unknowns] 2014-02-24 17:42:14,640 INFO [ ome.formats.importer.ImportConfig] ( Thread-18) OMERO Version: 5.0.0-ice35-b19 2014-02-24 17:42:14,640 INFO [ ome.formats.importer.ImportConfig] ( Thread-18) Bioformats version: 5.0.0-ice35-b19 revision: cd2a5db date: 24 February 2014 2014-02-24 17:42:14,640 DEBUG [ o.o.shoola.util.NetworkChecker] ( Thread-18) Cached networkup: true 2014-02-24 17:42:14,661 INFO [ ome.formats.importer.ImportConfig] ( Thread-18) OMERO Version: 5.0.0-ice35-b19 2014-02-24 17:42:14,661 INFO [ ome.formats.importer.ImportConfig] ( Thread-18) Bioformats version: 5.0.0-ice35-b19 revision: cd2a5db date: 24 February 2014 2014-02-24 17:42:14,661 INFO [ ome.formats.importer.ImportCandidates] ( Thread-18) Depth: 4 Metadata Level: MINIMUM 2014-02-24 17:42:14,677 INFO [ loci.formats.ImageReader] ( Thread-18) APNGReader initializing /Users/simon/omero-test/tmp/train-png/1/00039986.png ...
Though there is no sign of any imports happening server-side.
Blitz-0.log is full of errors:
grep ERROR OMERO-CURRENT/var/log/Blitz-0.log ... 2014-02-24 18:06:44,215 ERROR [ omero.util.ServantHolder] (l.Server-2) servantsPerSession reached for 51689bc8-e36f-430d-9f81-11d0182904de: 10000 2014-02-24 18:06:47,298 ERROR [ omero.util.ServantHolder] (l.Server-4) servantsPerSession reached for 51689bc8-e36f-430d-9f81-11d0182904de: 10000 2014-02-24 18:06:47,360 ERROR [ omero.util.ServantHolder] (l.Server-2) servantsPerSession reached for 51689bc8-e36f-430d-9f81-11d0182904de: 10000 2014-02-24 18:06:49,916 ERROR [ omero.util.ServantHolder] (l.Server-3) servantsPerSession reached for 51689bc8-e36f-430d-9f81-11d0182904de: 10000 2014-02-24 18:06:50,043 ERROR [ omero.util.ServantHolder] (l.Server-2) servantsPerSession reached for 51689bc8-e36f-430d-9f81-11d0182904de: 10000 ...
Change History (3)
comment:1 Changed 10 years ago by rleigh
comment:2 Changed 10 years ago by spli
File handles were not cleaned up after 15 minutes. However after Insight was started again and a new import kicked off they were closed.
Note I've seen the same issue with a different server on the 5.1 branch... handles remained open after over 1 hour but seemed to be cleared when a new import started.
comment:3 Changed 10 years ago by jamoore
- Description modified (diff)
- Milestone changed from Unscheduled to 5.0.1
- Priority changed from major to critical
I missed an obvious point earlier: there's no reason that a single fileset should have that many servants / filehandles left open. i.e. there is an additional problem.
However, what I said about the failure to call close on abnormal termination still holds: that will leave an import unclosed which is "A Bad Thing (TM)". We should A) try to do whatever cleanup is possible in such cases (or avoid abnormal terminiations??) and perhaps B) check for closed imports on connect. This will have to be fairly careful to also test the connected client count.
This is exactly the issue I triggered last Thursday: lsof showed hundreds of open import logs.