Warning: Can't synchronize with repository "(default)" (/home/git/ome.git does not appear to be a Git repository.). Look in the Trac log for more information.
Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Task #12012 (new)

Opened 10 years ago

Last modified 10 years ago

BUG: Importing a large number of files fails — at Version 3

Reported by: spli Owned by:
Priority: critical Milestone: 5.0.1
Component: Import Version: 5.0.0-rc3
Keywords: n.a. Cc: rleigh, java@…
Resources: n.a. Referenced By: n.a.
References: n.a. Remaining Time: n.a.
Sprint: n.a.

Description (last modified by jamoore)

OMERO.clients-5.0.0-ice35-b19.mac.zip, gretzky: OMERO.web 5.0.0-ice33-b19.

Import a dataset with a large number of files, e.g. /ome/team/simon/digits/train-png.tar.bz (42,000 files split into 10 directories).

The import failed after importing 2492 files. The server appears to have open file handles to a log file for each image:

omero@gretzky:~$ lsof -u omero | grep /nightshade/OMERO/omero-c6100-dec-5-2013-valuable/OMERO-merge-develop-FS/ManagedRepository/ |wc -l
2503

e.g.

...
/nightshade/OMERO/omero-c6100-dec-5-2013-valuable/OMERO-merge-develop-FS/ManagedRepository/user-4_5/2014-02/24/17-38-55.156.log
...

Insight has managed to get itself stuck at 100% cpu with an Abnormal Termination:

java.lang.Exception: Abnormal termination due to an uncaught exception.
java.lang.StackOverflowError
	at java.awt.AWTEventMulticaster.componentHidden(AWTEventMulticaster.java:172)
...1022 copies of this line cut...
	at java.awt.AWTEventMulticaster.componentHidden(AWTEventMulticaster.java:172)
Abnormal termination due to an uncaught exception.
java.lang.StackOverflowError
	at java.awt.AWTEventMulticaster.componentHidden(AWTEventMulticaster.java:172)
...1022 copies of this line cut...
	at java.awt.AWTEventMulticaster.componentHidden(AWTEventMulticaster.java:172)
Exception in thread "AWT-EventQueue-0"

	at org.openmicroscopy.shoola.env.ui.UserNotifierImpl.showErrorDialog(UserNotifierImpl.java:191)
	at org.openmicroscopy.shoola.env.ui.UserNotifierImpl.notifyError(UserNotifierImpl.java:291)
	at org.openmicroscopy.shoola.env.AbnormalExitHandler.doTermination(AbnormalExitHandler.java:147)
	at org.openmicroscopy.shoola.env.AbnormalExitHandler.terminate(AbnormalExitHandler.java:85)
	at org.openmicroscopy.shoola.env.AWTExceptionHanlder.handle(AWTExceptionHanlder.java:99)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at java.awt.EventDispatchThread.handleException(EventDispatchThread.java:412)
	at java.awt.EventDispatchThread.processException(EventDispatchThread.java:318)
	at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:312)
	at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:211)
	at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:201)
	at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:196)
	at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:188)
	at java.awt.EventDispatchThread.run(EventDispatchThread.java:122)

Whilst this dialog is displayed Insight appears to continue attempting to upload files, judging by omeroinsight.log:

...
2014-02-24 17:42:14,583 INFO  [   ome.formats.importer.ImportCandidates] ( Threa
d-18) 1 file(s) parsed into 1 group(s) with 1 call(s) to setId in 1ms. (32ms tot
al) [0 unknowns]
2014-02-24 17:42:14,583 INFO  [       ome.formats.importer.ImportConfig] ( Threa
d-18) OMERO Version: 5.0.0-ice35-b19
2014-02-24 17:42:14,584 INFO  [       ome.formats.importer.ImportConfig] ( Threa
d-18) Bioformats version: 5.0.0-ice35-b19 revision: cd2a5db date: 24 February 20
14
2014-02-24 17:42:14,584 DEBUG [          o.o.shoola.util.NetworkChecker] ( Threa
d-18) Cached networkup: true
2014-02-24 17:42:14,608 INFO  [       ome.formats.importer.ImportConfig] ( Thread-18) OMERO Version: 5.0.0-ice35-b19
2014-02-24 17:42:14,608 INFO  [       ome.formats.importer.ImportConfig] ( Thread-18) Bioformats version: 5.0.0-ice35-b19 revision: cd2a5db date: 24 February 2014
2014-02-24 17:42:14,608 INFO  [   ome.formats.importer.ImportCandidates] ( Thread-18) Depth: 4 Metadata Level: MINIMUM
2014-02-24 17:42:14,623 INFO  [                loci.formats.ImageReader] ( Thread-18) APNGReader initializing /Users/simon/omero-test/tmp/train-png/1/00001447.png
2014-02-24 17:42:14,640 INFO  [   ome.formats.importer.ImportCandidates] ( Thread-18) 1 file(s) parsed into 1 group(s) with 1 call(s) to setId in 1ms. (32ms total) [0 unknowns]
2014-02-24 17:42:14,640 INFO  [       ome.formats.importer.ImportConfig] ( Thread-18) OMERO Version: 5.0.0-ice35-b19
2014-02-24 17:42:14,640 INFO  [       ome.formats.importer.ImportConfig] ( Thread-18) Bioformats version: 5.0.0-ice35-b19 revision: cd2a5db date: 24 February 2014
2014-02-24 17:42:14,640 DEBUG [          o.o.shoola.util.NetworkChecker] ( Thread-18) Cached networkup: true
2014-02-24 17:42:14,661 INFO  [       ome.formats.importer.ImportConfig] ( Thread-18) OMERO Version: 5.0.0-ice35-b19
2014-02-24 17:42:14,661 INFO  [       ome.formats.importer.ImportConfig] ( Thread-18) Bioformats version: 5.0.0-ice35-b19 revision: cd2a5db date: 24 February 2014
2014-02-24 17:42:14,661 INFO  [   ome.formats.importer.ImportCandidates] ( Thread-18) Depth: 4 Metadata Level: MINIMUM
2014-02-24 17:42:14,677 INFO  [                loci.formats.ImageReader] ( Thread-18) APNGReader initializing /Users/simon/omero-test/tmp/train-png/1/00039986.png
...

Though there is no sign of any imports happening server-side.

Blitz-0.log is full of errors:

grep ERROR OMERO-CURRENT/var/log/Blitz-0.log

...
2014-02-24 18:06:44,215 ERROR [                omero.util.ServantHolder] (l.Server-2) servantsPerSession reached for 51689bc8-e36f-430d-9f81-11d0182904de: 10000
2014-02-24 18:06:47,298 ERROR [                omero.util.ServantHolder] (l.Server-4) servantsPerSession reached for 51689bc8-e36f-430d-9f81-11d0182904de: 10000
2014-02-24 18:06:47,360 ERROR [                omero.util.ServantHolder] (l.Server-2) servantsPerSession reached for 51689bc8-e36f-430d-9f81-11d0182904de: 10000
2014-02-24 18:06:49,916 ERROR [                omero.util.ServantHolder] (l.Server-3) servantsPerSession reached for 51689bc8-e36f-430d-9f81-11d0182904de: 10000
2014-02-24 18:06:50,043 ERROR [                omero.util.ServantHolder] (l.Server-2) servantsPerSession reached for 51689bc8-e36f-430d-9f81-11d0182904de: 10000
...

Change History (3)

comment:1 Changed 10 years ago by rleigh

This is exactly the issue I triggered last Thursday: lsof showed hundreds of open import logs.

comment:2 Changed 10 years ago by spli

File handles were not cleaned up after 15 minutes. However after Insight was started again and a new import kicked off they were closed.

Note I've seen the same issue with a different server on the 5.1 branch... handles remained open after over 1 hour but seemed to be cleared when a new import started.

comment:3 Changed 10 years ago by jamoore

  • Description modified (diff)
  • Milestone changed from Unscheduled to 5.0.1
  • Priority changed from major to critical

I missed an obvious point earlier: there's no reason that a single fileset should have that many servants / filehandles left open. i.e. there is an additional problem.

However, what I said about the failure to call close on abnormal termination still holds: that will leave an import unclosed which is "A Bad Thing (TM)". We should A) try to do whatever cleanup is possible in such cases (or avoid abnormal terminiations??) and perhaps B) check for closed imports on connect. This will have to be fairly careful to also test the connected client count.

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.69218 sec.)

We're Hiring!