Warning: Can't synchronize with repository "(default)" (/home/git/ome.git does not appear to be a Git repository.). Look in the Trac log for more information.
Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Task #12012 (closed)

Opened 10 years ago

Closed 10 years ago

BUG: Importing a large number of files fails

Reported by: spli Owned by: dlindner
Priority: critical Milestone: 5.0.1
Component: Insight Version: 5.0.0-rc3
Keywords: n.a. Cc: rleigh, java@…
Resources: n.a. Referenced By: n.a.
References: n.a. Remaining Time: n.a.
Sprint: n.a.

Description (last modified by jamoore)

OMERO.clients-5.0.0-ice35-b19.mac.zip, gretzky: OMERO.web 5.0.0-ice33-b19.

Import a dataset with a large number of files, e.g. /ome/team/simon/digits/train-png.tar.bz (42,000 files split into 10 directories).

The import failed after importing 2492 files. The server appears to have open file handles to a log file for each image:

omero@gretzky:~$ lsof -u omero | grep /nightshade/OMERO/omero-c6100-dec-5-2013-valuable/OMERO-merge-develop-FS/ManagedRepository/ |wc -l
2503

e.g.

...
/nightshade/OMERO/omero-c6100-dec-5-2013-valuable/OMERO-merge-develop-FS/ManagedRepository/user-4_5/2014-02/24/17-38-55.156.log
...

Insight has managed to get itself stuck at 100% cpu with an Abnormal Termination:

java.lang.Exception: Abnormal termination due to an uncaught exception.
java.lang.StackOverflowError
	at java.awt.AWTEventMulticaster.componentHidden(AWTEventMulticaster.java:172)
...1022 copies of this line cut...
	at java.awt.AWTEventMulticaster.componentHidden(AWTEventMulticaster.java:172)
Abnormal termination due to an uncaught exception.
java.lang.StackOverflowError
	at java.awt.AWTEventMulticaster.componentHidden(AWTEventMulticaster.java:172)
...1022 copies of this line cut...
	at java.awt.AWTEventMulticaster.componentHidden(AWTEventMulticaster.java:172)
Exception in thread "AWT-EventQueue-0"

	at org.openmicroscopy.shoola.env.ui.UserNotifierImpl.showErrorDialog(UserNotifierImpl.java:191)
	at org.openmicroscopy.shoola.env.ui.UserNotifierImpl.notifyError(UserNotifierImpl.java:291)
	at org.openmicroscopy.shoola.env.AbnormalExitHandler.doTermination(AbnormalExitHandler.java:147)
	at org.openmicroscopy.shoola.env.AbnormalExitHandler.terminate(AbnormalExitHandler.java:85)
	at org.openmicroscopy.shoola.env.AWTExceptionHanlder.handle(AWTExceptionHanlder.java:99)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at java.awt.EventDispatchThread.handleException(EventDispatchThread.java:412)
	at java.awt.EventDispatchThread.processException(EventDispatchThread.java:318)
	at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:312)
	at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:211)
	at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:201)
	at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:196)
	at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:188)
	at java.awt.EventDispatchThread.run(EventDispatchThread.java:122)

Whilst this dialog is displayed Insight appears to continue attempting to upload files, judging by omeroinsight.log:

...
2014-02-24 17:42:14,583 INFO  [   ome.formats.importer.ImportCandidates] ( Threa
d-18) 1 file(s) parsed into 1 group(s) with 1 call(s) to setId in 1ms. (32ms tot
al) [0 unknowns]
2014-02-24 17:42:14,583 INFO  [       ome.formats.importer.ImportConfig] ( Threa
d-18) OMERO Version: 5.0.0-ice35-b19
2014-02-24 17:42:14,584 INFO  [       ome.formats.importer.ImportConfig] ( Threa
d-18) Bioformats version: 5.0.0-ice35-b19 revision: cd2a5db date: 24 February 20
14
2014-02-24 17:42:14,584 DEBUG [          o.o.shoola.util.NetworkChecker] ( Threa
d-18) Cached networkup: true
2014-02-24 17:42:14,608 INFO  [       ome.formats.importer.ImportConfig] ( Thread-18) OMERO Version: 5.0.0-ice35-b19
2014-02-24 17:42:14,608 INFO  [       ome.formats.importer.ImportConfig] ( Thread-18) Bioformats version: 5.0.0-ice35-b19 revision: cd2a5db date: 24 February 2014
2014-02-24 17:42:14,608 INFO  [   ome.formats.importer.ImportCandidates] ( Thread-18) Depth: 4 Metadata Level: MINIMUM
2014-02-24 17:42:14,623 INFO  [                loci.formats.ImageReader] ( Thread-18) APNGReader initializing /Users/simon/omero-test/tmp/train-png/1/00001447.png
2014-02-24 17:42:14,640 INFO  [   ome.formats.importer.ImportCandidates] ( Thread-18) 1 file(s) parsed into 1 group(s) with 1 call(s) to setId in 1ms. (32ms total) [0 unknowns]
2014-02-24 17:42:14,640 INFO  [       ome.formats.importer.ImportConfig] ( Thread-18) OMERO Version: 5.0.0-ice35-b19
2014-02-24 17:42:14,640 INFO  [       ome.formats.importer.ImportConfig] ( Thread-18) Bioformats version: 5.0.0-ice35-b19 revision: cd2a5db date: 24 February 2014
2014-02-24 17:42:14,640 DEBUG [          o.o.shoola.util.NetworkChecker] ( Thread-18) Cached networkup: true
2014-02-24 17:42:14,661 INFO  [       ome.formats.importer.ImportConfig] ( Thread-18) OMERO Version: 5.0.0-ice35-b19
2014-02-24 17:42:14,661 INFO  [       ome.formats.importer.ImportConfig] ( Thread-18) Bioformats version: 5.0.0-ice35-b19 revision: cd2a5db date: 24 February 2014
2014-02-24 17:42:14,661 INFO  [   ome.formats.importer.ImportCandidates] ( Thread-18) Depth: 4 Metadata Level: MINIMUM
2014-02-24 17:42:14,677 INFO  [                loci.formats.ImageReader] ( Thread-18) APNGReader initializing /Users/simon/omero-test/tmp/train-png/1/00039986.png
...

Though there is no sign of any imports happening server-side.

Blitz-0.log is full of errors:

grep ERROR OMERO-CURRENT/var/log/Blitz-0.log

...
2014-02-24 18:06:44,215 ERROR [                omero.util.ServantHolder] (l.Server-2) servantsPerSession reached for 51689bc8-e36f-430d-9f81-11d0182904de: 10000
2014-02-24 18:06:47,298 ERROR [                omero.util.ServantHolder] (l.Server-4) servantsPerSession reached for 51689bc8-e36f-430d-9f81-11d0182904de: 10000
2014-02-24 18:06:47,360 ERROR [                omero.util.ServantHolder] (l.Server-2) servantsPerSession reached for 51689bc8-e36f-430d-9f81-11d0182904de: 10000
2014-02-24 18:06:49,916 ERROR [                omero.util.ServantHolder] (l.Server-3) servantsPerSession reached for 51689bc8-e36f-430d-9f81-11d0182904de: 10000
2014-02-24 18:06:50,043 ERROR [                omero.util.ServantHolder] (l.Server-2) servantsPerSession reached for 51689bc8-e36f-430d-9f81-11d0182904de: 10000
...

Change History (21)

comment:1 Changed 10 years ago by rleigh

This is exactly the issue I triggered last Thursday: lsof showed hundreds of open import logs.

comment:2 Changed 10 years ago by spli

File handles were not cleaned up after 15 minutes. However after Insight was started again and a new import kicked off they were closed.

Note I've seen the same issue with a different server on the 5.1 branch... handles remained open after over 1 hour but seemed to be cleared when a new import started.

comment:3 Changed 10 years ago by jamoore

  • Description modified (diff)
  • Milestone changed from Unscheduled to 5.0.1
  • Priority changed from major to critical

I missed an obvious point earlier: there's no reason that a single fileset should have that many servants / filehandles left open. i.e. there is an additional problem.

However, what I said about the failure to call close on abnormal termination still holds: that will leave an import unclosed which is "A Bad Thing (TM)". We should A) try to do whatever cleanup is possible in such cases (or avoid abnormal terminiations??) and perhaps B) check for closed imports on connect. This will have to be fairly careful to also test the connected client count.

comment:4 Changed 10 years ago by jamoore

rleigh: how did you trigger the same issue?

comment:5 Changed 10 years ago by rleigh

josh: I imported a few directories of images on gretzky with insight. All the imports started failing after there were around ~800 import logs left open. Verified with lsof at the time by spli.

comment:6 Changed 10 years ago by jamoore

Roger: in your case there were only import log files, right? And Simon: in yours there were also data files?

comment:7 Changed 10 years ago by rleigh

IIRC there were files other than import log files also open, but only a handful in comparison to the log files.

comment:8 Changed 10 years ago by spli

I've just reproduced this on ome-ci-c6-07 (5.0.0-ice34-b19, Java 1.7 as opposed to 1.6 on gretzky). Also failed at the same point, after uploading 2492 files (note the importer reports 2496 files as imported, though only 2492 are visible in the web client). 2128 open file handles on the server, almost all to log files.

comment:9 Changed 10 years ago by jamoore

Ok knowing that it's only the log files is somewhat re-assuring: See https://trello.com/c/P4wOlFcd/86-close-sifted-logs

The timeout for these files is currently 30 minutes. You can set it in etc/logback.xml to something lower if you'd like to test. (How long did your whole import take?)

As for what happened to the 4 imported but unseen files, that would require more investigation.

comment:10 Changed 10 years ago by spli

Around 45 minutes, then Insight crashed with around 1600 files still to go.

comment:11 Changed 10 years ago by jamoore

Can you try reproducing with the CLI?

comment:12 Changed 10 years ago by spli

bin/omero import -s gretzky.openmicroscopy.org.uk -c -d 2113 train-png/0/000* worked (4132 files imported)

comment:13 Changed 10 years ago by cblackburn

See Updates to slf4j/logback on openmicroscopy

and Updates to slf4j/logback on bioformats

These should allow a time-out to be set for the SiftingAppender or for markers to be investigated as a way of closing the import log files.

comment:14 Changed 10 years ago by jamoore

Moving the pure logback issue to #12021 (yes lucky bug number!). This issue should be solely used for the insight crash on the large number of files.

comment:15 Changed 10 years ago by bpindelski

  • Owner set to bpindelski

comment:16 Changed 10 years ago by bpindelski

  • Component changed from Import to Insight

comment:17 Changed 10 years ago by bpindelski

After analysing the heap taken after Insight threw an OOM exception, it can be seen that the biggest memory consumers are 30800 instances of org.openmicroscopy.shoola.agents.fsimporter.util.FileImportComponent? (643 MB heap space) and 72800 instances of org.openmicroscopy.shoola.env.data.util.StatusLabel? (513 MB heap space). I wonder if the first solution to this problem would require introducing some kind of paging system to the importer window?

comment:18 Changed 10 years ago by bpindelski

The method where this excessive allocation happens is org.openmicroscopy.shoola.agents.fsimporter.util.FileImportComponent.insertFiles(Map<File, StatusLabel>).

comment:19 Changed 10 years ago by bpindelski

One solution suggested during the team meeting was to hard-limit the number of files handled by the importer window. This would require adding a setting to one of Insight's config XML files and using it during import. Then a dialog would be displayed and the import aborted.

comment:20 Changed 10 years ago by dlindner

  • Owner changed from bpindelski to dlindner

Limit number of files to import to 2000

comment:21 Changed 10 years ago by dlindner

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.68417 sec.)

We're Hiring!