Context Navigation

Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Task #11675 (closed)

Opened 6 years ago

Closed 6 years ago

Last modified 5 years ago

BUG: java.io.IOException: Map failed

Reported by:	spli	Owned by:
Priority:	critical	Milestone:	5.0.0-rc1
Component:	General	Version:	4.4.9
Keywords:	n.a.	Cc:	java@…
Resources:	n.a.	Referenced By:	n.a.
References:	n.a.	Remaining Time:	n.a.
Sprint:	n.a.

Description (last modified by mtbcarroll)

See https://www.openmicroscopy.org/community/viewtopic.php?f=5&t=7351&start=10#p13116

Java may not free MappedByteBuffers, which can lead to java.io.IOException: Map failed being thrown in
ome.io.nio.RomioPixelBuffer.getRegion(RomioPixelBuffer.java:343)

See for example

The last link suggests the following workaround

ByteBuffer buffer;
            try {
                buffer = channel.map(READ_ONLY, ofs, n);
            } catch (java.io.IOException e) {
                System.gc();
                System.runFinalization();
                buffer = channel.map(READ_ONLY, ofs, n);
            }

Note other uses of MappedByteBuffer may need to be checked.

Java version as reported by Douglas:

$ java -version
java version "1.6.0_33"
Java(TM) SE Runtime Environment (build 1.6.0_33-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03, mixed mode)

References

Change History (26)

comment:1 Changed 6 years ago by jamoore

Milestone changed from Unscheduled to 5.0.0-beta2
Priority changed from minor to critical

comment:2 Changed 6 years ago by mtbcarroll

Owner set to mtbcarroll

comment:3 Changed 6 years ago by mtbcarroll

Description modified (diff)
Status changed from new to accepted

From the point of view of adjusting existing code that uses mapped byte buffers, the cleanest-looking adjustment would involve writing another ByteBuffer subclass that offers a close() method that nulls a wrapped byte buffer and then gets GC and finalization run in a separate thread. However, the new subclass would have to be in the java.nio package because none of the constructors are public: is that acceptable?

comment:4 Changed 6 years ago by mtbcarroll

(I don't want to just wrap the buffer-using code with exception-catching because it may consume a bunch of memory and successfully complete only to cause an OOM from elsewhere in the codebase.)

comment:5 Changed 6 years ago by jamoore

Mark: was there any resolution on which Java versions suffer from this? Adding calls to gc and finalization seems less than ideal if we could prevent it.

comment:6 Changed 6 years ago by spli

https://www.openmicroscopy.org/community/viewtopic.php?f=5&t=7351&start=20#p13123 suggests it might not be a problem with Java7. Then again since it's to do with memory allocation/GC I wouldn't be surprised if it's intermittent. Can we reproduce this with one of our test files on Java6?

comment:7 Changed 6 years ago by mtbcarroll

http://bugs.sun.com/view_bug.do?bug_id=4724038 suggests that the general issue is not fixed, though there is a rather incomplete-looking stab at relief at http://bugs.sun.com/view_bug.do?bug_id=6417205 in what looks to be 1.6b86.

Another option would be to simply refactor away from using MappedByteBuffers at all for this.

comment:8 Changed 6 years ago by mtbcarroll

The reports of problems that I've seen have been particular to only 32-bit JREs. Perhaps with 64-bit the GC and finalization always occurs before address space is depleted.

comment:9 Changed 6 years ago by spli

According to Douglas the original server error occured with 1.6 Java 64 bit.

comment:10 Changed 6 years ago by mtbcarroll

I will see if I can write a failing test.

(Update: no luck so far!)

Last edited 6 years ago by mtbcarroll (previous) (diff)

comment:11 Changed 6 years ago by mtbcarroll

He wouldn't be running the server on Windows, right?

comment:12 Changed 6 years ago by jamoore

Server log certainly used /home-style paths.

comment:13 Changed 6 years ago by mtbcarroll

Okay. Still no luck reproducing this, with OpenJDK 6 and Oracle Java SDK 1.6.

comment:14 Changed 6 years ago by mtbcarroll

Do we have the actual problem images, or some idea of which of ours may behave similarly? (Especially if the problem can be reliably reproduced.)

Last edited 6 years ago by mtbcarroll (previous) (diff)

comment:15 Changed 6 years ago by mtbcarroll

Note: whatever fixes are made for this ticket should be tested also on Windows, as its native memory allocation and management isn't necessarily as good.

comment:16 Changed 6 years ago by mtbcarroll

It should be noted that something like ((sun.nio.ch.DirectBuffer) buffer).cleaner().clean() may be of use even though it is horrifying and should be avoided. In particular, though it seems presently okay for both OpenJDK and Oracle's SDK, there is no official guarantee that the class will remain available.

Last edited 6 years ago by mtbcarroll (previous) (diff)

comment:17 Changed 6 years ago by mtbcarroll

"may be of use": https://github.com/openmicroscopy/openmicroscopy/pull/1865 opened accordingly. It affects the code quoted in this ticket's description, i.e. in calculating the checksum.

comment:18 Changed 6 years ago by jamoore

Owner mtbcarroll deleted

With Mark away, this will need to get looked into by someone else.

comment:19 Changed 6 years ago by jamoore

One sun error report also mentions : http://stackoverflow.com/questions/3773775/default-for-xxmaxdirectmemorysize which if useful (assuming we can ever reproduce this) would allow us to provide users something to try without needing to recompile.

comment:20 Changed 6 years ago by jamoore

Suggestion from Chris: see #6083 which was the same condition. Solution was to use a 64bit JVM.

comment:21 Changed 6 years ago by jamoore

Resolution set to fixed
Status changed from accepted to closed

I've added a static JVM configuration setting: omero.pixeldata.dispose in https://github.com/openmicroscopy/openmicroscopy/pull/1884

With that, we've probably done all we can do to contain this issue while giving people options, and I'm closing. If find a reproducible test case, we can consider extending this to test for 32-bit JVM, check for low-memory, etc.

comment:22 Changed 6 years ago by jmoore <josh@…>

(In [da313a48b014f5cba64a6075c6dac5aeba9ad00e/ome.git] on branch develop) Add omero.pixeldata.dispose for static config (See #11675)

Since the cleaning should only be necessary under certain conditions
(most likely a 32-bit JVM), the disposing of the ByteBuffers? held by
PixelData? instances can be en-/disabled with omero.pixeldata.dispose

comment:23 Changed 6 years ago by Josh Moore <josh@…>

(In [60666baced537e3d42efbc6aa9dbabc2f39147b7/ome.git] on branch develop) Merge pull request #1884 from joshmoore/11675-config-dispose

Add omero.pixeldata.dispose for static config (See #11675)

comment:24 Changed 6 years ago by mtbcarroll

Note https://github.com/openmicroscopy/openmicroscopy/pull/1865#issuecomment-30127891 about exception handling in clean().

comment:25 Changed 5 years ago by jmoore <josh@…>

(In [d0d85da25ca5a14bd7c427c795adc43a53d03077/ome.git] on branch develop) Set omero.pixeldata.dispose=true by default (See #11675)

Regularly, the long_running Python tests cause my JVM to
segfault locally when out of memory conditions occur.
Setting dispose to true prevents this from happening.

This proposes setting true as the default. If issues
arise, sysadmins can manually set it back to false.

comment:26 Changed 5 years ago by Josh Moore <josh@…>

(In [da0b2239db83cef3b9aea8ee0bea6027872956df/ome.git] on branch develop) Merge pull request #3371 from joshmoore/dispose-true

Set omero.pixeldata.dispose=true by default (See #11675)

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

Download in other formats: