Bug #1059 (closed)
Opened 16 years ago
Closed 16 years ago
OutOfMemory exception thrown after 2 hours of importing
Reported by: | jamoore | Owned by: | jamoore |
---|---|---|---|
Priority: | blocker | Cc: | cxallan, bwzloranger, jrswedlow |
Sprint: | n.a. | ||
Total Remaining Time: | n.a. |
Description
After 2 hours of importing a screen, the JBoss server hung with a OOM. The thread dump shows no threads hung. A head dump shows 100s of megabytes tied up in char[] (-->) ultimately linked to the JBoss class loader. Possible cause:
https://jira.jboss.org/jira/browse/JBAS-4593
As in the jira ticket, there are more than 10,000 JMX InvocationContexts still in memory. This issue, however, was supposedly fixed in JBoss 4.2.2
Attachments (1)
Change History (7)
comment:1 Changed 16 years ago by jmoore
Changed 16 years ago by jmoore
Attaching a summary of various gc tests. From README.txt: Running various gc parameters under jboss/blitz &/or jprofiler. (Note: ConcMarkSweep? doesn't work well under jprofiler) gc1: standard gc under jboss. When requesting a full gc, boom gc2: parallel gc " " . Same. gc3: concurrent gc. Couldn't request full gc (jprofiler doesn't support) but ran and ran gc4: standard gc under blitz. same as gc1 gc5: concurrent gc under blitz. jprofiler could request full. ran and ran. gc6: going back to standard gc to see how long (without jprofiler) things run. No stop. Seems to be an issue with heavy load (also caused by profiler) gc7: trying gc6 with profiler to confirm. No full gc requested. Still OOM. gc8: Retring gc7 under jprofiler looking at allocations. Disconnected jprofiler before OOM, and blitz recovered. gc9: Adding NewRatio?=8 to gc8 test a hypothesis. Does work better, survives multiple gcs. gen2 filled up and was completely cleared each time. gc10: Doubled NewRatio?. Works even better. Full gcs got memory down into the 200M range. (Dependent on how close to the limit) -- may have spoken too soon: the increase from 600M to 700M (absolute) max hung somewhat. no throughput, etc. -- in general the throughput is getting worse, and the last gc @ 700M only got down in the 300M range. -- despite the large mountain peaks, does keep going gc11: Trying -XX:+ScavengeBeforeFullGC rather than NewRatio?. Without a call to gc, was able to survive a full gc @ 700M down to ~300M -- calling several myself was not significantly helpful gc12: Trying -XX:NewSize=big. Seems to work well. A steady (expected) ratchet effect. Memory growth->Clearing gc13: Trying gc12 with NewRatio?=8
comment:2 Changed 16 years ago by jmoore
- Cc jason added
Possible solutions to this include:
- Move indexing to another process. Cons: May require too much memory for many since Hibernate will be running twice, also complicates deployment in the JBoss case
- Throttle indexing and/or import Cons: To get this in for milestone:3.0-Beta3.1 would require pushing things back. The OmeroThrottling infrastructure is in place, but the checks are not being performed.
- Ship with improved GC settings. Cons: We still have not found the optimal settings but this can most likely be done. Also complicates deployment, since the optimal settings are dependent on the resources available on the server machine.
comment:3 Changed 16 years ago by jmoore
- Milestone changed from 3.0-Beta3.1 to 3.0-Beta4
Pushing. We may have to release a point release for the server-only if users run into this issue.
comment:4 Changed 16 years ago by jmoore
The primary method for working around this is to get the full text indexer into a separate process. And some (very high) hard-throttling limits will be in place on a per thread basis (e.g. one thread can't load/write 100K objects in a single method call) Then testing will have to show what needs to be changed to make the importer's huge imports more successful.
comment:5 Changed 16 years ago by jmoore
comment:6 Changed 16 years ago by jmoore
- Resolution set to fixed
- Status changed from new to closed
With the indexer in a separate thread and the numerous improvements to the importer, this seems to be solved. Obviously, other memory issues will pop up again, but closing for now.
So, this almost certainly has nothing to do with JBAS-4593, and rather is not a traditional memory leak at all. What's most likely happening is that so many short-lived objects are being created with indexing & a screen import running at the same time, that the garbage collector is not keeping up. This explains why simple profiling shows no memory increase after a small import and a garbage collection.
Some other symptoms:
All of which better handle the large number of short-lived objects. Attempts to do the same under import still failed.