Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

User Story #11936 (accepted)

Opened 6 years ago

Last modified 4 years ago

Improve search indexing robustness, performance, and reliability

Reported by: jballanco-x Owned by: jballanco-x
Priority: major Milestone: Asynchronous
Component: General Keywords: n.a.
Cc: atarkowska, ejrozbicki, jamoore, cxallan, jburel Story Points: n.a.
Sprint: n.a. Importance: n.a.
Total Remaining Time: 0.0d Estimated Remaining Time: n.a.

Description

Under certain circumstances, the Indexer process can get into an Out Of Memory state and, subsequently, images that have had tags added to them are not findable via a search on the tag.

We suspect that what is occurring in such cases is that the Indexer process is doing far more work than in needs to, that this is exacerbating an (as yet unidentified) memory leak, and that when the Indexer reaches the OOM state it is not gracefully dying. This means that at the very least we need to:

  • Improve the performance of the Indexer. Currently, events are processed in order in limited batches. When a large number of updates are made to the same image, this can result in the same image being re-indexed for each batch, instead of only after all the queued-up events related to the image have been seen.
  • Ensure that the Indexer can survive an OOM situation and continue making progress in processing pending events.
  • Find and eliminate the source of the memory leak. Even when the indexer gets stuck doing too much work, it should not be running out of memory.

Attachments (3)

Indexer-0.log (1.3 MB) - added by jballanco-x 6 years ago.
Log of an Indexer getting into an OOM state
Indexer-0.oom.emil.heap.dmp.xz (9.7 MB) - added by ejrozbicki 6 years ago.
Java heap dump of failed indexer. omero 4.4.10
Indexer-0.oom.heap.dmp.xz (11.1 MB) - added by jballanco-x 6 years ago.
Heap dump of Indexer in OOM state

Change History (11)

Changed 6 years ago by jballanco-x

Log of an Indexer getting into an OOM state

Changed 6 years ago by ejrozbicki

Java heap dump of failed indexer. omero 4.4.10

Changed 6 years ago by jballanco-x

Heap dump of Indexer in OOM state

comment:1 Changed 6 years ago by jburel

  • Cc jburel added

comment:2 Changed 6 years ago by jballanco-x

After further investigation, it seems likely that Hibernate is the culprit leading to the heap overflow. It does not appear that anything is leaking, but it does seem that loading items from the database during indexing is exhausting the default 256 MB heap.

There are a few things we will try to resolve the issue:

Additionally, it seems like we should be able to catch threads that die from a heap OOM state and restart them, so that at least the entire Indexer process doesn't eventually hang. Along these lines we will:

  • investigate why the thread-pool threads are not recovering from the OOM state
  • improve the granularity of updating the Indexer's progress, so we don't have to repeat entire batches

comment:3 Changed 6 years ago by jballanco-x

  • Milestone changed from 5.0.0 to 5.0.1
  • Priority changed from blocker to critical

Key components have been fixed for 5.0.0 so that search will work in most situations. Remaining fixes are being prepared for 5.0.1. Thus, this story should no longer be blocking 5.0.0 (except for documentation being written).

comment:4 Changed 6 years ago by jballanco-x

  • Priority changed from critical to major
  • Summary changed from Searching for tags sometimes does not return expected images to Improve search indexing robustness, performance, and reliability

comment:5 Changed 6 years ago by jballanco-x

  • Milestone changed from 5.0.1 to 5.0.2

comment:6 Changed 5 years ago by agilo

  • Status changed from new to accepted

Updated status, related task in progress

comment:7 Changed 5 years ago by jamoore

  • Milestone changed from 5.1.0-m3 to 5.x

I don't see anything else happening for 5.1.0. Pushing to 5.x

comment:8 Changed 4 years ago by jamoore

  • Milestone changed from 5.x to Asynchronous
Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.111022 sec.)

We're Hiring!