Task #11948 (closed)
Update indexer progress during batch processing
| Reported by: | jballanco-x | Owned by: | jamoore |
|---|---|---|---|
| Priority: | critical | Milestone: | 5.0.3 |
| Component: | General | Version: | 4.4.10 |
| Keywords: | full-text indexing, search | Cc: | |
| Resources: | n.a. | Referenced By: | https://trello.com/c/NDjNwZnm/52-bug-search-5-0-3 |
| References: | n.a. | Remaining Time: | n.a. |
| Sprint: | n.a. |
Description
Currently, Indexer worker threads process events in batches and only update their progress as a batch completes. Instead, we should update the progress marker as each event is processed so that, should a worker thread encounter an unrecoverable error during batch processing, the entire batch does not need to be reprocessed.
Attachments (1)
Change History (19)
comment:1 Changed 6 years ago by jballanco-x
comment:2 Changed 5 years ago by jballanco-x
Pull request sent: https://github.com/openmicroscopy/openmicroscopy/pull/2086
However, an issue remains. Since the event currently being processed is recorded as the event is pulled off of the log, we never get the chance to retry events that failed to process.
comment:3 follow-up: ↓ 4 Changed 5 years ago by jamoore
- Cc jamoore added
The only thing I can think of is:
version=1 id=100 retries=0 state=inprogress
followed by
version-1 id=100 retries=0 state=failed
or similar. But perhaps that's getting too complicated.
comment:4 in reply to: ↑ 3 Changed 5 years ago by jballanco-x
The problem stems more from the fact that the EventLogLoader is only involved in pulling out events for processing. Failures have been typically occurring inside of the loop processing retrieved events, in FullTextIndexer#doIndexing. So, we'd have to add code that deals with the event ID log to FullTextIndexer? as well, breaking through what little abstraction we currently have around pulling events out of the backlog...
comment:5 Changed 5 years ago by jamoore
That makes sense. So a "failed" state, fundamentally, doesn't make sense. Whatever ID is in the file on startup, then, needs to be considered "inprogress", right? Unless, perhaps, on shutdown, a marker was set to say that all was well? Perhaps simple a close() on the log loader API would suffice?
comment:6 Changed 5 years ago by jballanco-x
- Milestone changed from 5.0.0 to 5.0.1
- Priority changed from blocker to critical
comment:7 Changed 5 years ago by jballanco-x
Referencing ticket #11936 has changed sprint.
comment:8 Changed 5 years ago by jballanco-x
Updated strategy described in pull-request...still not a complete solution, however.
comment:9 Changed 5 years ago by spli
I'm not sure if you received a notification- I've attached part of the log.
comment:10 Changed 5 years ago by jballanco-x
Thanks!
comment:11 Changed 5 years ago by jballanco-x
- Milestone changed from 5.0.1 to 5.0.2
comment:12 Changed 5 years ago by jballanco-x
Referencing ticket #11936 has changed sprint.
comment:14 Changed 5 years ago by jamoore
- Cc jamoore removed
- Owner changed from jballanco-x to jamoore
comment:15 Changed 5 years ago by jamoore
- Referenced By set to https://trello.com/c/NDjNwZnm/52-bug-search-5-0-3
comment:16 Changed 5 years ago by jamoore
- Status changed from new to accepted
Better or at least easier solution will likely be to have an event log loader which better handles failures by:
- not throwing on exceptions
- retrying failures
- adding annotations to failed objects
At the same time, a partitioning query will be used to reduce the total number of objects indexed.
comment:17 Changed 5 years ago by jamoore
- Resolution set to fixed
- Status changed from accepted to closed
PR opened: https://github.com/openmicroscopy/openmicroscopy/pull/2639
Errors will be marked with annotations and the indexer will continue.
comment:18 Changed 5 years ago by jmoore <josh@…>
(In [fb6eec1d9aa778c4a2c89b54ffb80e4bf558d246/ome.git] on branch develop) EventLogQueue?: new persistent event log loader (See #11948)
This PELL should significantly improve the performance of
the background indexer. Making use of the new partition
method in SqlAction, it should not re-try multiple objects.
And by adding a failure queue, it should not get stuck when
an object is unindexable.
The strategy we'll take for resolving this issue will be to record (in a file on disk) the last event ID that was attempted to be indexed, and how many attempts have been made. If more than a preconfigured number of attempts have been made, then indexing for the event in question will be skipped in favor of continuing to process later events in the event stream, and a warning will be logged.