Context Navigation

Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Task #11948 (closed)

Opened 6 years ago

Closed 5 years ago

Last modified 5 years ago

Update indexer progress during batch processing

Reported by:	jballanco-x	Owned by:	jamoore
Priority:	critical	Milestone:	5.0.3
Component:	General	Version:	4.4.10
Keywords:	full-text indexing, search	Cc:
Resources:	n.a.	Referenced By:	https://trello.com/c/NDjNwZnm/52-bug-search-5-0-3
References:	n.a.	Remaining Time:	n.a.
Sprint:	n.a.

Description

Currently, Indexer worker threads process events in batches and only update their progress as a batch completes. Instead, we should update the progress marker as each event is processed so that, should a worker thread encounter an unrecoverable error during batch processing, the entire batch does not need to be reprocessed.

References

Referenced by:
← User Story (#11936): Improve search indexing robustness, performance, and reliability

Attachments (1)

Indexer-0.part.log (19.8 KB) - added by spli 5 years ago.: Excerpt from gretzky (dev_5_0 merge) Indexer-0.log this morning

Download all attachments as: .zip

Change History (19)

comment:1 Changed 6 years ago by jballanco-x

The strategy we'll take for resolving this issue will be to record (in a file on disk) the last event ID that was attempted to be indexed, and how many attempts have been made. If more than a preconfigured number of attempts have been made, then indexing for the event in question will be skipped in favor of continuing to process later events in the event stream, and a warning will be logged.

comment:2 Changed 5 years ago by jballanco-x

Pull request sent: https://github.com/openmicroscopy/openmicroscopy/pull/2086

However, an issue remains. Since the event currently being processed is recorded as the event is pulled off of the log, we never get the chance to retry events that failed to process.

Last edited 5 years ago by jballanco-x (previous) (diff)

comment:3 follow-up: ↓ 4 Changed 5 years ago by jamoore

Cc jamoore added

The only thing I can think of is:

version=1
id=100
retries=0
state=inprogress

followed by

version-1
id=100
retries=0
state=failed

or similar. But perhaps that's getting too complicated.

comment:4 in reply to: ↑ 3 Changed 5 years ago by jballanco-x

The problem stems more from the fact that the EventLogLoader is only involved in pulling out events for processing. Failures have been typically occurring inside of the loop processing retrieved events, in FullTextIndexer#doIndexing. So, we'd have to add code that deals with the event ID log to FullTextIndexer? as well, breaking through what little abstraction we currently have around pulling events out of the backlog...

comment:5 Changed 5 years ago by jamoore

That makes sense. So a "failed" state, fundamentally, doesn't make sense. Whatever ID is in the file on startup, then, needs to be considered "inprogress", right? Unless, perhaps, on shutdown, a marker was set to say that all was well? Perhaps simple a close() on the log loader API would suffice?

comment:6 Changed 5 years ago by jballanco-x

Milestone changed from 5.0.0 to 5.0.1
Priority changed from blocker to critical

comment:7 Changed 5 years ago by jballanco-x

Referencing ticket #11936 has changed sprint.

comment:8 Changed 5 years ago by jballanco-x

Updated strategy described in pull-request...still not a complete solution, however.

Changed 5 years ago by spli

Attachment Indexer-0.part.log added

Excerpt from gretzky (dev_5_0 merge) Indexer-0.log this morning

comment:9 Changed 5 years ago by spli

I'm not sure if you received a notification- I've attached part of the log.

comment:10 Changed 5 years ago by jballanco-x

Thanks!

comment:11 Changed 5 years ago by jballanco-x

Milestone changed from 5.0.1 to 5.0.2

comment:12 Changed 5 years ago by jballanco-x

Referencing ticket #11936 has changed sprint.

comment:13 Changed 5 years ago by jamoore

Milestone changed from 5.0.2 to 5.0.3

Pushing.

comment:14 Changed 5 years ago by jamoore

Cc jamoore removed
Owner changed from jballanco-x to jamoore

comment:15 Changed 5 years ago by jamoore

Referenced By set to https://trello.com/c/NDjNwZnm/52-bug-search-5-0-3

comment:16 Changed 5 years ago by jamoore

Status changed from new to accepted

Better or at least easier solution will likely be to have an event log loader which better handles failures by:

not throwing on exceptions
retrying failures
adding annotations to failed objects

At the same time, a partitioning query will be used to reduce the total number of objects indexed.

comment:17 Changed 5 years ago by jamoore

Resolution set to fixed
Status changed from accepted to closed

PR opened: https://github.com/openmicroscopy/openmicroscopy/pull/2639

Errors will be marked with annotations and the indexer will continue.

comment:18 Changed 5 years ago by jmoore <josh@…>

(In [fb6eec1d9aa778c4a2c89b54ffb80e4bf558d246/ome.git] on branch develop) EventLogQueue?: new persistent event log loader (See #11948)

This PELL should significantly improve the performance of
the background indexer. Making use of the new partition
method in SqlAction, it should not re-try multiple objects.
And by adding a failure queue, it should not get stuck when
an object is unindexable.

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

Download in other formats: