Context Navigation

Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Task #11182 (closed)

Opened 6 years ago

Closed 5 years ago

Last modified 5 years ago

Bug: duplicate REINDEX logs created

Reported by:	jamoore	Owned by:	mtbcarroll
Priority:	blocker	Milestone:	5.1.0-m4
Component:	ORM	Version:	4.4.8
Keywords:	triggers	Cc:	java@…, jballanco-x
Resources:	n.a.	Referenced By:	n.a.
References:	n.a.	Remaining Time:	0.0d
Sprint:	n.a.

Description

See: https://www.openmicroscopy.org/community/viewtopic.php?f=4&t=5406

REINDEX logs are being created once per annotation per image rather than once per image. This will be another situation in with a "on-commit" trigger would be most appropriate. A workaround will need to be found.

Note: this will require modification to the stable, production database.

References

Change History (20)

comment:1 Changed 6 years ago by jamoore

Cc mtbcarroll added

comment:2 Changed 6 years ago by jamoore

Cc java@… added; jburel mtbcarroll removed
Priority changed from critical to blocker
Version set to 4.4.8

I don't see us being able to safely come up with a solution for this in the timescale we've outlined. I'm moving to 4.4.x as "blocker", which should be the next round of issues we look at.

comment:3 Changed 6 years ago by jamoore

Milestone changed from OMERO-4.4.9 to OMERO-4.4.x

comment:4 Changed 5 years ago by jamoore

Milestone changed from 5.x to 5.0.2

Moving all remaining blockers to 5.0.2 for re-evaluation.

comment:5 Changed 5 years ago by bpindelski

Having had a quick look, it seems that https://github.com/openmicroscopy/openmicroscopy/blob/dev_5_0/sql/psql/OMERO5.0__0/psql-footer.sql#L1019 is calling https://github.com/openmicroscopy/openmicroscopy/blob/dev_5_0/sql/psql/OMERO5.0__0/psql-footer.sql#L909, which then inserts the REINDEX row (https://github.com/openmicroscopy/openmicroscopy/blob/dev_5_0/sql/psql/OMERO5.0__0/psql-footer.sql#L909). Would using NOTIFY help in any way here?

comment:6 Changed 5 years ago by jamoore

Blazej: How would you see that working? I guess if we could notify on the last annotation somehow....but wouldn't that be in another transaction?

comment:7 Changed 5 years ago by bpindelski

Josh: That was more of a guess than a suggestion for a fix. Postgres internals are still uncharted land for me, but research showed that "on commit" procedures aren't available in the RDBMS (e.g. http://www.postgresql.org/message-id/4682.1211206238@sss.pgh.pa.us).

comment:8 Changed 5 years ago by jamoore

Cc jballanco-x added
Milestone changed from 5.0.2 to 5.1.0-m1

I'm inclined to try fixing this via a DB patch in 5.1. If anyone has thoughts on how we could address this in 5.0 (even with an emergency DB patch), please speak up. Alternatively, JB's work may make the REINDEX commands unnecessary (though it would require triggering events at the MQ level of the same form)

comment:9 Changed 5 years ago by jamoore

Note: the eventLogQueue logic from 5.0.3 should reduce the impact of these duplicate REINDEX statements, though it would still be better to prevent them.

comment:10 Changed 5 years ago by bpindelski

I had another look at this ticket. It seems that two triggers (image_annotation_link_event_trigger for the imageannotationlink table and annotation_trigger for the annotation table) end up generating some sort of Cartesian product of stored procedure calls. Further investigation required into what should exactly happen. One simple solution would be to slim down the annotation_trigger.

comment:11 Changed 5 years ago by jamoore

Milestone changed from 5.1.0-m1 to 5.1.0-m2

comment:12 Changed 5 years ago by jamoore

We should probably discuss the impact of this. Is it still a blocker? Is it fixable at the moment?

comment:13 Changed 5 years ago by mtbcarroll

I may be analyzing the event log wrongly but at first blush it seems that applying Chgrp2 to test_images_good/bruker/wis/RA_060110.051/ generates many millions of REINDEX entries. (Each of the 21 images has 286 file annotations noting the companion files.) My PostgreSQL now seems to have taken up a part-time job of autovacuum: VACUUM ANALYZE public.eventlog. I think that bpindelski's Cartesian product suggestion may well have merit. (There are 6,006 file annotations and, if one squares that, one gets many millions.)

Last edited 5 years ago by mtbcarroll (previous) (diff)

comment:14 Changed 5 years ago by mtbcarroll

Milestone changed from 5.1.0-m2 to 5.1.0-m4
Owner set to mtbcarroll

This needs to be fixed.

comment:15 Changed 5 years ago by mtbcarroll

Status changed from new to accepted

To deduplicate, we need to see more than one row at a time. I'd say we populate _updated_annotations from the first transaction, issue the NOTIFY, then the listener starts a new transaction to process _updated_annotations and fill eventlog. (In the first transaction we could also capture and pass on the value of _current_or_new_event() so the event ID is correct.) I can do this on Tuesday if a DB patch number is available.

Last edited 5 years ago by mtbcarroll (previous) (diff)

comment:16 Changed 5 years ago by mtbcarroll

Actually, should be able to simply do an upsert into the eventlog table in the first transaction. I'll give that a try.

comment:17 Changed 5 years ago by mtbcarroll

Actually, upsert might be worse: it may not scale so well as the eventlog table grows. The NOTIFY solution can deduplicate just in the current _updated_annotations and add all the resulting rows to eventlog without checking any existing ones. Thoughts?

comment:18 Changed 5 years ago by mtbcarroll

I'll try out the NOTIFY solution, replacing the temporary table with a normal one that also notes event IDs.

comment:19 Changed 5 years ago by mtbcarroll

Resolution set to fixed
Status changed from accepted to closed

Fixed by https://github.com/openmicroscopy/openmicroscopy/pull/3286.

comment:20 Changed 5 years ago by Josh Moore <josh@…>

Remaining Time set to 0

(In [01e18f82844dca8c4dc9bb84db6e50a18407e9aa/ome.git] on branch develop) Merge pull request #3286 from mtbc/trac-11182-duplicate-reindex

fix #11182: avoid Cartesian product of REINDEX eventlog entries

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

Download in other formats: