Task #11182 (closed)
Bug: duplicate REINDEX logs created
Reported by: | jamoore | Owned by: | mtbcarroll |
---|---|---|---|
Priority: | blocker | Milestone: | 5.1.0-m4 |
Component: | ORM | Version: | 4.4.8 |
Keywords: | triggers | Cc: | java@…, jballanco-x |
Resources: | n.a. | Referenced By: | n.a. |
References: | n.a. | Remaining Time: | 0.0d |
Sprint: | n.a. |
Description
See: https://www.openmicroscopy.org/community/viewtopic.php?f=4&t=5406
REINDEX logs are being created once per annotation per image rather than once per image. This will be another situation in with a "on-commit" trigger would be most appropriate. A workaround will need to be found.
Note: this will require modification to the stable, production database.
Change History (20)
comment:1 Changed 11 years ago by jamoore
- Cc mtbcarroll added
comment:2 Changed 11 years ago by jamoore
- Cc java@… added; jburel mtbcarroll removed
- Priority changed from critical to blocker
- Version set to 4.4.8
comment:3 Changed 11 years ago by jamoore
- Milestone changed from OMERO-4.4.9 to OMERO-4.4.x
comment:4 Changed 10 years ago by jamoore
- Milestone changed from 5.x to 5.0.2
Moving all remaining blockers to 5.0.2 for re-evaluation.
comment:5 Changed 10 years ago by bpindelski
Having had a quick look, it seems that https://github.com/openmicroscopy/openmicroscopy/blob/dev_5_0/sql/psql/OMERO5.0__0/psql-footer.sql#L1019 is calling https://github.com/openmicroscopy/openmicroscopy/blob/dev_5_0/sql/psql/OMERO5.0__0/psql-footer.sql#L909, which then inserts the REINDEX row (https://github.com/openmicroscopy/openmicroscopy/blob/dev_5_0/sql/psql/OMERO5.0__0/psql-footer.sql#L909). Would using NOTIFY help in any way here?
comment:6 Changed 10 years ago by jamoore
Blazej: How would you see that working? I guess if we could notify on the last annotation somehow....but wouldn't that be in another transaction?
comment:7 Changed 10 years ago by bpindelski
Josh: That was more of a guess than a suggestion for a fix. Postgres internals are still uncharted land for me, but research showed that "on commit" procedures aren't available in the RDBMS (e.g. http://www.postgresql.org/message-id/4682.1211206238@sss.pgh.pa.us).
comment:8 Changed 10 years ago by jamoore
- Cc jballanco-x added
- Milestone changed from 5.0.2 to 5.1.0-m1
I'm inclined to try fixing this via a DB patch in 5.1. If anyone has thoughts on how we could address this in 5.0 (even with an emergency DB patch), please speak up. Alternatively, JB's work may make the REINDEX commands unnecessary (though it would require triggering events at the MQ level of the same form)
comment:9 Changed 10 years ago by jamoore
Note: the eventLogQueue logic from 5.0.3 should reduce the impact of these duplicate REINDEX statements, though it would still be better to prevent them.
comment:10 Changed 10 years ago by bpindelski
I had another look at this ticket. It seems that two triggers (image_annotation_link_event_trigger for the imageannotationlink table and annotation_trigger for the annotation table) end up generating some sort of Cartesian product of stored procedure calls. Further investigation required into what should exactly happen. One simple solution would be to slim down the annotation_trigger.
comment:11 Changed 10 years ago by jamoore
- Milestone changed from 5.1.0-m1 to 5.1.0-m2
comment:12 Changed 10 years ago by jamoore
We should probably discuss the impact of this. Is it still a blocker? Is it fixable at the moment?
comment:13 Changed 10 years ago by mtbcarroll
I may be analyzing the event log wrongly but at first blush it seems that applying Chgrp2 to test_images_good/bruker/wis/RA_060110.051/ generates many millions of REINDEX entries. (Each of the 21 images has 286 file annotations noting the companion files.) My PostgreSQL now seems to have taken up a part-time job of autovacuum: VACUUM ANALYZE public.eventlog. I think that bpindelski's Cartesian product suggestion may well have merit. (There are 6,006 file annotations and, if one squares that, one gets many millions.)
comment:14 Changed 10 years ago by mtbcarroll
- Milestone changed from 5.1.0-m2 to 5.1.0-m4
- Owner set to mtbcarroll
This needs to be fixed.
comment:15 Changed 10 years ago by mtbcarroll
- Status changed from new to accepted
To deduplicate, we need to see more than one row at a time. I'd say we populate _updated_annotations from the first transaction, issue the NOTIFY, then the listener starts a new transaction to process _updated_annotations and fill eventlog. (In the first transaction we could also capture and pass on the value of _current_or_new_event() so the event ID is correct.) I can do this on Tuesday if a DB patch number is available.
comment:16 Changed 10 years ago by mtbcarroll
Actually, should be able to simply do an upsert into the eventlog table in the first transaction. I'll give that a try.
comment:17 Changed 10 years ago by mtbcarroll
Actually, upsert might be worse: it may not scale so well as the eventlog table grows. The NOTIFY solution can deduplicate just in the current _updated_annotations and add all the resulting rows to eventlog without checking any existing ones. Thoughts?
comment:18 Changed 10 years ago by mtbcarroll
I'll try out the NOTIFY solution, replacing the temporary table with a normal one that also notes event IDs.
comment:19 Changed 10 years ago by mtbcarroll
- Resolution set to fixed
- Status changed from accepted to closed
comment:20 Changed 10 years ago by Josh Moore <josh@…>
- Remaining Time set to 0
(In [01e18f82844dca8c4dc9bb84db6e50a18407e9aa/ome.git] on branch develop) Merge pull request #3286 from mtbc/trac-11182-duplicate-reindex
fix #11182: avoid Cartesian product of REINDEX eventlog entries
I don't see us being able to safely come up with a solution for this in the timescale we've outlined. I'm moving to 4.4.x as "blocker", which should be the next round of issues we look at.