Warning: Can't synchronize with repository "(default)" (/home/git/ome.git does not appear to be a Git repository.). Look in the Trac log for more information.
Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Task #11182 (closed)

Opened 11 years ago

Closed 10 years ago

Last modified 10 years ago

Bug: duplicate REINDEX logs created

Reported by: jamoore Owned by: mtbcarroll
Priority: blocker Milestone: 5.1.0-m4
Component: ORM Version: 4.4.8
Keywords: triggers Cc: java@…, jballanco-x
Resources: n.a. Referenced By: n.a.
References: n.a. Remaining Time: 0.0d
Sprint: n.a.

Description

See: https://www.openmicroscopy.org/community/viewtopic.php?f=4&t=5406

REINDEX logs are being created once per annotation per image rather than once per image. This will be another situation in with a "on-commit" trigger would be most appropriate. A workaround will need to be found.

Note: this will require modification to the stable, production database.

Change History (20)

comment:1 Changed 11 years ago by jamoore

  • Cc mtbcarroll added

comment:2 Changed 11 years ago by jamoore

  • Cc java@… added; jburel mtbcarroll removed
  • Priority changed from critical to blocker
  • Version set to 4.4.8

I don't see us being able to safely come up with a solution for this in the timescale we've outlined. I'm moving to 4.4.x as "blocker", which should be the next round of issues we look at.

comment:3 Changed 11 years ago by jamoore

  • Milestone changed from OMERO-4.4.9 to OMERO-4.4.x

comment:4 Changed 10 years ago by jamoore

  • Milestone changed from 5.x to 5.0.2

Moving all remaining blockers to 5.0.2 for re-evaluation.

comment:6 Changed 10 years ago by jamoore

Blazej: How would you see that working? I guess if we could notify on the last annotation somehow....but wouldn't that be in another transaction?

comment:7 Changed 10 years ago by bpindelski

Josh: That was more of a guess than a suggestion for a fix. Postgres internals are still uncharted land for me, but research showed that "on commit" procedures aren't available in the RDBMS (e.g. http://www.postgresql.org/message-id/4682.1211206238@sss.pgh.pa.us).

comment:8 Changed 10 years ago by jamoore

  • Cc jballanco-x added
  • Milestone changed from 5.0.2 to 5.1.0-m1

I'm inclined to try fixing this via a DB patch in 5.1. If anyone has thoughts on how we could address this in 5.0 (even with an emergency DB patch), please speak up. Alternatively, JB's work may make the REINDEX commands unnecessary (though it would require triggering events at the MQ level of the same form)

comment:9 Changed 10 years ago by jamoore

Note: the eventLogQueue logic from 5.0.3 should reduce the impact of these duplicate REINDEX statements, though it would still be better to prevent them.

comment:10 Changed 10 years ago by bpindelski

I had another look at this ticket. It seems that two triggers (image_annotation_link_event_trigger for the imageannotationlink table and annotation_trigger for the annotation table) end up generating some sort of Cartesian product of stored procedure calls. Further investigation required into what should exactly happen. One simple solution would be to slim down the annotation_trigger.

comment:11 Changed 10 years ago by jamoore

  • Milestone changed from 5.1.0-m1 to 5.1.0-m2

comment:12 Changed 10 years ago by jamoore

We should probably discuss the impact of this. Is it still a blocker? Is it fixable at the moment?

comment:13 Changed 10 years ago by mtbcarroll

I may be analyzing the event log wrongly but at first blush it seems that applying Chgrp2 to test_images_good/bruker/wis/RA_060110.051/ generates many millions of REINDEX entries. (Each of the 21 images has 286 file annotations noting the companion files.) My PostgreSQL now seems to have taken up a part-time job of autovacuum: VACUUM ANALYZE public.eventlog. I think that bpindelski's Cartesian product suggestion may well have merit. (There are 6,006 file annotations and, if one squares that, one gets many millions.)

Last edited 10 years ago by mtbcarroll (previous) (diff)

comment:14 Changed 10 years ago by mtbcarroll

  • Milestone changed from 5.1.0-m2 to 5.1.0-m4
  • Owner set to mtbcarroll

This needs to be fixed.

comment:15 Changed 10 years ago by mtbcarroll

  • Status changed from new to accepted

To deduplicate, we need to see more than one row at a time. I'd say we populate _updated_annotations from the first transaction, issue the NOTIFY, then the listener starts a new transaction to process _updated_annotations and fill eventlog. (In the first transaction we could also capture and pass on the value of _current_or_new_event() so the event ID is correct.) I can do this on Tuesday if a DB patch number is available.

Last edited 10 years ago by mtbcarroll (previous) (diff)

comment:16 Changed 10 years ago by mtbcarroll

Actually, should be able to simply do an upsert into the eventlog table in the first transaction. I'll give that a try.

comment:17 Changed 10 years ago by mtbcarroll

Actually, upsert might be worse: it may not scale so well as the eventlog table grows. The NOTIFY solution can deduplicate just in the current _updated_annotations and add all the resulting rows to eventlog without checking any existing ones. Thoughts?

comment:18 Changed 10 years ago by mtbcarroll

I'll try out the NOTIFY solution, replacing the temporary table with a normal one that also notes event IDs.

comment:19 Changed 10 years ago by mtbcarroll

  • Resolution set to fixed
  • Status changed from accepted to closed

comment:20 Changed 10 years ago by Josh Moore <josh@…>

  • Remaining Time set to 0

(In [01e18f82844dca8c4dc9bb84db6e50a18407e9aa/ome.git] on branch develop) Merge pull request #3286 from mtbc/trac-11182-duplicate-reindex

fix #11182: avoid Cartesian product of REINDEX eventlog entries

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.181059 sec.)

We're Hiring!