Warning: Can't synchronize with repository "(default)" (/home/git/ome.git does not appear to be a Git repository.). Look in the Trac log for more information.
Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Task #11054 (closed)

Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

Bug: HCS delete performance atrocious

Reported by: jamoore Owned by: jamoore
Priority: blocker Milestone: 5.0.0-beta1
Component: Services Version: n.a.
Keywords: fs Cc: fs@…
Resources: n.a. Referenced By: n.a.
References: n.a. Remaining Time: 0.0d
Sprint: FS Demo 4.3

Description

HCS deletes spend an inordinate amount of time in the LOOP loop of GraphTables as seen on adhoc2 yesterday. Stepping through that loop today, it's being caused by the links from images to filesets and then back down to the used files. In a normal fileset, this only causes 10s of extra calls. In HCS data it causes on the order of 10^3 x 10^3 or more.

Mark has said that with his preprocessor work filesets should be injected in the chgrp/delete lists before any plates, and therefore the linking of images to usedFiles may no longer be necessary.

Change History (17)

comment:1 Changed 11 years ago by mtbcarroll

  • Sprint set to FS Demo 4.3

comment:2 Changed 11 years ago by mtbcarroll

I just tried my embryonic testing framework for the preprocessor and it does look like screen and plate delete requests do indeed get prefixed with fileset requests for non-split MIFs.

comment:3 follow-up: Changed 11 years ago by jamoore

  • Remaining Time set to 0.25

Thanks Mark. On a pushed branch?

Note: plate delete is working. Now just trying to verify that I haven't broken everything else.

comment:4 Changed 11 years ago by jamoore

  • Status changed from new to accepted

comment:5 Changed 11 years ago by mtbcarroll

Not yet, going to clean up the code tomorrow and add to the existing PR for #10890, shouldn't be too much work from where I now am, and none of my tests so far have revealed any problems in the preprocessor. Could create and push to a different branch tonight though if that would help you.

comment:6 in reply to: ↑ 3 Changed 11 years ago by mtbcarroll

By the way, congratulations on plate delete. (-: The preprocessor should handle dataset and project deletion just fine too.

comment:7 Changed 11 years ago by mtbcarroll

Okay, pushed them to https://github.com/openmicroscopy/openmicroscopy/pull/1235

The tests themselves definitely need a proper going over and expansion, which I hope to do soon, but I guess my work-in-progress is better than no tests at all, and if there are any specific cases you want to quickly confirm then it should be easy for you to adapt the existing tests to add extra ones.

comment:8 Changed 11 years ago by jamoore

  • Remaining Time changed from 0.25 to 0
  • Resolution set to fixed
  • Status changed from accepted to closed

comment:9 Changed 11 years ago by jmoore <josh@…>

(In [82db4c737722a55469efd6eab4b4fec7d78fe744/ome.git] on branch develop) Add ExtendedMetadata?.getSQLJoin (See #11054)

In order to make use of select distinct on (...), the
ome.services.graph package needs to be able to drop down
into raw SQL rather than HQL. Doing so, loses all of the
HQL dot notation, meaning it's necessary to lookup the
SQL-style joins for walking the graph.

This isn't a complete solution in that it doesn't take
joined subclasses into account, but works for the current
uses.

comment:10 Changed 11 years ago by jmoore <josh@…>

(In [e71f936b6d24710a7bf4c03e19340c06c25cacf0/ome.git] on branch develop) Permit 'select distinct on' raw SQL (See #11054, #9496)

Previously, only UPDATE and DELETE statements were allowed.
HQL, however, does not properly support distinct on (...)
and therefore it was necessary to include this workaround.

comment:11 Changed 11 years ago by jmoore <josh@…>

(In [7d9182b55343dac6cd1fed45234220bfc45501b2/ome.git] on branch develop) Refactor NULL handling into GraphStep? (See #11054)

In preparation for NULLing UploadJob?.versionInfo as a
workaround for SQL joined-subclass handling, the delete
step logic for nulling Pixels.relatedTo must be expanded
to also handle nulling versionInfo including during chgrp.

comment:12 Changed 11 years ago by jmoore <josh@…>

(In [ded6f7a542892c4be6329b1042f5a8da299d1d04/ome.git] on branch develop) Add FilesetGraphSpec? to solve performance issues (Fix #11054)

"select distinct on (...)" is used to only load a /Fileset
once for an entire subgraph. To handle the variable intermediate
links, -2 is inserted into the array of IDs, so that matching
in GraphTables? completes appropriately.

/Filesets are now only handled from the contain types /Plate
and /Dataset.

NB: this is still not working because the super spec which
wraps the FilesetGraphSpec? is not performing the proper
logic.

comment:13 Changed 11 years ago by jmoore <josh@…>

(In [ed0735610fc6e75a4de4912edb006e61096c656c/ome.git] on branch develop) Use DoAll? even with single-image filesets (See #11054)

Since preprocessing is necessary in order to handle Filesets
rather than images, all chgrps that are not via a container
should take place via a DoAll?.

comment:14 Changed 11 years ago by jmoore <josh@…>

(In [79ea4922c2a00df1202c9983ac6ee370569b19e0/ome.git] on branch develop) Revert "Permit 'select distinct on' raw SQL (See #11054, #9496)"

This reverts commit e71f936b6d24710a7bf4c03e19340c06c25cacf0.

This workaround was intended only for allowing use of SQL's
"select distinct on (...)". Since another solution was found
for the same problem (namely dropping all the intermediate
IDs from the select list), this need no longer be permitted.

comment:15 Changed 11 years ago by jmoore <josh@…>

(In [99865a91fc892e01fbb9850a4e742e6c13e69e34/ome.git] on branch develop) Order filesets last (Fix #11045, See #11054)

Rather than inject filesets at the first possible location
in the DoAll? requests list, place them after all related
requests.

comment:16 Changed 11 years ago by jmoore <josh@…>

(In [6875b38bcbbfd5cfe58d4e730fa9b6199009ef15/ome.git] on branch develop) Remove spec for fileset from dataset (See #11054)

All chgrp and delete activities need now be performed
within a DoAll? due to the ordering of the Fileset
element. It's not currently possible to support the
DoAll? and non-DoAll? case.

comment:17 Changed 11 years ago by jmoore <josh@…>

(In [92b69fe02b0a709f5fdd5aa34974ac4132cdd833/ome.git] on branch develop) Add finish() phase to IRequest (See #11045, #11054, etc)

Now that Fileset handling is happening after containers and
images, it's necessary to have a phase hook which checks that
all filesets have been deleted along with the related images.

This amounts to a trigger pre-commit, but that's not provided
by postgresql. Instead, we collect all of the fileset IDs in
the DeleteStep? instances and then check that they have been
removed afterwards. The primary drawback is that this no longer
provides a GraphConstraintERR but rather just a regular ERR.

See #10846

Note: if this requirement is ever removed, the entire finish()
phase could be rolled back. Alternatively, this could replace
the ChgrpValidation? steps.

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.65751 sec.)

We're Hiring!