Task #11540 (closed)
RFE: Add a DB-based BackOff implementation
Reported by: | bpindelski | Owned by: | bpindelski |
---|---|---|---|
Priority: | major | Milestone: | 5.0.0-rc1 |
Component: | Services | Version: | 5.0.0-beta1 |
Keywords: | n.a. | Cc: | jamoore |
Resources: | n.a. | Referenced By: | n.a. |
References: | n.a. | Remaining Time: | n.a. |
Sprint: | n.a. |
Description (last modified by bpindelski)
As a result of a discussion on https://github.com/openmicroscopy/openmicroscopy/pull/1562 and intermittent LockTimeout?-s still being present in the OmeroJava-integration-develop job, it will make sense to add a BackOff? mechanism that is based on the DB throughput.
At the same time, add the changes from https://github.com/bpindelski/openmicroscopy/tree/backoff-deprecated to the resulting PR.
Change History (10)
comment:1 Changed 11 years ago by bpindelski
- Description modified (diff)
comment:2 Changed 11 years ago by bpindelski
comment:3 Changed 11 years ago by jamoore
We definitely need to work on the overall performance, but until that time, we're left with either A) bumping up the timeout or B) performing a slow SPW delete beforehand to have an estimate for the backoff.
comment:4 Changed 11 years ago by bpindelski
After a discussion with Josh, it has been agreed that the best solution for the current moment will be to manually re-run the OMERO-integration-develop manually when timeouts arise. In the longer term, a separate target (e.g. "longrunning") with a configurable backoff value ("-Dbackoff=5000") might be a good solution for all tests that fail due to LockTimeout?-s. Ideally, we will again re-visit the underlying server code and make the tests resilient to timing issues. Closing for now.
comment:5 Changed 11 years ago by bpindelski
- Resolution set to fixed
- Status changed from new to closed
comment:6 Changed 11 years ago by jamoore
Discussed with Blazej. Currently we're expecting both "regular" and "largish" deletes to finish in the same time. In retrospect, this is clearly unrealistic. If we expect delete(Plate) to finish in the specified backoff time (here 5.2 seconds), then delete(Screen(Plate(), Plate())) could realistically require twice that time. Blazej is going to add screen.sizeOfPlateLinks() * backoff to the job to see if that prevents this from happening again. If not, likely we should consider entire classes of operations and their required times.
comment:7 Changed 10 years ago by Blazej Pindelski <bpindelski@…>
(In [ad4644744bc60c8df2b1c630c4f22dce473b91c6/ome.git] on branch develop) Fix SPW delete test timeouts (see #11540).
comment:8 Changed 10 years ago by Josh Moore <josh@…>
(In [37f9444f2be920052c37090d54070b0d4b87b6be/ome.git] on branch develop) Merge pull request #1797 from bpindelski/spw-delete-testfix
Fix SPW delete test timeouts (see #11540).
comment:9 Changed 10 years ago by Blazej Pindelski <bpindelski@…>
(In [8791d8890ebc41e48318ab8f637fe2242fb73b02/ome.git] on branch dev_4_4) Fix SPW delete test timeouts (see #11540).
comment:10 Changed 10 years ago by Josh Moore <josh@…>
(In [9667291b3628803c477d732647a68cc2d85066b0/ome.git] on branch dev_4_4) Merge pull request #1855 from bpindelski/rebased/dev_4_4/spw-delete-testfix
Fix SPW delete test timeouts (see #11540). (rebased onto dev_4_4)
Josh: The build history of http://hudson.openmicroscopy.org.uk/job/OmeroJava-integration-develop/ is starting to look too yellow for me... Is the fact that a LockTimeout? happens with SPW deletion on develop a more general issue? Maybe that delete command will stop timing out if we improve the performance of the screen removal? If that's not the case, I'd like to ask about the specifics of a BackOff? implementation that would help us get rid of that test time out once and for all.