Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Task #11084 (closed)

Opened 7 years ago

Closed 6 years ago

Last modified 6 years ago

Improve deletion of original files

Reported by: cblackburn Owned by: mtbcarroll
Priority: major Milestone: 5.1.0-m4
Component: Services Version: n.a.
Keywords: n.a. Cc: fs@…
Resources: n.a. Referenced By: n.a.
References: n.a. Remaining Time: 0.0d
Sprint: n.a.


Currently original files are deleted one per transaction. This adds considerably to the delete time for plates. Changing so that file deletes are done as a batch should improve delete performance.

Change History (25)

comment:1 Changed 7 years ago by cblackburn

  • Cc fs@… added
  • Component changed from General to Services
  • Priority changed from minor to major

comment:2 Changed 7 years ago by jburel

  • Sprint changed from FS Demo 4.3 to FS demo 4.4

Moved from sprint FS Demo 4.3

comment:3 Changed 7 years ago by mtbcarroll

  • Owner changed from cblackburn to mtbcarroll

comment:4 Changed 7 years ago by jamoore

  • Sprint changed from FS demo 4.4 to FS demo 4.x

Pushing to post Paris.

comment:5 Changed 7 years ago by mtbcarroll

  • Sprint FS demo 4.x deleted

Josh: this looks relevant to code that you have worked on lately, any pointers?

comment:6 Changed 7 years ago by jamoore

Mark: if you're referring to the speed up of the import paths by reduction in transactions, then no real pointer other than to profile and make sure that's the actual cause. Most likely this requires some refactoring in order to get everything into a single call of Executor.execute which is what I did for import.

comment:7 Changed 6 years ago by mtbcarroll

How to determine that original files are deleted one per transaction, for plates or other cases of concern? It seems odd that this would be the case. Is the problem that a savepoint is created for each within the one transaction?

comment:8 Changed 6 years ago by jamoore

Certainly not a savepoint issue like graphs. More a RepositoryDao API issue.

comment:9 Changed 6 years ago by mtbcarroll

  • Milestone changed from 5.x to 5.1.0
  • Version set to 5.0.5

Promoted to 5.1.0 to match #10475.

(I do indeed notice with the graphs reimplementation that even after the graph traversal is all done, the actual deletion of the files is surprisingly slow.)

comment:10 Changed 6 years ago by mtbcarroll

  • Version 5.0.5 deleted

No, I didn't set the Version, bah.

comment:11 Changed 6 years ago by mtbcarroll

I notice that SqlAction allows the _fs_deletelog table to be searched by any column other than params, yet the table is wholly unindexed.

comment:12 Changed 6 years ago by jamoore

Good catch. I'd propose I add these to my DB upgrade for m3.

comment:13 Changed 6 years ago by mtbcarroll

It could also be useful to allow DeleteLogMessage and DeleteLog to carry a set of IDs instead of just one.

comment:14 Changed 6 years ago by mtbcarroll

A look at the logs finds that during Delete2I deletion of a plate there is plenty of,

2014-11-24 14:07:14,885 INFO  [                 org.perf4j.TimingLogger] (2-thread-4) start[1416838034882] time[2] tag[$16.doWork]
2014-11-24 14:07:14,885 INFO  [] (2-thread-4)  Rslt:	()
2014-11-24 14:07:14,885 INFO  [] (2-thread-4)  Executor.doWork --<file=996repo=c9c14ba0-592a-4777-a0cf-6cf9e4215c58>)
2014-11-24 14:07:14,885 INFO  [] (2-thread-4)  Args:	[null, InternalSF@1517296254]
2014-11-24 14:07:14,886 INFO  [] (2-thread-4)  Auth:	user=0,group=0,event=null(Internal),sess=e8a4c338-c244-417f-b473-0b1c472c24b2
2014-11-24 14:07:14,888 INFO  [                 org.perf4j.TimingLogger] (2-thread-4) start[1416838034885] time[2] tag[$16.doWork]
2014-11-24 14:07:14,888 INFO  [] (2-thread-4)  Rslt:	()
2014-11-24 14:07:14,888 INFO  [] (2-thread-4)  Executor.doWork --<file=996repo=4fc7caa6-5ea5-4486-b86b-de704a6d106f>)
2014-11-24 14:07:14,888 INFO  [] (2-thread-4)  Args:	[null, InternalSF@1517296254]
2014-11-24 14:07:14,889 INFO  [] (2-thread-4)  Auth:	user=0,group=0,event=null(Internal),sess=e8a4c338-c244-417f-b473-0b1c472c24b2
2014-11-24 14:07:14,891 INFO  [                 org.perf4j.TimingLogger] (2-thread-4) start[1416838034888] time[2] tag[$16.doWork]
2014-11-24 14:07:14,891 INFO  [] (2-thread-4)  Rslt:	(DeleteLog<event=19457file=996owner=239group=321path=user-2_239/2014-11/24/12-36-20.930/siRNA_PRIM1_03102008/001-365700055641/results/name=Well0001_mode1_z000_t000_detail_2008-09-18-10h48m54s.txtrepo=4fc7caa6-5ea5-4486-b86b-de704a6d106f>)
2014-11-24 14:07:14,891 INFO  [] (2-thread-4)  Executor.doWork --<file=996repo=4fc7caa6-5ea5-4486-b86b-de704a6d106f>)
2014-11-24 14:07:14,891 INFO  [] (2-thread-4)  Args:	[null, InternalSF@1517296254]
2014-11-24 14:07:14,903 INFO  [] (2-thread-4)  Auth:	user=0,group=0,event=19585(Internal),sess=e8a4c338-c244-417f-b473-0b1c472c24b2
2014-11-24 14:07:14,907 INFO  [                 org.perf4j.TimingLogger] (2-thread-4) start[1416838034891] time[15] tag[$17.doWork]
2014-11-24 14:07:14,907 INFO  [] (2-thread-4)  Rslt:	1
2014-11-24 14:07:14,907 INFO  [] (2-thread-4)  Executor.doWork --<file=996repo=ScriptRepo>)
2014-11-24 14:07:14,907 INFO  [] (2-thread-4)  Args:	[null, InternalSF@1517296254]
2014-11-24 14:07:14,909 INFO  [] (2-thread-4)  Auth:	user=0,group=0,event=null(Internal),sess=e8a4c338-c244-417f-b473-0b1c472c24b2
2014-11-24 14:07:14,911 INFO  [                 org.perf4j.TimingLogger] (2-thread-4) start[1416838034907] time[3] tag[$16.doWork]
2014-11-24 14:07:14,911 INFO  [] (2-thread-4)  Rslt:	()
2014-11-24 14:07:14,911 INFO  [] (2-thread-4)  Executor.doWork --<file=999repo=c9c14ba0-592a-4777-a0cf-6cf9e4215c58>)
2014-11-24 14:07:14,911 INFO  [] (2-thread-4)  Args:	[null, InternalSF@1517296254]
2014-11-24 14:07:14,913 INFO  [] (2-thread-4)  Auth:	user=0,group=0,event=null(Internal),sess=e8a4c338-c244-417f-b473-0b1c472c24b2

comment:15 Changed 6 years ago by jamoore

  • Milestone changed from 5.1.0 to 5.1.0-m3
  • Owner changed from mtbcarroll to jamoore

I'll start looking at the db index side of this tomorrow.

comment:16 Changed 6 years ago by jamoore

The indexes don't seem to have a huge effect:

After adding them, I started working on the number of transactions but haven't yet gotten to a point where I feel this is a MUST-INCLUDE.

comment:17 Changed 6 years ago by mtbcarroll

Thank you for the update; I guess maybe the table never gets appreciably large. I'll certainly be interested to learn if the transactions do make more of a difference, or if you gain clues as to the slowness' principal causes.

comment:18 Changed 6 years ago by mtbcarroll

I added an index to originalfile.path in case that speeds up _fs_dir_delete() usefully.

comment:19 Changed 6 years ago by jamoore

  • Owner changed from jamoore to mtbcarroll

comment:20 Changed 6 years ago by mtbcarroll

Josh: Where's your branch and what more needs done?

comment:21 Changed 6 years ago by jamoore

I'll assume you're again not getting trello notifications :) --

comment:22 Changed 6 years ago by mtbcarroll

  • Status changed from new to accepted

comment:23 Changed 6 years ago by mtbcarroll

  • Resolution set to fixed
  • Status changed from accepted to closed

comment:24 Changed 6 years ago by jmoore <josh@…>

  • Remaining Time set to 0

(In [3682570ced474c64bd6597faad46101e5f9c7ebc/ome.git] on branch develop) Reduce transactions for file deletion (Fix #11084)

Looping through large numbers of files took significant
time due to the number of transactions involved. This
now wraps all the DeleteLogMessages? into a single
DeleteLogsMessage? which can be handled in bulk.

comment:25 Changed 6 years ago by Josh Moore <josh@…>

(In [568a8f76fd1e902841f15224b65b89ce88f9b062/ome.git] on branch develop) Merge pull request #3399 from mtbc/fs-delete-logs

fix #11084: combine original file deletions into fewer transactions

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.136183 sec.)

We're Hiring!