Task #10362 (accepted)
Opened 12 years ago
Last modified 9 years ago
OMERO.scripts should call enableKeepAlive() before running a script
Reported by: | spli | Owned by: | spli |
---|---|---|---|
Priority: | minor | Milestone: | Unscheduled |
Component: | Scripting | Version: | n.a. |
Keywords: | n.a. | Cc: | analysis@… |
Resources: | n.a. | Referenced By: | n.a. |
References: | n.a. | Remaining Time: | 1.0d |
Sprint: | n.a. |
Description
If a long running script doesn't communicate with the server for a while the session will timeout (defaults 10mins). enableKeepAlive() should be called before passing the client object to the script.
Change History (16)
comment:1 Changed 12 years ago by spli
- Status changed from new to accepted
comment:2 Changed 12 years ago by jmoore
comment:3 Changed 12 years ago by spli
I can't find anything that calls ProcessI.check()
git grep '\.check(' doesn't show anything relevant.
The only thing vaguely similar is ProcessI.poll()
https://github.com/openmicroscopy/openmicroscopy/blob/develop/components/tools/OmeroPy/src/omero/processor.py#L441
but that's called by a remote client to check on the script status, and in any case the session id is different from that of the client object in the script (client.getSessionId()).
comment:4 Changed 12 years ago by jmoore
If the process is getting added to the Resources object in the context, then it's "check" method should be looked up via getattr: https://github.com/openmicroscopy/openmicroscopy/blob/develop/components/tools/OmeroPy/src/omero/util/__init__.py#L653
comment:5 Changed 12 years ago by spli
I added self.status("ProcessI.check: Session = %s, self.uuid = %s" % (self.ctx.getSession(), self.uuid)) to ProcessI.check
which outputs the following in Processor-0.log at one minute intervals
... 2013-02-14 11:39:48,383 INFO [ omero.processor.ProcessI] (Thread-2 ) <proc:39798,rc=1,uuid=c1c2d2d7-95ba-41c8-baca-fc40b22b3a10> : ProcessI.check: Session = session-057ffbce-3bea-4fb4-b36c-b902127d3398/028eb04d-df22-494d-9718-2e8ab3aee212 -t @ BlitzAdapters, self.uuid = c1c2d2d7-95ba-41c8-baca-fc40b22b3a10 2013-02-14 11:40:48,394 INFO [ omero.processor.ProcessI] (Thread-2 ) <proc:39798,rc=1,uuid=c1c2d2d7-95ba-41c8-baca-fc40b22b3a10> : ProcessI.check: Session = session-057ffbce-3bea-4fb4-b36c-b902127d3398/028eb04d-df22-494d-9718-2e8ab3aee212 -t @ BlitzAdapters, self.uuid = c1c2d2d7-95ba-41c8-baca-fc40b22b3a10 2013-02-14 11:41:48,403 INFO [ omero.processor.ProcessI] (Thread-2 ) <proc:39798,rc=1,uuid=c1c2d2d7-95ba-41c8-baca-fc40b22b3a10> : ProcessI.check: Session = session-057ffbce-3bea-4fb4-b36c-b902127d3398/028eb04d-df22-494d-9718-2e8ab3aee212 -t @ BlitzAdapters, self.uuid = c1c2d2d7-95ba-41c8-baca-fc40b22b3a10 ...
Running https://github.com/manics/omero-pychrm/blob/master/scripts/Long_Running_Script_Test.py
(with the addition of a line to output 'Session: %s\n session id: %s\n' % (session, client.getSessionId())):
$ omero script launch 2657 Number_Of_Messages=72 Log_Interval=10 Using /Users/simon/work/openmicroscopy/dist/bin/omero Previous session expired for test1 on localhost:4064 Server: [localhost] Username: [test1] Password: Created session 647eee7a-e2e2-41e8-9937-5035a91702e4 (test1@localhost:4064). Idle timeout: 10.0 min. Current group: test-read-annotate Job 1652 ready Waiting.... Callback received: FINISHED *** start stdout (id=2701)*** * {'Number_Of_Messages': 72L, 'File_Prefix': 'Test', 'Log_Interval': 10L} * Session: session-4bb51cf5-a5c6-4ca8-b795-ab772cb3defa/c1c2d2d7-95ba-41c8-baca-fc40b22b3a10 -t @ BlitzAdapters * session id: c1c2d2d7-95ba-41c8-baca-fc40b22b3a10 * Output file: /tmp/Test-20130214T112603Z-uH_SIO.txt * Duration: 0:12:00.101778 * *** end stdout *** *** start stderr (id=2702)*** * Traceback (most recent call last): * File "./script", line 92, in <module> * runScript() * File "./script", line 86, in runScript * client.setOutput('Message', rstring(message)) * File "/Users/simon/work/openmicroscopy/dist/lib/python/omero/clients.py", line 915, in setOutput * self._env(False, "setOutput", key, value) * File "/Users/simon/work/openmicroscopy/dist/lib/python/omero/clients.py", line 881, in _env * a = session.getAdminService() * File "/Users/simon/work/openmicroscopy/dist/lib/python/omero_API_ice.py", line 209, in getAdminService * return _M_omero.api.ServiceFactory._op_getAdminService.invoke(self, ((), _ctx)) * Ice.ConnectionLostException: Ice.ConnectionLostException: * recv() returned zero * *** end stderr *** *** out parameters *** *** done ***
comment:6 Changed 12 years ago by spli
<deleted>
Ignore this, it was an artifact of the asdditional debug/logging I added.
comment:7 Changed 12 years ago by spli
So assuming it's just the Ice connection that's gone, I guess the check() call wouldn't cover this since the script is run as a separate process?
comment:8 Changed 12 years ago by jburel
- Sprint changed from 2013-02-12 (5) to 2013-03-12 (6))
Moved from sprint 2013-02-12 (5)
comment:9 Changed 12 years ago by jmoore
Ah, I could see how that would happen. You're right. The Glacier2 session has its own timeout which gets bumped any time any call is made. Nothing in ProcessI is telling Glacier2 that that actual client is alive, only to keep the session alive. It *might* suffice to use joinSession() in the check method so that the same Glacier2 session is accessed, but 1) I'm not sure it'll work and 2) it makes processor.py the bottleneck, since if it hangs, all the subprocesses die.
Two things to keep in mind: if enableKeepAlive() is called in the script, we'll have to make very sure that the client gets cleaned up appropriately; and if that works, we can remove the check() call from ProcessI.
comment:10 Changed 12 years ago by spli
- Remaining Time set to 1
comment:11 Changed 12 years ago by spli
This looks harder than expected. The client/session used to launch the script in ProcessI is different from the one used by the script.
An omero.client is created in scripts.client() (omero/scripts.py:353) which is called by the executing script, not the server. The session is created in the same function (:385). If I enableKeepAlive() here I can't see a way to get a reference to this client/session from the parent ProcessI to clean-up, though it seems to eventually timeout around 15 minutes after the script completes. (Why 15 minutes? Session and Glacier timeouts are 10 minutes.)
comment:12 Changed 12 years ago by jburel
- Sprint changed from 2013-03-12 (6)) to 2013-04-09 (7))
Moved from sprint 2013-03-12 (6))
comment:13 Changed 12 years ago by jmoore
Simon: I don't have any immediate ideas from your description above. Happy to walk through code with you if that will help.
comment:14 Changed 11 years ago by jburel
- Sprint changed from 2013-04-09 (7)) to 2013-05-07 (8)
Moved from sprint 2013-04-09 (7))
comment:15 Changed 11 years ago by jburel
- Milestone changed from OMERO-4.4.x to OMERO-4.4.9
- Sprint changed from 2013-05-07 (8) to Blocker 4.4.9 (1)
Moved from sprint 2013-05-07 (8)
comment:16 Changed 9 years ago by jamoore
- Milestone changed from 5.x to Unscheduled
Simon, https://github.com/openmicroscopy/openmicroscopy/blob/develop/components/tools/OmeroPy/src/omero/processor.py#L289 shows the method that I would have expected to keep the session for the ProcessI servant alive.