Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Task #10362 (accepted)

Opened 7 years ago

Last modified 4 years ago

OMERO.scripts should call enableKeepAlive() before running a script

Reported by: spli Owned by: spli
Priority: minor Milestone: Unscheduled
Component: Scripting Version: n.a.
Keywords: n.a. Cc: analysis@…
Resources: n.a. Referenced By: n.a.
References: n.a. Remaining Time: 1.0d
Sprint: n.a.

Description

If a long running script doesn't communicate with the server for a while the session will timeout (defaults 10mins). enableKeepAlive() should be called before passing the client object to the script.

Change History (16)

comment:1 Changed 7 years ago by spli

  • Status changed from new to accepted

comment:2 Changed 7 years ago by jmoore

Simon, https://github.com/openmicroscopy/openmicroscopy/blob/develop/components/tools/OmeroPy/src/omero/processor.py#L289 shows the method that I would have expected to keep the session for the ProcessI servant alive.

comment:3 Changed 7 years ago by spli

I can't find anything that calls ProcessI.check()

git grep '\.check(' doesn't show anything relevant.

The only thing vaguely similar is ProcessI.poll()
https://github.com/openmicroscopy/openmicroscopy/blob/develop/components/tools/OmeroPy/src/omero/processor.py#L441

but that's called by a remote client to check on the script status, and in any case the session id is different from that of the client object in the script (client.getSessionId()).

comment:4 Changed 7 years ago by jmoore

If the process is getting added to the Resources object in the context, then it's "check" method should be looked up via getattr: https://github.com/openmicroscopy/openmicroscopy/blob/develop/components/tools/OmeroPy/src/omero/util/__init__.py#L653

comment:5 Changed 7 years ago by spli

I added self.status("ProcessI.check: Session = %s, self.uuid = %s" % (self.ctx.getSession(), self.uuid)) to ProcessI.check
which outputs the following in Processor-0.log at one minute intervals

...
2013-02-14 11:39:48,383 INFO  [                omero.processor.ProcessI] (Thread-2  ) <proc:39798,rc=1,uuid=c1c2d2d7-95ba-41c8-baca-fc40b22b3a10> : ProcessI.check: Session = session-057ffbce-3bea-4fb4-b36c-b902127d3398/028eb04d-df22-494d-9718-2e8ab3aee212 -t @ BlitzAdapters, self.uuid = c1c2d2d7-95ba-41c8-baca-fc40b22b3a10
2013-02-14 11:40:48,394 INFO  [                omero.processor.ProcessI] (Thread-2  ) <proc:39798,rc=1,uuid=c1c2d2d7-95ba-41c8-baca-fc40b22b3a10> : ProcessI.check: Session = session-057ffbce-3bea-4fb4-b36c-b902127d3398/028eb04d-df22-494d-9718-2e8ab3aee212 -t @ BlitzAdapters, self.uuid = c1c2d2d7-95ba-41c8-baca-fc40b22b3a10
2013-02-14 11:41:48,403 INFO  [                omero.processor.ProcessI] (Thread-2  ) <proc:39798,rc=1,uuid=c1c2d2d7-95ba-41c8-baca-fc40b22b3a10> : ProcessI.check: Session = session-057ffbce-3bea-4fb4-b36c-b902127d3398/028eb04d-df22-494d-9718-2e8ab3aee212 -t @ BlitzAdapters, self.uuid = c1c2d2d7-95ba-41c8-baca-fc40b22b3a10
...

Running https://github.com/manics/omero-pychrm/blob/master/scripts/Long_Running_Script_Test.py
(with the addition of a line to output 'Session: %s\n session id: %s\n' % (session, client.getSessionId())):

$ omero script launch 2657 Number_Of_Messages=72 Log_Interval=10
Using /Users/simon/work/openmicroscopy/dist/bin/omero
Previous session expired for test1 on localhost:4064
Server: [localhost]
Username: [test1]
Password:
Created session 647eee7a-e2e2-41e8-9937-5035a91702e4 (test1@localhost:4064). Idle timeout: 10.0 min. Current group: test-read-annotate
Job 1652 ready
Waiting....
Callback received: FINISHED

        *** start stdout (id=2701)***
        * {'Number_Of_Messages': 72L, 'File_Prefix': 'Test', 'Log_Interval': 10L}
        * Session: session-4bb51cf5-a5c6-4ca8-b795-ab772cb3defa/c1c2d2d7-95ba-41c8-baca-fc40b22b3a10 -t @ BlitzAdapters
        *  session id: c1c2d2d7-95ba-41c8-baca-fc40b22b3a10
        * Output file: /tmp/Test-20130214T112603Z-uH_SIO.txt
        * Duration: 0:12:00.101778
        *
        *** end stdout ***


        *** start stderr (id=2702)***
        * Traceback (most recent call last):
        *   File "./script", line 92, in <module>
        *     runScript()
        *   File "./script", line 86, in runScript
        *     client.setOutput('Message', rstring(message))
        *   File "/Users/simon/work/openmicroscopy/dist/lib/python/omero/clients.py", line 915, in setOutput
        *     self._env(False, "setOutput", key, value)
        *   File "/Users/simon/work/openmicroscopy/dist/lib/python/omero/clients.py", line 881, in _env
        *     a = session.getAdminService()
        *   File "/Users/simon/work/openmicroscopy/dist/lib/python/omero_API_ice.py", line 209, in getAdminService
        *     return _M_omero.api.ServiceFactory._op_getAdminService.invoke(self, ((), _ctx))
        * Ice.ConnectionLostException: Ice.ConnectionLostException:
        * recv() returned zero
        *
        *** end stderr ***


        *** out parameters ***
        ***  done ***

comment:6 Changed 7 years ago by spli

<deleted>
Ignore this, it was an artifact of the asdditional debug/logging I added.

Last edited 7 years ago by spli (previous) (diff)

comment:7 Changed 7 years ago by spli

So assuming it's just the Ice connection that's gone, I guess the check() call wouldn't cover this since the script is run as a separate process?

Last edited 7 years ago by spli (previous) (diff)

comment:8 Changed 7 years ago by jburel

  • Sprint changed from 2013-02-12 (5) to 2013-03-12 (6))

Moved from sprint 2013-02-12 (5)

comment:9 Changed 7 years ago by jmoore

Ah, I could see how that would happen. You're right. The Glacier2 session has its own timeout which gets bumped any time any call is made. Nothing in ProcessI is telling Glacier2 that that actual client is alive, only to keep the session alive. It *might* suffice to use joinSession() in the check method so that the same Glacier2 session is accessed, but 1) I'm not sure it'll work and 2) it makes processor.py the bottleneck, since if it hangs, all the subprocesses die.

Two things to keep in mind: if enableKeepAlive() is called in the script, we'll have to make very sure that the client gets cleaned up appropriately; and if that works, we can remove the check() call from ProcessI.

comment:10 Changed 7 years ago by spli

  • Remaining Time set to 1

comment:11 Changed 7 years ago by spli

This looks harder than expected. The client/session used to launch the script in ProcessI is different from the one used by the script.

An omero.client is created in scripts.client() (omero/scripts.py:353) which is called by the executing script, not the server. The session is created in the same function (:385). If I enableKeepAlive() here I can't see a way to get a reference to this client/session from the parent ProcessI to clean-up, though it seems to eventually timeout around 15 minutes after the script completes. (Why 15 minutes? Session and Glacier timeouts are 10 minutes.)

comment:12 Changed 7 years ago by jburel

  • Sprint changed from 2013-03-12 (6)) to 2013-04-09 (7))

Moved from sprint 2013-03-12 (6))

comment:13 Changed 7 years ago by jmoore

Simon: I don't have any immediate ideas from your description above. Happy to walk through code with you if that will help.

comment:14 Changed 7 years ago by jburel

  • Sprint changed from 2013-04-09 (7)) to 2013-05-07 (8)

Moved from sprint 2013-04-09 (7))

comment:15 Changed 6 years ago by jburel

  • Milestone changed from OMERO-4.4.x to OMERO-4.4.9
  • Sprint changed from 2013-05-07 (8) to Blocker 4.4.9 (1)

Moved from sprint 2013-05-07 (8)

comment:16 Changed 4 years ago by jamoore

  • Milestone changed from 5.x to Unscheduled
Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.75938 sec.)

We're Hiring!