Warning: Can't synchronize with repository "(default)" (/home/git/ome.git does not appear to be a Git repository.). Look in the Trac log for more information.
Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Bug #1078 (closed)

Opened 16 years ago

Closed 15 years ago

Deadlocks in client when talking to Blitz

Reported by: jamoore Owned by: jamoore
Priority: critical Cc: cxallan, atarkowska, dzmacdonald, cblackburn, jburel, carlos@…
Sprint: n.a.
Total Remaining Time: n.a.

Description

Related to OmeroThrottling, there are cases (server-side bugs) in which the client can hang. Many of these have been fixed, and are related to missing try/finally blocks preventing ice_response() and ice_exception() from being called.

In general, such issues can be corrected by using the `Ice.Override.Timeout` on a client-wide basis or setting the timeout on a particular proxy:

   proxy = proxy.ice_timeout(secs)

See http://zeroc.com/doc/Ice-3.3.0/manual/Adv_server.29.12.html for more, especially information on the fatality of timeouts:

"You should also be aware that timeouts are considered fatal error conditions by the Ice run time and result in connection closure on the client side. Furthermore, any other requests pending on the same connection also fail with an exception. Timeouts are meant to be used to prevent a client from blocking indefinitely in case something has gone wrong with the server; they are not meant as a mechanism to routinely abort requests that take longer than intended."

Please report any instances which you see. One in particular from Chris:

11:15:28  chris@jabber: If you get Ice.ConnectionException anywhere and then try and do anything with a service you're deadlocked.
...
11:16:36  chris@jabber: Last night when I had a running Python interpreter for testing I of course left it too long, then did s.createSession() again, hit a service I had before and then boom, deadlock.
...
11:17:40  chris@jabber: Basically:
11:17:49  chris@jabber: s = c.createSession("root", "ome")
11:18:01  chris@jabber: query = s.getQueryService()
11:18:07  chris@jabber: ... wait some time
11:18:21  chris@jabber: query.findAllByQuery(...)
11:18:40  chris@jabber: Ice.ConnectionException (ie. session timeout)
11:18:56  chris@jabber: s = c.createSession("root", "ome")
11:19:07  chris@jabber: query.findAllByQuery(...)
11:19:13  chris@jabber: ... wait forever

Change History (7)

comment:1 Changed 15 years ago by jmoore

Another almost certainly related issue:

  c = omero.client()
  s = c.createSession()
  # wait for timeout which now calls closeSession() in the callback
  c._getCb() # --> NoneType on self.__oa
  c.createSession() # HANGS!

comment:2 Changed 15 years ago by jburel

  • Cc jburel added

comment:3 Changed 15 years ago by jmoore

  • Cc carlos@… added

From Carlos:

Hey,

So I can't mimic this issue locally. Several things happened, but it all
starts with a MemoryLimitException on renderCompressed, followed by an
attempt to recreate the rendering engine that fails with a lost
connection.

Now weblitz tries to reconnect, and then renderCompressed again, which
correctly fails because the automatic retry does not prepare the pixels.
The exact way this happens apparently gets me in an infinite loop, as the
processes stay at 20% CPU forever (look at pid 19440 for example @ envy)

The following is a (too big yet not sufficiently detailed) stack trace of
where the thread is after the above:

#0  0x962f13ae in __semwait_signal ()
#1  0x9631c326 in _pthread_cond_wait ()
#2  0x9631bd0d in pthread_cond_wait$UNIX2003 ()
#3  0x011621a5 in IceUtil::Monitor<IceUtil::Mutex>::wait ()
#4  0x012123ff in IceInternal::Outgoing::invoke ()
#5  0x0123cb28 in IceDelegateM::Ice::Object::ice_invoke ()
#6  0x0123be8a in IceProxy::Ice::Object::ice_invoke ()
#7  0x0123c064 in IceProxy::Ice::Object::ice_invoke ()
#8  0x010233f2 in IcePy::SyncTypedInvocation::invoke ()
#9  0x01019ab2 in operationInvoke ()
#10 0x0018d806 in PyEval_EvalFrameEx ()
#11 0x0018f45b in PyEval_EvalCodeEx ()
#12 0x00139c27 in PyFunction_SetClosure ()
#13 0x0011fd3d in PyObject_Call ()
#14 0x0018dfb8 in PyEval_EvalFrameEx ()
#15 0x0018f45b in PyEval_EvalCodeEx ()
#16 0x00139c27 in PyFunction_SetClosure ()
#17 0x0011fd3d in PyObject_Call ()
#18 0x0018dfb8 in PyEval_EvalFrameEx ()
#19 0x0018f45b in PyEval_EvalCodeEx ()
#20 0x0018da85 in PyEval_EvalFrameEx ()
#21 0x0018f45b in PyEval_EvalCodeEx ()
#22 0x00139c27 in PyFunction_SetClosure ()
#23 0x0011fd3d in PyObject_Call ()
#24 0x0018dfb8 in PyEval_EvalFrameEx ()
#25 0x0018f45b in PyEval_EvalCodeEx ()
#26 0x0018da85 in PyEval_EvalFrameEx ()
#27 0x0018f45b in PyEval_EvalCodeEx ()
#28 0x00139c27 in PyFunction_SetClosure ()
#29 0x0011fd3d in PyObject_Call ()
#30 0x0018dfb8 in PyEval_EvalFrameEx ()
#31 0x0018f45b in PyEval_EvalCodeEx ()
#32 0x00139c27 in PyFunction_SetClosure ()
#33 0x0011fd3d in PyObject_Call ()
#34 0x0018dfb8 in PyEval_EvalFrameEx ()
#35 0x0018d9e8 in PyEval_EvalFrameEx ()
#36 0x0018f45b in PyEval_EvalCodeEx ()
#37 0x00139c27 in PyFunction_SetClosure ()
#38 0x0011fd3d in PyObject_Call ()
#39 0x001285f8 in PyMethod_New ()
#40 0x0011fd3d in PyObject_Call ()
#41 0x001624b4 in _PyObject_SlotCompare ()
#42 0x0011fd3d in PyObject_Call ()
#43 0x0018db1a in PyEval_EvalFrameEx ()
#44 0x0018f45b in PyEval_EvalCodeEx ()
#45 0x0018da85 in PyEval_EvalFrameEx ()
#46 0x0018d9e8 in PyEval_EvalFrameEx ()
#47 0x0018d9e8 in PyEval_EvalFrameEx ()
#48 0x0018d9e8 in PyEval_EvalFrameEx ()
#49 0x0018d9e8 in PyEval_EvalFrameEx ()
#50 0x0018d9e8 in PyEval_EvalFrameEx ()
#51 0x0018d9e8 in PyEval_EvalFrameEx ()
#52 0x0018d9e8 in PyEval_EvalFrameEx ()
#53 0x0018f45b in PyEval_EvalCodeEx ()
#54 0x00139c27 in PyFunction_SetClosure ()
#55 0x0011fd3d in PyObject_Call ()
#56 0x001285f8 in PyMethod_New ()
#57 0x0011fd3d in PyObject_Call ()
#58 0x0018db1a in PyEval_EvalFrameEx ()
#59 0x0018d9e8 in PyEval_EvalFrameEx ()
#60 0x0018f45b in PyEval_EvalCodeEx ()
#61 0x0018da85 in PyEval_EvalFrameEx ()
#62 0x0018f45b in PyEval_EvalCodeEx ()
#63 0x00139c27 in PyFunction_SetClosure ()
#64 0x0011fd3d in PyObject_Call ()
#65 0x0018dfb8 in PyEval_EvalFrameEx ()
#66 0x0018f45b in PyEval_EvalCodeEx ()
#67 0x00139c27 in PyFunction_SetClosure ()
#68 0x0011fd3d in PyObject_Call ()
#69 0x0018dfb8 in PyEval_EvalFrameEx ()
#70 0x0018d9e8 in PyEval_EvalFrameEx ()
#71 0x0018d9e8 in PyEval_EvalFrameEx ()
#72 0x0018f45b in PyEval_EvalCodeEx ()
#73 0x0018da85 in PyEval_EvalFrameEx ()
#74 0x0018f45b in PyEval_EvalCodeEx ()
#75 0x0018f548 in PyEval_EvalCode ()
#76 0x001a69ec in PyErr_Display ()
#77 0x001a7016 in PyRun_FileExFlags ()
#78 0x001a8982 in PyRun_SimpleFileExFlags ()
#79 0x001b3c03 in Py_Main ()
#80 0x00001fca in ?? ()

I really think this is not weblitz specific, but right now I don't have
the brain power to look further. If I am to handle the
MemoryLimitException specifically (i.e. if it is a weblitz specific
problem) please keep the envy server with the current settings so I can
reproduce!

then:

It was not django, it turns out. I keep changing my mind, I know, but this
time I can actually reproduce this issue manually :)

So the basic issue is calling renderingEngine.renderCompressed on an image
that triggers a MemoryLimitException (id=58) and then calling it again on
the same engine. This hangs forever at the same position of the stack
trace I sent previously.

I'm catching this specifically in weblitz and it no longer hangs for me,
although I'm still working on a few details before posting, specifically
the fact that the connection is losing the session after the above issue
happens and shouldn't.

Almost there!

comment:4 Changed 15 years ago by jmoore

Reading the Ice forums for "hang", my current leads are:

  • ACM (active connection management) - that the connection is killed and the wait is happening while Ice tries to reconnect possibly to a port that's not active;
  • connection timeouts - if ACM is enough, it might suffice to have a connection timeout;
  • or thread starvation - with the new callbacks perhaps there's some recursive call that's blocking.

comment:5 Changed 15 years ago by jmoore

Carlos, can you let me know what's missing from the example:

import omero, Ice, time
c = omero.client()
s = c.createSession()
r = s.createRenderingEngine()
r.lookupPixels(59)
r.lookupRenderingDef(59)
r.load()
r.resetDefaults()
r.lookupRenderingDef(59)
r.load()

pd = omero.romio.PlaneDef()
pd.slice = 0
pd.t = 0
pd.z = 0

try:
    r.renderCompressed(pd)
except Ice.MemoryLimitException:
    print "MemoryLimitException"

r.renderCompressed(pd)

Testing against envy id=59 from Mac with 4.0, 3.2 and from envy itself with 3.2 I get:

MemoryLimitException
Traceback (most recent call last):
  File "maxmemory.py", line 24, in <module>
    r.renderCompressed(pd)
  File "/home/josh/root/green/weblitz/weblitz_gateway/blitz_gateway/lib/omero_API_ice.py", line 2956, in renderCompressed
    return _M_omero.api.RenderingEngine._op_renderCompressed.invoke(self, ((_def, ), _ctx))
Ice.ConnectionLostException: Ice.ConnectionLostException:
recv() returned zero

comment:6 Changed 15 years ago by jmoore

  • Milestone changed from OMERO-Beta4 to OMERO-Beta4.1

r3461 adds ConnectTimeout which should solve much of this.

On the other hand, errors in AMD usage -- i.e. not calling ice_response() or ice_exception() will still block, and we'll have to handle those as they happen. (The other possibility would be to set Ice.Override.Timeout but with some of our long running calls at the moment, that could inadvertently kill the user session). And, there may also still be issues with IceStorm.

Moving for the moment; just keep an eye out for hanging situtations.

comment:7 Changed 15 years ago by jmoore

  • Milestone changed from Unscheduled to OMERO-Beta4.1
  • Resolution set to fixed
  • Status changed from new to closed

Seems to have settled down in the 4.0.x line. Reopen as necessary.

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.269686 sec.)

We're Hiring!