Bug #1078 (closed)
Opened 16 years ago
Closed 15 years ago
Deadlocks in client when talking to Blitz
Reported by: | jamoore | Owned by: | jamoore |
---|---|---|---|
Priority: | critical | Cc: | cxallan, atarkowska, dzmacdonald, cblackburn, jburel, carlos@… |
Sprint: | n.a. | ||
Total Remaining Time: | n.a. |
Description
Related to OmeroThrottling, there are cases (server-side bugs) in which the client can hang. Many of these have been fixed, and are related to missing try/finally blocks preventing ice_response() and ice_exception() from being called.
In general, such issues can be corrected by using the `Ice.Override.Timeout` on a client-wide basis or setting the timeout on a particular proxy:
proxy = proxy.ice_timeout(secs)
See http://zeroc.com/doc/Ice-3.3.0/manual/Adv_server.29.12.html for more, especially information on the fatality of timeouts:
"You should also be aware that timeouts are considered fatal error conditions by the Ice run time and result in connection closure on the client side. Furthermore, any other requests pending on the same connection also fail with an exception. Timeouts are meant to be used to prevent a client from blocking indefinitely in case something has gone wrong with the server; they are not meant as a mechanism to routinely abort requests that take longer than intended."
Please report any instances which you see. One in particular from Chris:
11:15:28 chris@jabber: If you get Ice.ConnectionException anywhere and then try and do anything with a service you're deadlocked. ... 11:16:36 chris@jabber: Last night when I had a running Python interpreter for testing I of course left it too long, then did s.createSession() again, hit a service I had before and then boom, deadlock. ... 11:17:40 chris@jabber: Basically: 11:17:49 chris@jabber: s = c.createSession("root", "ome") 11:18:01 chris@jabber: query = s.getQueryService() 11:18:07 chris@jabber: ... wait some time 11:18:21 chris@jabber: query.findAllByQuery(...) 11:18:40 chris@jabber: Ice.ConnectionException (ie. session timeout) 11:18:56 chris@jabber: s = c.createSession("root", "ome") 11:19:07 chris@jabber: query.findAllByQuery(...) 11:19:13 chris@jabber: ... wait forever
Change History (7)
comment:1 Changed 15 years ago by jmoore
comment:2 Changed 15 years ago by jburel
- Cc jburel added
comment:3 Changed 15 years ago by jmoore
- Cc carlos@… added
From Carlos:
Hey, So I can't mimic this issue locally. Several things happened, but it all starts with a MemoryLimitException on renderCompressed, followed by an attempt to recreate the rendering engine that fails with a lost connection. Now weblitz tries to reconnect, and then renderCompressed again, which correctly fails because the automatic retry does not prepare the pixels. The exact way this happens apparently gets me in an infinite loop, as the processes stay at 20% CPU forever (look at pid 19440 for example @ envy) The following is a (too big yet not sufficiently detailed) stack trace of where the thread is after the above: #0 0x962f13ae in __semwait_signal () #1 0x9631c326 in _pthread_cond_wait () #2 0x9631bd0d in pthread_cond_wait$UNIX2003 () #3 0x011621a5 in IceUtil::Monitor<IceUtil::Mutex>::wait () #4 0x012123ff in IceInternal::Outgoing::invoke () #5 0x0123cb28 in IceDelegateM::Ice::Object::ice_invoke () #6 0x0123be8a in IceProxy::Ice::Object::ice_invoke () #7 0x0123c064 in IceProxy::Ice::Object::ice_invoke () #8 0x010233f2 in IcePy::SyncTypedInvocation::invoke () #9 0x01019ab2 in operationInvoke () #10 0x0018d806 in PyEval_EvalFrameEx () #11 0x0018f45b in PyEval_EvalCodeEx () #12 0x00139c27 in PyFunction_SetClosure () #13 0x0011fd3d in PyObject_Call () #14 0x0018dfb8 in PyEval_EvalFrameEx () #15 0x0018f45b in PyEval_EvalCodeEx () #16 0x00139c27 in PyFunction_SetClosure () #17 0x0011fd3d in PyObject_Call () #18 0x0018dfb8 in PyEval_EvalFrameEx () #19 0x0018f45b in PyEval_EvalCodeEx () #20 0x0018da85 in PyEval_EvalFrameEx () #21 0x0018f45b in PyEval_EvalCodeEx () #22 0x00139c27 in PyFunction_SetClosure () #23 0x0011fd3d in PyObject_Call () #24 0x0018dfb8 in PyEval_EvalFrameEx () #25 0x0018f45b in PyEval_EvalCodeEx () #26 0x0018da85 in PyEval_EvalFrameEx () #27 0x0018f45b in PyEval_EvalCodeEx () #28 0x00139c27 in PyFunction_SetClosure () #29 0x0011fd3d in PyObject_Call () #30 0x0018dfb8 in PyEval_EvalFrameEx () #31 0x0018f45b in PyEval_EvalCodeEx () #32 0x00139c27 in PyFunction_SetClosure () #33 0x0011fd3d in PyObject_Call () #34 0x0018dfb8 in PyEval_EvalFrameEx () #35 0x0018d9e8 in PyEval_EvalFrameEx () #36 0x0018f45b in PyEval_EvalCodeEx () #37 0x00139c27 in PyFunction_SetClosure () #38 0x0011fd3d in PyObject_Call () #39 0x001285f8 in PyMethod_New () #40 0x0011fd3d in PyObject_Call () #41 0x001624b4 in _PyObject_SlotCompare () #42 0x0011fd3d in PyObject_Call () #43 0x0018db1a in PyEval_EvalFrameEx () #44 0x0018f45b in PyEval_EvalCodeEx () #45 0x0018da85 in PyEval_EvalFrameEx () #46 0x0018d9e8 in PyEval_EvalFrameEx () #47 0x0018d9e8 in PyEval_EvalFrameEx () #48 0x0018d9e8 in PyEval_EvalFrameEx () #49 0x0018d9e8 in PyEval_EvalFrameEx () #50 0x0018d9e8 in PyEval_EvalFrameEx () #51 0x0018d9e8 in PyEval_EvalFrameEx () #52 0x0018d9e8 in PyEval_EvalFrameEx () #53 0x0018f45b in PyEval_EvalCodeEx () #54 0x00139c27 in PyFunction_SetClosure () #55 0x0011fd3d in PyObject_Call () #56 0x001285f8 in PyMethod_New () #57 0x0011fd3d in PyObject_Call () #58 0x0018db1a in PyEval_EvalFrameEx () #59 0x0018d9e8 in PyEval_EvalFrameEx () #60 0x0018f45b in PyEval_EvalCodeEx () #61 0x0018da85 in PyEval_EvalFrameEx () #62 0x0018f45b in PyEval_EvalCodeEx () #63 0x00139c27 in PyFunction_SetClosure () #64 0x0011fd3d in PyObject_Call () #65 0x0018dfb8 in PyEval_EvalFrameEx () #66 0x0018f45b in PyEval_EvalCodeEx () #67 0x00139c27 in PyFunction_SetClosure () #68 0x0011fd3d in PyObject_Call () #69 0x0018dfb8 in PyEval_EvalFrameEx () #70 0x0018d9e8 in PyEval_EvalFrameEx () #71 0x0018d9e8 in PyEval_EvalFrameEx () #72 0x0018f45b in PyEval_EvalCodeEx () #73 0x0018da85 in PyEval_EvalFrameEx () #74 0x0018f45b in PyEval_EvalCodeEx () #75 0x0018f548 in PyEval_EvalCode () #76 0x001a69ec in PyErr_Display () #77 0x001a7016 in PyRun_FileExFlags () #78 0x001a8982 in PyRun_SimpleFileExFlags () #79 0x001b3c03 in Py_Main () #80 0x00001fca in ?? () I really think this is not weblitz specific, but right now I don't have the brain power to look further. If I am to handle the MemoryLimitException specifically (i.e. if it is a weblitz specific problem) please keep the envy server with the current settings so I can reproduce!
then:
It was not django, it turns out. I keep changing my mind, I know, but this time I can actually reproduce this issue manually :) So the basic issue is calling renderingEngine.renderCompressed on an image that triggers a MemoryLimitException (id=58) and then calling it again on the same engine. This hangs forever at the same position of the stack trace I sent previously. I'm catching this specifically in weblitz and it no longer hangs for me, although I'm still working on a few details before posting, specifically the fact that the connection is losing the session after the above issue happens and shouldn't. Almost there!
comment:4 Changed 15 years ago by jmoore
Reading the Ice forums for "hang", my current leads are:
- ACM (active connection management) - that the connection is killed and the wait is happening while Ice tries to reconnect possibly to a port that's not active;
- connection timeouts - if ACM is enough, it might suffice to have a connection timeout;
- or thread starvation - with the new callbacks perhaps there's some recursive call that's blocking.
comment:5 Changed 15 years ago by jmoore
Carlos, can you let me know what's missing from the example:
import omero, Ice, time c = omero.client() s = c.createSession() r = s.createRenderingEngine() r.lookupPixels(59) r.lookupRenderingDef(59) r.load() r.resetDefaults() r.lookupRenderingDef(59) r.load() pd = omero.romio.PlaneDef() pd.slice = 0 pd.t = 0 pd.z = 0 try: r.renderCompressed(pd) except Ice.MemoryLimitException: print "MemoryLimitException" r.renderCompressed(pd)
Testing against envy id=59 from Mac with 4.0, 3.2 and from envy itself with 3.2 I get:
MemoryLimitException Traceback (most recent call last): File "maxmemory.py", line 24, in <module> r.renderCompressed(pd) File "/home/josh/root/green/weblitz/weblitz_gateway/blitz_gateway/lib/omero_API_ice.py", line 2956, in renderCompressed return _M_omero.api.RenderingEngine._op_renderCompressed.invoke(self, ((_def, ), _ctx)) Ice.ConnectionLostException: Ice.ConnectionLostException: recv() returned zero
comment:6 Changed 15 years ago by jmoore
- Milestone changed from OMERO-Beta4 to OMERO-Beta4.1
r3461 adds ConnectTimeout which should solve much of this.
On the other hand, errors in AMD usage -- i.e. not calling ice_response() or ice_exception() will still block, and we'll have to handle those as they happen. (The other possibility would be to set Ice.Override.Timeout but with some of our long running calls at the moment, that could inadvertently kill the user session). And, there may also still be issues with IceStorm.
Moving for the moment; just keep an eye out for hanging situtations.
comment:7 Changed 15 years ago by jmoore
- Milestone changed from Unscheduled to OMERO-Beta4.1
- Resolution set to fixed
- Status changed from new to closed
Seems to have settled down in the 4.0.x line. Reopen as necessary.
Another almost certainly related issue: