Task #9477 (closed)
BUG: unittest leads to deadlock
Reported by: | cneves | Owned by: | jamoore |
---|---|---|---|
Priority: | blocker | Milestone: | OMERO-4.4.4 |
Component: | OmeroPy | Version: | n.a. |
Keywords: | n.a. | Cc: | python-team@… |
Resources: | n.a. | Referenced By: | n.a. |
References: | n.a. | Remaining Time: | 0.0d |
Sprint: | 2012-08-28 (3) |
Description
From early this week that develop + https://github.com/cneves/openmicroscopy/tree/postdecorators/7202_unittest_review leads to a deadlock when running python setup.py test -s test.gatewaytest.chgrp.ChgrpTest?.testMultiDatasetDoAll
The lockup always happens on this particular test but the place where it locks up is in https://github.com/openmicroscopy/openmicroscopy/blob/develop/components/tools/OmeroPy/src/omero/clients.py#L827
I haven't yet reduced the particular test to find out what is triggering this, but since it doesn't always happen it feels like some sort of race condition. Having that in mind I added a 10ms sleep that prevents the deadlock completely when placed after https://github.com/openmicroscopy/openmicroscopy/blob/develop/components/tools/OmeroPy/src/omero/clients.py#L856
try: r = 1 while r > 0: count += 1 r = svc.closeSession(s) time.sleep(0.01) except omero.RemovedSessionException: pass except: self.__logger.warning("Unknown exception while closing all references", exc_info = True) # Now the server-side session is dead, call closeSession() self.closeSession()
prevents the call to self.closeSession from locking, where placing that same sleep before the svc.closeSession(s) line or after
def closeSession(self): """ Closes the Router connection created by createSession(). Due to a bug in Ice, only one connection is allowed per communicator, so we also destroy the communicator. """ self.__lock.acquire()
makes little difference. It does seem to make the dead locks become slightly less consistent, but still most calls to that test will end in deadlock.
There is one extra detail in play, the test infrastructure as well as the test proper will create and destroy session and connections, so this code path is exercised considerably. Even so, the deadlock only happens to the extent of my testing on this particular unittest and on the tearDown code path for the test.
I will now attempt to identify where this test is different and triggers the deadlock.
Change History (8)
comment:1 Changed 12 years ago by jmoore
- Milestone changed from Unscheduled to OMERO-4.4.2
comment:2 Changed 12 years ago by jmoore
- Sprint set to 2012-08-28 (3)
comment:3 Changed 12 years ago by cneves
https://github.com/cneves/openmicroscopy/commit/dada5b7b2a51de44616ebeab88c99d7f75c9c71d properly closes a callback, which makes the time.sleep unneeded and prevents this deadlock. While it shouldn't be possible to deadlock the client in this way, it is also required to properly close all servants.
comment:4 Changed 12 years ago by jmoore
Hang takes place at this point:
**************************************************************************************************** Depth: 1 File "test/gatewaytest/chgrp.py", line 296, in <module> unittest.main() File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 817, in __init__ self.runTests() File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 861, in runTests result = testRunner.run(self.test) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 753, in run test(result) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 464, in __call__ return self.run(*args, **kwds) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 460, in run test(result) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 464, in __call__ return self.run(*args, **kwds) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 460, in run test(result) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 300, in __call__ return self.run(*args, **kwds) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 289, in run self.tearDown() File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/gateway/scripts/testdb_create.py", line 103, in tearDown self.gateway.seppuku() File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/gateway/__init__.py", line 1484, in seppuku self._closeSession() File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/gateway/__init__.py", line 1602, in _closeSession self.c.killSession() File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/clients.py", line 887, in killSession self.closeSession() File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/clients.py", line 803, in closeSession self.__lock.acquire() File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/clients.py", line 106, in acquire TB() **************************************************************************************************** Depth: 2 File "test/gatewaytest/chgrp.py", line 296, in <module> unittest.main() File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 817, in __init__ self.runTests() File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 861, in runTests result = testRunner.run(self.test) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 753, in run test(result) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 464, in __call__ return self.run(*args, **kwds) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 460, in run test(result) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 464, in __call__ return self.run(*args, **kwds) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 460, in run test(result) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 300, in __call__ return self.run(*args, **kwds) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 289, in run self.tearDown() File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/gateway/scripts/testdb_create.py", line 103, in tearDown self.gateway.seppuku() File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/gateway/__init__.py", line 1484, in seppuku self._closeSession() File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/gateway/__init__.py", line 1602, in _closeSession self.c.killSession() File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/clients.py", line 887, in killSession self.closeSession() File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/clients.py", line 807, in closeSession self.stopKeepAlive() File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/clients.py", line 626, in stopKeepAlive self.__lock.acquire() File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/clients.py", line 106, in acquire TB() ---------------------------------------------------------------------------------------------------- Releasing... File "test/gatewaytest/chgrp.py", line 296, in <module> unittest.main() File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 817, in __init__ self.runTests() File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 861, in runTests result = testRunner.run(self.test) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 753, in run test(result) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 464, in __call__ return self.run(*args, **kwds) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 460, in run test(result) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 464, in __call__ return self.run(*args, **kwds) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 460, in run test(result) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 300, in __call__ return self.run(*args, **kwds) File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 289, in run self.tearDown() File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/gateway/scripts/testdb_create.py", line 103, in tearDown self.gateway.seppuku() File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/gateway/__init__.py", line 1484, in seppuku self._closeSession() File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/gateway/__init__.py", line 1602, in _closeSession self.c.killSession() File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/clients.py", line 887, in killSession self.closeSession() File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/clients.py", line 807, in closeSession self.stopKeepAlive() File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/clients.py", line 635, in stopKeepAlive self.__lock.release() File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/clients.py", line 111, in release TB() ****************************************************************************************************
comment:5 Changed 12 years ago by jmoore
Caused more specifically by the use of getSession in DetailsI by the ObjectFactoryRegistrar. I.e. in a background thread if an object was returned, it would block. In order to work around this, I'll add a non-blocking version of getSession().
comment:6 Changed 12 years ago by jmoore
- Resolution set to fixed
- Status changed from new to closed
comment:7 Changed 12 years ago by jmoore <josh@…>
- Remaining Time set to 0
(In [f316e42bfe99545b4385f0129d6edc11e0b76e38/ome.git] on branch develop) Add getSession(False) for non-blocking lookup (Fix #9477)
The call to getSession in the ctor of DetailsI was hanging
since it was within an Ice thread which ic.destroy was
waiting on. Other uses of getSession should not hang on
destroy.
comment:8 Changed 12 years ago by Carlos Neves <carlos@…>
(In [dada5b7b2a51de44616ebeab88c99d7f75c9c71d/ome.git] on branch develop) Preventing potential deadlock by properly closing callback servent (see #9477)
Carlos, feel free to put these in the current milestone.