Warning: Can't synchronize with repository "(default)" (/home/git/ome.git does not appear to be a Git repository.). Look in the Trac log for more information.
Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Task #9477 (closed)

Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

BUG: unittest leads to deadlock

Reported by: cneves Owned by: jamoore
Priority: blocker Milestone: OMERO-4.4.4
Component: OmeroPy Version: n.a.
Keywords: n.a. Cc: python-team@…
Resources: n.a. Referenced By: n.a.
References: n.a. Remaining Time: 0.0d
Sprint: 2012-08-28 (3)

Description

From early this week that develop + https://github.com/cneves/openmicroscopy/tree/postdecorators/7202_unittest_review leads to a deadlock when running python setup.py test -s test.gatewaytest.chgrp.ChgrpTest?.testMultiDatasetDoAll

The lockup always happens on this particular test but the place where it locks up is in https://github.com/openmicroscopy/openmicroscopy/blob/develop/components/tools/OmeroPy/src/omero/clients.py#L827

I haven't yet reduced the particular test to find out what is triggering this, but since it doesn't always happen it feels like some sort of race condition. Having that in mind I added a 10ms sleep that prevents the deadlock completely when placed after https://github.com/openmicroscopy/openmicroscopy/blob/develop/components/tools/OmeroPy/src/omero/clients.py#L856

        try:
            r = 1
            while r > 0:
                count += 1
                r = svc.closeSession(s)
                time.sleep(0.01)
        except omero.RemovedSessionException:
            pass
        except:
            self.__logger.warning("Unknown exception while closing all references", exc_info = True)

        # Now the server-side session is dead, call closeSession()
        self.closeSession()

prevents the call to self.closeSession from locking, where placing that same sleep before the svc.closeSession(s) line or after

    def closeSession(self):
        """
        Closes the Router connection created by createSession(). Due to a bug in Ice,
        only one connection is allowed per communicator, so we also destroy the communicator.
        """

        self.__lock.acquire()

makes little difference. It does seem to make the dead locks become slightly less consistent, but still most calls to that test will end in deadlock.

There is one extra detail in play, the test infrastructure as well as the test proper will create and destroy session and connections, so this code path is exercised considerably. Even so, the deadlock only happens to the extent of my testing on this particular unittest and on the tearDown code path for the test.

I will now attempt to identify where this test is different and triggers the deadlock.

Change History (8)

comment:1 Changed 12 years ago by jmoore

  • Milestone changed from Unscheduled to OMERO-4.4.2

Carlos, feel free to put these in the current milestone.

comment:2 Changed 12 years ago by jmoore

  • Sprint set to 2012-08-28 (3)

comment:3 Changed 12 years ago by cneves

https://github.com/cneves/openmicroscopy/commit/dada5b7b2a51de44616ebeab88c99d7f75c9c71d properly closes a callback, which makes the time.sleep unneeded and prevents this deadlock. While it shouldn't be possible to deadlock the client in this way, it is also required to properly close all servants.

comment:4 Changed 12 years ago by jmoore

Hang takes place at this point:

****************************************************************************************************
Depth:  1
  File "test/gatewaytest/chgrp.py", line 296, in <module>
    unittest.main()
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 817, in __init__
    self.runTests()
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 861, in runTests
    result = testRunner.run(self.test)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 753, in run
    test(result)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 464, in __call__
    return self.run(*args, **kwds)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 460, in run
    test(result)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 464, in __call__
    return self.run(*args, **kwds)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 460, in run
    test(result)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 300, in __call__
    return self.run(*args, **kwds)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 289, in run
    self.tearDown()
  File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/gateway/scripts/testdb_create.py", line 103, in tearDown
    self.gateway.seppuku()
  File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/gateway/__init__.py", line 1484, in seppuku
    self._closeSession()
  File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/gateway/__init__.py", line 1602, in _closeSession
    self.c.killSession()
  File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/clients.py", line 887, in killSession
    self.closeSession()
  File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/clients.py", line 803, in closeSession
    self.__lock.acquire()
  File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/clients.py", line 106, in acquire
    TB()
****************************************************************************************************
Depth:  2
  File "test/gatewaytest/chgrp.py", line 296, in <module>
    unittest.main()
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 817, in __init__
    self.runTests()
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 861, in runTests
    result = testRunner.run(self.test)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 753, in run
    test(result)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 464, in __call__
    return self.run(*args, **kwds)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 460, in run
    test(result)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 464, in __call__
    return self.run(*args, **kwds)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 460, in run
    test(result)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 300, in __call__
    return self.run(*args, **kwds)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 289, in run
    self.tearDown()
  File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/gateway/scripts/testdb_create.py", line 103, in tearDown
    self.gateway.seppuku()
  File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/gateway/__init__.py", line 1484, in seppuku
    self._closeSession()
  File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/gateway/__init__.py", line 1602, in _closeSession
    self.c.killSession()
  File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/clients.py", line 887, in killSession
    self.closeSession()
  File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/clients.py", line 807, in closeSession
    self.stopKeepAlive()
  File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/clients.py", line 626, in stopKeepAlive
    self.__lock.acquire()
  File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/clients.py", line 106, in acquire
    TB()
----------------------------------------------------------------------------------------------------
Releasing...
  File "test/gatewaytest/chgrp.py", line 296, in <module>
    unittest.main()
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 817, in __init__
    self.runTests()
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 861, in runTests
    result = testRunner.run(self.test)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 753, in run
    test(result)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 464, in __call__
    return self.run(*args, **kwds)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 460, in run
    test(result)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 464, in __call__
    return self.run(*args, **kwds)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 460, in run
    test(result)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 300, in __call__
    return self.run(*args, **kwds)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 289, in run
    self.tearDown()
  File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/gateway/scripts/testdb_create.py", line 103, in tearDown
    self.gateway.seppuku()
  File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/gateway/__init__.py", line 1484, in seppuku
    self._closeSession()
  File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/gateway/__init__.py", line 1602, in _closeSession
    self.c.killSession()
  File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/clients.py", line 887, in killSession
    self.closeSession()
  File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/clients.py", line 807, in closeSession
    self.stopKeepAlive()
  File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/clients.py", line 635, in stopKeepAlive
    self.__lock.release()
  File "/Users/moore/git/components/tools/OmeroPy/build/lib/omero/clients.py", line 111, in release
    TB()
****************************************************************************************************

comment:5 Changed 12 years ago by jmoore

Caused more specifically by the use of getSession in DetailsI by the ObjectFactoryRegistrar. I.e. in a background thread if an object was returned, it would block. In order to work around this, I'll add a non-blocking version of getSession().

comment:6 Changed 12 years ago by jmoore

  • Resolution set to fixed
  • Status changed from new to closed

comment:7 Changed 12 years ago by jmoore <josh@…>

  • Remaining Time set to 0

(In [f316e42bfe99545b4385f0129d6edc11e0b76e38/ome.git] on branch develop) Add getSession(False) for non-blocking lookup (Fix #9477)

The call to getSession in the ctor of DetailsI was hanging
since it was within an Ice thread which ic.destroy was
waiting on. Other uses of getSession should not hang on
destroy.

comment:8 Changed 12 years ago by Carlos Neves <carlos@…>

(In [dada5b7b2a51de44616ebeab88c99d7f75c9c71d/ome.git] on branch develop) Preventing potential deadlock by properly closing callback servent (see #9477)

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.86719 sec.)

We're Hiring!