Warning: Can't synchronize with repository "(default)" (/home/git/ome.git does not appear to be a Git repository.). Look in the Trac log for more information.
Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Task #2894 (closed)

Opened 14 years ago

Closed 12 years ago

Bug: 32 simultaneous logins hangs the server

Reported by: jamoore Owned by: jamoore
Priority: critical Milestone: OMERO-4.4
Component: Performance Version: n.a.
Keywords: n.a. Cc: jburel
Resources: n.a. Referenced By: n.a.
References: n.a. Remaining Time: 0.0d
Sprint: 2012-01-17 (6)

Description

threads = [T() for x in range(32)]
for t in threads: t.start()

in which each thread creates a session hangs the server.

Change History (4)

comment:1 Changed 14 years ago by jmoore

Happens for other actions as well, for example creating data as in #2893.

comment:2 Changed 13 years ago by jmoore

In 4.3, when just creating a session, 100 threads can't stop the server. But even a call as simple as getEventContext can:

#!/usr/bin/env PYTHONPATH=lib/python:/opt/Ice-3.3.1-64/python python

import sys
import omero
import omero.util
import threading
import logging


if len(sys.argv) > 1:
    NUM = int(sys.argv[1])
else:
    NUM = 32

omero.util.configure_logging()

class T(threading.Thread):
    def run(self):
        self.go = True
        count = 1
        while self.go:
            client = omero.client("localhost", 4064)
            try:
                logging.info(count)
                session = client.createSession("root", "ome")
                admin = session.getAdminService()
                admin.getEventContext()
            finally:
                count += 1
                client.closeSession()

threads = [T() for x in range(NUM)]
[x.start() for x in threads]

try:
    sys.stdin.readline()
finally:
    for x in threads:
        x.go = False
    [x.join() for x in threads]

comment:3 Changed 12 years ago by jmoore

  • Component changed from General to Performance
  • Milestone changed from Unscheduled to OMERO-Beta4.4
  • Owner set to jmoore
  • Remaining Time set to 0.5
  • Sprint set to 2012-01-17 (6)
  • Status changed from new to accepted

Reviewing the state of this in the current code base.

comment:4 Changed 12 years ago by jmoore

  • Cc jburel added
  • Remaining Time changed from 0.5 to 0
  • Resolution set to worksforme
  • Status changed from accepted to closed

(See comment below about DatabaseBusyException)

With a very aggressive timeout:

omero.sessions.sync_force=300000
omero.sessions.sync_interval=60000
omero.sessions.timeout=15000

it's possible that the 32 threads can hang the server long-enough (many tens of seconds) for sessions to timeout causing:

2012-01-05 16:30:11,365 INFO  [                                    root] (Thread-7  ) 15
2012-01-05 16:30:11,378 INFO  [                                    root] (Thread-16 ) 14
2012-01-05 16:30:11,384 INFO  [                                    root] (Thread-2  ) 14
2012-01-05 16:30:11,415 INFO  [                                    root] (Thread-30 ) 14
2012-01-05 16:30:11,416 INFO  [                                    root] (Thread-15 ) 14
2012-01-05 16:30:11,420 INFO  [                                    root] (Thread-17 ) 15
lib/python/omero_ServerErrors_ice.py:115: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
  self.message = message
lib/python/omero_api_IAdmin_ice.py:798: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
  return _M_omero.api.IAdmin._op_getEventContext.invoke(self, ((), _ctx))
lib/python/omero_ServerErrors_ice.py:186: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
  return IcePy.stringifyException(self)
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 522, in __bootstrap_inner
    self.run()
  File "./t.py", line 29, in run
    admin.getEventContext()
  File "/Users/moore/GlencoeSoftware.git/git/omero.git/dist/lib/python/omero_api_IAdmin_ice.py", line 798, in getEventContext
    return _M_omero.api.IAdmin._op_getEventContext.invoke(self, ((), _ctx))
SessionTimeoutException: exception ::omero::SessionTimeoutException
{
    serverStackTrace = ome.conditions.SessionTimeoutException: Session (started=2012-01-05 16:30:11.226, hits=3, last access=2012-01-05 16:30:11.347) exceeded timeToIdle (15000) by 93006 ms
	at ome.services.sessions.state.SessionCache.getDataNullOrThrowOnTimeout(SessionCache.java:422)
	at ome.services.sessions.state.SessionCache.getSessionContext(SessionCache.java:351)
	at ome.services.sessions.SessionManagerImpl.getEventContext(SessionManagerImpl.java:679)
	at ome.security.basic.BasicSecuritySystem.loadEventContext(BasicSecuritySystem.java:333)
	at ome.security.basic.EventHandler.invoke(EventHandler.java:118)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
	at org.springframework.orm.hibernate3.HibernateInterceptor.invoke(HibernateInterceptor.java:111)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
	at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:108)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
	at ome.tools.hibernate.ProxyCleanupFilter$Interceptor.invoke(ProxyCleanupFilter.java:231)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
	at ome.services.util.ServiceHandler.invoke(ServiceHandler.java:116)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
	at $Proxy70.getEventContext(Unknown Source)
	at sun.reflect.GeneratedMethodAccessor537.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
	at ome.security.basic.BasicSecurityWiring.invoke(BasicSecurityWiring.java:95)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
	at ome.services.blitz.fire.AopContextInitializer.invoke(AopContextInitializer.java:43)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
	at $Proxy70.getEventContext(Unknown Source)
	at sun.reflect.GeneratedMethodAccessor555.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at ome.services.blitz.util.IceMethodInvoker.invoke(IceMethodInvoker.java:179)
	at ome.services.throttling.Callback.run(Callback.java:56)
	at ome.services.throttling.InThreadThrottlingStrategy.callInvokerOnRawArgs(InThreadThrottlingStrategy.java:56)
	at ome.services.blitz.impl.AbstractAmdServant.callInvokerOnRawArgs(AbstractAmdServant.java:136)
	at ome.services.blitz.impl.AdminI.getEventContext_async(AdminI.java:217)
	at omero.api._IAdminTie.getEventContext_async(_IAdminTie.java:213)
	at omero.api._IAdminDisp.___getEventContext(_IAdminDisp.java:1425)
	at omero.api._IAdminDisp.__dispatch(_IAdminDisp.java:1581)
	at IceInternal.Incoming.invoke(Incoming.java:159)
	at Ice.ConnectionI.invokeAll(ConnectionI.java:2357)
	at Ice.ConnectionI.dispatch(ConnectionI.java:1208)
	at Ice.ConnectionI.message(ConnectionI.java:1163)
	at IceInternal.ThreadPool.run(ThreadPool.java:302)
	at IceInternal.ThreadPool.access$300(ThreadPool.java:12)
	at IceInternal.ThreadPool$EventHandlerThread.run(ThreadPool.java:643)
	at java.lang.Thread.run(Thread.java:680)

    serverExceptionClass = ome.conditions.SessionTimeoutException
    message = Session (started=2012-01-05 16:30:11.226, hits=3, last access=2012-01-05 16:30:11.347) exceeded timeToIdle (15000) by 93006 ms
}

However, once the bottleneck is passed, the threads keep on going:

...
2012-01-05 16:34:33,621 INFO  [                                    root] (Thread-30 ) 712
2012-01-05 16:34:33,646 INFO  [                                    root] (Thread-16 ) 712
2012-01-05 16:34:33,667 INFO  [                                    root] (Thread-7  ) 713
2012-01-05 16:34:33,752 INFO  [                                    root] (Thread-2  ) 714
2012-01-05 16:34:33,771 INFO  [                                    root] (Thread-15 ) 710
2012-01-05 16:34:33,841 INFO  [                                    root] (Thread-17 ) 715
2012-01-05 16:34:33,844 INFO  [                                    root] (Thread-30 ) 713
2012-01-05 16:34:33,848 INFO  [                                    root] (Thread-16 ) 713
2012-01-05 16:34:33,898 INFO  [                                    root] (Thread-7  ) 714
2012-01-05 16:34:33,958 INFO  [                                    root] (Thread-2  ) 715
2012-01-05 16:34:33,990 INFO  [                                    root] (Thread-15 ) 711
...

With the standard timeouts, as soon as Session.reload takes place, the server can hang until the db connection acquisition timeouts:

Exception in thread Thread-30:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 522, in __bootstrap_inner
    self.run()
  File "./t.py", line 35, in run
    admin.getEventContext()
  File "/Users/moore/GlencoeSoftware.git/git/omero.git/dist/lib/python/omero_api_IAdmin_ice.py", line 798, in getEventContext
    return _M_omero.api.IAdmin._op_getEventContext.invoke(self, ((), _ctx))
DatabaseBusyException: exception ::omero::DatabaseBusyException
{
    serverStackTrace = ome.conditions.DatabaseBusyException: Cannot acquire connection
	at ome.services.db.SelfCorrectingDataSource.callWithRetries(SelfCorrectingDataSource.java:91)
	at ome.services.db.SelfCorrectingDataSource.getConnection(SelfCorrectingDataSource.java:59)
	at org.springframework.jdbc.datasource.DataSourceTransactionManager.doBegin(DataSourceTransactionManager.java:203)
	at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:371)
	at org.springframework.transaction.interceptor.TransactionAspectSupport.createTransactionIfNecessary(TransactionAspectSupport.java:354)
	at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:103)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
	at ome.tools.hibernate.ProxyCleanupFilter$Interceptor.invoke(ProxyCleanupFilter.java:231)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
	at ome.services.util.ServiceHandler.invoke(ServiceHandler.java:116)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
	at $Proxy75.getEventContext(Unknown Source)
	at sun.reflect.GeneratedMethodAccessor287.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
	at ome.security.basic.BasicSecurityWiring.invoke(BasicSecurityWiring.java:95)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
	at ome.services.blitz.fire.AopContextInitializer.invoke(AopContextInitializer.java:43)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
	at $Proxy75.getEventContext(Unknown Source)
	at sun.reflect.GeneratedMethodAccessor289.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at ome.services.blitz.util.IceMethodInvoker.invoke(IceMethodInvoker.java:179)
	at ome.services.throttling.Callback.run(Callback.java:56)
	at ome.services.throttling.InThreadThrottlingStrategy.callInvokerOnRawArgs(InThreadThrottlingStrategy.java:56)
	at ome.services.blitz.impl.AbstractAmdServant.callInvokerOnRawArgs(AbstractAmdServant.java:136)
	at ome.services.blitz.impl.AdminI.getEventContext_async(AdminI.java:217)
	at omero.api._IAdminTie.getEventContext_async(_IAdminTie.java:213)
	at omero.api._IAdminDisp.___getEventContext(_IAdminDisp.java:1425)
	at omero.api._IAdminDisp.__dispatch(_IAdminDisp.java:1581)
	at IceInternal.Incoming.invoke(Incoming.java:159)
	at Ice.ConnectionI.invokeAll(ConnectionI.java:2357)
	at Ice.ConnectionI.dispatch(ConnectionI.java:1208)
	at Ice.ConnectionI.message(ConnectionI.java:1163)
	at IceInternal.ThreadPool.run(ThreadPool.java:302)
	at IceInternal.ThreadPool.access$300(ThreadPool.java:12)
	at IceInternal.ThreadPool$EventHandlerThread.run(ThreadPool.java:643)
	at java.lang.Thread.run(Thread.java:680)

    serverExceptionClass = ome.conditions.DatabaseBusyException
    message = Cannot acquire connection
    backOff = 0
}

21 <T(Thread-3, stopped 4324855808)>
2 <T(Thread-4, stopped 4325392384)>
Exception in thread Thread-18:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 522, in __bootstrap_inner
    self.run()
  File "./t.py", line 38, in run
    client.closeSession()
  File "/Users/moore/GlencoeSoftware.git/git/omero.git/dist/lib/python/omero/clients.py", line 748, in closeSession
    self.getRouter(oldIc).destroySession()
  File "/usr/local/Cellar/zeroc-ice34/3.4.2/python/Glacier2_Router_ice.py", line 441, in destroySession
    return _M_Glacier2.Router._op_destroySession.invoke(self, ((), _ctx))
SessionNotExistException: exception ::Glacier2::SessionNotExistException
{
}

21 <T(Thread-5, stopped 4325928960)>
21 <T(Thread-6, stopped 4326465536)>
22 <T(Thread-7, stopped 4327002112)>
2 <T(Thread-8, stopped 4327538688)>
21 <T(Thread-9, stopped 4328075264)>
2 <T(Thread-10, stopped 4328611840)>
21 <T(Thread-11, stopped 4329148416)>
21 <T(Thread-12, stopped 4329684992)>
2 <T(Thread-13, stopped 4330221568)>
21 <T(Thread-14, stopped 4330758144)>
21 <T(Thread-15, stopped 4331294720)>
21 <T(Thread-16, stopped 4331831296)>
1 <T(Thread-17, stopped 4332367872)>
22 <T(Thread-18, stopped 4332904448)>
2 <T(Thread-19, stopped 4333441024)>
2 <T(Thread-20, stopped 4333977600)>
20 <T(Thread-21, stopped 4334514176)>
22 <T(Thread-22, stopped 4336390144)>
2 <T(Thread-23, stopped 4346875904)>
21 <T(Thread-24, stopped 4347412480)>
2 <T(Thread-25, stopped 4347949056)>
21 <T(Thread-26, stopped 4348485632)>
21 <T(Thread-27, stopped 4349022208)>
21 <T(Thread-28, stopped 4349558784)>
21 <T(Thread-29, stopped 4350095360)>
22 <T(Thread-30, stopped 4350631936)>
2 <T(Thread-31, stopped 4351168512)>
21 <T(Thread-32, stopped 4351705088)>

As long as clients properly handle the DatabaseBusyException and backoff, then there doesn't seem to be any looming issue, other than a temporary hang which is based on the number of available PG connections. The server doesn't hang indefinitely.

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.67992 sec.)

We're Hiring!