Task #2894 (closed)
Opened 14 years ago
Closed 12 years ago
Bug: 32 simultaneous logins hangs the server
Reported by: | jamoore | Owned by: | jamoore |
---|---|---|---|
Priority: | critical | Milestone: | OMERO-4.4 |
Component: | Performance | Version: | n.a. |
Keywords: | n.a. | Cc: | jburel |
Resources: | n.a. | Referenced By: | n.a. |
References: | n.a. | Remaining Time: | 0.0d |
Sprint: | 2012-01-17 (6) |
Description
threads = [T() for x in range(32)] for t in threads: t.start()
in which each thread creates a session hangs the server.
Change History (4)
comment:1 Changed 14 years ago by jmoore
comment:2 Changed 13 years ago by jmoore
In 4.3, when just creating a session, 100 threads can't stop the server. But even a call as simple as getEventContext can:
#!/usr/bin/env PYTHONPATH=lib/python:/opt/Ice-3.3.1-64/python python import sys import omero import omero.util import threading import logging if len(sys.argv) > 1: NUM = int(sys.argv[1]) else: NUM = 32 omero.util.configure_logging() class T(threading.Thread): def run(self): self.go = True count = 1 while self.go: client = omero.client("localhost", 4064) try: logging.info(count) session = client.createSession("root", "ome") admin = session.getAdminService() admin.getEventContext() finally: count += 1 client.closeSession() threads = [T() for x in range(NUM)] [x.start() for x in threads] try: sys.stdin.readline() finally: for x in threads: x.go = False [x.join() for x in threads]
comment:3 Changed 12 years ago by jmoore
- Component changed from General to Performance
- Milestone changed from Unscheduled to OMERO-Beta4.4
- Owner set to jmoore
- Remaining Time set to 0.5
- Sprint set to 2012-01-17 (6)
- Status changed from new to accepted
Reviewing the state of this in the current code base.
comment:4 Changed 12 years ago by jmoore
- Cc jburel added
- Remaining Time changed from 0.5 to 0
- Resolution set to worksforme
- Status changed from accepted to closed
(See comment below about DatabaseBusyException)
With a very aggressive timeout:
omero.sessions.sync_force=300000 omero.sessions.sync_interval=60000 omero.sessions.timeout=15000
it's possible that the 32 threads can hang the server long-enough (many tens of seconds) for sessions to timeout causing:
2012-01-05 16:30:11,365 INFO [ root] (Thread-7 ) 15 2012-01-05 16:30:11,378 INFO [ root] (Thread-16 ) 14 2012-01-05 16:30:11,384 INFO [ root] (Thread-2 ) 14 2012-01-05 16:30:11,415 INFO [ root] (Thread-30 ) 14 2012-01-05 16:30:11,416 INFO [ root] (Thread-15 ) 14 2012-01-05 16:30:11,420 INFO [ root] (Thread-17 ) 15 lib/python/omero_ServerErrors_ice.py:115: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 self.message = message lib/python/omero_api_IAdmin_ice.py:798: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 return _M_omero.api.IAdmin._op_getEventContext.invoke(self, ((), _ctx)) lib/python/omero_ServerErrors_ice.py:186: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 return IcePy.stringifyException(self) Exception in thread Thread-1: Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 522, in __bootstrap_inner self.run() File "./t.py", line 29, in run admin.getEventContext() File "/Users/moore/GlencoeSoftware.git/git/omero.git/dist/lib/python/omero_api_IAdmin_ice.py", line 798, in getEventContext return _M_omero.api.IAdmin._op_getEventContext.invoke(self, ((), _ctx)) SessionTimeoutException: exception ::omero::SessionTimeoutException { serverStackTrace = ome.conditions.SessionTimeoutException: Session (started=2012-01-05 16:30:11.226, hits=3, last access=2012-01-05 16:30:11.347) exceeded timeToIdle (15000) by 93006 ms at ome.services.sessions.state.SessionCache.getDataNullOrThrowOnTimeout(SessionCache.java:422) at ome.services.sessions.state.SessionCache.getSessionContext(SessionCache.java:351) at ome.services.sessions.SessionManagerImpl.getEventContext(SessionManagerImpl.java:679) at ome.security.basic.BasicSecuritySystem.loadEventContext(BasicSecuritySystem.java:333) at ome.security.basic.EventHandler.invoke(EventHandler.java:118) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at org.springframework.orm.hibernate3.HibernateInterceptor.invoke(HibernateInterceptor.java:111) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:108) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at ome.tools.hibernate.ProxyCleanupFilter$Interceptor.invoke(ProxyCleanupFilter.java:231) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at ome.services.util.ServiceHandler.invoke(ServiceHandler.java:116) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202) at $Proxy70.getEventContext(Unknown Source) at sun.reflect.GeneratedMethodAccessor537.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) at ome.security.basic.BasicSecurityWiring.invoke(BasicSecurityWiring.java:95) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at ome.services.blitz.fire.AopContextInitializer.invoke(AopContextInitializer.java:43) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202) at $Proxy70.getEventContext(Unknown Source) at sun.reflect.GeneratedMethodAccessor555.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at ome.services.blitz.util.IceMethodInvoker.invoke(IceMethodInvoker.java:179) at ome.services.throttling.Callback.run(Callback.java:56) at ome.services.throttling.InThreadThrottlingStrategy.callInvokerOnRawArgs(InThreadThrottlingStrategy.java:56) at ome.services.blitz.impl.AbstractAmdServant.callInvokerOnRawArgs(AbstractAmdServant.java:136) at ome.services.blitz.impl.AdminI.getEventContext_async(AdminI.java:217) at omero.api._IAdminTie.getEventContext_async(_IAdminTie.java:213) at omero.api._IAdminDisp.___getEventContext(_IAdminDisp.java:1425) at omero.api._IAdminDisp.__dispatch(_IAdminDisp.java:1581) at IceInternal.Incoming.invoke(Incoming.java:159) at Ice.ConnectionI.invokeAll(ConnectionI.java:2357) at Ice.ConnectionI.dispatch(ConnectionI.java:1208) at Ice.ConnectionI.message(ConnectionI.java:1163) at IceInternal.ThreadPool.run(ThreadPool.java:302) at IceInternal.ThreadPool.access$300(ThreadPool.java:12) at IceInternal.ThreadPool$EventHandlerThread.run(ThreadPool.java:643) at java.lang.Thread.run(Thread.java:680) serverExceptionClass = ome.conditions.SessionTimeoutException message = Session (started=2012-01-05 16:30:11.226, hits=3, last access=2012-01-05 16:30:11.347) exceeded timeToIdle (15000) by 93006 ms }
However, once the bottleneck is passed, the threads keep on going:
... 2012-01-05 16:34:33,621 INFO [ root] (Thread-30 ) 712 2012-01-05 16:34:33,646 INFO [ root] (Thread-16 ) 712 2012-01-05 16:34:33,667 INFO [ root] (Thread-7 ) 713 2012-01-05 16:34:33,752 INFO [ root] (Thread-2 ) 714 2012-01-05 16:34:33,771 INFO [ root] (Thread-15 ) 710 2012-01-05 16:34:33,841 INFO [ root] (Thread-17 ) 715 2012-01-05 16:34:33,844 INFO [ root] (Thread-30 ) 713 2012-01-05 16:34:33,848 INFO [ root] (Thread-16 ) 713 2012-01-05 16:34:33,898 INFO [ root] (Thread-7 ) 714 2012-01-05 16:34:33,958 INFO [ root] (Thread-2 ) 715 2012-01-05 16:34:33,990 INFO [ root] (Thread-15 ) 711 ...
With the standard timeouts, as soon as Session.reload takes place, the server can hang until the db connection acquisition timeouts:
Exception in thread Thread-30: Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 522, in __bootstrap_inner self.run() File "./t.py", line 35, in run admin.getEventContext() File "/Users/moore/GlencoeSoftware.git/git/omero.git/dist/lib/python/omero_api_IAdmin_ice.py", line 798, in getEventContext return _M_omero.api.IAdmin._op_getEventContext.invoke(self, ((), _ctx)) DatabaseBusyException: exception ::omero::DatabaseBusyException { serverStackTrace = ome.conditions.DatabaseBusyException: Cannot acquire connection at ome.services.db.SelfCorrectingDataSource.callWithRetries(SelfCorrectingDataSource.java:91) at ome.services.db.SelfCorrectingDataSource.getConnection(SelfCorrectingDataSource.java:59) at org.springframework.jdbc.datasource.DataSourceTransactionManager.doBegin(DataSourceTransactionManager.java:203) at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:371) at org.springframework.transaction.interceptor.TransactionAspectSupport.createTransactionIfNecessary(TransactionAspectSupport.java:354) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:103) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at ome.tools.hibernate.ProxyCleanupFilter$Interceptor.invoke(ProxyCleanupFilter.java:231) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at ome.services.util.ServiceHandler.invoke(ServiceHandler.java:116) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202) at $Proxy75.getEventContext(Unknown Source) at sun.reflect.GeneratedMethodAccessor287.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) at ome.security.basic.BasicSecurityWiring.invoke(BasicSecurityWiring.java:95) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at ome.services.blitz.fire.AopContextInitializer.invoke(AopContextInitializer.java:43) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202) at $Proxy75.getEventContext(Unknown Source) at sun.reflect.GeneratedMethodAccessor289.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at ome.services.blitz.util.IceMethodInvoker.invoke(IceMethodInvoker.java:179) at ome.services.throttling.Callback.run(Callback.java:56) at ome.services.throttling.InThreadThrottlingStrategy.callInvokerOnRawArgs(InThreadThrottlingStrategy.java:56) at ome.services.blitz.impl.AbstractAmdServant.callInvokerOnRawArgs(AbstractAmdServant.java:136) at ome.services.blitz.impl.AdminI.getEventContext_async(AdminI.java:217) at omero.api._IAdminTie.getEventContext_async(_IAdminTie.java:213) at omero.api._IAdminDisp.___getEventContext(_IAdminDisp.java:1425) at omero.api._IAdminDisp.__dispatch(_IAdminDisp.java:1581) at IceInternal.Incoming.invoke(Incoming.java:159) at Ice.ConnectionI.invokeAll(ConnectionI.java:2357) at Ice.ConnectionI.dispatch(ConnectionI.java:1208) at Ice.ConnectionI.message(ConnectionI.java:1163) at IceInternal.ThreadPool.run(ThreadPool.java:302) at IceInternal.ThreadPool.access$300(ThreadPool.java:12) at IceInternal.ThreadPool$EventHandlerThread.run(ThreadPool.java:643) at java.lang.Thread.run(Thread.java:680) serverExceptionClass = ome.conditions.DatabaseBusyException message = Cannot acquire connection backOff = 0 } 21 <T(Thread-3, stopped 4324855808)> 2 <T(Thread-4, stopped 4325392384)> Exception in thread Thread-18: Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 522, in __bootstrap_inner self.run() File "./t.py", line 38, in run client.closeSession() File "/Users/moore/GlencoeSoftware.git/git/omero.git/dist/lib/python/omero/clients.py", line 748, in closeSession self.getRouter(oldIc).destroySession() File "/usr/local/Cellar/zeroc-ice34/3.4.2/python/Glacier2_Router_ice.py", line 441, in destroySession return _M_Glacier2.Router._op_destroySession.invoke(self, ((), _ctx)) SessionNotExistException: exception ::Glacier2::SessionNotExistException { } 21 <T(Thread-5, stopped 4325928960)> 21 <T(Thread-6, stopped 4326465536)> 22 <T(Thread-7, stopped 4327002112)> 2 <T(Thread-8, stopped 4327538688)> 21 <T(Thread-9, stopped 4328075264)> 2 <T(Thread-10, stopped 4328611840)> 21 <T(Thread-11, stopped 4329148416)> 21 <T(Thread-12, stopped 4329684992)> 2 <T(Thread-13, stopped 4330221568)> 21 <T(Thread-14, stopped 4330758144)> 21 <T(Thread-15, stopped 4331294720)> 21 <T(Thread-16, stopped 4331831296)> 1 <T(Thread-17, stopped 4332367872)> 22 <T(Thread-18, stopped 4332904448)> 2 <T(Thread-19, stopped 4333441024)> 2 <T(Thread-20, stopped 4333977600)> 20 <T(Thread-21, stopped 4334514176)> 22 <T(Thread-22, stopped 4336390144)> 2 <T(Thread-23, stopped 4346875904)> 21 <T(Thread-24, stopped 4347412480)> 2 <T(Thread-25, stopped 4347949056)> 21 <T(Thread-26, stopped 4348485632)> 21 <T(Thread-27, stopped 4349022208)> 21 <T(Thread-28, stopped 4349558784)> 21 <T(Thread-29, stopped 4350095360)> 22 <T(Thread-30, stopped 4350631936)> 2 <T(Thread-31, stopped 4351168512)> 21 <T(Thread-32, stopped 4351705088)>
As long as clients properly handle the DatabaseBusyException and backoff, then there doesn't seem to be any looming issue, other than a temporary hang which is based on the number of available PG connections. The server doesn't hang indefinitely.
Happens for other actions as well, for example creating data as in #2893.