Bug #906 (closed)
Opened 11 years ago
Closed 11 years ago
Server deadlock on empty connection pool
| Reported by: | jburel | Owned by: | jamoore |
|---|---|---|---|
| Priority: | blocker | Cc: | cxallan, atarkowska |
| Sprint: | n.a. | ||
| Total Remaining Time: | n.a. |
Description (last modified by jmoore)
When too many methods are executing simultaneously, the connection pool can lock up, causing the server to freeze and all methods to fail. kill -9 is required to stop the server.
One symptom of the freeze is that the FullTextThread blocks while indexing, causing other FullTextThreads to print:
INFO [FullTextThread] Currently running; skipping
Attachments (2)
Change History (31)
comment:1 Changed 11 years ago by jburel
- Description modified (diff)
comment:2 Changed 11 years ago by jburel
- Cc jason removed
comment:3 Changed 11 years ago by jburel
- Description modified (diff)
comment:4 Changed 11 years ago by jmoore
- Cc calln added
- Component changed from General to Services
- Owner changed from jmoore, callan to jmoore
- Status changed from new to assigned
comment:5 Changed 11 years ago by jmoore
- Cc callan added; calln removed
Is there any chance at all of getting the log that led up to this? Apparently there's a race condition of sorts in which the full text thread continually tries to get into a protected piece of code.
comment:6 Changed 11 years ago by jmoore
Specifically, it looks like a single thread has gotten stuck and therefore is holding the lock on DetailsFieldBridge. Since this happens, you should get lots of "Inactive; skipping" messages after trying to shutdown rather than "Currently running; skipping" messages.
How did you try to shutdown? The only other possibility is that Spring failed to call "stop()" on the FullTextThread.
comment:7 Changed 11 years ago by jburel
INFO [FullTextThread] Currently running; skipping
At that point, I have not tried to shut down the server.
The same message is then displayed over and over
After a while, I tried a Ctrl C to stop server, but it didn't work
comment:8 Changed 11 years ago by jmoore
Understood. Skipping before shutting down will mean that the full-text thread is taking a VERY long time, usually takes .1-5 seconds max for a cycle. Did you upload a very large text file? Also could you see if that JBoss log is still available, no matter how large it is?
comment:9 Changed 11 years ago by jmoore
comment:10 Changed 11 years ago by cxallan
- Summary changed from server Freeze to Server Deadlock While Indexing
- Version 3.0-M3 deleted
Deadlock data follows...
Server revision:
callan@mage ~/OMERO3 $ svn info Path: . URL: svn+ssh://cvs.openmicroscopy.org.uk/home/svn/omero/trunk Repository Root: svn+ssh://cvs.openmicroscopy.org.uk/home/svn/omero Repository UUID: 05709c45-44f0-0310-885b-81a1db45b4a6 Revision: 2365 Node Kind: directory Schedule: normal Last Changed Author: jburel Last Changed Rev: 2365 Last Changed Date: 2008-05-11 17:18:47 +0100 (Sun, 11 May 2008)
PostgreSQL lock data:
omero3=# select * from pg_stat_activity ;
datid | datname | procpid | usesysid | usename | current_query | query_start | backend_start | client_addr | client_port
---------+---------+---------+----------+---------+---------------+-------------------------------+-------------------------------+-------------+-------------
1018661 | omero3 | 32004 | 16384 | callan | <IDLE> | 2008-05-12 12:10:28.320578+01 | 2008-05-12 11:54:28.454397+01 | 127.0.0.1 | 60658
1018661 | omero3 | 32003 | 16384 | callan | <IDLE> | 2008-05-12 12:10:28.224922+01 | 2008-05-12 11:54:28.455642+01 | 127.0.0.1 | 45924
1018661 | omero3 | 32002 | 16384 | callan | <IDLE> | 2008-05-12 12:10:28.080963+01 | 2008-05-12 11:54:28.457481+01 | 127.0.0.1 | 40919
1018661 | omero3 | 24261 | 16384 | callan | <IDLE> | 2008-05-07 15:09:47.376005+01 | 2008-04-23 21:24:40.008894+01 | | -1
1018661 | omero3 | 32033 | 16384 | callan | <IDLE> | 2008-05-12 12:10:28.155431+01 | 2008-05-12 11:56:30.042641+01 | 127.0.0.1 | 49437
1018661 | omero3 | 32035 | 16384 | callan | <IDLE> | 2008-05-12 12:10:28.085952+01 | 2008-05-12 11:56:30.042656+01 | 127.0.0.1 | 59075
1018661 | omero3 | 32034 | 16384 | callan | <IDLE> | 2008-05-12 12:10:28.25795+01 | 2008-05-12 11:56:30.047677+01 | 127.0.0.1 | 39904
1018661 | omero3 | 32044 | 16384 | callan | <IDLE> | 2008-05-12 12:10:28.180709+01 | 2008-05-12 11:57:30.471054+01 | 127.0.0.1 | 38096
1018661 | omero3 | 32045 | 16384 | callan | <IDLE> | 2008-05-12 12:10:28.321916+01 | 2008-05-12 11:57:30.471104+01 | 127.0.0.1 | 58917
1018661 | omero3 | 32046 | 16384 | callan | <IDLE> | 2008-05-12 12:10:28.313033+01 | 2008-05-12 11:57:30.495012+01 | 127.0.0.1 | 45406
1018661 | omero3 | 32128 | 16384 | callan | <IDLE> | 2008-05-12 12:10:28.29668+01 | 2008-05-12 12:10:28.061232+01 | 127.0.0.1 | 40196
1018661 | omero3 | 32129 | 16384 | callan | <IDLE> | 2008-05-12 12:10:28.35524+01 | 2008-05-12 12:10:28.063689+01 | 127.0.0.1 | 37537
1018661 | omero3 | 32127 | 16384 | callan | <IDLE> | 2008-05-12 12:10:28.071263+01 | 2008-05-12 12:10:28.070155+01 | 127.0.0.1 | 48789
1018661 | omero3 | 32131 | 16384 | callan | <IDLE> | 2008-05-12 12:10:28.432773+01 | 2008-05-12 12:10:28.084959+01 | 127.0.0.1 | 49964
1018661 | omero3 | 32132 | 16384 | callan | <IDLE> | 2008-05-12 12:10:28.379633+01 | 2008-05-12 12:10:28.081523+01 | 127.0.0.1 | 38263
1018661 | omero3 | 32134 | 16384 | callan | <IDLE> | 2008-05-12 12:10:28.254855+01 | 2008-05-12 12:10:28.105267+01 | 127.0.0.1 | 55950
1018661 | omero3 | 32193 | 16384 | callan | <IDLE> | 2008-05-12 12:23:40.912671+01 | 2008-05-12 12:23:40.912608+01 | | -1
(17 rows)
omero3=# -- http://www.postgresql.org/docs/7.4/interactive/view-pg-locks.html
omero3=# SELECT pg_class.relname AS table,
omero3-# pg_database.datname AS database,
omero3-# transaction, pid, mode, granted
omero3-# FROM pg_locks, pg_class, pg_database
omero3-# WHERE pg_locks.relation = pg_class.oid
omero3-# AND pg_locks.database = pg_database.oid;
table | database | transaction | pid | mode | granted
--------------------+----------+-------------+-------+-----------------+---------
pg_class | omero3 | 11882818 | 32195 | AccessShareLock | t
pg_class_oid_index | omero3 | 11882818 | 32195 | AccessShareLock | t
pg_locks | omero3 | 11882818 | 32195 | AccessShareLock | t
(3 rows)
omero3=#
omero3=# -- Christian Storm <storm AT iparadigms.com>
omero3=# select pg_stat_activity.datname,
omero3-# pg_class.relname,
omero3-# pg_locks.transaction,
omero3-# pg_locks.mode,
omero3-# pg_locks.granted,
omero3-# pg_stat_activity.usename,
omero3-# substr(pg_stat_activity.current_query,1,30),
omero3-# pg_stat_activity.query_start,
omero3-# age(now(),
omero3(# pg_stat_activity.query_start) as "age",
omero3-# pg_stat_activity.procpid
omero3-# from pg_stat_activity,pg_locks left outer join pg_class on (pg_locks.relation = pg_class.oid)
omero3-# where pg_locks.pid=pg_stat_activity.procpid
omero3-# order by query_start desc;
datname | relname | transaction | mode | granted | usename | substr | query_start | age | procpid
---------+------------------+-------------+-----------------+---------+---------+--------+-------------------------------+-----------------+---------
omero3 | pg_authid | 11882819 | AccessShareLock | t | callan | <IDLE> | 2008-05-12 12:24:06.354396+01 | 00:00:20.647701 | 32195
omero3 | pg_stat_activity | 11882819 | AccessShareLock | t | callan | <IDLE> | 2008-05-12 12:24:06.354396+01 | 00:00:20.647701 | 32195
omero3 | | 11882819 | ExclusiveLock | t | callan | <IDLE> | 2008-05-12 12:24:06.354396+01 | 00:00:20.647701 | 32195
omero3 | pg_database | 11882819 | AccessShareLock | t | callan | <IDLE> | 2008-05-12 12:24:06.354396+01 | 00:00:20.647701 | 32195
omero3 | pg_locks | 11882819 | AccessShareLock | t | callan | <IDLE> | 2008-05-12 12:24:06.354396+01 | 00:00:20.647701 | 32195
omero3 | pg_class | 11882819 | AccessShareLock | t | callan | <IDLE> | 2008-05-12 12:24:06.354396+01 | 00:00:20.647701 | 32195
(6 rows)
comment:11 Changed 11 years ago by cxallan
Further data from an earlier deadlock with the same server version:
omero3=# select * from pg_stat_activity ;
datid | datname | procpid | usesysid | usename | current_query | query_start | backend_start | client_addr | client_port
---------+---------+---------+----------+---------+------------------------------+-------------------------------+-------------------------------+-------------+-------------
1018661 | omero3 | 8911 | 16384 | callan | <IDLE> | 2008-04-28 10:35:34.754785+01 | 2008-03-21 16:42:21.116586+00 | 127.0.0.1 | 59277
1018661 | omero3 | 8912 | 16384 | callan | <command string not enabled> | | 2008-03-21 16:42:21.116618+00 | 127.0.0.1 | 37319
1018661 | omero3 | 8913 | 16384 | callan | <IDLE> | 2008-05-07 12:22:27.041159+01 | 2008-03-21 16:42:21.116623+00 | 127.0.0.1 | 39331
1018661 | omero3 | 24261 | 16384 | callan | <IDLE> | 2008-05-07 15:09:47.376005+01 | 2008-04-23 21:24:40.008894+01 | | -1
1018661 | omero3 | 31386 | 16384 | callan | <IDLE> | 2008-05-12 11:46:44.681304+01 | 2008-05-12 08:37:27.119246+01 | 127.0.0.1 | 49153
1018661 | omero3 | 31387 | 16384 | callan | <IDLE> | 2008-05-12 11:46:44.831645+01 | 2008-05-12 08:37:27.125101+01 | 127.0.0.1 | 32792
1018661 | omero3 | 8987 | 16384 | callan | <command string not enabled> | | 2008-03-21 16:43:00.929327+00 | 127.0.0.1 | 52465
1018661 | omero3 | 8988 | 16384 | callan | <IDLE> | 2008-05-12 11:48:00.034243+01 | 2008-03-21 16:43:01.091478+00 | 127.0.0.1 | 58609
1018661 | omero3 | 8989 | 16384 | callan | <IDLE> | 2008-05-12 09:09:30.038081+01 | 2008-03-21 16:43:00.933228+00 | 127.0.0.1 | 34683
1018661 | omero3 | 31388 | 16384 | callan | <IDLE> | 2008-05-12 11:46:44.797632+01 | 2008-05-12 08:37:27.127697+01 | 127.0.0.1 | 38993
1018661 | omero3 | 31593 | 16384 | callan | <IDLE> | 2008-05-12 11:46:44.816352+01 | 2008-05-12 10:29:15.747345+01 | 127.0.0.1 | 37697
1018661 | omero3 | 31595 | 16384 | callan | <IDLE> | 2008-05-12 11:46:44.694687+01 | 2008-05-12 10:29:15.752309+01 | 127.0.0.1 | 54951
1018661 | omero3 | 31594 | 16384 | callan | <IDLE> | 2008-05-12 11:46:44.706963+01 | 2008-05-12 10:29:15.752355+01 | 127.0.0.1 | 60211
1018661 | omero3 | 28389 | 16384 | callan | <IDLE> | 2008-04-28 10:30:37.091669+01 | 2008-03-31 16:45:16.877403+01 | 127.0.0.1 | 35712
1018661 | omero3 | 28391 | 16384 | callan | <IDLE> | 2008-05-07 14:10:46.744597+01 | 2008-03-31 16:45:16.879851+01 | 127.0.0.1 | 60063
1018661 | omero3 | 28392 | 16384 | callan | <command string not enabled> | | 2008-03-31 16:45:16.883708+01 | 127.0.0.1 | 43758
1018661 | omero3 | 31685 | 16384 | callan | <IDLE> | 2008-05-12 11:46:44.664839+01 | 2008-05-12 11:06:21.466437+01 | 127.0.0.1 | 41580
1018661 | omero3 | 31686 | 16384 | callan | <IDLE> | 2008-05-12 11:46:44.622948+01 | 2008-05-12 11:06:21.466419+01 | 127.0.0.1 | 37106
1018661 | omero3 | 31687 | 16384 | callan | <IDLE> | 2008-05-12 11:46:44.536756+01 | 2008-05-12 11:06:21.470411+01 | 127.0.0.1 | 45767
1018661 | omero3 | 31785 | 16384 | callan | <IDLE> | 2008-05-12 11:46:44.518946+01 | 2008-05-12 11:46:44.513495+01 | 127.0.0.1 | 37486
1018661 | omero3 | 31786 | 16384 | callan | <IDLE> | 2008-05-12 11:46:44.777663+01 | 2008-05-12 11:46:44.513926+01 | 127.0.0.1 | 44982
1018661 | omero3 | 31787 | 16384 | callan | <IDLE> | 2008-05-12 11:46:44.869679+01 | 2008-05-12 11:46:44.513941+01 | 127.0.0.1 | 42208
1018661 | omero3 | 31796 | 16384 | callan | <IDLE> | 2008-05-12 11:46:44.698011+01 | 2008-05-12 11:46:44.628062+01 | 127.0.0.1 | 36093
1018661 | omero3 | 31798 | 16384 | callan | <IDLE> | 2008-05-12 11:46:44.886039+01 | 2008-05-12 11:46:44.638314+01 | 127.0.0.1 | 46696
1018661 | omero3 | 31800 | 16384 | callan | <IDLE> | 2008-05-12 11:46:44.65131+01 | 2008-05-12 11:46:44.649937+01 | 127.0.0.1 | 39196
1018661 | omero3 | 31861 | 16384 | callan | <IDLE> | 2008-05-12 11:48:21.690855+01 | 2008-05-12 11:48:21.690763+01 | | -1
omero3=# -- http://www.postgresql.org/docs/7.4/interactive/view-pg-locks.html
omero3=# SELECT pg_class.relname AS table,
omero3-# pg_database.datname AS database,
omero3-# transaction, pid, mode, granted
omero3-# FROM pg_locks, pg_class, pg_database
omero3-# WHERE pg_locks.relation = pg_class.oid
omero3-# AND pg_locks.database = pg_database.oid;
table | database | transaction | pid | mode | granted
--------------------+----------+-------------+-------+-----------------+---------
pg_locks | omero3 | 11871408 | 31861 | AccessShareLock | t
pg_class_oid_index | omero3 | 11871408 | 31861 | AccessShareLock | t
pg_class | omero3 | 11871408 | 31861 | AccessShareLock | t
(3 rows)
omero3=#
omero3=# -- Christian Storm <storm AT iparadigms.com>
omero3=# select pg_stat_activity.datname,
omero3-# pg_class.relname,
omero3-# pg_locks.transaction,
omero3-# pg_locks.mode,
omero3-# pg_locks.granted,
omero3-# pg_stat_activity.usename,
omero3-# substr(pg_stat_activity.current_query,1,30),
omero3-# pg_stat_activity.query_start,
omero3-# age(now(),
omero3(# pg_stat_activity.query_start) as "age",
omero3-# pg_stat_activity.procpid
omero3-# from pg_stat_activity,pg_locks left outer join pg_class on (pg_locks.relation = pg_class.oid)
omero3-# where pg_locks.pid=pg_stat_activity.procpid
omero3-# order by query_start desc;
datname | relname | transaction | mode | granted | usename | substr | query_start | age | procpid
---------+------------------+-------------+-----------------+---------+---------+--------+-------------------------------+-----------------+---------
omero3 | pg_locks | 11871409 | AccessShareLock | t | callan | <IDLE> | 2008-05-12 11:48:26.245359+01 | 00:03:55.993249 | 31861
omero3 | pg_authid | 11871409 | AccessShareLock | t | callan | <IDLE> | 2008-05-12 11:48:26.245359+01 | 00:03:55.993249 | 31861
omero3 | pg_database | 11871409 | AccessShareLock | t | callan | <IDLE> | 2008-05-12 11:48:26.245359+01 | 00:03:55.993249 | 31861
omero3 | | 11871409 | ExclusiveLock | t | callan | <IDLE> | 2008-05-12 11:48:26.245359+01 | 00:03:55.993249 | 31861
omero3 | pg_class | 11871409 | AccessShareLock | t | callan | <IDLE> | 2008-05-12 11:48:26.245359+01 | 00:03:55.993249 | 31861
omero3 | pg_stat_activity | 11871409 | AccessShareLock | t | callan | <IDLE> | 2008-05-12 11:48:26.245359+01 | 00:03:55.993249 | 31861
(6 rows)
comment:12 Changed 11 years ago by cxallan
Deadlock number three:
omero3=# -- http://www.postgresql.org/docs/7.4/interactive/view-pg-locks.html
omero3=# SELECT pg_class.relname AS table,
omero3-# pg_database.datname AS database,
omero3-# transaction, pid, mode, granted
omero3-# FROM pg_locks, pg_class, pg_database
omero3-# WHERE pg_locks.relation = pg_class.oid
omero3-# AND pg_locks.database = pg_database.oid;
table | database | transaction | pid | mode | granted
--------------------+----------+-------------+-------+-----------------+---------
pg_class_oid_index | omero3 | 11902738 | 32747 | AccessShareLock | t
pg_locks | omero3 | 11902738 | 32747 | AccessShareLock | t
pg_class | omero3 | 11902738 | 32747 | AccessShareLock | t
(3 rows)
omero3=#
omero3=# -- Christian Storm <storm AT iparadigms.com>
omero3=# select pg_stat_activity.datname,
omero3-# pg_class.relname,
omero3-# pg_locks.transaction,
omero3-# pg_locks.mode,
omero3-# pg_locks.granted,
omero3-# pg_stat_activity.usename,
omero3-# substr(pg_stat_activity.current_query,1,30),
omero3-# pg_stat_activity.query_start,
omero3-# age(now(),
omero3(# pg_stat_activity.query_start) as "age",
omero3-# pg_stat_activity.procpid
omero3-# from pg_stat_activity,pg_locks left outer join pg_class on (pg_locks.relation = pg_class.oid)
omero3-# where pg_locks.pid=pg_stat_activity.procpid
omero3-# order by query_start desc;
datname | relname | transaction | mode | granted | usename | substr | query_start | age | procpid
---------+------------------+-------------+-----------------+---------+---------+--------+-------------------------------+-----------------+---------
omero3 | pg_database | 11902739 | AccessShareLock | t | callan | <IDLE> | 2008-05-12 14:11:14.988843+01 | 00:00:01.365515 | 32747
omero3 | pg_locks | 11902739 | AccessShareLock | t | callan | <IDLE> | 2008-05-12 14:11:14.988843+01 | 00:00:01.365515 | 32747
omero3 | | 11902739 | ExclusiveLock | t | callan | <IDLE> | 2008-05-12 14:11:14.988843+01 | 00:00:01.365515 | 32747
omero3 | pg_authid | 11902739 | AccessShareLock | t | callan | <IDLE> | 2008-05-12 14:11:14.988843+01 | 00:00:01.365515 | 32747
omero3 | pg_stat_activity | 11902739 | AccessShareLock | t | callan | <IDLE> | 2008-05-12 14:11:14.988843+01 | 00:00:01.365515 | 32747
omero3 | pg_class | 11902739 | AccessShareLock | t | callan | <IDLE> | 2008-05-12 14:11:14.988843+01 | 00:00:01.365515 | 32747
(6 rows)
comment:13 Changed 11 years ago by cxallan
Information from Michael:
Chris 14:11 Were you importing or just working with data? Michael Porter 14:11 Using Insight In fact, wasn't even doing anyting fancy, just browsing the treeview
comment:14 Changed 11 years ago by jmoore
cf. r2139
comment:15 Changed 11 years ago by jmoore
- Description modified (diff)
- Summary changed from Server Deadlock While Indexing to Server deadlock on empty connection pool
Finally reproduced on my box. All it takes to reproduce the server freeze is to start 50 threads at the same time all of which call IAdmin.getEventContext(). c3p0 (the JDBC connection pool) then locks. Looking into solutions.
comment:16 Changed 11 years ago by cxallan
This might be related to an issue that J-M noticed a little while ago where he could exhaust the number of sockets (or threads I can't remember) that the server has available. This was especially noticeable when many thumbnail retrieval threads were going at the same time inside Insight.
J-M, have you noticed this again since you revamped the way you're retrieving thumbnails?
comment:17 Changed 11 years ago by jburel
I minimize the number of calls for counting
but I have done change the thumbnails yet.
That might be the reason why michael with his large dataset has the problem
I will put that on top of my TODO list.
comment:18 Changed 11 years ago by jmoore
This is definitely a DoS, though at the moment I can only pin-point it to SQL. Jean-Marie, if you know that this has anything to do with the number of threads, please let me know.
Status: seems to be tied to the number of postgres connections that are allowed. Using the default 30, I could "simultaneously" call up to 35 IAdmin.getEventContext methods. More than 30 simply because of the timing. Reducing the number of postgres connections down to 10 allowed only 6 simultaneous connections. Only 6 because each method may use multiple SQL connections, for sessions, etc. (Interestingly, increasing postgres max_connections to 100 also maxed out at 30.)
The GOOD thing about reducing the number so low is that it forced c3p0 to realize that there was a deadlock, which allowed regular server halting as opposed to using `kill -9'. I'll play with the parameters for c3p0 and see if I can't teach it to recognize deadlocks with lots of connections.
In the end, probably the only permanent cure is a throttle setting which roughly matches the max number of connections to postgres.
09:48:10,205 INFO [ServiceHandler] Meth: getEventContext
09:48:10,205 INFO [ServiceHandler] Args: ()
09:48:10,247 INFO [EventHandler] Auth: user=100,group=2,event=12209(User)
09:48:10,281 INFO [ServiceHandler] Rslt: ome.system.SimpleEventContext@3e925e
09:48:10,517 INFO [ServiceHandler] Meth: getEventContext
09:48:10,517 INFO [ServiceHandler] Args: ()
09:48:10,519 INFO [ServiceHandler] Meth: getEventContext
09:48:10,520 INFO [ServiceHandler] Args: ()
09:48:10,624 INFO [EventHandler] Auth: user=100,group=2,event=12210(User)
09:48:10,632 INFO [EventHandler] Auth: user=100,group=2,event=12211(User)
09:48:10,657 INFO [ServiceHandler] Rslt: ome.system.SimpleEventContext@8ef279
09:48:10,659 INFO [ServiceHandler] Rslt: ome.system.SimpleEventContext@831dff
09:48:11,030 INFO [ServiceHandler] Meth: getEventContext
09:48:11,030 INFO [ServiceHandler] Args: ()
09:48:11,326 INFO [ServiceHandler] Meth: getEventContext
09:48:11,326 INFO [ServiceHandler] Args: ()
09:48:11,679 INFO [ServiceHandler] Meth: getEventContext
09:48:11,680 INFO [ServiceHandler] Args: ()
09:48:11,820 INFO [EventHandler] Auth: user=100,group=2,event=12212(User)
09:48:11,825 INFO [EventHandler] Auth: user=100,group=2,event=12213(User)
09:48:11,826 INFO [EventHandler] Auth: user=100,group=2,event=12214(User)
09:48:11,910 INFO [ServiceHandler] Rslt: ome.system.SimpleEventContext@ccf2f9
09:48:11,915 INFO [ServiceHandler] Rslt: ome.system.SimpleEventContext@d28476
09:48:11,925 INFO [ServiceHandler] Rslt: ome.system.SimpleEventContext@5b7bd1
09:48:12,036 INFO [ServiceHandler] Meth: getEventContext
09:48:12,036 INFO [ServiceHandler] Args: ()
09:48:12,077 INFO [ServiceHandler] Meth: getEventContext
09:48:12,077 INFO [ServiceHandler] Args: ()
09:48:12,118 INFO [ServiceHandler] Meth: getEventContext
09:48:12,118 INFO [ServiceHandler] Args: ()
09:48:12,152 INFO [ServiceHandler] Meth: getEventContext
09:48:12,152 INFO [ServiceHandler] Args: ()
09:48:12,259 INFO [EventHandler] Auth: user=100,group=2,event=12215(User)
09:48:12,285 INFO [EventHandler] Auth: user=100,group=2,event=12216(User)
09:48:12,300 INFO [EventHandler] Auth: user=100,group=2,event=12218(User)
09:48:12,305 INFO [EventHandler] Auth: user=100,group=2,event=12217(User)
09:48:12,385 INFO [ServiceHandler] Rslt: ome.system.SimpleEventContext@d8b4ba
09:48:12,402 INFO [ServiceHandler] Rslt: ome.system.SimpleEventContext@c2339
09:48:12,408 INFO [ServiceHandler] Rslt: ome.system.SimpleEventContext@afb49d
09:48:12,417 INFO [ServiceHandler] Rslt: ome.system.SimpleEventContext@bcdd50
09:48:12,527 INFO [ServiceHandler] Meth: getEventContext
09:48:12,527 INFO [ServiceHandler] Args: ()
09:48:12,534 INFO [ServiceHandler] Meth: getEventContext
09:48:12,534 INFO [ServiceHandler] Args: ()
09:48:12,603 INFO [ServiceHandler] Meth: getEventContext
09:48:12,603 INFO [ServiceHandler] Args: ()
09:48:12,639 INFO [ServiceHandler] Meth: getEventContext
09:48:12,639 INFO [ServiceHandler] Args: ()
09:48:12,710 INFO [ServiceHandler] Meth: getEventContext
09:48:12,711 INFO [ServiceHandler] Args: ()
09:48:12,724 INFO [EventHandler] Auth: user=100,group=2,event=12219(User)
09:48:12,739 INFO [EventHandler] Auth: user=100,group=2,event=12220(User)
09:48:12,799 INFO [EventHandler] Auth: user=100,group=2,event=12221(User)
09:48:12,832 INFO [EventHandler] Auth: user=100,group=2,event=12222(User)
09:48:12,833 INFO [ServiceHandler] Rslt: ome.system.SimpleEventContext@834822
09:48:12,859 INFO [EventHandler] Auth: user=100,group=2,event=12223(User)
09:48:12,881 INFO [ServiceHandler] Rslt: ome.system.SimpleEventContext@1ecf06
09:48:12,894 INFO [ServiceHandler] Rslt: ome.system.SimpleEventContext@f709a2
09:48:12,905 INFO [ServiceHandler] Rslt: ome.system.SimpleEventContext@90b8a5
09:48:12,922 INFO [ServiceHandler] Rslt: ome.system.SimpleEventContext@82901c
09:48:13,054 INFO [ServiceHandler] Meth: getEventContext
09:48:13,054 INFO [ServiceHandler] Args: ()
09:48:13,056 INFO [ServiceHandler] Meth: getEventContext
09:48:13,056 INFO [ServiceHandler] Args: ()
09:48:13,061 INFO [ServiceHandler] Meth: getEventContext
09:48:13,061 INFO [ServiceHandler] Args: ()
09:48:13,062 INFO [ServiceHandler] Meth: getEventContext
09:48:13,062 INFO [ServiceHandler] Args: ()
09:48:13,064 INFO [ServiceHandler] Meth: getEventContext
09:48:13,064 INFO [ServiceHandler] Args: ()
09:48:13,066 INFO [ServiceHandler] Meth: getEventContext
09:48:13,066 INFO [ServiceHandler] Args: ()
09:48:13,180 INFO [EventHandler] Auth: user=100,group=2,event=12224(User)
09:48:13,275 INFO [EventHandler] Auth: user=100,group=2,event=12227(User)
09:48:13,293 INFO [EventHandler] Auth: user=100,group=2,event=12225(User)
09:48:13,381 INFO [ServiceHandler] Rslt: ome.system.SimpleEventContext@3a58c1
09:48:13,382 INFO [ServiceHandler] Rslt: ome.system.SimpleEventContext@a2d046
09:48:13,384 INFO [EventHandler] Auth: user=100,group=2,event=12226(User)
09:48:13,409 INFO [ServiceHandler] Rslt: ome.system.SimpleEventContext@9cc91d
09:48:13,455 INFO [ServiceHandler] Rslt: ome.system.SimpleEventContext@7c297a
09:48:27,224 WARN [ThreadPoolAsynchronousRunner] com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@eaec4 -- APPARENT DEADLOCK!!! Creating emergency threads for unassigned pending tasks!
09:48:27,237 WARN [ThreadPoolAsynchronousRunner] com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@eaec4 -- APPARENT DEADLOCK!!! Complete Status:
Managed Threads: 3
Active Threads: 3
Active Tasks:
com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@683e5e (com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#0)
com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@d6ca90 (com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#2)
com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@8ce38e (com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#1)
Pending Tasks:
com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@af8cbc
com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@40c470
com.mchange.v2.resourcepool.BasicResourcePool$1RefurbishCheckinResourceTask@209ff6
com.mchange.v2.resourcepool.BasicResourcePool$1RefurbishCheckinResourceTask@e5f80f
com.mchange.v2.resourcepool.BasicResourcePool$1RefurbishCheckinResourceTask@d590
com.mchange.v2.resourcepool.BasicResourcePool$1RefurbishCheckinResourceTask@c4e2fd
com.mchange.v2.resourcepool.BasicResourcePool$1RefurbishCheckinResourceTask@127a7e
com.mchange.v2.resourcepool.BasicResourcePool$1RefurbishCheckinResourceTask@2825d5
com.mchange.v2.resourcepool.BasicResourcePool$1RefurbishCheckinResourceTask@cbd3f4
com.mchange.v2.resourcepool.BasicResourcePool$1RefurbishCheckinResourceTask@c862a3
Pool thread stack traces:
Thread[com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#2,5,jboss]
java.lang.Thread.sleep(Native Method)
com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask.run(BasicResourcePool.java:1805)
com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:547)
Thread[com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#0,5,jboss]
java.lang.Thread.sleep(Native Method)
com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask.run(BasicResourcePool.java:1805)
com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:547)
Thread[com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#1,5,jboss]
java.lang.Thread.sleep(Native Method)
com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask.run(BasicResourcePool.java:1805)
com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:547)
09:48:27,296 INFO [EventHandler] Auth: user=100,group=2,event=12228(User)
09:48:27,314 INFO [EventHandler] Auth: user=100,group=2,event=12229(User)
09:48:27,314 INFO [ServiceHandler] Rslt: ome.system.SimpleEventContext@34369e
09:48:27,334 INFO [ServiceHandler] Rslt: ome.system.SimpleEventContext@3539d7
09:48:30,012 INFO [ServiceHandler] Meth: doWork
09:48:30,013 INFO [ServiceHandler] Args: [null, null, ome.tools.spring.InternalServiceFactory@17aae1]
09:48:30,041 INFO [EventHandler] Auth: user=0,group=0,event=12230(FullText)
09:48:30,070 INFO [FullTextIndexer] Indexed: ome.model.meta.Session:Id_702
09:48:30,660 INFO [ServiceHandler] Rslt: null
09:48:41,892 WARN [BasicResourcePool] com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@d6ca90 -- Acquisition Attempt Failed!!! Clearing pending acquires. While trying to acquire a needed new resource, we failed to succeed more than the maximum number of allowed acquisition attempts (30). Last acquisition attempt exception:
org.postgresql.util.PSQLException: FATAL: sorry, too many clients already
at org.postgresql.core.v3.ConnectionFactoryImpl.readStartupMessages(ConnectionFactoryImpl.java:443)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:98)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:65)
at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Connection.java:116)
at org.postgresql.jdbc3.AbstractJdbc3Connection.<init>(AbstractJdbc3Connection.java:30)
at org.postgresql.jdbc3.Jdbc3Connection.<init>(Jdbc3Connection.java:24)
at org.postgresql.Driver.makeConnection(Driver.java:369)
at org.postgresql.Driver.connect(Driver.java:245)
at com.mchange.v2.c3p0.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:135)
at com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:182)
at com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:171)
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.acquireResource(C3P0PooledConnectionPool.java:137)
at com.mchange.v2.resourcepool.BasicResourcePool.doAcquire(BasicResourcePool.java:1014)
at com.mchange.v2.resourcepool.BasicResourcePool.access$800(BasicResourcePool.java:32)
at com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask.run(BasicResourcePool.java:1810)
at com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:547)
09:48:42,049 WARN [BasicResourcePool] com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@683e5e -- Acquisition Attempt Failed!!! Clearing pending acquires. While trying to acquire a needed new resource, we failed to succeed more than the maximum number of allowed acquisition attempts (30). Last acquisition attempt exception:
org.postgresql.util.PSQLException: FATAL: sorry, too many clients already
at org.postgresql.core.v3.ConnectionFactoryImpl.readStartupMessages(ConnectionFactoryImpl.java:443)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:98)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:65)
at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Connection.java:116)
at org.postgresql.jdbc3.AbstractJdbc3Connection.<init>(AbstractJdbc3Connection.java:30)
at org.postgresql.jdbc3.Jdbc3Connection.<init>(Jdbc3Connection.java:24)
at org.postgresql.Driver.makeConnection(Driver.java:369)
at org.postgresql.Driver.connect(Driver.java:245)
at com.mchange.v2.c3p0.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:135)
at com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:182)
at com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:171)
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.acquireResource(C3P0PooledConnectionPool.java:137)
at com.mchange.v2.resourcepool.BasicResourcePool.doAcquire(BasicResourcePool.java:1014)
at com.mchange.v2.resourcepool.BasicResourcePool.access$800(BasicResourcePool.java:32)
at com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask.run(BasicResourcePool.java:1810)
at com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:547)
09:48:42,355 WARN [BasicResourcePool] com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@8ce38e -- Acquisition Attempt Failed!!! Clearing pending acquires. While trying to acquire a needed new resource, we failed to succeed more than the maximum number of allowed acquisition attempts (30). Last acquisition attempt exception:
org.postgresql.util.PSQLException: FATAL: sorry, too many clients already
at org.postgresql.core.v3.ConnectionFactoryImpl.readStartupMessages(ConnectionFactoryImpl.java:443)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:98)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:65)
at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Connection.java:116)
at org.postgresql.jdbc3.AbstractJdbc3Connection.<init>(AbstractJdbc3Connection.java:30)
at org.postgresql.jdbc3.Jdbc3Connection.<init>(Jdbc3Connection.java:24)
at org.postgresql.Driver.makeConnection(Driver.java:369)
at org.postgresql.Driver.connect(Driver.java:245)
at com.mchange.v2.c3p0.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:135)
at com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:182)
at com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:171)
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.acquireResource(C3P0PooledConnectionPool.java:137)
at com.mchange.v2.resourcepool.BasicResourcePool.doAcquire(BasicResourcePool.java:1014)
at com.mchange.v2.resourcepool.BasicResourcePool.access$800(BasicResourcePool.java:32)
at com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask.run(BasicResourcePool.java:1810)
at com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:547)
09:48:56,643 WARN [BasicResourcePool] com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@40c470 -- Acquisition Attempt Failed!!! Clearing pending acquires. While trying to acquire a needed new resource, we failed to succeed more than the maximum number of allowed acquisition attempts (30). Last acquisition attempt exception:
org.postgresql.util.PSQLException: FATAL: sorry, too many clients already
at org.postgresql.core.v3.ConnectionFactoryImpl.readStartupMessages(ConnectionFactoryImpl.java:443)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:98)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:65)
at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Connection.java:116)
at org.postgresql.jdbc3.AbstractJdbc3Connection.<init>(AbstractJdbc3Connection.java:30)
at org.postgresql.jdbc3.Jdbc3Connection.<init>(Jdbc3Connection.java:24)
at org.postgresql.Driver.makeConnection(Driver.java:369)
at org.postgresql.Driver.connect(Driver.java:245)
at com.mchange.v2.c3p0.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:135)
at com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:182)
at com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:171)
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.acquireResource(C3P0PooledConnectionPool.java:137)
at com.mchange.v2.resourcepool.BasicResourcePool.doAcquire(BasicResourcePool.java:1014)
at com.mchange.v2.resourcepool.BasicResourcePool.access$800(BasicResourcePool.java:32)
at com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask.run(BasicResourcePool.java:1810)
at com.mchange.v2.async.ThreadPerTaskAsynchronousRunner$TaskThread.run(ThreadPerTaskAsynchronousRunner.java:255)
09:48:56,646 WARN [BasicResourcePool] com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@af8cbc -- Acquisition Attempt Failed!!! Clearing pending acquires. While trying to acquire a needed new resource, we failed to succeed more than the maximum number of allowed acquisition attempts (30). Last acquisition attempt exception:
org.postgresql.util.PSQLException: FATAL: sorry, too many clients already
at org.postgresql.core.v3.ConnectionFactoryImpl.readStartupMessages(ConnectionFactoryImpl.java:443)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:98)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:65)
at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Connection.java:116)
at org.postgresql.jdbc3.AbstractJdbc3Connection.<init>(AbstractJdbc3Connection.java:30)
at org.postgresql.jdbc3.Jdbc3Connection.<init>(Jdbc3Connection.java:24)
at org.postgresql.Driver.makeConnection(Driver.java:369)
at org.postgresql.Driver.connect(Driver.java:245)
at com.mchange.v2.c3p0.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:135)
at com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:182)
at com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:171)
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.acquireResource(C3P0PooledConnectionPool.java:137)
at com.mchange.v2.resourcepool.BasicResourcePool.doAcquire(BasicResourcePool.java:1014)
at com.mchange.v2.resourcepool.BasicResourcePool.access$800(BasicResourcePool.java:32)
at com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask.run(BasicResourcePool.java:1810)
at com.mchange.v2.async.ThreadPerTaskAsynchronousRunner$TaskThread.run(ThreadPerTaskAsynchronousRunner.java:255)
09:49:00,013 INFO [ServiceHandler] Meth: doWork
09:49:00,014 INFO [ServiceHandler] Args: [null, null, ome.tools.spring.InternalServiceFactory@d2a0dc]
09:49:00,058 INFO [EventHandler] Auth: user=0,group=0,event=12231(FullText)
09:49:00,087 INFO [ServiceHandler] Rslt: null
...
09:52:23,411 INFO [TransactionManagerService] Stopping recovery manager
09:52:23,474 INFO [Server] Shutdown complete
Shutdown complete
Halting VM
comment:19 Changed 11 years ago by jmoore
Confirmed in blitz. Confirmed with and without indexing thread running.
Adding the following c3p0.properties allows scaling to 100s of simultaneous IAdmin.getEventContext() calls without the need for kill -9. Specifically up to 500 from OmeroPy at which point ConnectionLost exceptions start to be thrown, and above 200 (250, 350, 400 threads passed) for RMI/java before some (currently unknown) failure occurs:
c3p0.initialPoolSize=30 c3p0.minPoolSize=30 c3p0.maxPoolSize=63
Note: 63 is one less than the maximum number of connections allowed by my postgres.
comment:20 Changed 11 years ago by jmoore
Upgrading to c3p0 0.9.1.2 as suggested in HHH-3189 doesn't seem to significantly increase number of threads possible without c3p0.properties (<60), but the number of threads with c3p0.properties and version 0.9.1.2 is slightly higher. In Java, perhaps as many as 500 (never more) when warm, though these figures seem to fluctuate highly.
comment:21 Changed 11 years ago by jmoore
r2373 has an initial commit with c3p0 changes.
Beginning to outline how to configure OMERO for numerous concurrent threads under wiki:ScalingOmero
comment:22 Changed 11 years ago by jmoore
- Cc atarkowska added
How are we looking everyone? Any more freezes? Ola had another issue with the settings in c3p0.properties causing "too many clients" exceptions. Probably needs to be tuned.
comment:23 Changed 11 years ago by jburel
I have not seen the problem since modifications.
comment:24 Changed 11 years ago by jmoore
- Resolution set to fixed
- Status changed from assigned to closed
Closing, c3p0.properties configuration will be handled elsewhere.
comment:25 Changed 11 years ago by jmoore
- Resolution fixed deleted
- Status changed from closed to reopened
Reopening, since it looks like the current configuration for c3p0:
- assumes too many connections for many users
- would eventually still lead to a server freeze
comment:26 Changed 11 years ago by jmoore
r2451 changes the etc/c3p0.properties to better handle scaling. With a maxPoolSize of just 20 (the default), I can scale on my box to 300+ threads. With maxPoolSize at 50, I can scale to 1000+ threads. (With a connectionTimeout of 5 seconds, and a test time of 30 seconds.)
The commit also includes a "weakening" of the lock on FullTextThread.stop(). There's really no reason that if the server is going down, that we should wait more than a minute or two to kill the thread.
Chris, let me know how testing goes.
comment:27 Changed 11 years ago by jmoore
Also for reference:
http://www.mchange.com/projects/c3p0/index.html#configuration_properties
comment:28 Changed 11 years ago by jmoore
See also #996
comment:29 Changed 11 years ago by jmoore
- Resolution set to fixed
- Status changed from reopened to closed
Sorry pressed the submit button by mistake