Task #7325 (closed)
DOC: bin/omero admin restart - failed
Reported by: | wmoore | Owned by: | jamoore |
---|---|---|---|
Priority: | major | Milestone: | OMERO-4.4 |
Component: | Deployment | Version: | n.a. |
Keywords: | n.a. | Cc: | cxallan, jburel |
Resources: | n.a. | Referenced By: | n.a. |
References: | n.a. | Remaining Time: | n.a. |
Sprint: | 2012-01-17 (6) |
Description (last modified by jmoore)
After opening my laptop...
See comments for diagnostics and $ ps aux
jrs-macbookpro-25107:OMERO will$ omero admin restart error: node `master' couldn't be reached: the node is not active Was the server already stopped? Waiting on shutdown. Use CTRL-C to exit ............................. Failed to shutdown some components after 5 minutes jrs-macbookpro-25107:OMERO will$ omero admin start Server already running jrs-macbookpro-25107:OMERO will$ omero admin diagnostics
See separate but perhaps related issues below for which the solution may be to include retries on certain types of failures.
Change History (10)
comment:1 Changed 13 years ago by wmoore
comment:2 Changed 13 years ago by wmoore
- Owner set to jmoore
================================================================================ OMERO Diagnostics 4.3.3 ================================================================================ Commands: java -version 1.6.0 (/usr/bin/java) Commands: python -V 2.6.1 (/Users/will/apps/OMERO.libs/bin/python -- 2 others) Commands: icegridnode --version 3.3.1 (/Users/will/apps/OMERO.libs/bin/icegridnode) Commands: icegridadmin --version 3.3.1 (/Users/will/apps/OMERO.libs/bin/icegridadmin) Commands: psql --version 9.0.4 (/usr/local/bin/psql) Server: icegridnode running Server: Blitz-0 active (pid = 73774, enabled) Server: DropBox inactive (disabled) Server: FileServer active (pid = 73776, enabled) Server: Indexer-0 active (pid = 73777, enabled) Server: MonitorServer inactive (disabled) Server: OMERO.Glacier2 active (pid = 73779, enabled) Server: OMERO.IceStorm active (pid = 73780, enabled) Server: PixelData-0 active (pid = 73781, enabled) Server: Processor-0 active (pid = 73782, enabled) Server: Tables-0 active (pid = 73783, enabled) Server: TestDropBox inactive (enabled) Log dir: /Users/will/Desktop/OMERO/dist/var/log exists Log files: Blitz-0.log 29.0 MB errors=55 warnings=541 Log files: DropBox.log 12.0 KB errors=10 warnings=0 Log files: FileServer.log 3.0 KB Log files: Indexer-0.log 42.0 KB Log files: MonitorServer.log 8.0 KB errors=10 warnings=0 Log files: OMEROweb.log 1.0 MB errors=218 warnings=1 Log files: PixelData-0.log 25.0 KB errors=1 warnings=0 Log files: Processor-0.log 874.0 KB errors=0 warnings=329 Log files: Tables-0.log 224.0 KB errors=0 warnings=319 Log files: TestDropBox.log n/a Log files: master.err 0.0 KB Log files: master.out 0.0 KB Log files: Total size 32.07 MB Parsing Blitz-0.log:[line:30] => Server restarted <= Parsing Blitz-0.log:[line:50225] => Server restarted <= Parsing Blitz-0.log:[line:53714] => Server restarted <= Parsing Blitz-0.log:[line:64630] => Server restarted <= Parsing Blitz-0.log:[line:79985] => Server restarted <= Parsing Blitz-0.log:[line:85793] => Server restarted <= Parsing Blitz-0.log:[line:96678] => Server restarted <= Parsing Blitz-0.log:[line:100911] => Server restarted <= Parsing Blitz-0.log:[line:170752] => Server restarted <= Parsing Blitz-0.log:[line:172077] => Server restarted <= Environment:OMERO_HOME=/Users/will/Desktop/OMERO/dist Environment:OMERO_NODE=(unset) Environment:OMERO_MASTER=(unset) Environment:PATH=/Users/will/apps/OMERO.libs/Cellar/zeroc-ice33/bin:/usr/local/bin:/Users/will/apps/OMERO.libs/bin:/usr/local/lib/node_modules:/Users/will/Desktop/OMERO/dist/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/Applications/gitflow/ Environment:ICE_HOME=/Users/will/apps/OMERO.libs/Cellar/zeroc-ice33 Environment:LD_LIBRARY_PATH=(unset) Environment:DYLD_LIBRARY_PATH=:/Users/will/apps/OMERO.libs/lib OMERO data dir: '/OMERO' Exists? True Is writable? True OMERO.web status... Traceback (most recent call last): File "/Users/will/Desktop/OMERO/dist/bin/omero", line 123, in <module> rv = omero.cli.argv() File "/Users/will/Desktop/OMERO/dist/lib/python/omero/cli.py", line 1172, in argv cli.invoke(args[1:]) File "/Users/will/Desktop/OMERO/dist/lib/python/omero/cli.py", line 722, in invoke stop = self.onecmd(line, previous_args) File "/Users/will/Desktop/OMERO/dist/lib/python/omero/cli.py", line 791, in onecmd self.execute(line, previous_args) File "/Users/will/Desktop/OMERO/dist/lib/python/omero/cli.py", line 871, in execute args.func(args) File "/Users/will/Desktop/OMERO/dist/lib/python/omero/plugins/prefs.py", line 67, in open_and_close_config return func(*args, **kwargs) File "/Users/will/Desktop/OMERO/dist/lib/python/omero/plugins/admin.py", line 780, in diagnostics WebControl().status(args) File "/Users/will/Desktop/OMERO/dist/lib/python/omero/plugins/web.py", line 391, in status import omeroweb.settings as settings ImportError: No module named omeroweb.settings jrs-macbookpro-25107:OMERO will$ omero config get omero.web.application_server=development omero.web.debug=True
comment:3 Changed 13 years ago by wmoore
Another restart problem. This is some work at home with a dodgy internect connection - probably caused the need to restart with IP address change? Then tried restarting the next morning at work (last 2 restarts).
jrs-macbookpro-25107:OMERO will$ omero admin restart error: node `master' couldn't be reached: the node is not active Was the server already stopped? Waiting on shutdown. Use CTRL-C to exit ............................. Failed to shutdown some components after 5 minutes jrs-macbookpro-25107:OMERO will$ omero admin stop Waiting on shutdown. Use CTRL-C to exit .jrs-macbookpro-25107:OMERO will$ omero admin start No descriptor given. Using etc/grid/default.xml Waiting on startup. Use CTRL-C to exit jrs-macbookpro-25107:OMERO will$ omero config set omero.sessions.timeout 20000 jrs-macbookpro-25107:OMERO will$ omero admin restart Waiting on shutdown. Use CTRL-C to exit .No descriptor given. Using etc/grid/default.xml Waiting on startup. Use CTRL-C to exit jrs-macbookpro-25107:OMERO will$ omero admin restart Waiting on shutdown. Use CTRL-C to exit .No descriptor given. Using etc/grid/default.xml Waiting on startup. Use CTRL-C to exit jrs-macbookpro-25107:OMERO will$ tail -f dist/var/log/Blitz-0.log 2011-12-07 23:07:39,004 INFO [ ome.services.util.ServiceHandler] (l.Server-9) Args: [select distinct obj from Experimenter as obj left outer join fetch obj.groupExperimenterMap as map left outer join fetch map.parent g where obj.id in (:ids), PARAMS:ids=ArrayList(1) ] 2011-12-07 23:07:39,005 INFO [ ome.security.basic.EventHandler] (l.Server-9) Auth: user=2,group=3,event=null(User),sess=5ef34d52-1c6f-4020-b3d6-8567e386ac98 2011-12-07 23:07:39,009 INFO [ org.perf4j.TimingLogger] (l.Server-9) start[1323299259004] time[5] tag[omero.call.success.ome.logic.QueryImpl.findByQuery] 2011-12-07 23:07:39,009 INFO [ ome.services.util.ServiceHandler] (l.Server-9) Rslt: ome.model.meta.Experimenter:Id_2 2011-12-07 23:07:39,013 INFO [e.services.sessions.SessionContext$Count] (l.Server-5) -Reference count: 14831435-ff56-46b6-97b3-1f7db16e0847=0 2011-12-07 23:07:39,014 INFO [ ome.services.util.ServiceHandler] (l.Server-0) Meth: interface ome.api.IAdmin.lookupGroup 2011-12-07 23:07:39,014 INFO [ ome.services.util.ServiceHandler] (l.Server-0) Args: [user] 2011-12-07 23:07:39,015 INFO [ ome.security.basic.EventHandler] (l.Server-0) Auth: user=2,group=3,event=null(User),sess=5ef34d52-1c6f-4020-b3d6-8567e386ac98 2011-12-07 23:07:39,039 INFO [ org.perf4j.TimingLogger] (l.Server-0) start[1323299259014] time[25] tag[omero.call.success.ome.logic.AdminImpl.lookupGroup] 2011-12-07 23:07:39,039 INFO [ ome.services.util.ServiceHandler] (l.Server-0) Rslt: ome.model.meta.ExperimenterGroup:Id_1 2011-12-07 23:08:30,035 INFO [ ome.services.util.ServiceHandler] (l.Server-1) Executor.doWork -- ome.services.sessions.SessionManagerImpl.createSession 2011-12-07 23:08:30,035 INFO [ ome.services.util.ServiceHandler] (l.Server-1) Args: [null, InternalSF@1225998403] 2011-12-07 23:08:30,036 INFO [ ome.services.util.ServiceHandler] (l.Server-2) Executor.doWork -- ome.services.sessions.SessionManagerImpl.createSession 2011-12-07 23:08:30,036 INFO [ ome.services.util.ServiceHandler] (l.Server-2) Args: [null, InternalSF@1225998403] 2011-12-07 23:08:30,046 INFO [ ome.security.basic.EventHandler] (l.Server-1) Auth: user=0,group=0,event=51131(Sessions),sess=8ef97cbc-3a22-43e7-a3b6-4905dd252829 2011-12-07 23:08:30,046 INFO [ ome.security.basic.EventHandler] (l.Server-2) Auth: user=0,group=0,event=51132(Sessions),sess=8ef97cbc-3a22-43e7-a3b6-4905dd252829 2011-12-07 23:08:30,057 INFO [ ome.security.basic.CurrentDetails] (l.Server-1) Adding log:INSERT,class ome.model.meta.Session,14271 2011-12-07 23:08:30,057 INFO [ ome.security.basic.CurrentDetails] (l.Server-2) Adding log:INSERT,class ome.model.meta.Session,14272 2011-12-07 23:08:30,067 INFO [ org.perf4j.TimingLogger] (l.Server-1) start[1323299310035] time[32] tag[omero.call.success.ome.services.sessions.SessionManagerImpl$2.doWork] 2011-12-07 23:08:30,067 INFO [ ome.services.util.ServiceHandler] (l.Server-1) Rslt: (ome.model.meta.Experimenter:Id_0, ome.model.meta.ExperimenterGroup:Id_0, [0, 1], ... 4 more) 2011-12-07 23:08:30,067 INFO [ org.perf4j.TimingLogger] (l.Server-2) start[1323299310036] time[31] tag[omero.call.success.ome.services.sessions.SessionManagerImpl$2.doWork] 2011-12-07 23:08:30,067 INFO [ ome.services.util.ServiceHandler] (l.Server-2) Rslt: (ome.model.meta.Experimenter:Id_0, ome.model.meta.ExperimenterGroup:Id_0, [0, 1], ... 4 more) 2011-12-07 23:08:30,068 INFO [ ome.services.blitz.fire.SessionManagerI] (l.Server-1) Created session ServiceFactoryI(session-4c1a3040-c157-4aec-aa5a-3d75e05ada8e/e015e67d-5b49-4302-82c4-14a8fc08ba77) for user root (agent=Python service) 2011-12-07 23:08:30,068 INFO [ ome.services.blitz.fire.SessionManagerI] (l.Server-2) Created session ServiceFactoryI(session-ee18ac57-d52b-4e94-a55d-f62509fed456/55102c50-27c8-4d03-8b15-276da0e99186) for user root (agent=Python service) 2011-12-07 23:08:30,069 INFO [ ome.services.blitz.impl.ServiceFactoryI] (l.Server-0) Added servant to adapter: 55102c50-27c8-4d03-8b15-276da0e99186/ee18ac57-d52b-4e94-a55d-f62509fed456omero.grid.SharedResources(omero.grid._SharedResourcesTie@49e3c996) 2011-12-07 23:08:53,833 INFO [ ome.services.blitz.impl.ServiceFactoryI] (l.Server-7) Added servant to adapter: 5ef34d52-1c6f-4020-b3d6-8567e386ac98/f8e4838e-517e-4c8e-8053-d0bf21a69da9omero.api.IConfig(omero.api._IConfigTie@1833b96c) 2011-12-07 23:09:00,018 INFO [ ome.services.blitz.fire.SessionManagerI] (3-thread-1) Performing requestHeartbeats ^C jrs-macbookpro-25107:OMERO will$ git status # On branch ajax_helper # Changes not staged for commit: # (use "git add <file>..." to update what will be committed) # (use "git checkout -- <file>..." to discard changes in working directory) # # modified: components/tools/OmeroWeb/omeroweb/common/static/common/javascript/popup.js # modified: components/tools/OmeroWeb/omeroweb/webclient/templates/webclient/annotations/metadata_general.html # no changes added to commit (use "git add" and/or "git commit -a") jrs-macbookpro-25107:OMERO will$ omero admin restart Waiting on shutdown. Use CTRL-C to exit ......No descriptor given. Using etc/grid/default.xml icegridnode: failure occurred in daemon: service caught unhandled Ice exception: FileUtil.cpp:477: IceUtil::FileLockedException: could not lock file: `var/registry/__Freeze/lock' syscall exception: Resource temporarily unavailable jrs-macbookpro-25107:OMERO will$ omero admin restart Server not running No descriptor given. Using etc/grid/default.xml Waiting on startup. Use CTRL-C to exit jrs-macbookpro-25107:OMERO will$
comment:4 Changed 13 years ago by wmoore
master.err for restart above
jrs-macbookpro-25107:OMERO will$ cat dist/var/log/master.err -! 12/2/11 09:26:37:548 warning: Blitz-0-Ice.ThreadPool.Server-6: dispatch exception: identity: session-aba9b137-b6ac-4baf-84f9-d6e254edc562/a4a6d8ac-fd16-4306-bc09-d846d39fb82e facet: operation: destroy remote host: 10.12.2.172 remote port: 61057 Ice.ObjectAdapterDeactivatedException name = "BlitzAdapter" at Ice.ObjectAdapterI.checkForDeactivation(ObjectAdapterI.java:1121) at Ice.ObjectAdapterI.findFacet(ObjectAdapterI.java:505) at Ice.ObjectAdapterI.find(ObjectAdapterI.java:499) at ome.services.blitz.impl.ServiceFactoryI.unregisterServant(ServiceFactoryI.java:1005) at ome.services.blitz.impl.ServiceFactoryI.doDestroy(ServiceFactoryI.java:801) at ome.services.blitz.impl.ServiceFactoryI.destroy(ServiceFactoryI.java:703) at Glacier2._SessionDisp.___destroy(_SessionDisp.java:104) at omero.api._ServiceFactoryDisp.__dispatch(_ServiceFactoryDisp.java:1412) at IceInternal.Incoming.invoke(Incoming.java:159) at Ice.ConnectionI.invokeAll(ConnectionI.java:2357) at Ice.ConnectionI.dispatch(ConnectionI.java:1208) at Ice.ConnectionI.message(ConnectionI.java:1163) at IceInternal.ThreadPool.run(ThreadPool.java:302) at IceInternal.ThreadPool.access$300(ThreadPool.java:12) at IceInternal.ThreadPool$EventHandlerThread.run(ThreadPool.java:643) at java.lang.Thread.run(Thread.java:680) -! 12/2/11 09:26:37:548 warning: Blitz-0-Ice.ThreadPool.Server-5: dispatch exception: identity: session-d1004155-2312-4e2d-9042-a3d79f83a18a/59d84cf8-ee11-44a3-9b02-0391ccc4cca8 facet: operation: destroy remote host: 10.12.2.172 remote port: 61054 Ice.ObjectAdapterDeactivatedException name = "BlitzAdapter" at Ice.ObjectAdapterI.checkForDeactivation(ObjectAdapterI.java:1121) at Ice.ObjectAdapterI.findFacet(ObjectAdapterI.java:505) at Ice.ObjectAdapterI.find(ObjectAdapterI.java:499) at ome.services.blitz.impl.ServiceFactoryI.unregisterServant(ServiceFactoryI.java:1005) at ome.services.blitz.impl.ServiceFactoryI.doDestroy(ServiceFactoryI.java:801) at ome.services.blitz.impl.ServiceFactoryI.destroy(ServiceFactoryI.java:703) at Glacier2._SessionDisp.___destroy(_SessionDisp.java:104) at omero.api._ServiceFactoryDisp.__dispatch(_ServiceFactoryDisp.java:1412) at IceInternal.Incoming.invoke(Incoming.java:159) at Ice.ConnectionI.invokeAll(ConnectionI.java:2357) at Ice.ConnectionI.dispatch(ConnectionI.java:1208) at Ice.ConnectionI.message(ConnectionI.java:1163) at IceInternal.ThreadPool.run(ThreadPool.java:302) at IceInternal.ThreadPool.access$300(ThreadPool.java:12) at IceInternal.ThreadPool$EventHandlerThread.run(ThreadPool.java:643) at java.lang.Thread.run(Thread.java:680) !! 12/03/11 21:01:41.518 icegridnode: error: service caught unhandled Ice exception: FileUtil.cpp:477: IceUtil::FileLockedException: could not lock file: `var/registry/__Freeze/lock' syscall exception: Resource temporarily unavailable -! 12/04/11 08:08:59.944 OMERO.Glacier2: warning: dispatch exception: Network.cpp:1178: Ice::ConnectFailedException: connect failed: No route to host identity: session-71ca903c-cd65-49b5-a224-b60676209eec/8322950c-2c74-4bda-9810-f9398c25d47d facet: operation: keepAllAlive remote host: 127.0.0.1 remote port: 51739 -! 12/04/11 08:08:59.955 OMERO.Glacier2: warning: dispatch exception: Network.cpp:1178: Ice::ConnectFailedException: connect failed: No route to host identity: session-63dbf2f0-9c1e-4425-b624-d09f08e0b1ea/8322950c-2c74-4bda-9810-f9398c25d47d facet: operation: keepAllAlive remote host: 127.0.0.1 remote port: 51742 -! 12/04/11 08:10:04.957 OMERO.Glacier2: warning: dispatch exception: Network.cpp:1184: Ice::SocketException: socket exception: Host is down identity: session-71ca903c-cd65-49b5-a224-b60676209eec/8322950c-2c74-4bda-9810-f9398c25d47d facet: operation: keepAllAlive remote host: 127.0.0.1 remote port: 51739 -! 12/04/11 08:10:04.958 OMERO.Glacier2: warning: dispatch exception: Network.cpp:1184: Ice::SocketException: socket exception: Host is down identity: session-63dbf2f0-9c1e-4425-b624-d09f08e0b1ea/8322950c-2c74-4bda-9810-f9398c25d47d facet: operation: keepAllAlive remote host: 127.0.0.1 remote port: 51742 !! 12/08/11 09:43:45.229 icegridnode: error: service caught unhandled Ice exception: FileUtil.cpp:477: IceUtil::FileLockedException: could not lock file: `var/registry/__Freeze/lock' syscall exception: Resource temporarily unavailable
comment:5 Changed 13 years ago by jmoore
- Description modified (diff)
comment:6 Changed 13 years ago by jmoore
- Cc cxallan added
Working through this, Will, the only way that this print out can take place is if "ping node" passes but "shutdown node" fails. I could capture the "node is not active" message (not terribly robust) and then what? Do we offer to the user how to kill the process?
The process "5825" cannot be stopped. Use "kill 5825" ...
Thoughts? Adding Chris to the conversation.
comment:7 Changed 13 years ago by jmoore
- Cc jburel added
- Sprint set to 2012-01-17 (6)
- Status changed from new to accepted
- Summary changed from Bug: bin/omero admin restart - failed to DOC: bin/omero admin restart - failed
Like #5576, this is a developer only issue. Adding to FAQ as solution for both of these.
comment:8 Changed 13 years ago by jmoore
- Resolution set to fixed
- Status changed from accepted to closed
Added to troubleshooting.
comment:9 Changed 13 years ago by jmoore <josh@…>
(In [b04ee7b866e2b94e33bd9ec66581e3bb5b3b7c43/ome.git] on branch develop) Failing test shutdown (See #7325)
comment:10 Changed 13 years ago by jmoore <josh@…>
(In [d84045cfcd7a5155dec4b7aea1ee8a25f64577cc/ome.git] on branch develop) Try adding restart wait (See #7325)