Task #11709 (closed)
BUG: Tile loading failure not retried
Reported by: | pwalczysko | Owned by: | wmoore |
---|---|---|---|
Priority: | critical | Milestone: | 5.0.0-rc1 |
Component: | Web | Version: | 5.0.0-beta1 |
Keywords: | n.a. | Cc: | fs@…, ux@…, mlinkert |
Resources: | n.a. | Referenced By: | n.a. |
References: | n.a. | Remaining Time: | n.a. |
Sprint: | OMERO 5 Beta 2 (1) |
Description
Reported during the Phase I of upgrade testing by @bpindelski (line 17 of the gdoc https://docs.google.com/spreadsheet/ccc?key=0AoKiTAl8UOxndGt5bm5mOWl4M1lMc1NkQzVVRW5CZGc&usp=drive_web#gid=13) This was performed on omero4-demo server with develop database which was upgraded from dev_4_4 database (originally from Howe). The images were all the original dev_4_4 upgraded images, no new FS imports were in the DB at that point. bpindelski was viewing big images.
bpindelski:
"Broken image" icons for some tiles. Tiles re-appear when zoom level changed. Might be related to caching.
PW:
PW: cannot repeat on this image, but repeated twice on IE 9 Win7 and Firefox Mac 10.8 on gatan - see 2 screenshots - repeatable, and on Howe this very image has no problems - this is upgrade specific ?
Attachments (4)
Change History (22)
Changed 10 years ago by pwalczysko
Changed 10 years ago by pwalczysko
Changed 10 years ago by pwalczysko
Changed 10 years ago by pwalczysko
comment:1 Changed 10 years ago by pwalczysko
comment:2 Changed 10 years ago by jamoore
- Version changed from 3.0-Beta1 to 5.0.0-beta1
comment:3 Changed 10 years ago by pwalczysko
@wmoore: I have imported the image to Gretzky. Tested Firefox on Mac - no problems on Gretzky detected.
comment:4 Changed 10 years ago by wmoore
I'm seeing this in the omero4-demo web logs when tiles are missing
2013-11-18 11:17:17,223 WARNI [ omero.gateway] (proc.26515) debug:3514 LockTimeout on <class 'omeroweb.webclient.webclient_gateway.OmeroWebSafeCallWrapper'> to <eb0f5296-aef1-407d-84eb-00672990b7a9omero.api.RenderingEngine> load((<ServiceOptsDict: {'omero.session.uuid': 'e5531aef-2e2b-47c4-b52c-64e58aafc42d', 'omero.group': '6', 'omero.user': '13', 'omero.client.uuid': '08deb50b-e2ca-408c-bfb8-4c6c53c6bc97'}>,), {}) Traceback (most recent call last): File "/opt/hudson/workspace/OMERO-merge-develop-FS/src/dist/lib/python/omero/gateway/__init__.py", line 3532, in __call__ return self.f(*args, **kwargs) File "/opt/hudson/workspace/OMERO-merge-develop-FS/src/dist/lib/python/omero_api_RenderingEngine_ice.py", line 410, in load return _M_omero.api.RenderingEngine._op_load.invoke(self, ((), _ctx)) LockTimeout: exception ::omero::LockTimeout { serverStackTrace = ome.conditions.LockTimeout: /repositories/OMERO-merge-develop-FS/Pixels/Dir-007/7532_pyramid is locked by others at ome.io.bioformats.BfPyramidPixelBuffer.initializeReader(BfPyramidPixelBuffer.java:189) at ome.io.bioformats.BfPyramidPixelBuffer.<init>(BfPyramidPixelBuffer.java:173) at ome.io.bioformats.BfPyramidPixelBuffer.<init>(BfPyramidPixelBuffer.java:143) at ome.io.nio.PixelsService.createPyramidPixelBuffer(PixelsService.java:766) at ome.io.nio.PixelsService._getPixelBuffer(PixelsService.java:487) ... at Ice.ConnectionI.message(ConnectionI.java:1163) at IceInternal.ThreadPool.run(ThreadPool.java:302) at IceInternal.ThreadPool.access$300(ThreadPool.java:12) at IceInternal.ThreadPool$EventHandlerThread.run(ThreadPool.java:643) at java.lang.Thread.run(Thread.java:722) serverExceptionClass = ome.conditions.LockTimeout message = /repositories/OMERO-merge-develop-FS/Pixels/Dir-007/7532_pyramid is locked by others backOff = 15000 seconds = 0 } 2013-11-18 11:17:17,235 ERROR [ omeroweb.feedback.views] (proc.26515) handler500:141 handler500: Server error
Rendering Engine init fails during image.setActiveChannels().
comment:5 Changed 10 years ago by wmoore
To reproduce, log in as user-10 to https://omero4-demo.openmicroscopy.org:1443/webclient/img_detail/9732/ and zoom, pan etc. Not browser specific as far as I can see.
comment:6 Changed 10 years ago by wmoore
Blitz image.getChannels() is failing to get a rendering engine initialised. Since it has a decorator that states that we can ignore ConcurrencyExceptions?,
we don't handle this. We decided to ignore this here, since getChannels() is sometimes used to get the channel names, and we don't need RE then. Really, we should be trying again after a "short" time (how long is short)?
@assert_re(ignoreExceptions=(omero.ConcurrencyException)) def getChannels (self):
comment:7 Changed 10 years ago by wmoore
One solution would be to wrap setActiveChannels() with a @assert_re() (not ignoring ConcurrencyException?).
Then, assert_re.call() needs to handle ConcurrencyException? - or we have to handle it somewhere else?
comment:8 Changed 10 years ago by wmoore
- Owner changed from web-team@… to wmoore
comment:9 Changed 10 years ago by wmoore
Chris - you have an idea about how we want to handle this? If we assume that after a ConcurrencyException? we can retry very quickly (E.g. less than a second) then potentially we could do this directly in the Blitz Gateway - under assert_re.call(). However, if it needs to be a lot longer, then we probably need to return something that tells the Javascript to try again some time later - but this is much harder. Since we're expecting to get back a rendered tile (jpeg) what can we send back that tells PanoJs? to try loading the tile again in 10 - 15 seconds (or longer)??
comment:10 Changed 10 years ago by wmoore
Summary of Skype call - Josh, Chris, Carlos & me (Will). Josh will look at reducing the pyramid File lock to be a read-only lock, allowing many simultaneous reads/inits. I will look at adding error handling to the tiles (or tiles container) http://api.jquery.com/error/. If a tile fails to load, retry the load ONCE. If that fails, see if we can give the user a "refresh" button that reloads tiles (just failed tiles??) without losing viewport location.
comment:11 Changed 10 years ago by wmoore
Seems that event bubbling of "error" doesn't work http://forum.jquery.com/topic/error-event-with-live so we'll have to add it to each tile img.
comment:12 Changed 10 years ago by wmoore
- Summary changed from BUG: Tile loading stuck after upgrade to BUG: Tile loading failure not retried
Intermittent tile failure is also an issue on dev_4_4 E.g. pan around this public image at 100% https://nightshade.openmicroscopy.org/webgateway/img_detail/3946831/
PR opened on dev_4_4 for testing https://github.com/openmicroscopy/openmicroscopy/pull/1839
comment:13 Changed 10 years ago by jamoore
Can this be closed with the open PRs or is there more work so that this needs to be pushed?
comment:14 Changed 10 years ago by wmoore
- Resolution set to fixed
- Status changed from new to closed
At one point we discussed the idea of having a "Refresh" button to reload tiles that failed on their second attempt (or never loaded for some other reason) but I'm not sure this is necessary now. Certainly not in this "Phase 1". So we can close this now..
comment:15 Changed 10 years ago by Will Moore <will@…>
(In [e33174f56a9d80b7830b22ec3dfee6b57dc955a0/ome.git] on branch develop) Initial error handling of tile image loading in PanoJS See #11709
comment:16 Changed 10 years ago by Josh Moore <josh@…>
(In [0ca9b6a1a2c60436923f222b4fa5e7775cc851ac/ome.git] on branch develop) Merge pull request #1818 from will-moore/tile_loading_11709
Error handling of tile image loading in PanoJS See #11709
comment:17 Changed 10 years ago by Will Moore <will@…>
(In [2880017202c2d44491e7f6e88a535876bce0c407/ome.git] on branch dev_4_4) Initial error handling of tile image loading in PanoJS See #11709
comment:18 Changed 10 years ago by Josh Moore <josh@…>
(In [31830e3f2735d4da8253f4f8ba74570ab096c99c/ome.git] on branch dev_4_4) Merge pull request #1839 from will-moore/tile_loading_11709_dev_4_4
Error handling of tile image loading in PanoJS See #11709
Was able to repeat on Win7 IE9 even after I have cleared my browser cache.