Requirement #7902 (new)
MQ support in backend
|Reported by:||jamoore||Owned by:||jamoore|
|Cc:||omero-team@…, ckm@…, bpindelski, jkoetsie@…||Business Value:||n.a.|
|Total Story Points:||n.a.||Roif:||n.a.|
|Mandatory Story Points:||n.a.|
Description (last modified by jmoore)
The addition of a message queue (MQ) should allow any OMERO server component the ability to submit a work-item for later processing by another component, potentially in another process. The MQ should durably store the messages so that they are preserved on restart, and should uniquely deliver messages ("at-most-once" semantics) and somehow notify system administrators of failures for later handling.
As model objects are saved to the database, they generate EventLogs (in the eventlog table) which store the type and id of the affected row along with which action took place (INSERT, UPDATE, DELETE). There are also some specialized events (e.g. "PIXELDATA") which signify special handling. Currently, two internal components make use of these eventlogs in an MQ-like system. These icegridnode servers are typically named "Indexer-0" and "PixelData-0". They poll the eventlog table for updates, storing their current position in the configuration table. Each of these processes can become stuck if the processing of a particular eventlog ID fails. Further, they each consume a substantial amount of memory in order have access to the Hibernate session, though this may not be a requirement we can work around immediately.
Components to use MQ
- OmeroSearch: Search messages should likely contain the entire serialized Lucene Document along with links to any files on disk which need processing. This would allow SearchBridges to remain functioning largely unchanged while upholding the single-writer requirement of Lucene.
- PixelData: Each time a request is made for an image pyramid that does not yet exist, the Pixel graph should be sent to an MQ instance which has access to the proper image files. From these, a pyramid file (e.g. /OMERO/Pixels/001_pyramid) should be generated. Optimally, no processing would take place if the pyramid has been generated in the mean-time.
- OmeroFs: A monitoring service provides FS-level events about files which have been created in configured directory. A second dropbox client receives these events and possibly kicks off the import of data. The DropBox? client has extensive threading/waiting configuration to properly handle eventualities that may be better served by an MQ. (See #1429)
- OmeroScripts: Processor-0 icegridnode servers accept jobs to be run in sub-processes. At the moment, these are unbounded queues which can quickly overwhelm the server.
- LDAP Plugin (#7745): When users or groups are synchronized from LDAP, all members of those groups along with administrators should be notified by email. Since this could be a large number of emails to send, this should happen asynchronously.
- Delete, etc.: Any other background activities which are handled by ThreadPool are also candidates for MQ-integration.
- Deployment: Deploying across 'nix and Windows should be trivial. There should need to be minimal involvement on the sysadmin side to keep multiple instances running and all the messages safely persisted (probably under `/OMERO).
- Monitoring: Administrators and possibly even regular users should be able to view how many messages are currently piled up in a given queue. For example, if the search/indexer queue has thousand of waiting items, then it's likely that a just-saved object will not yet be searchable.
- Configuration: The pixel data use-case would benefit from being able to give priority to certain users or at least to decreasing users who have filled up the queue very quickly.
- Failures: There should be a failure queue which can be accessed by administrators (and displayed in all GUIs and have notifications sent by email, etc) for any tasks which fail so as to not block further tasks. The current system simply logs at ERROR which is not sufficient.
- JVM restart: As mentioned above, the current Java threads (Indexer-0 and PixelData-0) have substantial heap sizes. If we could remove the Hibernate requirement of these processes by passing all the data they need (or having them function as clients only), then the Java process could be shutdown while the queue is empty. This would require some form of launcher system, but that will likely be needed anyway for processing.
- Client submission: It's unlikely that clients will need to submit or handle items directly, but where, this interaction will need to be wrapped with an Ice interface.
Once the MQ has been successfully implemented, the same infrastructure could likely be reused for the entire client & server notification system (#2114/#860). This would allow events (like "This image has been deleted") to propagate to other clients interested in the same information. This would be a pubsub topic as opposed to a queue.