Requirement #10940 (new)
|Reported by:||sleo-x||Owned by:|
|Cc:||llianas-x, analysis@…, mtbcarroll||Business Value:||n.a.|
|Total Story Points:||n.a.||Roif:||n.a.|
|Mandatory Story Points:||n.a.|
Integration of a graph-oriented DB into OMERO. The new service could be named "OMERO.graph" and its purpose would be that of persistently storing the network of connections between related OMERO objects.
At CRS4, we use OMERO to implement Biobank, a framework for managing biomedical data built on top of a customized (i.e., we developed our own models) version of OMERO core.
Our data model focuses on the "chain of custody" concept, where (almost) every object is linked to some other object that created it as the result of an action. The chain of events that leads to the creation of an object is dynamic and unpredictable: in some cases, we need to reconstruct it for an arbitrary object, e.g., to retrieve all genotyping information for a given experimental subject.
Doing this by directly querying the OMERO DB does not scale due to the large number of queries and the fact that you can't perform joins since you don't know in advance which tables are involved. This lead us to the conclusion that we needed to complement the OMERO-based object repository with a fast graph traversal system which, at first, we implemented in-memory with python-graph. Needless to say, this not much more scalable than the direct approach, mainly because of the memory requirements (we have to store almost all objects in advance).
Recently, we experimented with a scalable solution that uses a graph-oriented DB (Neo4j) and a message queue (RabbitMQ). To keep Neo4j synced with OMERO, we implemented our custom event-driven protocol based on the exchange of messages via RabbitMQ.
Integrating Neo4j into OMERO as a service (embedding Neo4j in Java applications is directly supported) would greatly simplify the synchronization process as this could be triggered directly by OMERO save/update/delete events.
Any application that needs to keep track of relationships between OMERO objects that cannot be simply modeled through a SQL-like approach would benefit from this.
Simone Leo & Luca Lianas