Warning: Can't synchronize with repository "(default)" (/home/git/ome.git does not appear to be a Git repository.). Look in the Trac log for more information.
Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Requirement #10940 (new)

Opened 11 years ago

Last modified 8 years ago

OMERO.graph service

Reported by: sleo-x Owned by:
Priority: minor Milestone: Unscheduled
Component: Services Keywords: graph,nosql
Cc: llianas-x, analysis@…, mtbcarroll Business Value: n.a.
Total Story Points: n.a. Roif: n.a.
Mandatory Story Points: n.a.

Description

Integration of a graph-oriented DB into OMERO. The new service could be named "OMERO.graph" and its purpose would be that of persistently storing the network of connections between related OMERO objects.

At CRS4, we use OMERO to implement Biobank, a framework for managing biomedical data built on top of a customized (i.e., we developed our own models) version of OMERO core.

Our data model focuses on the "chain of custody" concept, where (almost) every object is linked to some other object that created it as the result of an action. The chain of events that leads to the creation of an object is dynamic and unpredictable: in some cases, we need to reconstruct it for an arbitrary object, e.g., to retrieve all genotyping information for a given experimental subject.

Doing this by directly querying the OMERO DB does not scale due to the large number of queries and the fact that you can't perform joins since you don't know in advance which tables are involved. This lead us to the conclusion that we needed to complement the OMERO-based object repository with a fast graph traversal system which, at first, we implemented in-memory with python-graph. Needless to say, this not much more scalable than the direct approach, mainly because of the memory requirements (we have to store almost all objects in advance).

Recently, we experimented with a scalable solution that uses a graph-oriented DB (Neo4j) and a message queue (RabbitMQ). To keep Neo4j synced with OMERO, we implemented our custom event-driven protocol based on the exchange of messages via RabbitMQ.

Integrating Neo4j into OMERO as a service (embedding Neo4j in Java applications is directly supported) would greatly simplify the synchronization process as this could be triggered directly by OMERO save/update/delete events.

Any application that needs to keep track of relationships between OMERO objects that cannot be simply modeled through a SQL-like approach would benefit from this.

Simone Leo & Luca Lianas

Change History (3)

comment:1 Changed 11 years ago by spli

  • Cc analysis@… added

comment:2 Changed 11 years ago by llianas-x

Josh's email in the ome-devel mailing list

Thanks, Simone!

Either here or on the ticket, could you add a pointer to your use of the neo4j graph API? I.e. what functionality/queries/etc would critical for you to be able to migrate?

Cheers,
~Josh

I think that the OMERO.graph service should provide the following functionalities:

  • generic node/edge create/delete/update functionalities: this will allow users to create their own graphs (if this kind of generic functionality is appropriate)
  • automatic node/edge create/delete triggered by object creation/deletion within OMERO (this is what we're currently doing in Biobank). We have to decide which objects are going to be mapped as graph nodes or edges and how the trigger should be activated. For instance, we could create a node for every Image object within OMERO and, when a new Image is produced as the result of manipulations on an existing one, we could create an edge that connects the corresponding nodes to keep track of this event. Information on how the new Image was created (creation date, program used, etc.) could be added to the edge itself
  • simple node traversal, with functionalities like:
    • retrieve all nodes connected to a given one (we could also filter by edge type or direction, add a maximum depth, etc.)
    • retrieve all incoming/outgoing edges for a specific node
    • for trees, retrieve a node's ancestor and/or children (in Biobank the graph reduces to a tree)
  • custom Cypher queries, similar to the current custom HQL queries

comment:3 Changed 8 years ago by mtbcarroll

  • Cc mtbcarroll added
Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.27363 sec.)

We're Hiring!