Requirement #4625 (closed)
OMERO/HIC data storage & analysis
|Reported by:||jamoore||Owned by:|
|Cc:||jburel, cxallan, jrswedlow||Business Value:||n.a.|
|Total Story Points:||n.a.||Roif:||n.a.|
|Mandatory Story Points:||n.a.|
Description (last modified by jmoore)
The initial data to be used for the first demo(s) will be some representative
subset of the GoDARTs data provided by Andy and Alison (step 0). This will
approximate the project sets exported to "research data centers". This data
will get loaded into OMERO.tables (step 1) via a command-line script. Some work
has been done on a generic loader, and may be re-usable for this task.
Otherwise, a custom script will be written.
Once the data is in OMERO, another command-line script will be written to
export the OMERO.tables data to a CSV file (step 2). This represents the
current state of the researchers' workflow: easy to implement on the OMERO side
but does not add any security to the system. The script should include
functionality for choosing columns and filtering the exported data. The
usability of this script should be validated by the researchers. Other options
may need to be added: exporting to TSV, XLS; more advanced querying; etc.
The next steps will work to add security constraints to full export to support
the Safe Haven requirements. Any API methods which will be used by partial
export will have auditing added (step 3) so that it is clear which researchers
have accessed what data. Further, classes of authorization will be added to
each column in the data set (step 4). Levels may include (from least to most
secure): full access, aggregations, aggregations without outliers,
correlations, absolute subset, admin-only access. With column security in
place, full export can be disabled, leaving only partial data export (step 5).
At this point, the researchers should again be asked for user-feedback to
determine what features must be added to make this modified workflow still
viable for them.
The final step (6) for the initial phase is then to allow researchers to
submit a script for execution on the entire data set.
- we will work with an anonymised subset of the GoDARTS data as the basis for the pilot
- we will define a couple of analyses that can be demonstrated
- we will define the requirements, stories and tasks
- we will work on porting a couple of key scripts (e.g. genome imputation, date of death validation, drug exposure)
- the initial focus (i.e. for July) will be on developing basic tool(s) for data loading, querying, and export all of which are to be audited; the researchers will then use the exports as usual with their preferred tools.
- the overall focus will be on developing APIs for a key set of defined tools, (e.g. R, STATA, PLINK, VCFTOOLS) so that these can interact directly with the OMERO architecture
Change History (7)
comment:1 Changed 10 years ago by jmoore
- Summary changed from Safe Haven data storage & analysis to OMERO/HIC data storage & analysis