Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

User Story #6320 (closed)

Opened 10 years ago

Closed 8 years ago

HIC: Dataset silo layer

Reported by: jamoore Owned by:
Priority: critical Milestone: Unscheduled
Component: General Keywords: n.a.
Cc: hic@… Story Points: n.a.
Sprint: n.a. Importance: n.a.
Total Remaining Time: n.a. Estimated Remaining Time: n.a.

Description (last modified by jburel)

So far we have focused on the project silo model. The data for which was manually anonymised, extracted and transferred from the mssql servers on the NHS network at HIC to the HIC/OMERO server on the UNI network.

It is going to be useful for other stories that we develop the dataset silo model as a parent layer of the project silo within this pilot.

The dataset silo is a complete anonymised mirror of the nhs datasets held on the mssql servers at HIC. The project silos are then prepared on a project-by-project basis from the dataset silo all within the HIC/OMERO architecture. This will include anonymisation, data cleaning and modelling steps.

Josh, has developed a diagram (pdf) outlining the proposed structure and the relationship with existing tickets.

Each of the clinical data files we've provided from the GoDARTS project represents a dataset. For instance the separate files for SMR, RX (prescribing), BIOCHEM, etc are all separate datasets. I think the only way we can work this in terms of the pilot project governance is to use the GoDARTS data as the real data and create a mass of fake data based on the schemas. These would then form the dataset silo, from which we can rebuild the project silo and test that users only get to work with the project data they are supposed to.

There are various reasons why this is useful - simplify/automate data flows, governance and audit, end user aggregate queries (e.g. study estimates). But of particular interest is the governance and the following 2 use cases:

  1. The audit trail should allow dataset inspection e.g. which project have used smr... In theory a data owner (the custodians, patients or caldicotts) may want to see how safe our model is or who is using 'their' data.
  1. The embedding of risk prediction models, privacy impact assessments and disclosure controls. These are mechanisms that are being discussed in the SHIP blueprint as ways to facilitate risk based & proportionate data governance.

The following points are from the Jan2011 draft of the blueprint, once the final version is available I'll update this story - although this may need pulling out into a separate story.

  • Assessing privacy risks is an integral component of a data controller’s responsibilities and should form a central part of their privacy policy. This process should include the identification of confidentiality, security and privacy risks of any data handling including linkages, storage and access considerations. The Information Commissioner's Office have developed a privacy impact assessment handbook http://bit.ly/A2cga containing guidance for carrying out risk assessments.
  • Appropriate disclosure control should be applied to all outputs; this should be carried out under the authority and oversight of the designated privacy officer. The Information Services Divison (ISD) of NHS Scotland have developed various documents on data protection and confidentiality http://bit.ly/nPOMZY and of particular importance is their protocol for disclosure control.

Implementation

A dataset silo in OMERO should be exportable to a re-anonymized project silo. If the work for #4652 (server-side API), then this work can be implemented as sub-directories in the silo fs repository. Along with the links between input tables and output tables, metadata about any operations (see #6321) that were performed on the data should also be recorded.

Change History (6)

comment:1 Changed 10 years ago by jmoore

  • Cc hic@… added; szwells removed
  • Description modified (diff)
  • Milestone changed from Unscheduled to OMERO-Beta4.3.2

comment:2 Changed 9 years ago by adjudson

This is related (possible duplicate of) user story #6330.

comment:3 Changed 9 years ago by jmoore

  • Description modified (diff)
  • Summary changed from HIC: Export dataset to project to HIC: Dataset silo layer

Merging description from #6330

comment:4 Changed 9 years ago by jburel

  • Description modified (diff)
  • Milestone changed from OMERO-Beta4.3.2 to OME-5.0

comment:5 Changed 9 years ago by jmoore

  • Milestone changed from OMERO-Beta4.4 to Unscheduled

Moving to "Unscheduled" as 4.4.0 release approaches.

comment:6 Changed 8 years ago by jamoore

  • Resolution set to invalid
  • Status changed from new to closed

Closing all specific HIC tasks.

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.96681 sec.)

We're Hiring!