Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Task #10304 (new)

Opened 6 years ago

Last modified 3 years ago

Bug: Original metadata is unquoted and not safe to parse

Reported by: rleigh Owned by:
Priority: critical Milestone: Metadata
Component: OmeroPy Version: 4.4.9
Keywords: n.a. Cc: jburel, florian.leitner@…, java@…
Resources: n.a. Referenced By: n.a.
References: n.a. Remaining Time: n.a.
Sprint: n.a.

Description

Affects 4.4+

Original metadata can potentially contain both = symbols (in both key and value) and can also contain newlines. Currently, when retrieving the original metadata, for example with:

      <XMLAnnotation ID="Annotation:65">
        <Value>
           <OriginalMetadata xmlns="openmicroscopy.org/OriginalMetadata">
              <Key>test User-Comment[0]</Key>
              <Value>Test test test test 
Entry entry entry 
Test test test test</Value>
           </OriginalMetadata>
        </Value>
     </XMLAnnotation>

this would result in

test User-Comment[0]=Test test test test 
Entry entry entry 
Test test test test

With the current server and gateway code, there is no quoting, so the second two lines will not be associated with the first.

It would be better if the data provided by the server was properly quoted so that the data is unabiguous and parsing can be robust. This could include:

  • surrounding key and value with quotes (")
  • allowing quotes inside keys and values with escaping (\")
  • allowing escaping of escaping (
    )

The server can encode keys and values using this simple scheme. And the client/gateway code can decode it before presenting it to the user. Keys and values are between unquoted quotes. And the escaping can be stripped from the separated keys and values.

Regards,
Roger

Change History (16)

comment:1 Changed 6 years ago by rleigh

The escaping of escaping which trac mangled was just two backslashes.

comment:2 Changed 6 years ago by jamoore

  • Cc jburel florian.leitner@… java@… added; jburel florian.leitner@… removed
  • Milestone changed from Unscheduled to 5.0.0-beta2
  • Version set to 4.4.9

Mark: could you look at this in slot it into other remaining "original metadata" work. This may not longer be an issue with OMERO5 in which case we'll need to decide whether or not we will backport any fixes to OMERO4.

comment:3 Changed 6 years ago by mtbcarroll

Another difficult example is the value for Recording #1 Notes from zeiss-lsm/sample files.mdb/sample files.mdb [XY-ch].

Does the OMERO 5 server still issue these text-format key-value lists (beyond the original_metadata.txt generated in a pre-upgrade import)?

comment:4 Changed 6 years ago by jamoore

  • Owner set to mlinkert

Melissa: any thoughts?

comment:5 Changed 6 years ago by mlinkert

Easiest is probably to add to this:

https://github.com/openmicroscopy/bioformats/blob/develop/components/scifio/src/loci/formats/FormatReader.java#L275

to remove and/or escape '=', '\n', '\r', ', and?'. Then everything received by the server should be valid and parseable. Are there any other characters that you all have encountered that have caused a problem?

comment:6 Changed 6 years ago by mlinkert

Roger or Mark, did either of you have a test case that I can use to make sure I'm on the right track here (e.g. import a specific file and click a specific button in Insight)?

comment:7 Changed 6 years ago by rleigh

Hmm, looking back the problem was originally identified when adding support for LIF User-Comment fields, so it's quite possible that the LIF file(s) containing User-Comment may produce "unsafe" original metadata. But I can't recall offhand which LIF files contained the test comments. However, it may well be sufficient just to add the example annotation above to an OME-TIFF and import that?

comment:8 Changed 6 years ago by mtbcarroll

Melissa, data_repo/test_images_metadata/zeiss-lsm/sample files.mdb/sample files.mdb has some alarming stuff for the value of Recording #1 Notes key of sample files.mdb [XY-ch]. Also, I believe that data_repo/test_images_good/dv/mt1_R3D_D3D.dv has a key named Min, Max, Mean (w=617.0 nm). (So, both those keys have a "special" character.)

comment:9 Changed 6 years ago by mlinkert

  • Owner mlinkert deleted

https://github.com/openmicroscopy/bioformats/pull/821 opened to fix this going forward. Any data already imported will still have this problem, though - that will likely need to be sorted out in OMERO.

comment:10 Changed 6 years ago by jamoore

  • Priority changed from major to critical

Leaving the PR 821 part of this ticket for beta2 and pushing the rest of the work to beta3.

comment:11 Changed 6 years ago by jamoore

  • Milestone changed from 5.0.0-beta2 to 5.0.0-beta3

(forgot to change milestone)

comment:12 Changed 4 years ago by jamoore

  • Milestone changed from 5.1.0 to 5.1.1

comment:13 Changed 4 years ago by jburel

  • Milestone changed from 5.1.1 to 5.1.3

Moving to 5.1.3 when we work on import and annotation support

comment:14 Changed 4 years ago by jamoore

  • Milestone changed from 5.1.4 to OMERO-5.1.4

Splitting 5.1.4 due to milestone decoupling

comment:15 Changed 4 years ago by sbesson

  • Milestone changed from OMERO-5.1.4 to 5.x

As per discussion today, pushing to 5.x

comment:16 Changed 3 years ago by jamoore

  • Milestone changed from 5.x to Metadata
Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.154568 sec.)

We're Hiring!