Task #11341 (closed)
Map added to new Annotation and other elements
Reported by: | ajpatterson | Owned by: | ajpatterson |
---|---|---|---|
Priority: | major | Milestone: | 5.1.0-m1 |
Component: | Specification | Version: | 5.0.0-beta1 |
Keywords: | schema | Cc: | jamoore, mlinkert, jburel, rleigh |
Resources: | n.a. | Referenced By: | n.a. |
References: | n.a. | Remaining Time: | n.a. |
Sprint: | n.a. |
Description (last modified by ajpatterson)
Add a representation of a map (database H store column) to schema.
This needs to be a compact representation of a key value pair that can be stored in the database as a H Store column.
e.g. <SA:MapAnnotation ID="Annotation:2"> <SA:Value Terminator="|" Separator=","> <Pairs>a,1|b,2|c,3</Pairs> </SA:Value> </SA:MapAnnotation>
These maps will be added as a new complex type. This will be used in new kind of MapAnnotation?. This new map complex type can also be added directly to some the the current elements. Suggestions so far have been Instrument and WellSample?.
Change History (12)
comment:1 Changed 11 years ago by ajpatterson
comment:2 Changed 11 years ago by ajpatterson
- Description modified (diff)
- Status changed from new to accepted
comment:3 Changed 11 years ago by rleigh
I would suggest that to represent key=value pairs, the default separator should be "=" and the default terminator should be "\n" (newline).
comment:4 Changed 11 years ago by ajpatterson
I do not want to define defaults in the XML, if they are present then it can be hard to tell if the value is set to the default or just not provided.
comment:5 Changed 11 years ago by rleigh
I think it's entirely reasonable to pick sane defaults and stick with them. Why is a newline not a sensible default terminator. Very few people will want or need to change that. And while a variety of separators may get used, using "=" is also sensible. It's not CSV, so comma isn't an obvious choice. Storing key-value pairs was the original intent, so let's have it do that by default!
Also, what's the need for the Value and Pair elements? Why can't it be this simple and compact?:
<SA:MapAnnotation ID="Annotation:2"> key1=value1 key2=value2 key3=value3 </SA:MapAnnotation>
or, with a separator specified
<SA:MapAnnotation ID="Annotation:2" Separator=": "> key1: value1 key2: value2 key3: value3 </SA:MapAnnotation>
comment:6 Changed 11 years ago by jamoore
I'd add a proposal for something more like: <Map><entry key="...">value</entry></Map>. ("Pair" would also work) This will limit the keys to pretty sensible values, while allowing values to vary more widely.
comment:7 Changed 11 years ago by rleigh
@jamoore, this was what Andrew's original version did, which we discussed last week. The disadvantages of this approach are that it retains a pile of XML bloat, and also doesn't allow verbatim copying of original metadata into the XML; the reason for the flexible separator/terminator was to permit direct copying of original metadata. Not having to translate into XML was one of the primary features we discussed.
comment:8 Changed 11 years ago by jamoore
The flip-side is of course is that it requires of all *consumers* of the XML having or writing their own parser. The bloat is about 13 (<e k=""></e>) characters per entry as opposed to 2 (,|) which I would argue is worth to prevent reparsing. I certainly hadn't thought of being able to copy-n-paste into the XML as a requirement, but perhaps if that is a design goal and we're expecting to get huge maps, we could just leave those items in external files? (This is something we will equally need for OMERO.tables, i.e. having csv-like data that has the Right Thing done with it, without needing parsing into XML)
comment:9 Changed 11 years ago by rleigh
I wouldn't say direct copy-paste, but rather that it's possible to preserve the original metadata verbatim without any loss, and to attach it to the parts of the model where it makes sense. Some formats are already generating several thousand annotations, and the translation to an individual annotation per key does not scale in terms of space wastage and parsing overhead; likewise the markup in the <Value> tag ends up adding a vast amount of wastage relative to the original data size. While it does mean parsing this properly occurs outside the XML parser, it's no different than e.g. Timestamp or the other custom types in this respect, and it can map directly to the map/hash implementation of the language in use.
Using ZVI as an example, this attaches a huge quantity of key-value pairs to each plane. Currently this is all global metadata, but a big file can end up with many thousands of values, and this will allow them to be attached, essentially verbatim, exactly where they belong in the same place as the original format (well, translated from binary to text, but essentially identical). And for formats which store the metadata as plain text, the metadata can be copied directly by the reader.
comment:10 Changed 11 years ago by jamoore
What do you mean verbatim? What format are they in the file? What other file formats have this type of format? Will we support all of them? Is 1000x10 (10k) really that much worse than 1000x2 (2k) for the benefit of a parseable format? We might need to move this to voice to not annoy everyone ;)
comment:11 Changed 10 years ago by ajpatterson
- Resolution set to fixed
- Status changed from accepted to closed
comment:12 Changed 10 years ago by ajpatterson
- Milestone changed from 5.x to 5.1.0-m1
Now updated to not use XML but define a separator and terminator instead.
see https://github.com/qidane/bioformats/compare/timestamps...map-annotation