Task #11894 (new)
Opened 11 years ago
Last modified 8 years ago
Review 256-char-limit on OMERO model text properties
Reported by: | jburel | Owned by: | mtbcarroll |
---|---|---|---|
Priority: | major | Milestone: | OME-Files |
Component: | Model | Version: | 4.4.9 |
Keywords: | n.a. | Cc: | ux@…, jballanco-x, cxallan |
Resources: | n.a. | Referenced By: | n.a. |
References: | n.a. | Remaining Time: | n.a. |
Sprint: | n.a. |
Description
Currently model has no limit for channel's name. The db limits to 256 characters.
This leads to error like #11893
256 length seems reasonable. That is also the limit we have for object's name e.g. image.
Change History (27)
comment:1 Changed 11 years ago by jburel
- Milestone changed from 5.x to 5.1.0
comment:2 Changed 10 years ago by jamoore
- Cc ajpatterson added
- Owner changed from ajpatterson to mtbcarroll
Perhaps worth a quick thought about all of our "name" attributes?
comment:3 Changed 10 years ago by mtbcarroll
Perhaps, but I doubt I'll be changing the OME XML object model for 5.1.0!
comment:4 Changed 10 years ago by jamoore
No indeed. :) I was more thinking if everyone was still agreed that a length of 256 for (display)name etc. was still ok. I.e. DB changes only at this point, and this then moves on to being a model change that we have to decide on.
comment:5 Changed 10 years ago by mtbcarroll
- Cc ux@… added; jamoore wamoore removed
It seems perfectly reasonable to me -- I think it counts in Unicode code points rather than bytes -- but those who know more about microscopy may differ.
comment:6 Changed 10 years ago by mtbcarroll
Does anything need doing here for 5.1.0?
comment:7 Changed 10 years ago by jburel
cf. discussion on Tuesday about map annotation and length.
256 was not enough. We could only have a db chance this time, and model as suggested by Josh for 5.2
The change should not be limited to channel only
comment:8 Changed 10 years ago by mtbcarroll
- Milestone changed from 5.1.0 to 5.2.0-m1
Should review string property limits in general. User and group names needn't be enormous, but perhaps Hibernate annotation generation should default to length=4096 for most properties.
comment:9 Changed 9 years ago by mtbcarroll
Search for ": string" on http://www.openmicroscopy.org/site/support/omero5.2/developers/Model/EveryObject.html
comment:10 Changed 9 years ago by jamoore
- Milestone changed from 5.2.0 to OMERO-5.2.0
Splitting due to milestone decoupling.
comment:11 Changed 9 years ago by mtbcarroll
To make a start on aligning DB text columns with the OME-XML schema I have created https://docs.google.com/spreadsheets/d/12SezN7oLUbxb7QPKM6Z8PaEXtWS8SxuIOd66-Eobdm8/edit which includes the DB text columns that I suspect may interestingly appear in OME-XML XSDs.
comment:12 Changed 9 years ago by mtbcarroll
One proposal just to throw out there out of curiosity: Why not just make all the text properties TEXT in the OMERO DB? Why limit the lengths in either the DB or OME-XML? Do we get storage benefits in PostgreSQL beyond what I'd expect? Or is the benefit coming more from the OME-XML (or UI) side?
comment:13 follow-up: ↓ 15 Changed 9 years ago by jamoore
One reason might be the arbitrary 4000 char limit in Oracle. Perhaps a brief discussion with JB? Otherwise, no objections from me.
comment:14 Changed 9 years ago by mtbcarroll
- Cc jballanco-x added; ajpatterson removed
comment:15 in reply to: ↑ 13 Changed 9 years ago by jballanco-x
Replying to jamoore:
One reason might be the arbitrary 4000 char limit in Oracle. Perhaps a brief discussion with JB? Otherwise, no objections from me.
Well, if we sub NCLOBs for the current VARCHAR2s, then the limit increases to 128 TB, which I hope should be sufficient!
More concerning is the potential impact on storage. Based on past experience, subbing TEXT fields for VARCHARs can have non-negligible impact on on-disk layout for the tables involved. From the spreadsheet it seems like the majority of the columns that are being proposed to switch to TEXT are "description", which makes me think a better solution might be to break "description" out as its own entity.
comment:16 follow-up: ↓ 17 Changed 9 years ago by mtbcarroll
Thanks! https://wiki.postgresql.org/wiki/TOAST looks rather encouraging TEXT-wise and http://www.postgresql.org/docs/9.4/static/storage-toast.html has information too. Is the past experience an Oracle-side issue where even shorter TEXT values aren't staying in-row?
comment:17 in reply to: ↑ 16 Changed 9 years ago by jballanco-x
Actually my past experience is related to MySQL (though I'll do a bit of research to see what Oracle has to say as well). Specifically, from this page:
- Instances of BLOB or TEXT columns in the result of a query that is processed using a temporary table causes the server to use a table on disk rather than in memory because the MEMORY storage engine does not support those data types (see Section 8.4.4, “How MySQL Uses Internal Temporary Tables”). Use of disk incurs a performance penalty, so include BLOB or TEXT columns in the query result only if they are really needed. For example, avoid using SELECT *, which selects all columns.
So even though the tables themselves are packed efficiently (similar to Postgres), the separate storage requirements for large text lead to degraded performance on queries...
...IF those queries are doing select * ... or otherwise including the text fields when they're not needed (which is the situation I found myself in the past). So long as our ORM is well behaved and we're careful with the custom queries we write, then we should be ok. The suggestion to put the text fields in a separate table is just a "best practice" I've followed in the past that makes it more difficult to accidentally write a query against a table containing a TEXT column that degrades performance.
comment:18 Changed 9 years ago by jamoore
Hibernate will definitely actively load those fields unless they are a subobject (like Maps)
comment:19 Changed 9 years ago by jburel
- Milestone changed from OMERO-5.2.1 to OMERO-5.2.2
Milestone OMERO-5.2.1 deleted
comment:20 Changed 9 years ago by jburel
- Milestone changed from OMERO-5.2.2 to OMERO-5.2.1
Milestone OMERO-5.2.2 deleted
comment:21 Changed 9 years ago by mtbcarroll
Following further discussion the plan is to go ahead and make the change to TEXT in the expectation that the impact on Oracle won't be terrible; the PostgreSQL anecdotes generally look good. In the future we should watch out for any impact on queries (we don't want to often be eagerly fetching enormous values) in case we do have to somehow break description, etc. out more separately.
comment:22 Changed 9 years ago by jburel
- Milestone changed from OMERO-5.2.2 to OME-Files
comment:23 Changed 8 years ago by mtbcarroll
A larger issue may be for clients: accidentally retrieving en masse, even attempting to display, what may be truly huge strings.
comment:24 Changed 8 years ago by mtbcarroll
At the moment giving a dataset a really long name in Insight has it report an exception: DSAccessException: Cannot access data. Cannot update the object.
comment:25 Changed 8 years ago by mtbcarroll
In glancing through the model the most obvious-to-me candidates for switching to TEXT are the name property of Annotation, Dataset, Folder, Image, LogicalChannel, OriginalFile, Plate, PlateAcquisition, Project, Reagent, RenderingDef, Roi, Screen, StageLabel. Also, the properties ImportJob.imageName, ImportJob.imageDescription, Screen.protocolDescription, Screen.reagentSetDescription. Does this proposal seem okay for OMERO 5.3?
comment:26 Changed 8 years ago by mtbcarroll
- Cc cxallan added
comment:27 Changed 8 years ago by mtbcarroll
- Summary changed from Db model difference Channel's name to Review 256-char-limit on OMERO model text properties
Opened https://github.com/openmicroscopy/openmicroscopy/pull/4882 to fix the more pressing cases, leaving remaining properties for subsequent review.
This is an easy model fix that should be done in 5.1