Warning: Can't synchronize with repository "(default)" (/home/git/ome.git does not appear to be a Git repository.). Look in the Trac log for more information.
Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Task #11894 (new)

Opened 10 years ago

Last modified 8 years ago

Review 256-char-limit on OMERO model text properties

Reported by: jburel Owned by: mtbcarroll
Priority: major Milestone: OME-Files
Component: Model Version: 4.4.9
Keywords: n.a. Cc: ux@…, jballanco-x, cxallan
Resources: n.a. Referenced By: n.a.
References: n.a. Remaining Time: n.a.
Sprint: n.a.

Description

Currently model has no limit for channel's name. The db limits to 256 characters.
This leads to error like #11893
256 length seems reasonable. That is also the limit we have for object's name e.g. image.

Change History (27)

comment:1 Changed 10 years ago by jburel

  • Milestone changed from 5.x to 5.1.0

This is an easy model fix that should be done in 5.1

comment:2 Changed 9 years ago by jamoore

  • Cc ajpatterson added
  • Owner changed from ajpatterson to mtbcarroll

Perhaps worth a quick thought about all of our "name" attributes?

comment:3 Changed 9 years ago by mtbcarroll

Perhaps, but I doubt I'll be changing the OME XML object model for 5.1.0!

comment:4 Changed 9 years ago by jamoore

No indeed. :) I was more thinking if everyone was still agreed that a length of 256 for (display)name etc. was still ok. I.e. DB changes only at this point, and this then moves on to being a model change that we have to decide on.

comment:5 Changed 9 years ago by mtbcarroll

  • Cc ux@… added; jamoore wamoore removed

It seems perfectly reasonable to me -- I think it counts in Unicode code points rather than bytes -- but those who know more about microscopy may differ.

comment:6 Changed 9 years ago by mtbcarroll

Does anything need doing here for 5.1.0?

comment:7 Changed 9 years ago by jburel

cf. discussion on Tuesday about map annotation and length.
256 was not enough. We could only have a db chance this time, and model as suggested by Josh for 5.2
The change should not be limited to channel only

comment:8 Changed 9 years ago by mtbcarroll

  • Milestone changed from 5.1.0 to 5.2.0-m1

Should review string property limits in general. User and group names needn't be enormous, but perhaps Hibernate annotation generation should default to length=4096 for most properties.

comment:10 Changed 9 years ago by jamoore

  • Milestone changed from 5.2.0 to OMERO-5.2.0

Splitting due to milestone decoupling.

comment:11 Changed 9 years ago by mtbcarroll

To make a start on aligning DB text columns with the OME-XML schema I have created https://docs.google.com/spreadsheets/d/12SezN7oLUbxb7QPKM6Z8PaEXtWS8SxuIOd66-Eobdm8/edit which includes the DB text columns that I suspect may interestingly appear in OME-XML XSDs.

comment:12 Changed 9 years ago by mtbcarroll

One proposal just to throw out there out of curiosity: Why not just make all the text properties TEXT in the OMERO DB? Why limit the lengths in either the DB or OME-XML? Do we get storage benefits in PostgreSQL beyond what I'd expect? Or is the benefit coming more from the OME-XML (or UI) side?

comment:13 follow-up: Changed 9 years ago by jamoore

One reason might be the arbitrary 4000 char limit in Oracle. Perhaps a brief discussion with JB? Otherwise, no objections from me.

comment:14 Changed 9 years ago by mtbcarroll

  • Cc jballanco-x added; ajpatterson removed

comment:15 in reply to: ↑ 13 Changed 9 years ago by jballanco-x

Replying to jamoore:

One reason might be the arbitrary 4000 char limit in Oracle. Perhaps a brief discussion with JB? Otherwise, no objections from me.

Well, if we sub NCLOBs for the current VARCHAR2s, then the limit increases to 128 TB, which I hope should be sufficient!

More concerning is the potential impact on storage. Based on past experience, subbing TEXT fields for VARCHARs can have non-negligible impact on on-disk layout for the tables involved. From the spreadsheet it seems like the majority of the columns that are being proposed to switch to TEXT are "description", which makes me think a better solution might be to break "description" out as its own entity.

comment:16 follow-up: Changed 8 years ago by mtbcarroll

Thanks! https://wiki.postgresql.org/wiki/TOAST looks rather encouraging TEXT-wise and http://www.postgresql.org/docs/9.4/static/storage-toast.html has information too. Is the past experience an Oracle-side issue where even shorter TEXT values aren't staying in-row?

comment:17 in reply to: ↑ 16 Changed 8 years ago by jballanco-x

Actually my past experience is related to MySQL (though I'll do a bit of research to see what Oracle has to say as well). Specifically, from http://dev.mysql.com/doc/refman/5.7/en/blob.html this] page:

  • Instances of BLOB or TEXT columns in the result of a query that is processed using a temporary table causes the server to use a table on disk rather than in memory because the MEMORY storage engine does not support those data types (see Section 8.4.4, “How MySQL Uses Internal Temporary Tables”). Use of disk incurs a performance penalty, so include BLOB or TEXT columns in the query result only if they are really needed. For example, avoid using SELECT *, which selects all columns.

So even though the tables themselves are packed efficiently (similar to Postgres), the separate storage requirements for large text lead to degraded performance on queries...

...IF those queries are doing select * ... or otherwise including the text fields when they're not needed (which is the situation I found myself in the past). So long as our ORM is well behaved and we're careful with the custom queries we write, then we should be ok. The suggestion to put the text fields in a separate table is just a "best practice" I've followed in the past that makes it more difficult to accidentally write a query against a table containing a TEXT column that degrades performance.

Version 1, edited 8 years ago by jballanco-x (previous) (next) (diff)

comment:18 Changed 8 years ago by jamoore

Hibernate will definitely actively load those fields unless they are a subobject (like Maps)

comment:19 Changed 8 years ago by jburel

  • Milestone changed from OMERO-5.2.1 to OMERO-5.2.2

Milestone OMERO-5.2.1 deleted

comment:20 Changed 8 years ago by jburel

  • Milestone changed from OMERO-5.2.2 to OMERO-5.2.1

Milestone OMERO-5.2.2 deleted

comment:21 Changed 8 years ago by mtbcarroll

Following further discussion the plan is to go ahead and make the change to TEXT in the expectation that the impact on Oracle won't be terrible; the PostgreSQL anecdotes generally look good. In the future we should watch out for any impact on queries (we don't want to often be eagerly fetching enormous values) in case we do have to somehow break description, etc. out more separately.

comment:22 Changed 8 years ago by jburel

  • Milestone changed from OMERO-5.2.2 to OME-Files

comment:23 Changed 8 years ago by mtbcarroll

A larger issue may be for clients: accidentally retrieving en masse, even attempting to display, what may be truly huge strings.

comment:24 Changed 8 years ago by mtbcarroll

At the moment giving a dataset a really long name in Insight has it report an exception: DSAccessException: Cannot access data. Cannot update the object.

comment:25 Changed 8 years ago by mtbcarroll

In glancing through the model the most obvious-to-me candidates for switching to TEXT are the name property of Annotation, Dataset, Folder, Image, LogicalChannel, OriginalFile, Plate, PlateAcquisition, Project, Reagent, RenderingDef, Roi, Screen, StageLabel. Also, the properties ImportJob.imageName, ImportJob.imageDescription, Screen.protocolDescription, Screen.reagentSetDescription. Does this proposal seem okay for OMERO 5.3?

comment:27 Changed 8 years ago by mtbcarroll

  • Summary changed from Db model difference Channel's name to Review 256-char-limit on OMERO model text properties

Opened https://github.com/openmicroscopy/openmicroscopy/pull/4882 to fix the more pressing cases, leaving remaining properties for subsequent review.

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.150607 sec.)

We're Hiring!