Task #10930

Opened 11 years ago

Last modified 9 years ago

Arrange Lucene index better for multiply hashed files

Reported by: mtbcarroll
Priority: minor Milestone: Unscheduled
Component: Search Version: n.a.
Keywords: n.a. Cc: server@…
Resources: n.a. Referenced By: n.a.
References: n.a. Remaining Time: n.a.
Sprint: n.a.


In the longer term for OMERO 5.x, hash algorithms for files may be changed and new hashes thus generated. A Lucene index might end up with multiple hashes noted for the same file.

FullTextBridge.handleFileAnnotation's approach won't work for multiple algorithms because in search the appropriate file.hasher and file.hash value pairs can't be matched. Perhaps a rather better approach to indexing would be something like,

if (file.getHasher() != null && file.getHash() != null) {
    add(document, "file.hash." + file.getHasher().getValue(), file.getHash(), opts);

Change History

comment:1 Changed 9 years ago by mtbcarroll

  • Cc server@… added; omero-team@… niko@… removed
  • Version set to OMERO-5.1.3

comment:2 Changed 9 years ago by mtbcarroll

  • Version OMERO-5.1.3 deleted

comment:3 Changed 9 years ago by jamoore

The Lucene document is rewritten each time. Only the one hash stored in the DB will be searchable until there's a DB upgrade.

