Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Task #10930 (new)

Opened 7 years ago

Last modified 4 years ago

Arrange Lucene index better for multiply hashed files

Reported by: mtbcarroll Owned by:
Priority: minor Milestone: Unscheduled
Component: Search Version: n.a.
Keywords: n.a. Cc: server@…
Resources: n.a. Referenced By: n.a.
References: n.a. Remaining Time: n.a.
Sprint: n.a.

Description

In the longer term for OMERO 5.x, hash algorithms for files may be changed and new hashes thus generated. A Lucene index might end up with multiple hashes noted for the same file.

FullTextBridge.handleFileAnnotation's approach won't work for multiple algorithms because in search the appropriate file.hasher and file.hash value pairs can't be matched. Perhaps a rather better approach to indexing would be something like,

if (file.getHasher() != null && file.getHash() != null) {
    add(document, "file.hash." + file.getHasher().getValue(), file.getHash(), opts);
}

Change History (3)

comment:1 Changed 4 years ago by mtbcarroll

  • Cc server@… added; omero-team@… niko@… removed
  • Version set to OMERO-5.1.3

comment:2 Changed 4 years ago by mtbcarroll

  • Version OMERO-5.1.3 deleted

comment:3 Changed 4 years ago by jamoore

The Lucene document is rewritten each time. Only the one hash stored in the DB will be searchable until there's a DB upgrade.

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.131522 sec.)

We're Hiring!