Context Navigation

Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Task #10930 (new)

Opened 6 years ago

Last modified 4 years ago

Arrange Lucene index better for multiply hashed files

Reported by:	mtbcarroll	Owned by:
Priority:	minor	Milestone:	Unscheduled
Component:	Search	Version:	n.a.
Keywords:	n.a.	Cc:	server@…
Resources:	n.a.	Referenced By:	n.a.
References:	n.a.	Remaining Time:	n.a.
Sprint:	n.a.

Description

In the longer term for OMERO 5.x, hash algorithms for files may be changed and new hashes thus generated. A Lucene index might end up with multiple hashes noted for the same file.

FullTextBridge.handleFileAnnotation's approach won't work for multiple algorithms because in search the appropriate file.hasher and file.hash value pairs can't be matched. Perhaps a rather better approach to indexing would be something like,

if (file.getHasher() != null && file.getHash() != null) {
    add(document, "file.hash." + file.getHasher().getValue(), file.getHash(), opts);
}

References

Change History (3)

comment:1 Changed 4 years ago by mtbcarroll

Cc server@… added; omero-team@… niko@… removed
Version set to OMERO-5.1.3

comment:2 Changed 4 years ago by mtbcarroll

Version OMERO-5.1.3 deleted

comment:3 Changed 4 years ago by jamoore

The Lucene document is rewritten each time. Only the one hash stored in the DB will be searchable until there's a DB upgrade.

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

Download in other formats: