Warning: Can't synchronize with repository "(default)" (/home/git/ome.git does not appear to be a Git repository.). Look in the Trac log for more information.
Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

User Story #3164 (closed)

Opened 13 years ago

Closed 10 years ago

Last modified 5 years ago

BUG: Searching returns no results for wildcard searches

Reported by: atarkowska Owned by: jamoore
Priority: blocker Milestone: 5.0.3
Component: Services Keywords: n.a.
Cc: jrswedlow, java@…, wmoore Story Points: n.a.
Sprint: n.a. Importance: n.a.
Total Remaining Time: n.a. Estimated Remaining Time: n.a.

Description (last modified by jmoore)

This is caused by the FullTextAnalyzer (ticket:1010) not being used for wildcard searches:

There's a proposed workaround in lucene-misc-2.4.1.jar called the AnalyzingQueryParser which does pass the search string to the analyzer, even with wildcards. To use it, however, we may need more investigation. (Especially the JIRA link above illustrates some of the issues one can run into)

Extending Ola's test with the following tests:

        texts = ("*earch", "*h", "search tif", "search",\
                 "test", "tag", "t*", "search_test",\
                 "*test*.tif", "search*tif", "s .tif",\
                 ".tif", "tif", "*tif",\
                 "s*.tif", "*.tif")

I see the following terms fail for the default QueryParser:

  • *.tif
  • search*tif
  • s*.tif
  • *test*.tif


For the new AnalyzingQueryParser:

  • *earch
  • *h
  • search*tif
  • s*.tif
  • *test*.tif

So, we can get "*.tif" back, but at the cost of "*earch" and "*h". With further investigation, we can probably come up with something that makes each of these cases pass, but other searches may then start to fail.

Possibly related is #1011 which would not use an analyzer at all on some fields like Image.name so that the underscores in "search_test_1.tif" don't get removed.

I'll commit the extended test and the lucene-misc jars and we can discuss further.

Update

This issue is apparently not only restricted to leading wildcards, but other forms of wildcard searches. Moving to 4.3 for review.

Matching: test-project-a-b-c
=============================================
                         Query Found  Ok?
                          test    21 GOOD
                  test-project    21 GOOD
                 test\-project    21 GOOD
                         test-    21 GOOD
                 test-project-    21 GOOD
               test\-project\-    21 GOOD
                         test*    21 GOOD
                 test-project*     0 FAIL
                test\-project*     0 FAIL
                        test-*     0 FAIL
                test-project-*     0 FAIL
              test\-project\-*     0 FAIL
                    name:test*    21 GOOD
            name:test-project*     0 FAIL
           name:test\-project*     0 FAIL
                    name:test*    21 GOOD
        name:test name:project    21 GOOD
                  test project    21 GOOD
                test* project*    21 GOOD
                test- project-    21 GOOD
              test-* project-*     0 FAIL
            test-project-a-b-c    21 GOOD
                         a-b-c    21 GOOD
                         a b c    21 GOOD
                            t*    21 GOOD
                            p*    21 GOOD
                            a*    21 GOOD
                            b*    21 GOOD
                            c*    21 GOOD
                         t* p*    21 GOOD
                         proj*    21 GOOD
                    tes* proj*    21 GOOD
                  tes*-project     0 FAIL
                    test-proj*     0 FAIL


Change History (19)

comment:1 Changed 13 years ago by atarkowska

  • Component changed from General to Services
  • Priority changed from minor to blocker

comment:2 Changed 13 years ago by atarkowska

(In [8385]) test, see #3164

comment:3 Changed 13 years ago by jmoore

  • Description modified (diff)
  • Summary changed from BUG: Searching returns no results to BUG: Searching returns no results for leading wildcard term

comment:3 Changed 13 years ago by jmoore

comment:4 Changed 13 years ago by jmoore

(In [8386]) More tests of wildcard searching; and lucene-misc (See #3164)

comment:6 Changed 13 years ago by atarkowska

I'm not sure if it was known, but in collaborative group I am able to use "*tif", etc. If I switch to read-only or private group no results is returned.

comment:7 Changed 13 years ago by jmoore

(In [8393]) Adding search.py to integration_suite.py (See #3164)

comment:8 Changed 13 years ago by jmoore

(In [8396]) Testing various groups and disabling broken strings (See #3164)

comment:9 Changed 13 years ago by jburel

  • Sprint changed from 2010-10-28 (18) to 2010-11-11 (19)

Moved from sprint 2010-10-28 (18)

comment:10 Changed 13 years ago by jmoore

  • Milestone changed from OMERO-Beta4.2.1 to Unscheduled
  • Sprint 2010-11-11 (19) deleted

Not making any code modifications to support this for 4.2.1. I've linked this under #2097 (4.2+ search fixes) and am moving to "unscheduled". Hopefully, we will have a large search review after big images.

comment:11 Changed 13 years ago by jmoore

  • Description modified (diff)
  • Milestone changed from Unscheduled to OMERO-Beta4.3

comment:12 Changed 13 years ago by jmoore

  • Description modified (diff)
  • Summary changed from BUG: Searching returns no results for leading wildcard term to BUG: Searching returns no results for wildcard searches

comment:13 Changed 13 years ago by jmoore

  • Description modified (diff)

comment:14 Changed 13 years ago by jmoore

  • Type changed from Task to User Story

This is going to take significant time and in fact may not be fixable in 4.3. Turning into a story.

comment:15 Changed 13 years ago by jmoore

  • Milestone changed from OMERO-Beta4.3 to Unscheduled

comment:16 Changed 10 years ago by jamoore

  • Cc jrswedlow java@… wmoore added
  • Milestone changed from Unscheduled to 5.0.3
  • Resolution set to fixed
  • Status changed from new to closed

Considering the improvements in wildcard handling in 5.0.3, I'm closing this. Of course, as as outlined in https://trello.com/c/INhtQu6q/21-search-tng there are still other improvements that can be made (via the analyzer and n-grams) but the base issues here are taken care of.

comment:17 Changed 10 years ago by pwalczysko

@jamoore Fine with closing this, because principally this works now. The only questions remaining here are the *.svs type queries, i.e. "wildcard immediately followed by a non-alphanumeric" problem. I understand that his is being hacked in clients atm to give "svs".

comment:18 Changed 10 years ago by dlindner

Yes, "*.svs" effectively does a search for "svs" (it's handled on the server).

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.69172 sec.)

We're Hiring!