Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Task #897 (closed)

Opened 11 years ago

Closed 11 years ago

Leading Wildcard Searching

Reported by: cxallan Owned by: jamoore
Priority: major Milestone: 3.0-Beta3
Component: Search Version: n.a.
Keywords: REVIEW, performance, search, lucene Cc: jamoore, atarkowska, jburel
Resources: n.a. Referenced By: n.a.
References: n.a. Remaining Time: n.a.
Sprint: n.a.

Description

It is currently impossible to do leading wildcard searches such as "*term" with the current search API. A method should be added to ISearch that allows leading wildcard processing to be turned on and off.

Change History (7)

comment:1 Changed 11 years ago by cxallan

  • Owner changed from sfrank to callan
  • Status changed from new to assigned

Initial version in r2308 from Aleksandra's patch.

comment:2 Changed 11 years ago by jmoore

  • Owner changed from callan to jmoore
  • Status changed from assigned to new

r2327 modifies the API and adds a test

comment:3 Changed 11 years ago by jmoore

  • Status changed from new to assigned

From Ola:

When you search by text "*"

query = "*"

def searchImages (self, query=None, created=None):
         search = self.createSearchService()
         search.onlyType('Image')
         search.addOrderByAsc("name")

         if query:
            search.setAllowLeadingWildcard()
            search.byFullText(str(query))
         if search.hasNext():
             for e in search.results():
                 yield ImageWrapper(self, e, cache)


2008/04/29 14:46 +0100 [-] [OMERO.blitz] 3354191    [l.Server-0]  
WARN           ome.services.util.ServiceHandler  - Unknown exception  
thrown.
2008/04/29 14:46 +0100 [-] [OMERO.blitz]
2008/04/29 14:46 +0100 [-] [OMERO.blitz]  
org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount  
is set to 1024
2008/04/29 14:46 +0100 [-] [OMERO.blitz]        at  
org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:165)
2008/04/29 14:46 +0100 [-] [OMERO.blitz]        at  
org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:156)
2008/04/29 14:46 +0100 [-] [OMERO.blitz]        at  
org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:63)

comment:4 Changed 11 years ago by jmoore

This basically tries to expand to every term in the index. I added a check to prevent it.

See:

"(3) so what's the deal with maxClauseCount?   If you have a big index,
with lots of terms then a sufficiently general prefix/wildcard can be
rewritten into a really honking big BooleanQuery, which can take up a lot
of RAM (for all of those TermQueries and TermWeights and TermSCorerers)
and can take a lot of time to execute.  If you've got gobs abd gobs
of RAM, and don't care how long your queries take, then
set the maxClauseCount to MAX_INT and forget about.  maxClauseCount is
just there as a safety valve to protect you."

comment:5 Changed 11 years ago by jmoore

comment:6 Changed 11 years ago by jmoore

comment:7 Changed 11 years ago by jmoore

  • Keywords REVIEW performance search lucene added
  • Resolution set to fixed
  • Status changed from assigned to closed

Leading wildcard searching is now fully supported (with appropriate ApiUsageExceptions, etc.) We will need to keep an eye out for the performance. Adding REVIEW to keywords.

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.78527 sec.)

We're Hiring!