Task #897 (closed)
Opened 11 years ago
Closed 11 years ago
Leading Wildcard Searching
| Reported by: | cxallan | Owned by: | jamoore |
|---|---|---|---|
| Priority: | major | Milestone: | 3.0-Beta3 |
| Component: | Search | Version: | n.a. |
| Keywords: | REVIEW, performance, search, lucene | Cc: | jamoore, atarkowska, jburel |
| Resources: | n.a. | Referenced By: | n.a. |
| References: | n.a. | Remaining Time: | n.a. |
| Sprint: | n.a. |
Description
It is currently impossible to do leading wildcard searches such as "*term" with the current search API. A method should be added to ISearch that allows leading wildcard processing to be turned on and off.
Change History (7)
comment:1 Changed 11 years ago by cxallan
- Owner changed from sfrank to callan
- Status changed from new to assigned
comment:2 Changed 11 years ago by jmoore
- Owner changed from callan to jmoore
- Status changed from assigned to new
r2327 modifies the API and adds a test
comment:3 Changed 11 years ago by jmoore
- Status changed from new to assigned
From Ola:
When you search by text "*"
query = "*"
def searchImages (self, query=None, created=None):
search = self.createSearchService()
search.onlyType('Image')
search.addOrderByAsc("name")
if query:
search.setAllowLeadingWildcard()
search.byFullText(str(query))
if search.hasNext():
for e in search.results():
yield ImageWrapper(self, e, cache)
2008/04/29 14:46 +0100 [-] [OMERO.blitz] 3354191 [l.Server-0]
WARN ome.services.util.ServiceHandler - Unknown exception
thrown.
2008/04/29 14:46 +0100 [-] [OMERO.blitz]
2008/04/29 14:46 +0100 [-] [OMERO.blitz]
org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount
is set to 1024
2008/04/29 14:46 +0100 [-] [OMERO.blitz] at
org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:165)
2008/04/29 14:46 +0100 [-] [OMERO.blitz] at
org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:156)
2008/04/29 14:46 +0100 [-] [OMERO.blitz] at
org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:63)
comment:4 Changed 11 years ago by jmoore
This basically tries to expand to every term in the index. I added a check to prevent it.
See:
- Google: lucene maxClauseCount 1024 wildcard
- Lucene Faq
- Good thread on the issue, specifically this which implies that we might should even have a minimum character limit > 2 for wildcard searches. Quote:
"(3) so what's the deal with maxClauseCount? If you have a big index, with lots of terms then a sufficiently general prefix/wildcard can be rewritten into a really honking big BooleanQuery, which can take up a lot of RAM (for all of those TermQueries and TermWeights and TermSCorerers) and can take a lot of time to execute. If you've got gobs abd gobs of RAM, and don't care how long your queries take, then set the maxClauseCount to MAX_INT and forget about. maxClauseCount is just there as a safety valve to protect you."
comment:5 Changed 11 years ago by jmoore
comment:6 Changed 11 years ago by jmoore
comment:7 Changed 11 years ago by jmoore
- Keywords REVIEW performance search lucene added
- Resolution set to fixed
- Status changed from assigned to closed
Leading wildcard searching is now fully supported (with appropriate ApiUsageExceptions, etc.) We will need to keep an eye out for the performance. Adding REVIEW to keywords.
Initial version in r2308 from Aleksandra's patch.