Warning: Can't synchronize with repository "(default)" (/home/git/ome.git does not appear to be a Git repository.). Look in the Trac log for more information.
Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Task #11979 (closed)

Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

Reject large files from Indexing

Reported by: jballanco-x Owned by: jballanco-x
Priority: blocker Milestone: 5.0.0
Component: Search Version: 4.4.10
Keywords: n.a. Cc:
Resources: n.a. Referenced By: n.a.
References: n.a. Remaining Time: n.a.
Sprint: n.a.

Description

It's been observed that attempting to index very large files (size > heap) can cause the Lucene index for an image to become corrupted, such that further files/tags/other metadata added to the image after the large file is no longer searchable.

Until we can narrow the scope of why these indexes become corrupted or craft a work-around, we should reject large files from indexing in order to prevent indexes from becoming corrupted.

Change History (5)

comment:1 Changed 10 years ago by jballanco-x

First step in this task is to figure out exactly where the file-size cut-off should be. So far testing indicates that files larger than 296 MB with a 256 MB heap are too large. Further testing will attempt to discern if "large" is independent of heap size and/or total size of files already indexed.

We should endeavor to find a good default cut-off, but the exact cut-off will be end-user configurable.

comment:2 Changed 10 years ago by jballanco-x

Testing indicates that max file size varies with heap size. With a 256MB heap, a 126MB file indexes fine but a 256MB file causes index corruption. With a 512MB heap, the 256MB file is fine but a 512MB file is not. Setting half-heap-space as the default max for now while investigation continues as to exact cut-off.

comment:4 Changed 10 years ago by jballanco-x

  • Resolution set to fixed
  • Status changed from new to closed

PR merged. Ready for 5.0.0

comment:5 Changed 10 years ago by Josh Moore <josh@…>

(In [dac7d403a79109bf954f2c8ac25a90ad4b3fd8f3/ome.git] on branch develop) Merge pull request #2142 from jballanc/rebased/develop/limit-indexed-file-size

Limit max file size for FullTextParser (see #11979) (rebased onto develop)

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.66758 sec.)

We're Hiring!