Notice: In order to edit this ticket you need to be either: a Product Owner, The owner or the reporter of the ticket, or, in case of a Task not yet assigned, a team_member"

Task #11980 (new)

Opened 7 years ago

Last modified 5 years ago

Extract text from large PDFs for indexing

Reported by: jballanco-x Owned by: jballanco-x
Priority: major Milestone: Asynchronous
Component: Search Version: 4.4.10
Keywords: search, full text indexing Cc:
Resources: n.a. Referenced By: n.a.
References: n.a. Remaining Time: n.a.
Sprint: n.a.

Description

A PDF file may contain multiple images, causing it to exceed the large-file-size rejection limit (see #11979), even though the text content of the file is below the limit. We need a mechanism for extracting text from such files and checking the text against the size limit independent of the parent file. If the text alone is below the cut-off, we should still index it.

Change History (8)

comment:1 Changed 7 years ago by jballanco-x

Referencing ticket #11936 has changed sprint.

comment:2 Changed 7 years ago by jballanco-x

  • Milestone changed from 5.0.1 to 5.0.2

comment:3 Changed 7 years ago by jballanco-x

Referencing ticket #11936 has changed sprint.

comment:4 Changed 7 years ago by jamoore

Referencing ticket #11936 has changed sprint.

comment:5 Changed 6 years ago by jamoore

  • Milestone changed from 5.1.0-m4 to 5.x

Pushing out.

comment:8 Changed 5 years ago by jamoore

  • Milestone changed from 5.x to Asynchronous

comment:6 Changed 5 years ago by jamoore

Referencing ticket #11936 has changed sprint.

comment:7 Changed 5 years ago by jamoore

Referencing ticket #11936 has changed sprint.

Note: See TracTickets for help on using tickets. You may also have a look at Agilo extensions to the ticket.

1.3.13-PRO © 2008-2011 Agilo Software all rights reserved (this page was served in: 0.84798 sec.)

We're Hiring!