PHP app needs to display .pdf file in browser with highlighted word(s)

52 views Asked by At

For years I have wanted my PHP app to display a .pdf file in a browser, any browser, that will go directly to a specified page number and highlight words that are passed in as arguments.

To date, I can display the file, go to the page number, but I have never been successful in getting the word(s) to be highlighted.

This is the string:

window.open('https://Filename.pdf#page='+pg#+'&search="'+word+'"','_blank');

I've tried to keep up with browser enhancements over those years, but have not, to this point, found out how I can successfully send a word, or list of words, and have them highlighted.

(I have to pre-populate the ^F search in the browser's PDF Reader with the word, and then once there, click the magnifying glass to see them).

Has this been solved yet?

1

There are 1 answers

0
K J On

There is no guarantee that any browser can search in a PDF, as it's secured against keyboard entry and many PDF readers will not allow external access.

PDF.js is not a normal embedded system binary, so the external worker.js application/pdf has access to the external PDF without compromise.

Therefore the following program / code line will work in Firefox or other unencumbered PDF.js variants.

https://www.w3.org/TR/1998/REC-html40-19980424/html40.pdf#page=60&search=search

enter image description here

In the Adobe .pdf#fragment pilot application (authors Acrobat DC), it has a really slow triggering of the search function. However it does not highlight the result on the page, but seeks all matching entries.

enter image description here

Most other PDF renderers including "Powered by Acrobat" (even in same browser) in Chromium based browsers will ignore the search phrase. In fact, Adobe do not even respect the .pdf#page= redirection in MS Edge.

enter image description here

It was always declared an option not a mandate and in effect document wise depreciated by Adobe as not a standard. However they are included in the ISO standard under "Annex O (normative) Fragment identifiers" and search is a word list not a phrase. described as:

Open the document and search for one or more words, selecting the first matching word in the document. The wordList argument defines the search words and shall be a string enclosed within quotation marks comprised of individual words separated by space characters. Note 1: Some browsers require that the space characters be encoded appropriately for a URI.

Thus more correctly the following should be used, however note that it does NOT select the first word on this page, nor navigate to page 6. Which should be the "first matching word in the document"! NOR does the search return all the occurrences, as manually "search" alone, internally is a much larger result.

https://www.w3.org/TR/1998/REC-html40-19980424/html40.pdf#search="search%20words"

enter image description here

Correction, it will work in Acrobat DC, if it is one word not two!

enter image description here