SearchWP Xpdf Integration
The ability to extract plain text from PDF files that have been submitted to your WordPress website is a function that is exclusive to SearchWP. SearchWP attempts to perform this out of the box using simply PHP; but, due to the complexity and diversity of the PDF format, this can occasionally result in the content not being effectively extracted. Enter Xpdf.
By utilising the Xpdf Integration Extension, you are able to offload all of the work that PHP has to perform in order to process your PDF files to the command line tools provided by Xpdf. These tools are incredibly quick and accurate when it comes to removing material from your PDFs. After you have made the Extension active, you will need to proceed with the installation by following the on-screen instructions. After it has been installed, SearchWP will delegate the task of extracting content from PDF files to Xpdf.
Installing Xpdf command line tools
You will be able to use Xpdf to extract the content from your PDFs if you make use of this extension.
It is extremely important to note that the Xpdf command line tools are not included in this download for the Extension. You are need to follow these procedures in order to download the command line tools and then upload them to a place that is not public and is located outside of your Web root.
After you have downloaded the command line tools for your server, you should do the following:
- The xpdf-tools-linux-4.03.tar.gz archive should be extracted (the version number may be different)
- After extracting the files, navigate to the bin32 or bin64 directory. Depending on the architecture of your server, upload the pdftotext binary to a location that is not accessible to the public and is outside of your Web root directory.
- After extracting the files, navigate to the bin32 or bin64 directory. Depending on the architecture of your server, upload the pdfinfo binary to a place that is not accessible to the public and is outside of your Web root directory.
- Ensure that the PHP user on your server has the ability to run pdftotext and pdfinfo, and that they both have the appropriate permissions.
The final step is to inform SearchWP Xpdf Integration of the location where pdftotext and pdfinfo were installed on your computer. To accomplish this:
You will need to replace /path/to/pdftotext with the actual path to the pdftotext and pdfinfo binaries (not the folder) on your server when you add the following to your SearchWP Customizations plugin.
Manually Testing Xpdf Integration
Once you have uploaded and activated the Xpdf Integration Extension as well as defined your route to pdftotext, you will be able to manually verify that the Xpdf text extraction is functioning as planned on certain PDFs that have been submitted to your Media library. To begin, navigate to the SearchWP Settings page (Settings > SearchWP) and look for the link labelled Xpdf Integration in the Extensions section of the SearchWP settings screen. Click on this link.