Python Script: PDF Extract
While playing around with a couple of other scripts, I got this idea that I wanted to incorporate extracting data from PDFs. Nothing fancy here, just a recursive search for PDFs, we're extracting the text, and we're writing it out to a text file: output.txt
#!/usr/bin/python3
import glob
import PyPDF2
folder_path = './'
for filename in glob.iglob(folder_path + '**/*.pdf', recursive=True):
file = open(filename, 'rb')
pdfReader = PyPDF2.PdfFileReader(file, strict=False)
pageObj = pdfReader.getPage(0)
f1=open('./output.txt', 'a+')
f1.write(pageObj.extractText())
f1.close()