https://medium.com/@debusinha2009/processing-pdf-data-with-apache-pdfbox-and-apache-spark-at-scale-on-databricks-85b4f8daee78