Monday, January 19, 2015

Ensuring long-term access: PDF validation with JHOVE?

Ensuring long-term access: PDF validation with JHOVE? Yvonne Friese. ZBW - Leibniz Information Centre for Economics.  PDF Association. December 17, 2014.
JHOVE is an open source tool for identifying, characterizing and validating twelve common formats such as pdf, tiff, jpeg, aiff and wave.  Pages within a PDF file are usually stored as a page tree, allowing the user to reach a given page as quickly as possible. Common advice for long-term archiving is to preferentially use the PDF/A format. However, this no longer matches to the day-to-day reality of many workflows which use JHOVE for validation tests. The differences between PDF and PDF/A means that there there can be validation errors. JHOVE’s PDF module is certainly capable of validating PDF/A files but the feature does not work well.  The process does not analyze the content of the data streams, meaning that it cannot validate PDF/A compliance in line with ISO standards. JHOVE is not suited to PDF/A validation but there currently are no alternatives to JHOVE for validating standard PDFs.

JHOVE can still be useful, provided users understand its error reports and are aware of ways to resolve them. Even with the problems JHOVE remains an excellent option for providing initial guidance.

[In our own institution, we have found JHOVE to be useful in identifying PDF files that have potential problems. Each problem for each source needs to be examined to decide if there is a preservation risk.]

No comments: