Tuesday, 3 March 2009

JHove Batch Processing

What is JHove?

JHove is a java utility that can be used to identify document types. It is used extensively in the digital preservation field.

The problem

At work I was set the task of getting JHove to batch process all files in a folder. It does individual documents correctly using the GUI however there were no examples of it being used through the command line on a batch of files. After looking at it for a while it seemed that we would have to build a separate application in order to call JHove for each file in a folder.

Finally we worked out how to do batch processing by using JHove alone.

Here is the batch file we created

java -jar c:\jhove\bin\JHoveApp.jar -o c:\output.xml -h audit -c C:\jhove\conf\jhove.conf "C:\Documents and Settings\DigitalArchives\Desktop\testjhove"


Send the results to c:\output.xml
-o c:\output.xml

The type of output you what to have. You need to set it to 'audit' in order to do batch processing
-h audit

Use the configuration file c:\jhove\conf\jhove.conf (This is the default config file and it can be changed to specify which types of files you want to test for)

-c c:\jhove\conf\jhove.conf

The folder that contains the files to audit
"C:\Documents and Settings\DigitalArchives\Desktop\testjhove"


Thing to improve the process are as follows

  • Allow multiple folder to audit
  • Create a front end for the process that will allow you to pick a folder to audit and choose an output file.

Please let me know if you are using JHove and how you are using it.


Gary McGath said...

You can also drag a folder to the JHOVE GUI window, and it will process all the files in the folder.

Sony said...

@gary thanks for your comment. I didn't know you could do that. I'll try it out tomorrow.

Yvonne said...

Thank you Gary that helped a lot! very convinient :-)