Tuesday 3 March 2009

JHove Batch Processing


What is JHove?

JHove is a java utility that can be used to identify document types. It is used extensively in the digital preservation field.

The problem

At work I was set the task of getting JHove to batch process all files in a folder. It does individual documents correctly using the GUI however there were no examples of it being used through the command line on a batch of files. After looking at it for a while it seemed that we would have to build a separate application in order to call JHove for each file in a folder.

Finally we worked out how to do batch processing by using JHove alone.

Here is the batch file we created


java -jar c:\jhove\bin\JHoveApp.jar -o c:\output.xml -h audit -c C:\jhove\conf\jhove.conf "C:\Documents and Settings\DigitalArchives\Desktop\testjhove"


Parameters:-


Send the results to c:\output.xml
-o c:\output.xml


The type of output you what to have. You need to set it to 'audit' in order to do batch processing
-h audit


Use the configuration file c:\jhove\conf\jhove.conf (This is the default config file and it can be changed to specify which types of files you want to test for)

-c c:\jhove\conf\jhove.conf


The folder that contains the files to audit
"C:\Documents and Settings\DigitalArchives\Desktop\testjhove"


Improvements

Thing to improve the process are as follows

  • Allow multiple folder to audit
  • Create a front end for the process that will allow you to pick a folder to audit and choose an output file.

Please let me know if you are using JHove and how you are using it.

3 comments:

Gary McGath said...

You can also drag a folder to the JHOVE GUI window, and it will process all the files in the folder.

Sony said...

@gary thanks for your comment. I didn't know you could do that. I'll try it out tomorrow.

Yvonne said...

Thank you Gary that helped a lot! very convinient :-)
Best,
Yvonne