JHove Batch Processing


What is JHove?

JHove is a java utility that can be used to identify document types. It is used extensively in the digital preservation field.

The problem

At work I was set the task of getting JHove to batch process all files in a folder. It does individual documents correctly using the GUI however there were no examples of it being used through the command line on a batch of files. After looking at it for a while it seemed that we would have to build a separate application in order to call JHove for each file in a folder.

Finally we worked out how to do batch processing by using JHove alone.

Here is the batch file we created


java -jar c:\jhove\bin\JHoveApp.jar -o c:\output.xml -h audit -c C:\jhove\conf\jhove.conf "C:\Documents and Settings\DigitalArchives\Desktop\testjhove"


Parameters:-


Send the results to c:\output.xml
-o c:\output.xml


The type of output you what to have. You need to set it to 'audit' in order to do batch processing
-h audit


Use the configuration file c:\jhove\conf\jhove.conf (This is the default config file and it can be changed to specify which types of files you want to test for)

-c c:\jhove\conf\jhove.conf


The folder that contains the files to audit
"C:\Documents and Settings\DigitalArchives\Desktop\testjhove"


Improvements

Thing to improve the process are as follows

  • Allow multiple folder to audit
  • Create a front end for the process that will allow you to pick a folder to audit and choose an output file.

Please let me know if you are using JHove and how you are using it.

Comments

Gary McGath said…
You can also drag a folder to the JHOVE GUI window, and it will process all the files in the folder.
Sony said…
@gary thanks for your comment. I didn't know you could do that. I'll try it out tomorrow.
Yvonne said…
Thank you Gary that helped a lot! very convinient :-)
Best,
Yvonne
Nice post. Thanks for the post.
also, check Java course in Pune

Popular Posts