DROID and Siegfried Flashcards
Go into greater detail about both of these tools.
What is…
PRONOM
- PRONOM is a digital preservation technical registry. It is maintained by The National Archives, UK, though it’s core architecture is little changed since 2004.
- PRONOM’s use in the community is to be a centralised service for file format signatures.
- For every file format that can be identified through the tools that use PRONOM’s signatures, a unique identifier is also given to the user.
What is a…
PUID
PRONOM Unique Identifier (PUID) which are assigned to all formats in the PRONOM registry. There are two primary types, fmt and x-fmt. The latter is the result of a historical error when x-fmt identifiers were made available to the public. A subsequent decision to maintain x-fmt was made in favour of continuity as a standard. There is no longer a semantic difference between identifier types – that is the x- is no longer experimental, it is equivalent to the other type.
What are…
PRONOM web services
PRONOM delivers signature files to tools via web services. DROID for example will first use a web-service to check for new signatures. If they exist it will then communicate with a second web service to download those signatures in the form of a ‘signature file’. A second type of signature file, Container Signatures, are downloaded via more traditional web based techniques utilizing a web-page’s Last-modified date, to seek new data.
What is…
PRONOM XML
PRONOM can be accessed via XML making it possible to download and remix. The links look like:
http: //www.nationalarchives.gov.uk /PRONOM/fmt/{no}
http: //www.nationalarchives.gov.uk /PRONOM/x-fmt/{no}
What is a…
DROID Signature File
- A DROID signature file is an XML file that contains a snapshot of PRONOM in its current state.
- Split into two, or three sections (for container signatures), the signature file’s two main components are a list of file formats and metadata, e.g. format MIMEType, and then a mapping to a list of signatures.
- A container signature file contains a third section of ‘trigger PUIDs’ that is, PUIDs that trigger container identification when a match is found.
What is a…
PRONOM Release
A PRONOM release happens when a publishing job is run by The National Archives, UK. Importantly, the draft information in the database is published onto the web, and a signature file is created via database stored procedure and uploaded to a location where it can be accessed via web service.
What are…
PRONOM Release Notes
The PRONOM release notes are released in XML form and are available from the PRONOM index page on the web. Each release it summaries in terms of:
- New Records: New records for file formats that now have PUIDs
- Updated records: Format records in PRONOM that have had their information updated in some way, including signature changes
- New Signatures: File formats that now have signatures associated with them and can be identified via PRONOM
What is…
pronom
@
nationalarchives.gsi.gov.uk
The email address to send format requests to at The National Archives, UK.
What is…
DROID-list Google Group
An open community that is a good first place to start for discussing new file format signatures for PRONOM. Being open, folks are invited to contribute to other’s identification issues. Signatures can be shared and the workload in fixing them shared too. PRONOM development is aided when there is as much information as possible about a file format and its potential signature. This work would all have to be done by their developers otherwise.
What is…
DROID
DROID was the first client tool to make use of PRONOM signatures. The tool can be pointed at a directory, or directories of files to recurse. The files are then matched against the signatures in the signature file. DROID will return a PUID only for those that do match. For all files DROID outputs other metadata including last-modified date, and checksum if selected.
What is…
Fido
Fido was the second client tool to make use of a subset of the PRONOM signatures. Fido was created in Python and utilized traditional regular expressions to match file formats with signatures. This meant converting the PRONOM signatures into a format that could be understood by a standard regular expression matching engine. Fido is used in Archivematica and is still maintained as part of the Open Preservation Foundations stewardship.
What is…
Siegfried
Siegfried is a more current implementation of a DROID-like tool and utilizes all of the signature information available to DROID. Siegfried uses a different matching algorithm. It will return equivalent metadata. Siegfried has a number of strengths.
- It is the first to use more sources of file format signatures, including a type of signature from BSD FreeDesktop, and a set of signatures from the Library of Congress
- Siegfried is primarily command line based making it easier to integrate with workflows
- Siegfried is also open source like DROID
What is…
Brunnhilde
Brunnhilde is a reporting companion tool for Siegfried created by Tim Walsh. Brunnhilde is part of BitCurator implementations and also integrates reports from sources such as ClamAV virus checker.
What is…
Roy
Roy is a utility created alongside Siegfried that allows users to customize signature files and Siegfried’s capability with those signature files. For example, it is possible to create custom offsets to match against, or to limit the number of formats with signatures in the signature file, e.g. only image formats for digitization workflows.
What is an…
Offset
Offsets are important to the functionality of a signature, that is, where in a file will certain byte patterns (signature patterns) are expected to be found. DROID and Siegfried both offer customisations which limit the size of an offset. These customisations can be used to speed up format identification e.g. by scanning less data a scan can finish quicker, but this has its trade-offs.