Tools Flashcards
Learn about software and hardware tools that may be important to know in digital preservation.
What is…
PRONOM is a registry of file formats that is maintained by The National Archives, UK.
PRONOM delivers new file format information to a tool called DROID which it can use to identify files in collections and assign a unique identifier.
What is…
DROID is a tool that can be used to automatically identify a file’s format using ‘file format signatures’ that it downloads from PRONOM.
DROID assigns a unique identifier to a file format called a PUID (PRONOM Unique Identifier)
What is…
Nanite is a programming library that wraps DROID in a way that makes it possible for software developers to incorporate file format identification in their programs.
What is…
JHOVE is a tool that checks whether a file format accurately conforms to its specification, for example, it can check whether date formats used in certain file types are standardized.
JHOVE can do this for approximately 12 formats, but the software makes it easy for more to be programmed.
What is…
Siegfried is a DROID like, command-line tool that also uses PRONOM information to identify file formats.
Siegfied is approximate to DROID, but adds other mechanisms to identify file formats and alternative ways for users to interact with it.
What is…
File is a linux based tool for identifying file formats. Unlike DROID and Siegfried it does not return unique identifiers for what it finds.
FIle uses a different mechanism and different corpus of information to identify the format of a digital object.
What is…
An online resource of community contributed ‘recipes’ (commands) for processing audio visual files through the open source audio visual transcode and characterization tool ffmpeg.
What is…
- A free and open source tool for working with audio and video.
- ffmpeg can characterize multimedia, even output visual analyses.
- ffmpeg can transcode it into other file formats, and perform many other manipulations.
- Developed and maintained by the ffmpeg team.
What is…
A utility for transferring data across file systems while maintaining key file system properties such as last-modified date, and user’s permissions.
What is…
Not strictly for digital preservation, but useful nonetheless, will annotate Linux commands for users and enables those annotations to be shared.
What is…
Vera PDF
A free and open source tool for the validation of PDF/A files. Vera PDF Provides some support for other PDF variants.
What is…
A tool written in Python that characterizes JPEG2000 (JP2) files. Important in digitization workflows where JP2 is now taking a place for the savings in storage space over TIF.
What is…
A tool by Martin Hoppenheit to reduce the number of signatures in the DROID signature file, e.g. for the purpose of quicker identification in image format only digitization workflows.
What is…
A large scale, OAIS (Open Archival Information System) compliant, system that implements large pieces of the digital preservation workflow from ingest to delivery. Rosetta is maintained by the company Ex Libris.
What is…
RODA is an open-source digital repository designed for preservation developed in Portugal. The repository supports all the main functional components of the OAIS model.
What is…
Open source digital preservation system maintained by Artefactual. Younger than Preservica and Rosetta, Archivematica has a growing user-base, and a different support model to the two mentioned.
What is…
Originally called Safety Deposit Box, Preservica is an OAIS compliant digital preservation system maintained by Preservica in Abingdon, Oxford, UK.
What is…
Safety Deposit Box
The first four implementations of the Preservica digital preservation system went under the name Safety Deposit Box, organisations such as The National Archives, UK, and Swiss Federal Archive, were some of the first to adopt this system.
What is…
Apache Tika
A tool maintained by the Apache Software Foundation capable of extracting metadata and content from a range of file formats including PDF, Microsoft Office, Rich Text Format, and XML.
What is…
A registry of digital forensics tools and training courses developed in 2016 that will prove useful for finding tools for dissecting and interpreting digital files for preservation and access.
What is the…
Just Solve the File Format Problem
A wiki style registry of file formats that can be edited by all users. It differs from PRONOM in the regard that anyone can add information, and so it is a good idea to submit something to this wiki first, or in concert with PRONOM, for the benefit of the community.
Just Solve It, is an initiative of the Internet Archive.
What is a…
Write Blocker
Forensics hardware that blocks the ability to write to a storage device, thus protecting data and its evidentiary value. Write blocking tools are available from companies such as Tableau and Wiebetech.
What is a…
USB controller and write blocker for legacy floppy disk drives. It allows us to use 3.5-inch and 5.25-inch disk drives on modern computer hardware.
What is the…
SuperCard Pro
A USB controller and write blocker for legacy floppy disk drives, specifically 3.5-inch and 5.25-inch disk drives. One of a handful of alternatives to KryoFlux.
What is important about…
A useful way for those in digital preservation to connect with the community. An active forum with lots of branches out to other resources.
What is…
- A portal, search engine, and API that connects metadata about content at Australian GLAM institutions.
- TROVE makes this information findable.
- Trove is a collaboration between National Library, Australia’s State and Territory libraries.
What is…
- A technical registry that describes tools useful for long term digital preservation.
- Acts primarily as a finding and evaluation tool to help practitioners find the tools they need to preserve digital data.
- COPTR collates this knowledge in one place instead of organisations competing against each other with their own registries.
What is…
- ‘Twitter’ archiving (twarc) is a command line tool and Python library for archiving Twitter JSON data.
- Each tweet is represented as a JSON object that is exactly what was returned from the Twitter API.
- In addition to letting you collect tweets Twarc can also help you collect data on users, and trends.
What is a…
- Also known as ISO 28500:2009.
- A standardised file format for storing the result of a web crawl – the output of a web archiving effort.
- WARC files many aggregate WARC records.
- WARC can encode any other file format – as you’d expect of any potential digital object on the web.
What is the…
Wayback Machine
A search engine, and API for the archived web. Hosted by the Internet Archive, based in San Francisco.
What is…
A standard for accessing and interacting with various web archives across the globe. Based on existing internet standards and the capabilities of the Wayback Machine.
What is…
A search engine for the UK web archive at the British Library that enables both trend analysis, and content search and retrieval.
What is the…
nestor Seal
An extended self-assessment process based on standard DIN31644 recognizing the trustworthiness of a digital archive.
If a nestor assessment yields a positive result they are entitled to publicise this by using the nestor Seal for Trustworthy Digital Archives.
What is an…
AV Preserve ISO 16363 Assessment
An assessment which determines how close a digital archive is to passing the benchmark for a Trusted Digital Repository (TDR). AVPreserve are one of the companies offering this service.
What is…
A method of packaging files and information about them for transferring across a network and into storage - a ‘bag’.
Bags can potentially be used as submission ingest packages (SIPs) in a digital repository.
What is the…
UK Web Archive (UKWA)
The UK Web Archive is hosted by the British Library and supported by a number of partners in the UK. Part of the collection is searchable and can be found online. Web archives collected under legal deposit law in the UK have their access restricted to various reading rooms in the UK.
What is the…
UK Government Web Archive
The web archive of UK government maintained by The National Archives UK.
The archive is an exemplar of why we archive the web, and good case-studies appear, for example, during a machinery of government change.
The UK Government Web Archive also archives UK Government Twitter feeds.
What is…
Library Carpentry
A set of open source tutorials and lessons available on GitHub to help teach librarians and archivists digital literacy skills required in this era.
What is…
The Signal
- The Library of Congress blog whose basic intent is to discuss digital stewardship.
- The blog covers other aspects of computer technology, most especially management, transmission and use of data.
- It covers new developments that have an impact on digital preservation and access.
- Contributors come from across the archives and digital preservation community.
What is…
The OPF Blog
A blog hosted by the Open Preservation Foundation (OPF) that invites free and open discussion of digital preservation issues and the tools we use in the community.
The blog has a low-barrier to authoring and is free to sign-up to; as such it has a wide and varied range of contributors, and blogs to read through.
What is…
Heritrix is a web crawler created by the Internet Archive and was designed purposely for web archiving. The last stable release of Heritrix was in 2014.
What is…
GNU Wget is a free utility for downloading files from the web. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies. Wget is useful for scripting the download of files on the web via shell scripting tools such as Bash.
What is…
The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a protocol for exposing to the outside world what is held inside a digital repository, and has numerous applications such as enabling the transmission of metadata about items for an archival catalogue.
What is a…
Hex Editor
Hex (Hexadecimal) editor, e.g. HxD, are tools for representing the binary content of a file in hexadecimal form, usually in contiguous rows of bytes.
A hex editor is often split into two view panes. The left pane showing the hexadecimal form of the binary content of a file. The right, showing the characters that can be rendered using the ASCII encoding scheme, or another scheme supported by the tool such as EBCDIC.
Hex editors are impotant tools for developing file format signatures.