Archivematica Flashcards
Learn about the open source Archivematica digital preservation system.
Archivematica
A system that can generate digital preservation packages in accordance with the OAIS framework of digital preservation. Archivematica can create SIPS and AIPS (submission information packages, and archival information packages). Both are structured in the Bagit RFC8493 specification. Archivematica can generate DIPs for dissemination in systems other than Archivematica.
Dashboard
Analogous to an ‘Archivematica’ it is the user-facing component of the system. It is a web-interface. Its horizontal menu is a series of tabs that relate back to the OAIS stages of the preservation workflow. The two main screens, Transfer and Ingest display the status of various microservice jobs.
Pipeline
A synonym for a pipeline or an ‘Archivematica’. An organization might run multiple pipelines across different servers to enable it to process different material types in different ways.
Storage Service
A separate system which manages the different package types output by different Archivematica pipelines. A storage service can connect to multiple pipelines. A pipeline is usually connected to a single storage service.
Transfer Type
Transfer types are analogous to material flows in other systems such as the Rosetta digital preservation system. A transfer type allows different material, e.g. a dataset to be handled differently by an Archivematica workflow.
API
Both Archivematica and the Storage Service have APIs that can be used to start transfers, get transfer status updates, and manage packages; as well as other assorted functions such as downloading individual files from AIPs.
Storage Space
A method of connecting to a file system or quasi-file-system using different protocols. Examples include local file system, Amazon S3, and Dataverse API. The storage space lets the user establish locations for transferring data, processing data, as well as creating an area for the storage of archival packages.
Storage Location
A storage location is a specific area of a storage space designated to the purpose of ‘transfer source’, ‘AIP storage’, ‘DIP storage’, ‘processing’, ‘replication’ and a number of other purposes such as an area for failed or rejected transfers.
Transfer
The process of taking material from a transfer source, running that material through various microservice jobs and generating a SIP package in Archivematica. Transfer is responsible for processes such as virus scanning or file format identification.
Ingest
The process of turning a SIP into an AIP in Archivematica. The Ingest phase includes normalization (creating derivative objects (files) for access, or preservation-friendly objects). It is also responsible for generating an AIP METS file containing information about the structure and content of the AIP.
Standard Transfer
A transfer type in Archivematica. A standard transfer implements a basic preservation workflow and makes no additional assumptions about the structure of the content transferred or about how it should be arranged on disk.
Processing Configuration
Abbreviated to processingMCP.xml the processing configuration affects every transfer type and provides granular control over the various microservices that may or may not run. For example, users can select whether or not to normalize files in the processing configuration, or let it remain an option in the ingest workflow.
Automated Processing Configuration
A description of a processing configuration file where all decision points have been resolved, e.g. a user can select at run-time where to send an AIP to, e.g. a media storage location, or generic storage location. Or they can determine this up front with customized processing configurations. A transfer or ingest with all decision points resolved by a processing configuration can be described as being ‘automated’.
Unzipped Bag
A transfer type in Archivematica where the content received is in a Bag(it) format, and validated as such before being processed into an AIP. If additional information about the transfer is provided in the bag-info.txt it is mapped to the AIP METS.
Zipped Bag
Identical to the unzipped bag transfer type except the bag is decompressed before being validated and processed further. Archivematica uses the Library of Congress Bagit.py tool to work with bags.
Dataverse
A transfer type which enables users to connect to a Dataverse via the Dataverse API and download datasets to be processed into an AIP. Metadata associated with a Dataverse dataset is mapped to AIP METS. At present it is only possible to retrieve a dataset from Dataverse, not upload an Archivematica output to it.
Disk Image
A mechanism that allows a disk image type to be selected and processed by Archivematica. The transfer type will employ different tools to enable standard features such as file format identification of the disk image’s content. Additional disk image specific metadata can be added by the user at point of transfer which will be preserved in the AIP METS.
Dublin Core Metadata
Items, directories, and the AIP itself can be described using the set of Dublin Core Metadata Elements (DCMI). All elements are repeatable and will appear in the descriptive metadata section of the AIP METS. Metadata can be provided in CSV or JSON as part of a transfer, or edited during the transfer or ingest workflows.
UUID
Archivematica uses UUIDs for everything! UUIDs identify different microservice jobs, different transfers or Ingests. The AIP is assigned a UUID as well as the AIPs contents. A UUID might look as follows “ebc9fc1c-6243-4461-842c-215eba47e379”
DIP Upload
If a transfer has been normalized for access then a DIP will be created by Archivematica. It can be stored (optional) or uploaded to another system such as AtoM (Access to Memory) (also optional). Dublin core metadata associated with the transfer as well as the access derivatives will be contained in the DIP package created.
Automation Tools
A set of utility scripts that are used to perform tasks on archival data during pre-ingest, ingest, or post-ingest. The primary use-case for automation-tools is to allow users to continually process content at set intervals; automatically, and without the need for intervention.