Archivematica Flashcards

Learn about the open source Archivematica digital preservation system.

1
Q

Archivematica

A

A system that can generate digital preservation packages in accordance with the OAIS framework of digital preservation. Archivematica can create SIPS and AIPS (submission information packages, and archival information packages). Both are structured in the Bagit RFC8493 specification. Archivematica can generate DIPs for dissemination in systems other than Archivematica.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Dashboard

A

Analogous to an ‘Archivematica’ it is the user-facing component of the system. It is a web-interface. Its horizontal menu is a series of tabs that relate back to the OAIS stages of the preservation workflow. The two main screens, Transfer and Ingest display the status of various microservice jobs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Pipeline

A

A synonym for a pipeline or an ‘Archivematica’. An organization might run multiple pipelines across different servers to enable it to process different material types in different ways.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Storage Service

A

A separate system which manages the different package types output by different Archivematica pipelines. A storage service can connect to multiple pipelines. A pipeline is usually connected to a single storage service.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Transfer Type

A

Transfer types are analogous to material flows in other systems such as the Rosetta digital preservation system. A transfer type allows different material, e.g. a dataset to be handled differently by an Archivematica workflow.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

API

A

Both Archivematica and the Storage Service have APIs that can be used to start transfers, get transfer status updates, and manage packages; as well as other assorted functions such as downloading individual files from AIPs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Storage Space

A

A method of connecting to a file system or quasi-file-system using different protocols. Examples include local file system, Amazon S3, and Dataverse API. The storage space lets the user establish locations for transferring data, processing data, as well as creating an area for the storage of archival packages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Storage Location

A

A storage location is a specific area of a storage space designated to the purpose of ‘transfer source’, ‘AIP storage’, ‘DIP storage’, ‘processing’, ‘replication’ and a number of other purposes such as an area for failed or rejected transfers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Transfer

A

The process of taking material from a transfer source, running that material through various microservice jobs and generating a SIP package in Archivematica. Transfer is responsible for processes such as virus scanning or file format identification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Ingest

A

The process of turning a SIP into an AIP in Archivematica. The Ingest phase includes normalization (creating derivative objects (files) for access, or preservation-friendly objects). It is also responsible for generating an AIP METS file containing information about the structure and content of the AIP.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Standard Transfer

A

A transfer type in Archivematica. A standard transfer implements a basic preservation workflow and makes no additional assumptions about the structure of the content transferred or about how it should be arranged on disk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Processing Configuration

A

Abbreviated to processingMCP.xml the processing configuration affects every transfer type and provides granular control over the various microservices that may or may not run. For example, users can select whether or not to normalize files in the processing configuration, or let it remain an option in the ingest workflow.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Automated Processing Configuration

A

A description of a processing configuration file where all decision points have been resolved, e.g. a user can select at run-time where to send an AIP to, e.g. a media storage location, or generic storage location. Or they can determine this up front with customized processing configurations. A transfer or ingest with all decision points resolved by a processing configuration can be described as being ‘automated’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Unzipped Bag

A

A transfer type in Archivematica where the content received is in a Bag(it) format, and validated as such before being processed into an AIP. If additional information about the transfer is provided in the bag-info.txt it is mapped to the AIP METS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Zipped Bag

A

Identical to the unzipped bag transfer type except the bag is decompressed before being validated and processed further. Archivematica uses the Library of Congress Bagit.py tool to work with bags.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Dataverse

A

A transfer type which enables users to connect to a Dataverse via the Dataverse API and download datasets to be processed into an AIP. Metadata associated with a Dataverse dataset is mapped to AIP METS. At present it is only possible to retrieve a dataset from Dataverse, not upload an Archivematica output to it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Disk Image

A

A mechanism that allows a disk image type to be selected and processed by Archivematica. The transfer type will employ different tools to enable standard features such as file format identification of the disk image’s content. Additional disk image specific metadata can be added by the user at point of transfer which will be preserved in the AIP METS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Dublin Core Metadata

A

Items, directories, and the AIP itself can be described using the set of Dublin Core Metadata Elements (DCMI). All elements are repeatable and will appear in the descriptive metadata section of the AIP METS. Metadata can be provided in CSV or JSON as part of a transfer, or edited during the transfer or ingest workflows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

UUID

A

Archivematica uses UUIDs for everything! UUIDs identify different microservice jobs, different transfers or Ingests. The AIP is assigned a UUID as well as the AIPs contents. A UUID might look as follows “ebc9fc1c-6243-4461-842c-215eba47e379”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

DIP Upload

A

If a transfer has been normalized for access then a DIP will be created by Archivematica. It can be stored (optional) or uploaded to another system such as AtoM (Access to Memory) (also optional). Dublin core metadata associated with the transfer as well as the access derivatives will be contained in the DIP package created.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Automation Tools

A

A set of utility scripts that are used to perform tasks on archival data during pre-ingest, ingest, or post-ingest. The primary use-case for automation-tools is to allow users to continually process content at set intervals; automatically, and without the need for intervention.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Fixity

A

Fixity is a utility application that lets users perform fixity checks across the entire storage service. Fixity is also an API endpoint which writes the status back to the storage service database. The results are returned by the endpoint and are also visible in the storage service’s packages tab.

23
Q

Transfer Tab

A

The transfer tab shows the success statuses of all the microservice jobs that are run in the process of creating a SIP.

24
Q

Backlog Tab

A

A SIP can be sent to a backlog and retrieved later for arrangement in the appraisal tab. The backlog tab lists all of the SIPs that can be accessed.

25
Q

Appraisal Tab

A

The appraisal tab can be used to arrange SIP contents and combine multiple SIPs to become one AIP. The appraisal tab can also display information extracted from the bulk-extractor tool as well as provide previews of a small number of file types.

26
Q

Ingest Tab

A

The ingest tab shows the success statuses of all the microservice jobs that are run in the process of creating a AIP.

27
Q

Preservation Planning

A

Preservation planning is synonymous with the format policy register (FPR). Specific commands to be run on specific file format types are controlled from here. The FPR maintains commands for Identification, characterization, normalization, transcription, validation, and verification. Format types must have a PRONOM entry and be able to be identified by the Siegfried or FIDO identification tools.

28
Q

Archival Storage Tab

A

AIPs can be retrieved and downloaded from the archival storage tab. AIPs can be searched for based on the content of their dmdSecs in the METS metadata (if an AIP has been created with Dublin Core (DC) metadata). Pointer files can also be retrieved and viewed from this tab.

29
Q

Pointer File

A

Pointer files are METS files that describe an archival package’s container. The metadata in a pointer file is designed to make the archival package as accessible as possible in the future. For example, if a package is compressed or encrypted, then the first most important thing to know is that it is. The second most important information then is a description (pointers) how to then decompress or unencrypt. This information as well as other instructions may be found in the pointer file type.

30
Q

Administration

A

The administration tab enables users to configure parts of an Archivematica dashboard, for example, create new users; connect the pipeline to a storage service; or establish the parameters of a processing configuration.

31
Q

Packages

A

Packages that Archivematica can create include SIP (Submission Information Package); AIP (Archival Information Package); and DIP (Dissemination Information Package). Packages are usually uncompressed and can be downloaded as a group in a tar file, or compressed using the 7z compression tool with a range of algorithms available including bzip and lzma. Archivematica packages are anticipated to be system independent and it should be possible to import them into other systems (preservation or otherwise) today, and in the future.

32
Q

Bag (Packages)

A

SIPs and AIPs in Archivematica are self-contained bags in accordance with the Bagit RFC8493 specification.

33
Q

Reingest

A

Reingest is the process by which an AIP is then sent back through the transfer and ingest microservice jobs for information about the package’s contents to be updated. DmdSec elements of the METS are marked as original and updated. techMD sections of the METS are marked with superseded as new metadata is added. As tools and information about digital objects evolves it stands to reason then that an AIP and the knowledge within could be updated too. This should help to promote preservation and reingest serves this use-case. Reingest can be full, partial (ingest microservice jobs only), or metadata only.

34
Q

Normalization

A

Normalization is the process of creating derivative digital objects, either for access, e.g. TIFF images to JPEG because file-sizes are smaller for the web. Or preservation e.g. JPEG2000 images to TIFF because TIFF is perceived to be more stable. The original objects, the preservation masters, are always preserved. Rules for normalization can be configured in the FPR and normalization is an opt-in process. Normalization is just one approach Archivematica supports users with in a preservation workflow.

35
Q

metadata.csv

A

metadata.csv is one mechanism for users to provide metadata about the objects in a transfer. Files, directories, and the SIP/AIP itself can be described using the Dublin Core Metadata Element Set.

36
Q

custom_structmap.xml

A

A logical METS structMap that can be associated with a transfer to describe some other structure for the digital objects than will manifest in the AIP. A logical structMap can add structure to a flat list of digital objects, e.g. a group of JPEG files might be arranged into the shape of a book. An individual object, e.g. a single audio recording, can be given structure e.g. by describing time-frames and the intellectual payload of each of those time-frames.

37
Q

identifiers.json

A

A mechanism by which users can supply previously minted identifiers for digital objects. An identifier might come with the published content, e.g. in the form of DOI or HDL identifiers. Identifiers might be minted by an organization’s catalog, and used to pair descriptive records with those in the AIP once the AIP has been stored.

38
Q

Logs folder

A

The logs folder is stored within the AIP package itself is a collection of detailed output generated by microservice jobs. Logs might be used to identify issues with an AIPs generation, but may also be informative where METS/PREMIS is not the correct place for some of Archivematica’s output, for example, identification of personally identifying information by forensic (legal) tools such as bulk extractor.

39
Q

Metadata folder

A

The metadata folder is stored within the AIP package and contains metadata generated by Archivematica, as well as any additional submission documentation provided by the user on transfer.

40
Q

Submission Documentation

A

Submission documentation might include rights or other legal information about the contents of a transfer. This folder is also stored in an Archivematica AIP.

41
Q

Transfer METS

A

The transfer METS.xml file contains all the descriptive and technical metadata output by the transfer microservice jobs. An example might include file format identifiers and validation output for specific file-formats. The transfer METS will also describe the structure of the SIP.

42
Q

AIP METS

A

The AIP METS.xml file contains all the descriptive and technical metadata output by all the Archivematica microservice jobs. The AIP METS will also describe the structure of the AIP. Aside from the contents of the AIP, the METS/PREMIS within the AIP METS is the most-important addition to an archival package generated by Archivematica.

43
Q

Policy Checks

A

Mediaconch policies can describe the expected technical requirements of media formats. These policies can be used by Archivematica to help reject or accept content. These policies may be especially useful in digitization workflows where content is being provided by an external party and immediately uploaded to Archivematica.

44
Q

README.html

A

README.html is a file included with the AIP that describes for users the structure of the AIP. It provides a plain-text description of an AIP that is one small measure to support an AIP’s understanding in the future.

45
Q

Examine Contents

A

A microservice job which runs the bulk extractor forensics (legal) tool to extract personally identifying information from transferred content.

46
Q

Replication

A

A mechanism in the Archivematica Storage Service to duplicate AIPs and send them to a separate storage location following the LOCKSS (lots of copies keeps stuff safe) method of digital preservation.

47
Q

File Format Identification

A

A microservice job that can be controlled by the format policy register to run Siegfried or Fido on transferred files and output the results to PREMIS in the transfer and AIP METS.

48
Q

Uncompressed AIP

A

Describes the layout of an AIP on disk. Uncompressed AIPs are arranged on the file system as a series of folders and files conforming with the AIP structure. Uncompressed AIPs can be downloaded as tar files.

49
Q

Compressed AIP

A

An AIP can be compressed into a single ‘archival’ file format and downloaded and extracted as such. Archivematica uses the 7z compression tool to compress this style of AIP and store them on disk.

50
Q

7-Zip

A

Or 7z. An open source compression library and command-line tool that supports a number of algorithms surfaced to the user for compressing packages via Archivematica.

51
Q

Bagit.py

A

The bagging tool used by Archivematica to create SIPs and AIPs and also verify the integrity of those when requested by the user.

52
Q

Transcribe contents

A

Archivematica’s microservice job which wraps the open source Tesseract optical character recognition (OCR) tool. OCR outputs are found in an AIP’s logs directory.

53
Q

Microservice

A

A microservice in Archivematica describes a grouping of individual jobs directed at a single purpose, e.g. the Normalization microservice contains a number of jobs associated with creating and describing normalized versions of the digital objects associated with a digital transfer. A microservice may have different connotations in different industry settings.

54
Q

Microservice job

A

A microservice job describes a script that performs a specific function in Archivematica, for example, one job might verify user supplied checksum values. Another job might be responsible for saving an AIP to the Storage Service. A microservice job is typically a Python script.