Chapter 8: Data Structures and CAATTs for Data Extraction Flashcards
Data structures have two fundamental components:
organization and access method
_______________ refers to the way records are physically arranged on the secondary storage device. This may be either sequential or random.
Organization
The _______________ is the technique used to locate records and to navigate through the database or
file.
access method
Under this arrangement, for example, the record with key value 1875 is placed in the physical storage space immediately following the record with key value 1874. Thus, all records in the file lie in contiguous storage spaces in a specified sequence (ascending or descending) arranged by their primary key.
sequential structure
An ________________ is so named because, in addition to the actual data file, there exists a separate index that is itself a file of record addresses. This index contains the numeric
value of the physical disk storage location (cylinder, surface, and record block) for each record in the associated data file.
indexed structure
Records in an _________________ are dispersed throughout a disk without regard for their physical proximity to other related records
indexed random file
The ___________________________structure is used for very large files that require routine batch processing and a moderate degree of individual record processing. For instance, the customer file of a public utility company will be processed in batch mode for billing purposes and directly accessed in response to individual customer
queries
Virtual Storage access method (VSAM)
A VSAM file has three physical components:
the indexes
the prime data storage area
the overflow area.
A ______________ employs an algorithm that converts the primary key of a record directly into a storage address.
hashing structure
The principal advantage of hashing is _____________________.
access speed
________________ is used to create a linked-list file.
pointer structure
A ___________________ contains the actual disk
storage location (cylinder, surface, and record number) needed by the disk controller. This physical address allows the system to access the record directly without obtaining
further information. This method has the advantage of speed, since it does not need to be manipulated further to determine a record’s location.
physical address pointer
A _____________ contains the relative position of a record in the file. For
example, the pointer could specify the 135th record in the file. This must be further manipulated to convert it to the actual physical address. The conversion software calculates this by using the physical address of the beginning of the file, the length of each record
in the file, and the relative address of the record being sought.
relative address pointer
A _________________ contains the primary key of the related record. This key value is then converted into the record’s physical address by a hashing algorithm.
logical key pointer
This structure uses an index in conjunction with a sequential file organization. It facilitates both direct access to individual records and batch processing of the entire file. Multiple indexes can be used to create a cross-reference, called an inverted list, which allows even more flexible access to data.
indexed sequential file structure
An _______ is anything about which the organization wishes to capture data. These may be physical, such as inventories, customers, or employees. They may also be conceptual, such as sales (to a customer), accounts receivable (AR), or accounts payable (AP).
entity
The term _____________ is used to describe the number of instances or records that pertain to a specific entity.
occurrence
______________ are the data elements that define an entity.
Attributes
The labeled line connecting two entities in a data model describes the nature of the
___________ between them.
association
_____________ is the degree of association between two entities
Cardinality
__________ describes the number of possible occurrences in one table that are associated with a single occurrence in a related table.
cardinality
Four basic forms of cardinality are
possible:
zero or one (0,1)
one and only one (1,1)
zero or many (0,M)
one or many (1,M).
The value of at least one attribute in each occurrence (row) must be unique. This attribute is the ______________________.
primary key
Logically related tables need to be physically connected to achieve the associations described in the data model using _________________.
foreign keys
A ____________ is the set of data that a particular user sees. Examples of this are computer screens for entering or viewing data, management reports, or source documents such as an invoice.
user view
Improperly normalized tables can cause DBMS processing problems that restrict, or even deny, users access to the information they
need. Such tables exhibit negative operational symptoms called _________________.
anomalies
To be free of anomalies, tables must be normalized to the _______________________.
third normal form (3NF) level.
The _____________ results from data redundancy in an unnormalized table.
update anomaly
The _______________ involves the unintentional deletion of data
from a table.
deletion anomaly
_________________ is a component of a much larger systems development process that involves extensive analysis of user needs.
Database design
Combining the data needs of all users into a single schema or enterprise-wide view is called _____________________.
view integration
The objective of the ____________________, also known as continuous auditing, is to identify important transactions while they are being processed and extract copies of them in real time.
embedded audit module (EAM),
An ____________ is a specially programmed module embedded in a host application to capture predetermined transaction types for subsequent analysis.
embedded audit module (EAM)
Disadvantages of EAMs
- Operational Efficiency - EAMs decrease operational performance
- Verifying EAM Integrity - EAM may not be a viable audit technique in environments with a high level of program maintenance
___________________ is the most widely used CAATT for IS auditing. It allows auditors to access electronically coded data files and perform various operations
on their contents.
Generalized audit software (GAS)
The widespread popularity of GAS is due to four factors:
(1) GAS languages are easy to use and require little computer background on the part of the auditor;
(2) many GAS products can be used on both mainframe and PC systems;
(3) auditors can perform their tests independent of the client’s computer service staff; and
(4) GAS can be used to audit the data stored in most file structures and formats.
_______________________ was designed as a meta-language for auditors to access data stored in various digital formats and to test them comprehensively.
ACL (audit command language)
One of ACL’s strengths is the ability to read data stored in most formats.
ACL uses the __________________ for this purpose
data definition feature
______________ are expressions that search for records that meet the filter criteria.
Filters
________________________ allows the auditor to use logical operators such as AND, OR, , , NOT and others to define and test conditions of any complexity and to process only those records that match specific conditions
ACL’s expression builder
_____________________________ feature allows the auditor to view the distribution of records that fall into specified strata.
ACL’s stratification
Data can be stratified on any numeric field such as sales price, unitcost, quantity sold, and so on. The data are summarized and classified by strata, which can be equal in size (called ___________) or vary in size (called ___________).
intervals, free
ACL offers many sampling methods for statistical analysis. Two of the most frequently used are ________________________.
record sampling and monetary unit sampling (MUS).