Data Management Flashcards
What is a data dictionary?
- It is something that stores metadata (a set of data that gives info about other data) (Ex of metadata: file size, author, date created) related to properties of fields in a relational database table.
What is a database schema?
It is a diagram of the underlying structure of the database that shows tables, primary keys, and relationships
What is an entity relationship diagram?
- It is a database schema that outlines the entities, relationships among entities, and attributes for each entity.
- ERDs do not specify the acceptable range of values for each field.
What is SQL?
- Structured Query Language
- It can be used to query relational databases
- To query data stored in multiple tables, the tables must be JOINED on a shared attribute (field)
What are the 4 types of joins in SQL?
- Inner Join
- Left Outer Join
- Right Outer Join
- Full Outer Join
What is a FULL OUTER JOIN?
It is a join that would retrieve all records from both datasets, regardless of whether there is a match
What is an INNER JOIN?
- It is a join that would retrieve only those records that match both tables
- The output would list only those who attended a workshop, for example
What is a LEFT OUTER JOIN?
returns all records from the left table and matched records from the right table
What is a RIGHT OUTER JOIN?
returns all records from the right table and matched records from the left table
What is normal form?
The rule is that each normal form adds more rules to the one before, subdividing large tables into smaller tables that are associated with primary and foreign keys
What is a check digit?
- numbers with no obvious meaning in which 1 of the digits is determined by a formula applied to the rest of the number
- Ex: Bank account number is verified to ensure that the number entered is a valid bank account number
What are application controls?
They are controls that are specific to individual programs and include input, processing, and output controls
What are the 3 types of database schemas?
- CONCEPTUAL: Shows the big picture of the database in initial phases of its design. Ex: Entity Relationship Diagram. It shows relationships.
- LOGICAL: Shows the attributes and primary keys that will be included for each table in the database. Does not show relationships.
- PHYSICAL: Shows the technical implementation of the logical model once the software has been determined
What are data warehouses?
- They are databases designed to collect and store large amounts of historical, structured data from multiple databases
- The primary purpose is to serve as central repositories
- They must be continuously updated with current and relevant data so that decision makers can make accurate decisions
What is a data mart?
- It is not for broader organization-wide purposes
- It is a smaller version of a data warehouse that is intended for a specific purpose (ex: a department)
What are star schemas?
- They are schemas that require less storage space because they have fewer tables than snowflake schemas
- This improves performance, as the computer has less to search
- Used for data warehouses and data marts
- They store data in a denormalized form in central fact tables holding numerical data and in associated dimension tables, which provide descriptive context (month of sale) to the fact tables
- This configuration reduces the number of joins required to retrieve data, leading to faster queries.
What are snowflake schemas?
They are schemas that are best for organizations needing to perform complex queries
According to QC 10, A Firm’s System of Quality Control:
a firm should accept an engagement only when its staff are COMPETENT to perform the work
How are data retrieved from databases?
By using queries
To meet quality control standards, engagement team members must be competent in what SQL language?
Data query language
What is a data lake?
- It is a data storage architecture that can hold massive amounts of data cheaply, no matter the format
- Data is stored in an original, raw format
- They support multidimensional analysis, like AI and machine learning
If an employee’s department appears differently on 2+ tables, what is violated?
- Referential integrity
- This enforces a rule that the foreign key in 1 table can refer only to the primary key in another table