Data Management Flashcards

1
Q

What is a data dictionary?

A
  • It is something that stores metadata (a set of data that gives info about other data) (Ex of metadata: file size, author, date created) related to properties of fields in a relational database table.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a database schema?

A

It is a diagram of the underlying structure of the database that shows tables, primary keys, and relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an entity relationship diagram?

A
  • It is a database schema that outlines the entities, relationships among entities, and attributes for each entity.
  • ERDs do not specify the acceptable range of values for each field.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is SQL?

A
  • Structured Query Language
  • It can be used to query relational databases
  • To query data stored in multiple tables, the tables must be JOINED on a shared attribute (field)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the 4 types of joins in SQL?

A
  • Inner Join
  • Left Outer Join
  • Right Outer Join
  • Full Outer Join
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a FULL OUTER JOIN?

A

It is a join that would retrieve all records from both datasets, regardless of whether there is a match

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is an INNER JOIN?

A
  • It is a join that would retrieve only those records that match both tables
  • The output would list only those who attended a workshop, for example
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a LEFT OUTER JOIN?

A

returns all records from the left table and matched records from the right table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a RIGHT OUTER JOIN?

A

returns all records from the right table and matched records from the left table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is normal form?

A

The rule is that each normal form adds more rules to the one before, subdividing large tables into smaller tables that are associated with primary and foreign keys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a check digit?

A
  • numbers with no obvious meaning in which 1 of the digits is determined by a formula applied to the rest of the number
  • Ex: Bank account number is verified to ensure that the number entered is a valid bank account number
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are application controls?

A

They are controls that are specific to individual programs and include input, processing, and output controls

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the 3 types of database schemas?

A
  • CONCEPTUAL: Shows the big picture of the database in initial phases of its design. Ex: Entity Relationship Diagram. It shows relationships.
  • LOGICAL: Shows the attributes and primary keys that will be included for each table in the database. Does not show relationships.
  • PHYSICAL: Shows the technical implementation of the logical model once the software has been determined
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are data warehouses?

A
  • They are databases designed to collect and store large amounts of historical, structured data from multiple databases
  • The primary purpose is to serve as central repositories
  • They must be continuously updated with current and relevant data so that decision makers can make accurate decisions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a data mart?

A
  • It is not for broader organization-wide purposes
  • It is a smaller version of a data warehouse that is intended for a specific purpose (ex: a department)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are star schemas?

A
  • They are schemas that require less storage space because they have fewer tables than snowflake schemas
  • This improves performance, as the computer has less to search
  • Used for data warehouses and data marts
  • They store data in a denormalized form in central fact tables holding numerical data and in associated dimension tables, which provide descriptive context (month of sale) to the fact tables
  • This configuration reduces the number of joins required to retrieve data, leading to faster queries.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are snowflake schemas?

A

They are schemas that are best for organizations needing to perform complex queries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

According to QC 10, A Firm’s System of Quality Control:

A

a firm should accept an engagement only when its staff are COMPETENT to perform the work

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How are data retrieved from databases?

A

By using queries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

To meet quality control standards, engagement team members must be competent in what SQL language?

A

Data query language

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a data lake?

A
  • It is a data storage architecture that can hold massive amounts of data cheaply, no matter the format
  • Data is stored in an original, raw format
  • They support multidimensional analysis, like AI and machine learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

If an employee’s department appears differently on 2+ tables, what is violated?

A
  • Referential integrity
  • This enforces a rule that the foreign key in 1 table can refer only to the primary key in another table
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

referential integrity

A

A database design concept where a foreign key in a table points to a primary key in another table, so that any changes in 1 table affect relevant data linked to it

24
Q

What should automatically update all related tables?

A

An ATTRIBUTE CHANGE in 1 table should automatically update all related tables to keep data consistent

25
Q

What is a field check?

A
  • It validates whether data entered is the correct length and format
  • Ex: Time card entry is verified to ensure the hours entered are numeric
26
Q

What is a validity check?

A
  • It compares entered data with a list of acceptable entries
  • Ex: 2 letter state abbreviation will be checked against a list of acceptable choices
27
Q

What is a reasonableness check?

A

It ensures that the entered data falls within predetermined limits

28
Q

What does an asterisk do in SQL langauge?

A

It retrieves all fields from a table

29
Q

What does WHERE clause do in SQL language?

A

It restricts the output to only those records that satisfy certain criteria

30
Q

What does ORDER BY clause do in SQL language?

A

It sorts the output in ascending (small to big, or first to last) order by default OR in descending order (big to small, or last to first) by attaching the DESC keyword

31
Q

What is application programming interface?

A
  • It is software that connects internal and external data and systems in real time without human intervention
  • Ex: Weather forecast in real time
32
Q

What are the 4 types of constraints?

A
  • Domain (Ex: MM/DD/YY must be entered in a certain format
  • Entity
  • Referential integrity
  • Key
33
Q

What are constraints?

A

They are PREVENTITIVE controls that restrict the type of data and range of values that can be entered into a field in a relational database table

34
Q

What is a UPC?

A
  • Universal product code
  • These digits identify a single retail product and its manufacturer with a check digit
  • UPC can be found on a barcode
35
Q

What is electronic data interchange (EDI)?

A
  • EDI is the asynchronous (not happening at the same time) computer to computer exchange of documents among business partners in an electronic format
  • The EDI network can only be accessed by authorized organizations, so it is a secure method of transferring B2B data
36
Q

What does the DISTINCT keyword in SQL represent?

A

It can remove duplicates from the output by being included in the SELECT clause

37
Q

What are the 3 types of data structures?

A
  • Structured Data (Ex: user ID in smartphone app)
  • Semi-structured Data (Ex: Tagged post from social media)
  • Unstructured Data (Ex: PDFs and video )
38
Q

What is the DISPOSAL PHASE of the data life cycle?

A
  • It includes archive, deletion, and destruction
  • In this phase, data is archived when its no longer actively used but must remain accessible just in case
39
Q

What is database normalization?

A
  • It decreases data redundancy by creating tables and relationships that follow rules known as “normal forms”
40
Q

What does AVG function do in SQL?

A

It calculates the average of a field in a table

41
Q

What does HAVING do in SQL?

A

It applies criteria at the group level

42
Q

What does WHERE do in SQL?

A

It applies criteria at the individual record level

43
Q

The correct order of execution of SQL queries

A

1) From - Specifies the tables from which to extract the data
2) Where - Filters the data
3) Group by - Aggregates the data
4) Having - Filters aggregated data
5) Select - Specifies the fields to be extracted
6) Order by - Sorts the returned data
7) Limit - Restricts the rows returned

44
Q

What is a limit test?

A
  • AKA Reasonableness check or Limit Check
  • It’s a comparison of data entered with limits that are set
  • Ex: Time card entry is verified to make sure that hours entered don’t exceed 24 hours in a day
45
Q

To be in the 1NF:

A
  • Each record must have a primary key
  • Repeating groups are eliminated
  • Each cell must hold a single piece of data
46
Q

To be in the 2NF:

A
  • Database must already be in the 1NF
  • Separate tables should be created for sets of values that apply to multiple records
  • These tables must be related with a foreign key
47
Q

What is a primary key?

A
  • Think of a row of data in a table, like a house
  • The house has all kinds of things in it, like a TV, couch, table, as do all the other houses on the street
  • How do I know which house is mine?
  • The best way to do this is to give the house a number (a special number that no one else is allowed to use on their house)
  • That number is called a PRIMARY KEY
48
Q

What is referential integrity?

A
  • Referential integrity is a term used in database design to describe the relationship between two tables.
  • It is important because it ensures that all data in a database remains consistent and up to date.
  • It helps to prevent incorrect records from being added, deleted, or modified.
  • It makes certain that each foreign key in a table point to a unique primary key value in another table.
  • This will ensure that data is not lost, and it will also help you maintain data quality.
49
Q

What is a domain contstraint?

A
  • They restrict the attribute data type (character, date) and may also require the data to be entered in a specific format
  • Ex: MM/DD/YYYY instead of M/D/Y
  • Input fields may be programmed to automatically reformat the data or notify the user to reformat the data
50
Q

Left JOIN + IS NULL condition =

A

This can be used to find records in one table that don’t exist in another table

51
Q

Union join would:

A

Return all records from both tables

52
Q

WHERE Vendor ID is NULL condition does what?

A

It limits the output to only those records for which the VENDOR ID in the PurchaseOrder table doesn’t exist as a VendorID in the Vendor table

53
Q

What are the 5 commonly used AGGREGATE FUNCTIONS IN SQL?

A

COUNT: Returns the number of records in a field
AVERAGE: Computes the average of values in a field
SUM: Adds all the values in a field
MAX: Finds the largest value in a field
MIN: Finds the smallest value in a field

54
Q

What is ROBOTIC PROCESS AUTOMATION?

A

It is a process that automates manual tasks (data entry) through the use of bots, which replicate the actions or clicks a human would do

55
Q

What does EXTRACT TRANSFORM LOAD focus on?

A

It focuses on batch processing and consolidating data from different sources into a SINGLE REPOSITORY

56
Q

To be in the 3NF:

A

All non-prime table attributes depend on the primary key

Ex: A denormalized customer table holds information about invoices. However, the only information that depends on the customer ID is the customer’s name and address, so an additional table is needed to separate information that is not dependent on the primary key, achieving the 3NF. The primary key in this example is the customer ID.

57
Q

What is a logical database schema?

A
  • It is a type of database schema
  • The logical schema must be completed before implementation because the logical schema is the basis for the physical schema
  • The physical schema is then implemented in the database software once the type of database (relational) is known
  • Logical schemas show the attributes (fields) and primary keys that will be included for each table in the database