2 Data Structures, Types, and Formats Flashcards

1
Q

What is the main focus of this chapter?

A

Data storage and the various formats of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two main categories of databases?

A
  • Structured
  • Unstructured
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What defines a structured database?

A

It follows a standardized format with a clear and logical structure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two main archetypes of structured databases?

A
  • Defined rows/columns
  • Key-value pairs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How are defined rows and columns organized?

A

In tables or spreadsheets where columns represent variables and rows represent data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What do key-value pairs represent in a structured database?

A

Data objects where each object has the same set of keys with different values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What characterizes unstructured data?

A

It has no attempt at organization and is often stored as individual files.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the two groups of unstructured data?

A
  • Undefined fields
  • Machine data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What types of file formats are included in undefined fields?

A
  • Text files
  • Audio files
  • Video files
  • Images
  • Social media data
  • Emails
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is machine data?

A

Data automatically generated by software without human intervention.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the difference between relational and non-relational databases?

A

Relational databases store information and relationships, while non-relational databases store information only.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What language is primarily used for querying relational databases?

A

Structured Query Language (SQL)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

True or False: All SQL databases are structured and relational.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

True or False: All non-relational databases are unstructured.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the two most basic types of data schemas covered in this chapter?

A
  • Star schema
  • Snowflake schema
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the structure of a star schema?

A

A central key table with dimension tables connected directly to it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the pros of a star schema?

A
  • Simple
  • Fewer joins required
  • Easier to understand
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the cons of a star schema?

A
  • High redundancy
  • Denormalized
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What distinguishes a snowflake schema from a star schema?

A

It has two levels of dimension tables instead of one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the pros of a snowflake schema?

A
  • Low redundancy
  • Normalized
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the cons of a snowflake schema?

A
  • More complicated
  • More joins required
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a data warehouse?

A

A database used for structured relational tables, holding large amounts of processed transactional data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a data mart?

A

A specialized subset of a data warehouse holding processed information on a specific topic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is a data lake?

A

A storage system for large amounts of raw, unprocessed data, which can be structured, unstructured, or a combination.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Fill in the blank: A data mart is designed to be _______ enough for analysts or customer support employees to access by themselves.

A

[self-service]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is a data mart?

A

A data mart is a subset of a data warehouse that contains customer-facing data and is designed for self-service access by analysts or customer support employees

Data marts prioritize ease of use and often follow a star schema.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is a data lake?

A

A data lake stores large amounts of raw, unprocessed data, which can include structured, unstructured, or a combination of both types

Data lakes are often used by data scientists and do not follow any specific schema.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Who typically creates data warehouses and data lakes?

A

Data warehouses and data lakes are usually created by specialized data engineers, and many companies only have one

They are often created through third-party services or software.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What are some popular data warehouse tools?

A
  • Snowflake
  • Hevo
  • Amazon Web Services (AWS) data warehouse tools
  • Microsoft Azure data warehouse tools
  • Google data warehouse tools
30
Q

What are the two options for updating a current value in a dataset?

A
  • Overwrite historical values
  • Keep historical values
31
Q

What is the benefit of overwriting historical values?

A

It keeps your dataset smaller and simpler, but historical data is lost

Historical data is required for trend analysis.

32
Q

What columns are added to keep historical values?

A
  • Active Record
  • Active Start
  • Active End
33
Q

What does the Active Record column indicate?

A

It indicates whether the specified value is the most current value, marked as Yes or No.

34
Q

What are the consequences of changing the number of variables being recorded?

A

It creates null values in the dataset

Null values occur whether you are adding new columns or removing existing ones.

35
Q

What are the four common data types that everyone working with data should know?

A
  • Date
  • Numeric
  • Alphanumeric
  • Currency
36
Q

How should dates be formatted according to ISO recommendations?

A
  • YYYY-MM-DD
  • YYYY-MM-DD HH:MI:SS
37
Q

What is numeric data?

A

Numeric data is made up of numbers, which can be whole numbers or decimals.

38
Q

What does alphanumeric data include?

A

Alphanumeric data includes numbers and letters, except for values in scientific notation.

39
Q

What is currency data?

A

Currency data includes monetary values, typically denoted with a currency symbol.

40
Q

What are the two types of numeric data?

A
  • Discrete
  • Continuous
41
Q

What defines discrete variables?

A

Discrete variables are counts that usually describe whole numbers or integers.

42
Q

What are continuous variables?

A

Continuous variables can represent an infinite number of values between two points and are often measured as decimals.

43
Q

What are the three main types of categorical variables?

A
  • Binary
  • Nominal
  • Ordinal
44
Q

What distinguishes independent variables from dependent variables?

A

Independent variables are manipulated directly, while dependent variables are measured and depend on independent variables.

45
Q

What file types are commonly encountered by data analysts?

A
  • Text (TXT)
  • Image (JPEG)
  • Audio (MP3)
  • Video (MP4)
  • Flat (CSV)
  • Website (HTML)
46
Q

What are flat files?

A

Flat files contain a simple two-dimensional dataset or spreadsheet, such as TSV or CSV.

47
Q

What is the difference between TSV and CSV?

A

TSV values are separated by tabs, while CSV values are separated by commas.

48
Q

What are the common image file types?

A
  • JPG/JPEG
  • PNG
  • GIF
  • BMP
  • RAW
49
Q

What formats do audio files commonly use?

A
  • MP3
  • WAV
  • WMA
  • AAC
  • ALAC
50
Q

What are some popular video file formats?

A
  • MP4
  • WMV
  • MOV
  • FLV
  • AVI
51
Q

What are the website file types recognized by the exam?

A
  • HTML
  • XML
  • JSON
52
Q

What is the purpose of HTML?

A

HTML is used to structure websites and store information between tags.

53
Q

What distinguishes XML from HTML?

A

XML tags have no pre-determined meanings and can be customized, while HTML tags have specific meanings.

54
Q

What is HTML primarily used for?

A

Website structure and occasionally passing information

55
Q

How is information stored in HTML?

A

Between tags that create elements with specific meanings

56
Q

What is XML similar to?

57
Q

In XML, what is unique about the tags?

A

They have no pre-determined meanings

58
Q

What is a key feature of JSON?

A

Specializes in storing and passing information

59
Q

How does a JSON file structure data?

A

Contains a list of data objects using key-value pairs

60
Q

What is the main difference between JSON and HTML/XML?

A

JSON does not contribute to website structure

61
Q

What types of databases were covered in this chapter?

A

Structured and unstructured databases

62
Q

What are the two types of databases discussed?

A

Relational and non-relational databases

63
Q

What types of schemas were mentioned?

A

Star and snowflake schemas

64
Q

What are the three types of data storage mentioned?

A

Data warehouses, data marts, and data lakes

65
Q

What is a characteristic of a data lake?

A

Focuses on raw, unprocessed data

66
Q

True or False: JSON can have pre-determined tag meanings like XML.

67
Q

Fill in the blank: A smart thermometer sends data to a ______.

A

local database

68
Q

What type of schema is most appropriate for non-technical client-facing agents?

A

Star schema

69
Q

What will historic values be for a newly added column in a dataset?

70
Q

What type of data does a file with the ‘.png’ extension contain?