Chapter 5 Flashcards

1
Q

four v’s of big data

A

volume, velocity, variety, and veracity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

data volume

A

amount of data created and stored by an organization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

data velocity

A

pace at which data is created and stored

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

data variety

A

different forms data can take

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

data veracity

A

quality or trustworthiness of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

analytics mindset is ability to

A

ask right questions; extract, transform, and load relevant data; apply appropriate data analytic technique; interpret and share results with stakeholders

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

asking right questions is the 1st step of analytics mindset: establishing objectives that are smart

A

specific, measurable, achievable, relevant, timely

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

etl process

A

extracting, transforming, and loading data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

structured data

A

data that is highly organized and fits into fixed fields

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

unstructured data

A

data that has no uniform structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

semi structured data

A

organized in some ways but not fully organized to be inserted into a relational database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

data warehouses store

A

structured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

data lake

A

collection of structured, semi structured, and unstructured data in a single location

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

dark data

A

info the organization has collected and stored that would be useful for analysis but is not analyzed and is ignored

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

data swamps

A

data repositories that arent accurately documented so the stored data cant be properly identified and analyzed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

data swamps

A

data repositories that arent accurately documented so the stored data cant be properly identified and analyzed

17
Q

flat file

A

text file that contains data from multiple tables or sources and merges it into a single row

18
Q

delimiter

A

character that marks end of 1 field and beginning of tect

19
Q

text qualifier

A

2 characters that indicate the beginning and end of a field and tell program to ignore any delimiters contained btw the characters

20
Q

4 steps for transforming data

A

understand the data and desired outcome
standardize, structure, and clean data
validate data uality and verify data meets data requirements
document the transformation process

21
Q

descriptive analytics

A

info that results from examination of data to understand the pasts
“what happened?”

22
Q

diagnostic analytics

A

build on descriptive to answer “why did this happen?”
attempt to determine causal relationships

23
Q

predictive analytics

A

answers “what might happen in the future?”

24
Q

prescriptive analytics

A

info that provide a recommendation of what should happen
“what should be done?’

25
common way people interpret results incorrectly is
correlation and causation
26
correlation
tells if 2 things happen at the same timie
27
causation
tells if the occurrence of 1 thing causes the occurrence of the 2nd thing
28
components of sharing results
remember the question that initiated the analytics process, consider audience, data visualization
29
good principles of visualization design
choosing right type of visualization, simplifying presentation of data, emphasizing whats important, representing data ethically
30
automation
application of machines to automatically perform a task once performed by humans
31
what is one tool that can be used to automate etl tasks?
robotic process automation