Quiz 1 Flashcards

Question 1

Q

what are the steps of data analysis pipeline

Answer

A

Figure out the question.
Find/get relevant data.
Clean & prepare the data.
Analyze the data.
Interpret & present results.

Question 2

Q

why is full data analysis broken up into many steps

Answer

A

if is impractical to rerun the first few steps over an over. ex API calls.

Question 3

Q

what does np.vectorize do

Answer

A

turns a function into a function that can operate on an entire array in an element by element fashion

Question 4

Q

what is numPy good for

Answer

A

storing and operating on arrays

Question 5

Q

what is pandas good for

Answer

A

pandas is good for manipulating data

Question 6

Q

what is a series in panda

Answer

A

a 1d array, stored as a numpy array, column

Question 7

Q

what is a dataframe in panda

Answer

A

a collection of data

Question 8

Q

define the steps of extract-transform-load

Answer

A

extract: get the data you need

transform: fix the data, clean data, get it in a form you want to work with

load: load into next step of your pipeline

Question 9

Q

what is signal processing and filtering algorithms

Answer

A

signal process uses filtering algorithms to remove noise from a signal

Question 10

Q

what does LOESS smoothing do

Answer

A

it’s a technique to smooth a curve, to remove the noise

Question 11

Q

LOESS smoothing takes a local area of the data and fit’s a line to it. We have to make a decision on how big this area is.

What happens if we pick a small area or a large area?

Answer

A

if small, we are more sensitive to noise

if large, we are less sensitive to signal changes

Question 12

Q

is LOESS better with lots of samples or sparce samples

Answer

A

better with more samples

Question 13

Q

true or false: LOESS’s parameters are y then x

Question 14

Q

what is kalman filtering

Answer

A

it allows you to express what you know to predict the most likely value for the truth

Question 15

Q

what does the kalman operation need

Answer

A

we need to give the variance of
1. our observations
2. our predications

and the covariance between each pair

we also need matrices for both out observations and predictions

the covariance matrix express our uncertainty in the measurement and predictions

Question 16

Q

in your observation-covariance matrix, which expresses errors in the observations, what does lower and higher values mean

Answer

Study These Flashcards

A

lower values: less sensor error, allow the observations/measurements to have more of an effect on the result

higher values: more noise exists

Question 17

Q

the transition_covariance says what you think about the error in your prediction. What does lower and higher values mean

Answer

Study These Flashcards

A

lower: less prediction error, let prediction affect the results more, less noise

higher: less accurate

Question 18

Q

what does it mean to impute data

Answer

Study These Flashcards

A

replacing missing or deleted outliers with plausible, calculated values

Question 19

Q

what is entity resolution or record linkage

Answer

Study These Flashcards

A

the process of finding multiple values that actually refer to the same entity

Question 20

Q

what is the difference between
city_data = city_data[city_data[‘area’] <= 10000]
city_data = city_data[‘area’] <= 10000

Answer

Study These Flashcards

A

city_data = city_data[city_data[‘area’] <= 10000]
will filter based on if area is <= 10k

city_data = city_data[‘area’] <= 10000
will change the dataset to a single column of true/false based off if area <= 10k

Question 21

Q

how do you write sums in numpy and sums in pandas

Answer

Study These Flashcards

A

numpy:
np.sum(totals, axis=x)

pandas:
totals.sum(axis=x)

Question 22

Q

you have a df called counts, and you have a column in the dataframe called ‘date’ make it a datetime column

Answer

Study These Flashcards

A

counts[‘date’] = pd.to_datetime(counts[‘date’])

Quiz 1 Flashcards

(22 cards)