Quiz 1 Flashcards
what are the steps of data analysis pipeline
- Figure out the question.
- Find/get relevant data.
- Clean & prepare the data.
- Analyze the data.
- Interpret & present results.
why is full data analysis broken up into many steps
if is impractical to rerun the first few steps over an over. ex API calls.
what does np.vectorize do
turns a function into a function that can operate on an entire array in an element by element fashion
what is numPy good for
storing and operating on arrays
what is pandas good for
pandas is good for manipulating data
what is a series in panda
a 1d array, stored as a numpy array, column
what is a dataframe in panda
a collection of data
define the steps of extract-transform-load
extract: get the data you need
transform: fix the data, clean data, get it in a form you want to work with
load: load into next step of your pipeline
what is signal processing and filtering algorithms
signal process uses filtering algorithms to remove noise from a signal
what does LOESS smoothing do
it’s a technique to smooth a curve, to remove the noise
LOESS smoothing takes a local area of the data and fit’s a line to it. We have to make a decision on how big this area is.
What happens if we pick a small area or a large area?
if small, we are more sensitive to noise
if large, we are less sensitive to signal changes
is LOESS better with lots of samples or sparce samples
better with more samples
true or false: LOESS’s parameters are y then x
true
what is kalman filtering
it allows you to express what you know to predict the most likely value for the truth
what does the kalman operation need
we need to give the variance of
1. our observations
2. our predications
and the covariance between each pair
we also need matrices for both out observations and predictions
the covariance matrix express our uncertainty in the measurement and predictions