Midterm Flashcards
what can data do (4)
Describe the current state of an organization or process
Detect anomalous events
Diagnose the causes of events and behaviors
Predict future events
describe the 4 steps in ds workflow
data collection and storage
data preparation
exploration and visualization
experimentation and prediction
what are the 3 applications of data science
traditional machine learning
internet of things
deep learning
what do we need for machine learning
a well defined question
a set of example data
a new set of data to use our algorithm on
what is deep learning
may neurons work together
requires much more training data
used in complex problems: image classifications, language learning/understanding
what is supervised machine learning
predictions from data with labels and features
what is churn prediction
trying to predict whether the customer will likely terminate their subscription with a certain service in the future
what is clustering and what are 3 use cases
divide data into categories
use cases:
customer segmentation
image segmentation
anomaly detection
how do you slice a list in python
list[start:end] [inclusive (optional) : exclusive (optional)]
how do you delete an element in a list
del(list[index])
does python work by reference or assignment
reference
how can you make a copy of a list instead of referencing the original
y = x[:]
what are the 3 parameters of np.random.normal()
distribution mean
distribution standard deviation
number of samples
how to check if “x” is a key in dictionary y
“x” in y
what is pandas
high level data manipulation tool built on numpy
suppose brics is a dataframe. what is the difference between brics[“country”] and brics[[“country”]]
the first only lists the countries with their indexes. (type series)
the second returns a dataframe with one column, countries
what is the type of brics[1:4] considering brics is a dataframe
dataframe
what is the difference between df.loc[’’,’’] and df.iloc[rowint,colint]
loc locates keys while iloc locates indices
how to use logical operatos with numpy
np.logical_and()
np.logical_or()
np.logical_not()
create a for loop that loops through a list and prints the index and its value
for index, height in enumerate(fam):
…
loop over the contets of a dictionary
for key, value in worlds.items():
how to loop through a dataframe printing index and row content
for index, row in brics.iterrows():
print(index)
print(row) #row is a list in this case
what does the following do
brics[“country”].apply(len)
adds a column to the dataframe that contains the length of the content of country column in each row
sort a dataframe by multiple values in ascending and descending order
df.sort_values([‘col1”, “col2”], ascending=[True, False])
how to subset a dataframe to match 2 conditions
h[cond1 & cond2]
h[cond1 | cond2]
how to return all rows where the value in column “state” in a dataframe is one of 3 predetermined values
h1 = h[h[‘state’].isin([‘north’, ‘virginia’, ‘arizona’])]