Exploratory Data Analysis Flashcards
Data that are expressed on a numerical scale is what data type?
numeric
Data that can take only a specific set of values representing a set of possible categories (enums, enumerated, factors, nominal) are what data type?
categorical
Cite the two numerical data types
continuous and discrete
Cite the two categorical data types
binary and ordinal
Data that can take on any value in an interval (float, numeric)
continuous
Data that can take only integer values, such as counts
discrete
True or False
Data typing in software acts as a signal on how to process the data
True
Rectangular data (like a spread sheet) is the basic structure for statistical and machine learning models, cite the structure?
dataframe
A column (series) within a table is commonly referred to as a _______?
feature
Many data science projects involve predicting an ______?
outcome (dependent variable, response, target, output)
A row in a table is referred to as a ______?
record
What is the sum of all values divided by the number of values
mean
The sum of all values times a weight divided by the sum of the weights
weighted mean
The value such that one-half of the data lies above and below
median
The value such that P percent of the data lies below
percentile (quantile)
The value such that one-half of the sum of the weights lies above and below the sorted data
weighted median
The average of all values after dropping a fixed number of extreme values
trimmed mean (truncated mean)
Not sensitive to extreme values
Robust (resistant)
What is a data value that is very different from most of the data?
Outlier (extreme value)
The difference between the observed values and the estimate of location?
deviations
The sum of squared deviations from the mean divided by n-1 where n is the number of data values.
variance
The square root of the variance
standard deviation
The mean of the absolute values of the deviations from the mean (L1-norm, Manhattan norm)
mean abs deviation
The mean of the absolute values of the deviations from the median
median abs deviation from the median