final Flashcards
benefits of python
large ecosystem
community support
readability and simplicity
versatility
interactivity
integration
python basic data types
str, float, int, bool
tuple vs list vs dictionary
tuple: (), immutable - only count/index
list: [], mutable
dictionary: {key: value}
what is a function? what is a library?
function: block of organized, reusable code used to perform a single, related action
library: collections of pre-written code that provide ready-made functions and methods to accomplish specific tasks
* reusable code, promoting modularity, reducing redundancy + time used
common python libraries: numpy, pandas, matplotlib, scikit-learn, math
what does math.ceil() do?
numpy: numerical operations, arrays
pandas: data transformation, analysis, dataframes
matplotlib: data visualization, plotting
sklearn: tools for machine learning, predictive analytics
math: mathematical operations
math.ceil(): rounds up to integer
types of analytics
descriptive: what happened?
predictive: what could happen?
prescriptive: what should happen?
disgnostic
predictive analytics: noise, models, environment
noise: other factors impacting observations
models: mathematical approximations
environment: success depends on environment
linear regression
one continuous response variable, one or more continuous explanatory variable
use x to predict y by mapping a straight line through the data
the line is determined by OLS
types of predictive analytics
predict values
* exact value
* probability
* proportion
predict categories
* nominal groups
* ordinal groups (probability groups)
assumptions of linear regression
- error terms follow normal dist
- mean of error terms = 0
- variance of the error terms is constant, and independent of X
- error terms are independent of each other
- no multicollinearity
interpreting model estimates: coef, SD, t, P
coef: constant/slope of each term
SD: how much the coefficient varies
t: significance
P: significance
R squared
adjusted R-squared?
how much variation in Y is explained by X in linear regression
* increases with more variables included
* adjusted R-squared: adjusts for multiple predictors, decreases when additional variables do not contribute to model’s significance
continuous vs binary response
continuous:
* values are (-inf, +inf) or (0, +inf)
* fits straight line
* ex. profitability, attendance, capacity
binary:
* {0, 1}
* logit line
* ex. win/loss, survival, normal/failure
types of binary responses
winning percentage
probability of 1
failure rate
winning prob
what is logistic regression?
continuous/discrete variable predicts binary categorical variable