final Flashcards by Yvonne Ilao

benefits of python

large ecosystem
community support
readability and simplicity
versatility
interactivity
integration

How well did you know this?

Not at all

Perfectly

python basic data types

str, float, int, bool

How well did you know this?

Not at all

Perfectly

tuple vs list vs dictionary

tuple: (), immutable - only count/index
list: [], mutable
dictionary: {key: value}

How well did you know this?

Not at all

Perfectly

what is a function? what is a library?

function: block of organized, reusable code used to perform a single, related action

library: collections of pre-written code that provide ready-made functions and methods to accomplish specific tasks
* reusable code, promoting modularity, reducing redundancy + time used

How well did you know this?

Not at all

Perfectly

common python libraries: numpy, pandas, matplotlib, scikit-learn, math

what does math.ceil() do?

numpy: numerical operations, arrays
pandas: data transformation, analysis, dataframes
matplotlib: data visualization, plotting
sklearn: tools for machine learning, predictive analytics
math: mathematical operations

math.ceil(): rounds up to integer

How well did you know this?

Not at all

Perfectly

types of analytics

descriptive: what happened?
predictive: what could happen?
prescriptive: what should happen?
disgnostic

How well did you know this?

Not at all

Perfectly

predictive analytics: noise, models, environment

noise: other factors impacting observations
models: mathematical approximations
environment: success depends on environment

How well did you know this?

Not at all

Perfectly

linear regression

one continuous response variable, one or more continuous explanatory variable

use x to predict y by mapping a straight line through the data

the line is determined by OLS

How well did you know this?

Not at all

Perfectly

types of predictive analytics

predict values
* exact value
* probability
* proportion

predict categories
* nominal groups
* ordinal groups (probability groups)

How well did you know this?

Not at all

Perfectly

assumptions of linear regression

error terms follow normal dist
mean of error terms = 0
variance of the error terms is constant, and independent of X
error terms are independent of each other
no multicollinearity

How well did you know this?

Not at all

Perfectly

interpreting model estimates: coef, SD, t, P

coef: constant/slope of each term
SD: how much the coefficient varies
t: significance
P: significance

How well did you know this?

Not at all

Perfectly

R squared

adjusted R-squared?

how much variation in Y is explained by X in linear regression
* increases with more variables included
* adjusted R-squared: adjusts for multiple predictors, decreases when additional variables do not contribute to model’s significance

How well did you know this?

Not at all

Perfectly

continuous vs binary response

continuous:
* values are (-inf, +inf) or (0, +inf)
* fits straight line
* ex. profitability, attendance, capacity

binary:
* {0, 1}
* logit line
* ex. win/loss, survival, normal/failure

How well did you know this?

Not at all

Perfectly

types of binary responses

winning percentage
probability of 1
failure rate
winning prob

How well did you know this?

Not at all

Perfectly

what is logistic regression?

continuous/discrete variable predicts binary categorical variable

How well did you know this?

Not at all

Perfectly

how do we transform linear y to a probability dist? how do we calculate odds?

assumptions of logit

Study These Flashcards

y = 1 if y * >= 0
y = 0 if y * < 0

odds = Pr(Y=1)/Pr(Y = 0)

error term follows a logistic distribution

interpretation of logistic regression: coefficient, z-value, p-value

Study These Flashcards

for each 1 unit increase in Xk, odds is multiplied by exp(Bk)

z-value: how many SDs the estimated coefficients are away from 0 on a standard normal curve; should be >2 to be statistically significant

p-value: < 0.05 to be statistically significant

accuracy/hit rate, true positive rate, true negative rate

Study These Flashcards

accuracy rate: correct predictions/all predictions

TPR: true positives/all positive predictions

TNR: true negatives/all negative predictions

machine learning + types

Study These Flashcards

machine learning: gives computers the ability to learn without being explicitly programmed

supervised: we know the results so we can check accuracy; inputs > training > outputs
unsupervised: does not predict anything, just identifies patterns/structures (ex. clustering); inputs > outputs
reinforcement: learns from +ve and -ve reinforcement to maximize rewards; inputs > outputs > rewards

regression vs classification tree

Study These Flashcards

regression: response variable continuous; classifies to value ranges

classification: response variable discrete/categorical; classifies to categories

overfitting, pruning, cross-validation, random forests

what do CARTs depend on?

Study These Flashcards

overfitting: model describes random error/noise instead of underlying relationship; easily happens in CART models
* pruning: pruning leaves to reduce overfitting

cross-validation: mixes training sets to avoid overfitting and choose the best model

random forests: construct a bunch of decision trees at trainin time and output the mode of the classes or mean prediction

trainign sample, variable used, algorithms used

gini impurity, entropy

Study These Flashcards

both are algorithms to determine which attribute is at the top of the tree

gini impurity: weighted avg of 1 - P(true)^2 - P(false)^2 for each leaf

entropy: True/False > used to calculate information gain (entropy before - entropy after)

prescriptive analytics + subcategories

examples

Study These Flashcards

takes what we know (descriptive) to forecast what could happen (predictive) and decide what to do (prescriptive)

produces a reliable path to an optimal solution to business needs
characterized by rules, constraints, thresholds
helps managers make decisions under complex environments
* mathematical programming
* evolutionary computation
* probabilistic models
* simulation
* logic-based models

farm, self-driving car, flight prices

decision, objective, predictive models. environment

Study These Flashcards

decision → decision variables are the input
objectives and measurable outcomes → output
predictive models → understanding input/output relationship
environment → complexity

features of predictive analytics: automation, customization, intelligence

**automation** → build analytics into products/services **customization** → different solutions to different customers/scenarios **optimization** → from a good solution to the best solution **intelligence** → measurable outcomes, feedback loop, learning

prescriptive analytics methods: experimentation, collaborative filtering, monte carlo simulations | user-based, item-based?

**experimentation**: A/B testing, ex. obama campaign donation button **collaborative filtering**: user-based, item-based recommendations; makes predictions about a user by collecting preferences from many **monte carlo simulations**: rely on repeated random sampling to obtain numerical results using randomness to solve problems that might be deterministic in principle | user: similar users + recommend, item: similar tiems to prev. liked

product customization vs price discrimination (degrees)

product customization: individualized recommendation, cross selling, up selling price discrimination: 1. individual reservation price 2. quantity discounts 3. segmentation

when does monte carlo apply? | adv of simulation

1. problem well-defined 2. uncertainty in the problem/situation 3. complicate structure in the problem 4. not easy to get the exact solution computer power, good approx, understand chances of each outcome, best vs worst scenario

problems solved with monte carlo

**integration** — approximate local curvatures in complex integration problems **optimization** — find the best solution; find the tip of the dome **estimation** — simulate probability distributions, bayesian data analysis

LP: what is the objective? what changes do we have to make to the problem to put it into python? | how to define constraints, inequality vs equality

obj: max/min a linear obj function, st constraints * convert to min; list the coefficients of each decision variable (negative if max) * defining constraints: convert to <=, define LHS (might be negative) as A_ub and RHS as b_ub * if equality, LHS = A_eq, RHS = b_eq

IP: what is the objective? what changes do we have to make to the problem to put it into python?

obj: determine which decision variables are turned off or on * define each variable in binary terms, make obj function * constraints are defined for the vector of all variables, ex. [0,1,0]; if changing from >= might use -1

poisson distribution uses, assumptions | indiv prob vs cumulative prob

models the number of discrete events in a given time period, ex. defects per day λ = rate, mean, var * dist is centered around λ req: * indiv events can't happen at the same time * indepedent * doesn't depend on time since last event | prob of exactly x or up to x

final Flashcards

(32 cards)