final Flashcards

1
Q

benefits of python

A

large ecosystem
community support
readability and simplicity
versatility
interactivity
integration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

python basic data types

A

str, float, int, bool

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

tuple vs list vs dictionary

A

tuple: (), immutable - only count/index
list: [], mutable
dictionary: {key: value}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is a function? what is a library?

A

function: block of organized, reusable code used to perform a single, related action

library: collections of pre-written code that provide ready-made functions and methods to accomplish specific tasks
* reusable code, promoting modularity, reducing redundancy + time used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

common python libraries: numpy, pandas, matplotlib, scikit-learn, math

what does math.ceil() do?

A

numpy: numerical operations, arrays
pandas: data transformation, analysis, dataframes
matplotlib: data visualization, plotting
sklearn: tools for machine learning, predictive analytics
math: mathematical operations

math.ceil(): rounds up to integer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

types of analytics

A

descriptive: what happened?
predictive: what could happen?
prescriptive: what should happen?
disgnostic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

predictive analytics: noise, models, environment

A

noise: other factors impacting observations
models: mathematical approximations
environment: success depends on environment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

linear regression

A

one continuous response variable, one or more continuous explanatory variable

use x to predict y by mapping a straight line through the data

the line is determined by OLS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

types of predictive analytics

A

predict values
* exact value
* probability
* proportion

predict categories
* nominal groups
* ordinal groups (probability groups)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

assumptions of linear regression

A
  1. error terms follow normal dist
  2. mean of error terms = 0
  3. variance of the error terms is constant, and independent of X
  4. error terms are independent of each other
  5. no multicollinearity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

interpreting model estimates: coef, SD, t, P

A

coef: constant/slope of each term
SD: how much the coefficient varies
t: significance
P: significance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

R squared

adjusted R-squared?

A

how much variation in Y is explained by X in linear regression
* increases with more variables included
* adjusted R-squared: adjusts for multiple predictors, decreases when additional variables do not contribute to model’s significance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

continuous vs binary response

A

continuous:
* values are (-inf, +inf) or (0, +inf)
* fits straight line
* ex. profitability, attendance, capacity

binary:
* {0, 1}
* logit line
* ex. win/loss, survival, normal/failure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

types of binary responses

A

winning percentage
probability of 1
failure rate
winning prob

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is logistic regression?

A

continuous/discrete variable predicts binary categorical variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how do we transform linear y to a probability dist? how do we calculate odds?

assumptions of logit

A

y = 1 if y * >= 0
y = 0 if y * < 0

odds = Pr(Y=1)/Pr(Y = 0)

error term follows a logistic distribution

17
Q

interpretation of logistic regression: coefficient, z-value, p-value

A

for each 1 unit increase in Xk, odds is multiplied by exp(Bk)

z-value: how many SDs the estimated coefficients are away from 0 on a standard normal curve; should be >2 to be statistically significant

p-value: < 0.05 to be statistically significant

18
Q

accuracy/hit rate, true positive rate, true negative rate

A

accuracy rate: correct predictions/all predictions

TPR: true positives/all positive predictions

TNR: true negatives/all negative predictions

19
Q

machine learning + types

A

machine learning: gives computers the ability to learn without being explicitly programmed

  • supervised: we know the results so we can check accuracy; inputs > training > outputs
  • unsupervised: does not predict anything, just identifies patterns/structures (ex. clustering); inputs > outputs
  • reinforcement: learns from +ve and -ve reinforcement to maximize rewards; inputs > outputs > rewards
20
Q

regression vs classification tree

A

regression: response variable continuous; classifies to value ranges

classification: response variable discrete/categorical; classifies to categories

21
Q

overfitting, pruning, cross-validation, random forests

what do CARTs depend on?

A

overfitting: model describes random error/noise instead of underlying relationship; easily happens in CART models
* pruning: pruning leaves to reduce overfitting

cross-validation: mixes training sets to avoid overfitting and choose the best model

random forests: construct a bunch of decision trees at trainin time and output the mode of the classes or mean prediction

trainign sample, variable used, algorithms used

22
Q

gini impurity, entropy

A

both are algorithms to determine which attribute is at the top of the tree

gini impurity: weighted avg of 1 - P(true)^2 - P(false)^2 for each leaf

entropy: True/False > used to calculate information gain (entropy before - entropy after)

23
Q

prescriptive analytics + subcategories

examples

A

takes what we know (descriptive) to forecast what could happen (predictive) and decide what to do (prescriptive)

produces a reliable path to an optimal solution to business needs
characterized by rules, constraints, thresholds
helps managers make decisions under complex environments
* mathematical programming
* evolutionary computation
* probabilistic models
* simulation
* logic-based models

farm, self-driving car, flight prices

24
Q

decision, objective, predictive models. environment

A

decision → decision variables are the input
objectives and measurable outcomes → output
predictive models → understanding input/output relationship
environment → complexity

25
features of predictive analytics: automation, customization, intelligence
**automation** → build analytics into products/services **customization** → different solutions to different customers/scenarios **optimization** → from a good solution to the best solution **intelligence** → measurable outcomes, feedback loop, learning
26
prescriptive analytics methods: experimentation, collaborative filtering, monte carlo simulations | user-based, item-based?
**experimentation**: A/B testing, ex. obama campaign donation button **collaborative filtering**: user-based, item-based recommendations; makes predictions about a user by collecting preferences from many **monte carlo simulations**: rely on repeated random sampling to obtain numerical results using randomness to solve problems that might be deterministic in principle | user: similar users + recommend, item: similar tiems to prev. liked
27
product customization vs price discrimination (degrees)
product customization: individualized recommendation, cross selling, up selling price discrimination: 1. individual reservation price 2. quantity discounts 3. segmentation
28
when does monte carlo apply? | adv of simulation
1. problem well-defined 2. uncertainty in the problem/situation 3. complicate structure in the problem 4. not easy to get the exact solution computer power, good approx, understand chances of each outcome, best vs worst scenario
29
problems solved with monte carlo
**integration** — approximate local curvatures in complex integration problems **optimization** — find the best solution; find the tip of the dome **estimation** — simulate probability distributions, bayesian data analysis
30
LP: what is the objective? what changes do we have to make to the problem to put it into python? | how to define constraints, inequality vs equality
obj: max/min a linear obj function, st constraints * convert to min; list the coefficients of each decision variable (negative if max) * defining constraints: convert to <=, define LHS (might be negative) as A_ub and RHS as b_ub * if equality, LHS = A_eq, RHS = b_eq
31
IP: what is the objective? what changes do we have to make to the problem to put it into python?
obj: determine which decision variables are turned off or on * define each variable in binary terms, make obj function * constraints are defined for the vector of all variables, ex. [0,1,0]; if changing from >= might use -1
32
poisson distribution uses, assumptions | indiv prob vs cumulative prob
models the number of discrete events in a given time period, ex. defects per day λ = rate, mean, var * dist is centered around λ req: * indiv events can't happen at the same time * indepedent * doesn't depend on time since last event | prob of exactly x or up to x