final Flashcards

1
Q

benefits of python

A

large ecosystem
community support
readability and simplicity
versatility
interactivity
integration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

python basic data types

A

str, float, int, bool

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

tuple vs list vs dictionary

A

tuple: (), immutable - only count/index
list: [], mutable
dictionary: {key: value}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is a function? what is a library?

A

function: block of organized, reusable code used to perform a single, related action

library: collections of pre-written code that provide ready-made functions and methods to accomplish specific tasks
* reusable code, promoting modularity, reducing redundancy + time used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

common python libraries: numpy, pandas, matplotlib, scikit-learn, math

what does math.ceil() do?

A

numpy: numerical operations, arrays
pandas: data transformation, analysis, dataframes
matplotlib: data visualization, plotting
sklearn: tools for machine learning, predictive analytics
math: mathematical operations

math.ceil(): rounds up to integer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

types of analytics

A

descriptive: what happened?
predictive: what could happen?
prescriptive: what should happen?
disgnostic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

predictive analytics: noise, models, environment

A

noise: other factors impacting observations
models: mathematical approximations
environment: success depends on environment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

linear regression

A

one continuous response variable, one or more continuous explanatory variable

use x to predict y by mapping a straight line through the data

the line is determined by OLS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

types of predictive analytics

A

predict values
* exact value
* probability
* proportion

predict categories
* nominal groups
* ordinal groups (probability groups)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

assumptions of linear regression

A
  1. error terms follow normal dist
  2. mean of error terms = 0
  3. variance of the error terms is constant, and independent of X
  4. error terms are independent of each other
  5. no multicollinearity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

interpreting model estimates: coef, SD, t, P

A

coef: constant/slope of each term
SD: how much the coefficient varies
t: significance
P: significance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

R squared

adjusted R-squared?

A

how much variation in Y is explained by X in linear regression
* increases with more variables included
* adjusted R-squared: adjusts for multiple predictors, decreases when additional variables do not contribute to model’s significance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

continuous vs binary response

A

continuous:
* values are (-inf, +inf) or (0, +inf)
* fits straight line
* ex. profitability, attendance, capacity

binary:
* {0, 1}
* logit line
* ex. win/loss, survival, normal/failure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

types of binary responses

A

winning percentage
probability of 1
failure rate
winning prob

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is logistic regression?

A

continuous/discrete variable predicts binary categorical variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how do we transform linear y to a probability dist? how do we calculate odds?

assumptions of logit

A

y = 1 if y * >= 0
y = 0 if y * < 0

odds = Pr(Y=1)/Pr(Y = 0)

error term follows a logistic distribution

17
Q

interpretation of logistic regression: coefficient, z-value, p-value

A

for each 1 unit increase in Xk, odds is multiplied by exp(Bk)

z-value: how many SDs the estimated coefficients are away from 0 on a standard normal curve; should be >2 to be statistically significant

p-value: < 0.05 to be statistically significant

18
Q

accuracy/hit rate, true positive rate, true negative rate

A

accuracy rate: correct predictions/all predictions

TPR: true positives/all positive predictions

TNR: true negatives/all negative predictions

19
Q

machine learning + types

A

machine learning: gives computers the ability to learn without being explicitly programmed

  • supervised: we know the results so we can check accuracy; inputs > training > outputs
  • unsupervised: does not predict anything, just identifies patterns/structures (ex. clustering); inputs > outputs
  • reinforcement: learns from +ve and -ve reinforcement to maximize rewards; inputs > outputs > rewards
20
Q

regression vs classification tree

A

regression: response variable continuous; classifies to value ranges

classification: response variable discrete/categorical; classifies to categories

21
Q

overfitting, pruning, cross-validation, random forests

what do CARTs depend on?

A

overfitting: model describes random error/noise instead of underlying relationship; easily happens in CART models
* pruning: pruning leaves to reduce overfitting

cross-validation: mixes training sets to avoid overfitting and choose the best model

random forests: construct a bunch of decision trees at trainin time and output the mode of the classes or mean prediction

trainign sample, variable used, algorithms used

22
Q

gini impurity, entropy

A

both are algorithms to determine which attribute is at the top of the tree

gini impurity: weighted avg of 1 - P(true)^2 - P(false)^2 for each leaf

entropy: True/False > used to calculate information gain (entropy before - entropy after)

23
Q

prescriptive analytics + subcategories

examples

A

takes what we know (descriptive) to forecast what could happen (predictive) and decide what to do (prescriptive)

produces a reliable path to an optimal solution to business needs
characterized by rules, constraints, thresholds
helps managers make decisions under complex environments
* mathematical programming
* evolutionary computation
* probabilistic models
* simulation
* logic-based models

farm, self-driving car, flight prices

24
Q

decision, objective, predictive models. environment

A

decision → decision variables are the input
objectives and measurable outcomes → output
predictive models → understanding input/output relationship
environment → complexity

25
Q

features of predictive analytics: automation, customization, intelligence

A

automation → build analytics into products/services
customization → different solutions to different customers/scenarios
optimization → from a good solution to the best solution
intelligence → measurable outcomes, feedback loop, learning

26
Q

prescriptive analytics methods: experimentation, collaborative filtering, monte carlo simulations

user-based, item-based?

A

experimentation: A/B testing, ex. obama campaign donation button
collaborative filtering: user-based, item-based recommendations; makes predictions about a user by collecting preferences from many
monte carlo simulations: rely on repeated random sampling to obtain numerical results
using randomness to solve problems that might be deterministic in principle

user: similar users + recommend, item: similar tiems to prev. liked

27
Q

product customization vs price discrimination (degrees)

A

product customization: individualized recommendation, cross selling, up selling

price discrimination:
1. individual reservation price
2. quantity discounts
3. segmentation

28
Q

when does monte carlo apply?

adv of simulation

A
  1. problem well-defined
  2. uncertainty in the problem/situation
  3. complicate structure in the problem
  4. not easy to get the exact solution

computer power, good approx, understand chances of each outcome, best vs worst scenario

29
Q

problems solved with monte carlo

A

integration — approximate local curvatures in complex integration problems
optimization — find the best solution; find the tip of the dome
estimation — simulate probability distributions, bayesian data analysis

30
Q

LP: what is the objective? what changes do we have to make to the problem to put it into python?

how to define constraints, inequality vs equality

A

obj: max/min a linear obj function, st constraints

  • convert to min; list the coefficients of each decision variable (negative if max)
  • defining constraints: convert to <=, define LHS (might be negative) as A_ub and RHS as b_ub
  • if equality, LHS = A_eq, RHS = b_eq
31
Q

IP: what is the objective? what changes do we have to make to the problem to put it into python?

A

obj: determine which decision variables are turned off or on

  • define each variable in binary terms, make obj function
  • constraints are defined for the vector of all variables, ex. [0,1,0]; if changing from >= might use -1
32
Q

poisson distribution uses, assumptions

indiv prob vs cumulative prob

A

models the number of discrete events in a given time period, ex. defects per day
λ = rate, mean, var
* dist is centered around λ

req:
* indiv events can’t happen at the same time
* indepedent
* doesn’t depend on time since last event

prob of exactly x or up to x