Exam 1: Lectures 1, 2, 3 Flashcards

1
Q

Business Analytics

A

refers to the skills, technologies, and practices for continuous iterative exploration and investigation of past business performance (e.g., sales and return on investment) to gain insight and drive business planning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Descriptive Analytics

A

Tools that summarize what happened

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Prescriptive Analytics

A

Statistical techniques that make predictions and then suggest decision options to take advantage of the predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Predictive Analytics

A

A variety of statistical techniques that analyze data to make predictions about future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Business Analytics Advantages

A

1) Drive Revenue
2) Save Money
3) Encourage Experimentation
4) Side-step Politics
5) Persuade Executives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

4 Key Challenges in Doing Business Analytics

A

1) Managing 6V’s of Big Data
2) Growth of Unstructured Data
3) Underestimating the Hard Work
4) Hiring the Right Person(s)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

4 Main Elements of Data-Driven Tasks

A

1) Data Access
2) Data Management
3) Data analysis
4) Data presentation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Model

A

an abstraction of a real problem that tries to capture the essence and key features of the problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Key Challenges of Managing 6V’s of Big Data

A

1) Volume
2) Velocity
3) Variety
4) Volatility
5) Validity
6) Value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Volume

A

Big data implies large volumes of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Velocity

A

It is the speed of data processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Variety

A

Many sources and types of data are structured and unstructured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Volatility

A

It refers to how long data is valid and how long it should be stored

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Validity

A

Data should be correct and accurate for the intended use

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Seven step modeling process

A

1) Define the problem
2) Collect and summarize data
3) Develop a model
4) Verify the model
5) Select one or more suitable decisions
6) Present the results to the organization
7) Implement the model and update it over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

graphs

A

bar charts, pie charts, histograms, scatter charts, and time series graphs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

numerical summary measures

A

counts, percentages, averages, and measures of variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

tables of summary measures

A

totals, averages, counts, and grouped by categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

population

A

includes all of the entities of interest in a study (people, households, machines, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

sample

A

a subset of the population, often randomly chosen and preferably representative of the population as a whole

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Four Scales of Measurement

A

1) Nominal
2) Ordinal
3) Interval
4) Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Nominal

A

have two or more categories without having any kind of natural order, two levels: gender (male and female), multiple levels: marital status (single, married, divorced, widowed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

ordinal

A

a categorical variable for which the possible categories are ordered, education level: less than high school, high school, college degree, graduate degree

24
Q

interval

A

measure is ordered and the distance between each number is equal; however, there is no natural zero condition, temperature: the difference between 10C and 20C is the same as the difference between 20C and 30C

25
Q

ratio

A

variables are interval variables, but with the added condition of zero (origin), money, sales revenue

26
Q

interquartile range

A

the third quartile minus the first quartile
Thus, it is the range of the middle 50% of the data
It is less sensitive to extreme values than the range

27
Q

variance

A

essentially the average of the squared deviations from the mean
If Xi is a typical observation, its squared deviation from the mean is (Xi – mean)2

28
Q

range

A

the maximum value minus the minimum value

29
Q

standard deviation

A

the square root of the variance

30
Q

skewness

A

occurs when there is a lack of symmetry

31
Q

kurtosis

A

has to do with the “fatness” of the tails of the distribution relative to the tails of a normal distribution

32
Q

Statisticians generally consider a value as an outlier if

A

it is more than three standard deviations from the mean

33
Q

dummy variable

A

a 0–1 coded variable for a specific category
It is coded as 1 for all observations in that category and 0 for all observations not in that category

34
Q

bin variable

A

corresponds to a numerical variable that has been categorized into discrete categories

35
Q

when a distribution has a negative (or positive) skew, ____ is larger than ____

A

median, mean

36
Q

Two Types of Estimators

A

1) Point Estimators
2) Interval Estimators

37
Q

Point Estimators

A

to estimate a population characteristic with a single value

38
Q

Interval Estimators

A

to estimate a population characteristic with an interval, or range, of values

39
Q

simple random sampling mechanism

A

the sample mean is typically used as a “best guess.” This estimate is a point estimate
The accuracy of the point estimate is measured by its standard error It is the standard deviation of the sampling distribution of the point estimate
A confidence interval (with 95% confidence) for the population mean extends to approximately two standard errors on either side of the sample mean
From the central limit theorem, the sampling distributionof 𝑋̅ is approximately normal when n is reasonably large
There is approximately a 95% chance that any particular 𝑋̅ will be within two standard errors of the population mean μ

40
Q

For a simple random sampling, if we have 10,000 customers and we want to select 1,000 customers at random; each customer should have ___ chance to be selected

A

1 in 10

41
Q

typical sampling mistakes

A

1) Unrepresentative sample
2) Biased respondents
3) Low response rate (non-response bias)
4) Biased questions

42
Q

unrepresentative sample

A

Sample does not represent population

43
Q

biased respondents

A

Respondents incorrectly answer sensitive questions such as annual income

44
Q

low response rate (non-response bias)

A

Only few respondents participate in surveys

45
Q

biased questions

A

Incorrect wordings make hard to understand what respondents answer

46
Q

confidence interval

A

a range of values we are fairly sure our true value lies in

47
Q

systematic sampling

A

is a type of probability sampling method in which sample members from a larger population are selected according to a random starting point and a fixed periodic interval. This interval, called the sampling interval, is calculated by dividing the population size by the desired sample size

48
Q

stratified sampling

A

Suppose various subpopulations within the total population can be identified. These subpopulations are strata
Instead of taking a simple random sample from the entire population, it might make more sense to select a simple random sample from each stratum separately

49
Q

cluster sampling

A

the population is separated into clusters, such as cities or city blocks, and then a random sample of the clusters is selected

50
Q

p-value

A

the probability of obtaining a result equal to what was actually observed, when the null hypothesis is true

51
Q

What % of observations within 1, 2, or 3 standard deviations of its mean when a variable x follows a normal distribution

A

68, 95, 99.7

52
Q

When we reject or fail to reject null hypothesis at 0.05 significant level

A

p-value<0.05, rejected, p-value>0.05 fail to reject

53
Q

hypothesis

A

a claim that can be tested statistically

54
Q

one-tailed alternative

A

supported only by evidence in a single direction

55
Q

two-tailed alternative

A

supported by evidence in either of two directions

56
Q

how to deal with missing values in variables

A

One option is to simply ignore them. Then you will have to be aware of how the software deals with missing values

Another option is to fill in missing values with the average of nonmissing values. We use this option!

A third option is to examine the nonmissing values in the row of a missing value; these values might provide clues on what the missing value should be

57
Q
A