Midterm 1 Flashcards

1
Q

What do measures of central tendency yield?

A

Measures of central tendency yield information about the center (middle) of a group of numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the mode?

A

Mode: the most frequently occurring value in a data set
* Applicable to all levels of data measurement
* Sometimes no mode exists or there is more than one mode (bimodal or multimodal)
* Often used with nominal/ordinal data (e.g., determining the most common hair color/ blood type)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the median? What are some advantages and disadvantages?

A

Median: the middle value in an ordered array of numbers
* Array values in order
* The median of the array is the center number, or with an even number of observations, the average of the middle two terms
* Advantage: not affected by extreme values, so often preferable to the mean when the data includes some unusually large or small observations (e.g., income in the U.S., house prices in a given area)
* Disadvantage: it does not include all of the information in the data
* Data measurement level must at least be ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the arithmetic mean?

A

Arithmetic Mean: the average of a group of numbers
* Most common measure of central tendency
* Includes all information in the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are percentiles?

A

Percentiles: measures of central tendency that divide a group of data into 100 parts
* At least n% of the data lie at or below the nth percentile, and at most (100 - n)% of the
data lie above the nth percentile
* Example: 90th percentile indicates that at least 90% of the data are equal to or less
than it, and 10% of the data lie above it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are quartiles?

A

Quartiles: measures of central tendency that divide a group of data into four subgroups
25% of the data set is below the first quartile
50% of the data set is below the second quartile (also called the median) 75% of the data set is below the third quartile
100% of the data set is below the fourth quartile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are measure’s of variability

A

Measures of variability: describe the spread or dispersion of a set of data
* Distributions may have the same mean but different variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain range and give an advantage and a disadvantage

A

Range: the difference between the largest and the smallest values in a set of data
* Advantage – easy to compute
* Disadvantage – affected by extreme values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is interquartile range?

A

Interquartile range: range of values between the first and third quartile
* Range of the “middle half”; middle 50%
* Useful when analysts are interested in the
middle 50% and not the extremes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is variance?

A

Variance: average of the squared deviations about the arithmetic mean for a set of numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the Standard Deviation and what does it allow for?

A

Standard Deviation: square root of the variance
* Closely related to the variance but more easily interpretable
* The standard deviation allows us to apply the empirical rule and Chebyshev’s Theorem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain the empirical rule

A

Used to state the approximate percentage of values that lie within a given number of standard deviations from the mean of a set of data if the data are normally distributed
* Data must be normally distributed
* Since this is common for many things, the empirical rule is widely used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain Chebyshev’s Theorem

A

Chebyshev’s theorem tells us at least what percentage of the data will lie within a certain range; if the distribution is closer to normal, the actual amount will be greater

Unlike the empirical rule, data can have any distribution

For example, 75% of data will lie within 2 standard deviations of the data, no matter how the data is distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sample variance and standard deviation estimate what? Why is the denominator important?

A

The sample variance and standard deviation are used as estimators of the population values

The denominator is (n − 1) rather than N, which makes the sample statistics unbiased estimators of the population parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain z-scores and what the z scores represent if positive or negative

A

z Scores: represent the number of standard deviations a value (x) is above or below the mean for normally distributed data

Negative z scores indicate that the raw value (x) is below the mean; positive z scores indicate x values above the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the coefficient of variation? What does it measure?

A

The Coefficient of Variation: ratio of the standard deviation to the mean expressed as a
percentage

The CV can be used as a measure of risk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are measure’s of shape?

A

tools that can be used to describe the shape of a distribution of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is skewness?

A

is when a distribution is asymmetrical or lacks symmetry

Skewed portion is the long, thin part of the curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Explain in depth how measure’s of central tendency relate to skewness

A
  • The relationship of the mean, median, and the mode relate to skew
  • Symmetric: mean, median, and mode are equal
  • Negatively skewed: mean is less than the median, which is less than the mode
  • Positively skewed: mode is less than the median, which is less than the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does Kurtosis describe?

A

the amount of peakedness of a distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Explain the box-and-whisker plot

A

a diagram that utilizes the upper and lower quartiles along with the median and the two most extreme values to depict a distribution graphically
Sometimes called the 5-number summary
* A box is drawn around the median with the upper and lower quartiles as the box endpoints
(hinges)
* The interquartile range is used to construct the inner fences, ± 1.5 ∙ IQR
* If data fall outside the inner fences, outer fences are constructed, ± 3.0 ∙ IQR
* A line segment (whisker) is drawn from the lower hinge of the box outward to the smallest data value
* A second whisker is drawn from the upper hinge to the largest data value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the use of box and whisker plots?

A

One use of box-and-whisker plots is to find outliers
* Data values that fall outside the mainstream of values in a distribution are called outliers
o Sometimes merely extremes of the data
o Sometimes due to measurement or recording error
o Sometimes so unusual that they should not be considered with the rest of the data
* Values that are outside the inner fences but inside the outer fences are mild outliers
* Values that fall outside the outer fences are extreme outliers

Another use is to determine if the distribution is skewed
* The position of the median in the box gives information about the skew of the middle 50% of the data
o If the median is to the left, the middle 50% is skewed right
o If the median is to the right, the middle 50% is skewed left
* The length of the whiskers shows the skew of the outer values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Why do business analytics use descriptive statistics?

A
  • Descriptive statistics are at the foundation of statistical techniques and numerical measures that can be used to gain an initial understanding of data in business analytics
  • Descriptive statistics allows a business analyst begin to mine and understand any meanings and/or relationships that might exist in data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is a (random) experiment? Give an example

A

a process that produces well-defined outcome(s)
Sampling every 200th bottle of cola and weighing it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is an event, give an example

A

an outcome of an experiment

There are 10 bottles that are too full

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is an elementary event? Give an example

A

event that cannot be decomposed or broken down into other events

o Elementary events are denoted by lowercase letters

o Suppose that the experiment is to roll a die
o Elementary events are to roll a 1, a 2, a 3, etc.
o In this case, there are six elementary events, e1, e2, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the sample space?

A

a complete listing of all elementary events (all possible outcomes ) for a random experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the classical method of assigning probability?

A

The probability of an individual event occurring is determined by the ratio of the number of items in a population that contain the event (ne) to the total number of items in the population (N)

  • Because ne can never be greater than N, the highest value of a probability is 1
  • The lowest probability, if none of the N possibilities has the desired characteristic, e, is 0
  • Thus, 0≤P(E)≤1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is a priori probability?

A

(classical probability)– the probability can be
determined before the experiment takes place

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is the relative frequency of occurrence (empirical probability)?

A

Probability of an event occurring is equal to the number of times the event has occurred in the past divided by the total number of opportunities for the event to have occurred

Based on historical data; the past may or may not be a good predictor of the future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is subjective probability? Give an example

A
  • Based on the insights or feelings of the person determining the probability
  • Different individuals may (correctly or incorrectly) assign different numeric probabilities to the same event
  • subjective approach is usually limited to experiments that are unrepeatable

An experienced airline mechanic estimates the probability that a
particular plane will have a certain type of defect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Explain the Venn diagram structure of probability

A
  • Rectangular area represents the sample space for the random experiment and contains all possible outcomes.
  • Circle represents event A and contains only the outcomes that belong to A.
  • Shaded region of the rectangle contains all outcomes not in event A.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is a mutually exclusive event?

A
  • Mutually Exclusive Events
    o Events with no common outcomes
    o Occurrence of one event precludes the occurrence of the other event
    o Example: if you toss a coin and get heads, you cannot get tails
34
Q

What are collectively exhaustive events?

A

o Contains all possible elementary events for an experiment
* Rolling a die: {1,2,3,4,5,6}
* Generating a random integer number: { >5,= <5}
o The sample space for an experiment can be described as mutually exclusive (events do not have any outcome in common) and collectively exhaustive

35
Q

What are complementary events?

A

o Given an event X, the complement of X is defined to be the event consisting of all outcomes that are not in X.
o Complementary events are denoted X′ (or 𝑋􏰊), which is pronounced as “not X”
o In any probability application, either event X or its complement X′ must occur.
P(X′) = 1 − P(X)

36
Q

Explain unions and intersections

A
  • Set notation is the use of braces to group numbers o The union of sets X, Y is denoted X ∪Y
  • Given two events X and Y, the union of X and Y is defined as the event containing all outcomes belonging to X or Y or both
    o The intersection of sets X, Y is denoted X ∩ Y
  • An element is part of the intersection if it is in set X and set Y
37
Q

Describe Addition Laws, when does a special case arise?

A

The General Law of Addition (addition law) is used to find the probability of the union of two events
* The probability that event A or event B or both will occur (at least one of two events will occur).
P ( X ∪ Y ) = P ( X ) + P (Y ) − P ( X ∩ Y )

A special case arises for mutually exclusive events.

38
Q

Under addition laws, what is a probability matrix, joint probabilities, and marginal probabilities?

A

A probability matrix displays the intersection (joint) probabilities along with the marginal probabilities of a given problem
* When values give the probability of the intersection of two events, the probabilities are called joint probabilities.
o Inner cells show joint probabilities
* Marginal probabilities are found by summing the joint probabilities in the corresponding row or column of the joint probability table.
o Outer cells show marginal probabilities

39
Q

What is the Counting rule? give an example

A

The mn Counting Rule:
* If an operation can be done m ways and a second operation can be done n ways, then there are mn ways for the two operations to occur in order
o A cafeteria offers 5 salads, 4 meats, 8 vegetables, 3 breads, 4 desserts, and 3 drinks
* How many meals are available?
* 5 × 4 × 8 × 3 × 4 × 3 = 5760

40
Q

Explain sampling from a population with replacement

A

Sampling from a Population with Replacement:
* Sampling n items from a population of size N begin underline with replacement end underline would provide (N) n possibilities
o Six lottery numbers are drawn from the digits 0 to 9, with replacement

41
Q

Explain sampling from a population without replacement

A

Sampling n items from a population of size N without replacement provides the following number of possibilities

42
Q

What are independent events?

A

o The occurrence or nonoccurrence of one event does not affect the occurrence or nonoccurrence of the other event(s)
o The probability of someone wearing glasses is unlikely to affect the probability that the person likes milk
o Many events are not independent
* The probability of carrying an umbrella changes when the weather
forecast predicts rain If events are independent, then:
P ( X |Y ) = P ( X ), and P (Y | X ) = P (Y )
P(X |Y ) is the probability that X occursbegin underline given thatend underline Y has occurred.

43
Q

What is conditional probability?

A

Conditional probability: When the probability of one event is dependent on whether some related event has already occurred.

Conditional probabilities can be computed as the ratio of joint probability to a marginal probability.

44
Q

What are multiplication laws?

A

General Law of Multiplication
P ( X ∩ Y ) = P ( X ) ⋅ P (Y | X ) = P (Y ) ⋅ P ( X | Y )
* Used to find the joint probability

45
Q

What is the special law of multiplication?

A

Special Law of Multiplication
* If X and Y are independent,
P(X ∩ Y) = P(X) · P(Y)

46
Q

What are independent events under conditional probability?

A

Independent Events
If events are independent, then
P ( X | Y ) = P ( X ) and P (Y | X ) = P (Y )

47
Q

Explain the law of conditional probability?

A

Law of Conditional Probability: the conditional probability of X occurring, given that Y is known or has occurred is expressed

48
Q

What is Baye’s Rule?

A

Bayes’ Rule extends the use of the law of conditional probabilities to allow revision of original probabilities with new information

o The denominator is a weighted average of the conditional probabilities, with the weights being the prior probabilities
o The formula allows statisticians to incorporate new information to revise probability estimates

49
Q

What is statistics?

A

o A science dealing with the collection, analysis, interpretation, and presentation of numerical data
o Collect data -> analyze data -> interpret data -> present findings

50
Q

Population Vs Sample

A

Population: all
 A collection of all persons, objects, or items under study
 Can be broadly or narrowly defined

Census: gathering data from the whole population

Sample: gathering data on a subset of the population
 Should be representative of the whole population
 Use information about the sample to infer about the population

51
Q

What are the two branches of statistics?

A

Descriptive
 Uses data gathered on a group to describe or reach conclusions about that same group
 Produces graphical or numerical summaries of data

Inferential
 Gathers data from a sample and uses the statistics generated to reach conclusions about the population from which the sample was taken
 Sometimes called inductive statistics

52
Q

What is a parameter?

A

Parameter: descriptive measure of the population

53
Q

What is s statistic?

A
  • Statistic: descriptive measure of a sample
54
Q

What are the levels of data measurement?

A

Nominal -> ordinal -> interval -> ratio data (levels of data)

55
Q

What is nominal data?

A

Nominal: Used only to classify or categorize
 No quantitative value statement is implied
 Lowest level of measurement
Examples
* Profession (doctor, lawyer…)
* Sex (male, female)
* Eye color (brown, green, blue)
* Location (zip code)
 Best way to represent is by pie charts
 For example, the name of a class (9200) and (9201) is simply the name and there is no meaning attached to the numbers therefore it is nominal. 9201 doesn’t have more seats than 9200

56
Q

What is ordinal data?

A

Ordinal: ranking or ordering
 Distances between ranks are not always equal
 Nominal and ordinal data are nonmetric data or qualitative data because their measurements are imprecise

Example
* Ranking mutual funds by risk
* 50 most-admired companies
* Coffee cup size
 Often used in surveys
* Like a professor very much, not very much, worst professor ever

57
Q

What is interval data?

A

Interval: numerical data in which the distances between consecutive numbers have meaning
 Interval data have equal intervals

Example
* Fahrenheit temperature scale
o The zero point is a matter of convenience or convention
o A temperature of 0 does not mean that there is no temperature
o The amounts of heat between consecutive readings are the same
* Time
 0 is a value*

58
Q

What is ratio data?

A

Ratio: numerical data in which the distances between consecutive numbers have meaning and the zero value represents the absence of the characteristic being studied
 Highest level of data measurement
 Interval and ratio data are called metric or quantitative data because their measurements are precise
* Example
o Volume
o Weight
o Kelvin temperature

59
Q

Metric Vs Nonmetric data

A

o Nominal and ordinal (qualitative)
o Interval and ratio (quantitative)

60
Q

What does parametric statistics require?

A

require interval or ratio data

61
Q

What can nonparametric statistics be used with?

A

can be used with any data, but nominal and ordinal data require nonparametric methods

62
Q

What is Big data?

A

A collection of large and complex datasets from different sources that are difficult to process using traditional data management and processing applications

63
Q

What are the 5 V’s?

A

Volume
 Ever-increasing size of data and databases

Velocity
 The speed with which the data are available and can be processed

Variety
 Different forms and sources of data

Veracity
 Data quality, correctness, and accuracy

Value
 Sometimes considered a fifth characteristic

64
Q

What are the categories of Business Analytics?

A
  • Descriptive analytics
  • Predictive analytics
  • Prescriptive analytics
65
Q

What is descriptive analytics?

A

-Descriptive analytics: takes traditional data and describes what has or is happening in a business
o Used to discover hidden relationships and patterns
o Simplest and most commonly used category
o Data visualization
o Also called reporting analytics

66
Q

What is Predictive Analytics?

A

Predictive analytics: finds relationships in the data that are not readily apparent with descriptive analytics
o Patterns or relationships are extrapolated forward in time and the past is used to make predictions about the future
o Topics include, regression, time-series, forecasting, data mining, statistical modeling, machine learning techniques, decision tree models, and neural networks

67
Q

What is Prescriptive analytics?

A

Prescriptive analytics: examines current trends and likely forecasts to make better decisions
o Optimization models are an example of prescriptive analytics
o Takes uncertainty into account, recommends ways to mitigate risks, and tries to foresee the effects of future decisions
o Uses a set of mathematical techniques that determine optimal decisions given a complex set of objects, requirements, and constraints
o Topics include management science or operations research aimed at optimizing performance of a system such as mathematical programming, simulation, and network analysis

68
Q

What is data mining?

A

Data mining: collecting, exploring, and analyzing large volumes of data to uncover hidden patterns to enhance decision making

69
Q

What is Data visualization?

A

Data visualization: the study of the visual representation of data and is employed to convey data or information by imparting it as visual objects

70
Q

What is a discrete random variable?

A

Discrete random variable
o If the set of all possible values is at most a finite or a countably infinite number of possible values
o Most of the time produce nonnegative whole numbers
o Example: A group of 6 people are randomly selected from a population and the number of left-handed people are to be determined, the random variable produced is discrete because the only possible numbers are {0,1,2,3,4,5,6}, it is impossible to obtain a non-whole number.

71
Q

What are continuous distributions?

A

Take on values at every point over a given interval
o No gaps or un-assumed values
o Are generated from things that are measured
o Examples are:
 Time, weight, height, and volume
o Once this type of data is recorded it becomes discrete data because the data is rounded off to a discrete number

72
Q

What are the 3 types of discrete distributions?

A
  • Binomial distribution
  • Poisson distribution
  • Hypergeometric distribution
73
Q

What are the 6 continuous distributions?

A
  • Uniform distribution
  • Normal distribution
  • Exponential distribution
  • T distributions
  • Chi-square distribution
  • F distribution
74
Q

What are the binomial assumptions?

A

Binomial assumptions
 The experiment involves n identical trial
 Each trial has only 2 possible outcomes denoted as success or failure
 Each trial is independent of the previous trials
 The terms p and q remain constant throughout the experiment, where the term p is the probability of getting a success on any one trial and the term q = 1 – p is the probability of getting a failure on any one trial

75
Q

Binomial trials must be what?

A

Independent
o This means that either the experiment is by nature one that produces independent trials, or the experiment is conducted with replacement.

76
Q

Explain the mean and standard deviation for a binomial distribution?

A

A binomial distribution has an expected value or a long-run average, which is denoted by u and the value of  is determined by n*p

The standard deviation of a binomial distribution is denoted by SD = (square root)npq

77
Q

What is the Poisson distribution?

A
  • Poisson distribution focuses only on the number of discrete occurrences over some interval or continuum
  • Another discrete distribution
  • Has been referred to as the law of improbable events
  • Often used to describe the number of random arrivals per some time interval
78
Q

What are the characteristics of the Poisson distribution?

A
  • Discrete distribution
  • Describes rare events
  • Each occurrence is independent of the other occurrences
  • It describes discrete occurrences over a continuum or interval
  • The occurrences in each interval can range from zero to infinity
  • The expected number of occurrences must hold constant throughout the experiment
79
Q

Give an example of a Poisson distribution

A
  • Number of telephone calls per minute at a small business
  • Number of hazardous waste sites per province in Canada
80
Q
A