Statisitics Flashcards

1
Q

Probability

A

A mathematical tool to study randomness, dealing with the likelihood of an event occurring.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Statistics

A

The science that deals with the collection, analysis, interpretation, and presentation of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Descriptive Statistics

A

Organizing and summarizing data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Inferential Statistics

A

Drawing conclusions from data using formal methods to determine our confidence level of those conclusions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Population

A

A collection of persons, things, o objects under study.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sample

A

A subset of a population that are studied directly to gain information about the larger populaiton.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Statistic

A

A number that represents a property of a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Parameter

A

A numerical characteristic of a population that can be estimated by a statisitc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Representative Sample

A

A sample that accurately represents the parameters of the whole population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Variable

A

A characteristic or measurement that can be determined for each member of the population.

Typically denoted as X or Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Numerical Variable

A

A variable with units of equal weight.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Categorical Variables

A

Variables that identify a category that the object is in.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Data

A

The values of a variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Datum

A

A single value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Qualitative Data

A

The result of categorizing or describing attributes of a population. AKA categorical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Quantitative Data

A

Numbers. The result of counting or measuring attributes of a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Quantitative Discrete Data

A

Data that is measured on a scale that has a finite number of values within a finite interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Quantitative Continuous Data

A

Data measured on a scale that has an infinite number of values within a finite interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Pie Chart

A

A graph in which categories of data is represented wedges of a disk and are proportional in size to the percent of individuals in each category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Bar Graph

A

A graph in which the length of the bar is proportional to the number of individuals in each category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Pareto Chart

A

A bar graph in which bars are ordered from largest to smallest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Random Sampling

A

A sampling method in which each individual has an equal chance of be selected for the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Simple Random Sample

A

A random sampling method in which any group of n individuals is equally likely to be chosen as any other group of n individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Stratified Sample

A

A sample obtained by divide the population into groups called strata and then taking a proportionate number from each stratum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Cluster Sample

A

A sampling method by which one divides the population in clusters or groups and then randomly selects some of the clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Systematic Sample

A

A sampling method in which a starting point is chosen at random and then every nth piece of data from the population is added to the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Convenience Sampling

A

A non-random method of sampling that involves takes the data that is readily available.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Sampling with Replacement

A

Involves the member that has been chosen to go back into the population. This allows for the possibility of being chosen more than once.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Sampling without Replacement

A

When a member of a population can only be chosen once.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Sampling Errors

A

Errors in data resulting from the sampling process such as too small of a sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Nonsampling Errors

A

Errors in data not resulting from the sampling process such as a defective counter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Sampling Bias

A

Created when some members of a population are more likely to be chosen than other members.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Level of Measurement

A

The way a set of data is measured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Nominal Scale

A

Used to measure qualitative data. These are categories are not ordered in any way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Ordinal Scale

A

Similar to the nominal scale, it categorizes. But unlike the nominal scale, it is able to order the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Interval Scale

A

A measuring scale that has a definite ordering, ability to measure and calculate the difference in data points, and does not have a starting point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Ratio Scale

A

A quantitative measuring scale in which there is a starting point (0), and ratios can be calculated between data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Frequency

A

The number of times a value of the data occurs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Relative Frequency

A

The ratio of the frequency of a particular data point to the total number of outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Cumulative Relative Frequency

A

The sum of all previous relative frequencies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Explanatory Variable

A

The variable that causes a change in another. AKA independent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Response Variable

A

A variable that changes as a result of a change in the explanatory variable. AKA dependent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Treatments

A

The different values of the explanatory variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Experimental Unit

A

A single object or individual to be measured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Lurking Variables

A

Additional variables that can cloud a study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Random Assignment

A

Refers to randomly assigning the experimental units to the treatment groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Control Group

A

A group that is given a placebo treatment in which the treatment cannot influence the response group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Blinding

A

When a person involved in a research study does not know who is receiving the active treatments and who is receiving the placebo

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Double Blind Experiment

A

A research study in which both the researchers and the subjects are blinded

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Descriptive Statistics

A

An area of statistics concerned with displaying data through numerical and graphical ways.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Stem-and-Leaf Graph or Stemplot

A

A two column table, [‘stem’, ‘leaf’], with the leaf being the data point’s final significant digit and the stem being the rest of the digits. The rows are in descending order from least to greatest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

Outlier

A

An observation of data that does not fit the rest of the data. Sometime called an extreme value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

Line Graph

A

A graph that uses the x-axis to plot one variable and the y-axis to plot another variable. Line segments are used to connect each point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

Bar Graphs

A

A graph that uses bars to display the magnitude of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

Histogram

A

A graph that consist of adjoining boxes. The horizontal axis is labeled with eh data it represents while the vertical axis is labeled with either the frequency or relative frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Frequency Polygon

A

A line graph with the data on the x axis and the frequency on the y axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

Time Series Graph

A

A graph with time on the horizontal axis and the data on the vertical axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Quartiles

A

Measures of location on the horizontal axis. Q1 (25%), Q2 (50% or median), Q3 (75%).

Divides ordered data into quarters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

Percentiles

A

Divides ordered data into hundredths.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

Median

A

The center of the data. If the number N of data points is even, then the median is the average of the two values closest to the N/2. If it odd, then it is the value of the ((N-1)/2)+1 data point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

Interquartile Range (IQR)

A

The spread between the first and third quartile.

IQR = Q3-Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

Box Plots or Box-Whisker Plots

A

Gives a good image of the concentration of data.

Constructed with the minimum value, Q1, the median (Q2), Q3, and the maximum value.

The min/max are the endpoints of of the axis, Q1 marks the edge of the box closest to the min and Q3 marks the edge of the box closest to the max.

|——–|=====|====|———-|
min

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

Box Plots or Box-Whisker Plots

A

Gives a good image of the concentration of data.

Constructed with the minimum value, Q1, the median (Q2), Q3, and the maximum value.

The min/max are the endpoints of of the axis, Q1 marks the edge of the box closest to the min and Q3 marks the edge of the box closest to the max.

     |--------|=========|====|-----------------------|
  min      Q1      Median     Q3                    max
63
Q

Mean

A

The sum of N data points divide by N

64
Q

Median

A

The value of the data point in set of N data points with index of (N+1)/2 if odd or (V[N/2]+V[N/2+1])/2 if N is even

65
Q

Mode

A

The most frequent value

66
Q

The Law of Large Numbers

A

The limit of the sample mean as sample size approaches population size is population mean

67
Q

Sampling Distribution

A

The distribution of frequencies of a range of different outcomes that could occur for a statistic of a population

68
Q

Symmetrical Distribution

A

Occurs if a vertical line can be drawn at some point for which the image of the left side will mirror the image to the right side

69
Q

Skewed to the Left

A

When a distribution of data is biased towards the left side of the mode.

70
Q

Skewed to the Right

A

When the distribution of data is concentrated on the right side of the mode

71
Q

Standard Deviation

A

A widely used measure of variation.

A number that measures how far data is from the mean.

value = mean + (#ofSTDEV)(standard deviation)

72
Q

Deviation

A

If x is a measured value of a data point in a data set with a mean M, then

deviation = x-M

73
Q

Variance

A

The average of the squares of deviations

x – x̄ for sample

x - μ for population

74
Q

Standard Deviation Formula

A

Sample: s = √(∑(x - x̄)²/(n-1))

Population: σ = √(∑(x - μ)²/Ν)

75
Q

Standard Error of Mean

A

The standard deviation of the sampling distribution of the mean
σ ∕ √n

σ is the standard deviation of the population

n is the sample size

76
Q

Sampling Variability of a Statistic

A

A measure of how much a statistic varies from one sample to another

77
Q

Z-Score

A

The number of standard deviations from the mean

78
Q

Chebyshev’s Rule

A

For any data set:
>75% of data is within 2 standard deviations form the mean
>89% of data is within 3 standard deviations of the mean
>95% of data is within 4.5 standard deviations of the mean

79
Q

Empirical Rule

A

For data with Bell-shaped or Symmetric distribution:

~68% of data is within 1 standard deviation from the mean
~95% of data is within 2 standard deviations form the mean
>99% of data is within 3 standard deviations of the mean

80
Q

Probability

A

A measure associated with how certain we are of outcomes of a particular experiment or event

81
Q

Experiment

A

Planned operation carried out under controlled conditions

82
Q

Chance Experiment

A

An experiment in which the result is not predetermined

83
Q

Event

A

Any combination of outcomes.

Represented by uppercase letters like A or B

The probability of an event A is written P(A)

84
Q

Outcome

A

The result of an experiment

85
Q

Sample Space

A

The set of all possible outcomes of an experiment

86
Q

Equally Likely

A

Each outcome of an experiment occurs with equal probability

87
Q

Law of Large Numbers

A

As the number of repetitions of an experiment is increased, the relative frequency (# times of a particular outcome/# of total outcomes or repetitions) obtained in the experiment tends to become close and closer to the theoretical probability

88
Q

Empirical

A

Often used in place of the word observed.

Observed result = empirical result

89
Q

OR Event

A

The outcome is in the event A OR B if the outcome is in A or is in B or is in A and B

90
Q

AND Event

A

An outcome is in A AND B if and only if the outcome is in both A and B

91
Q

Complement

A

All the outcomes that are not in an event A, denoted as A’ (A prime).

92
Q

Conditional Probability

A

The conditional probability of A given B is probability that an event A occurs given an event B has already occurred.

P(A|B)

P(A|B) = P(A AND B) / P(B)

93
Q

Independent Events

A

Two events are independent if the occurrence of one event does not affect the chance that the other occurs.

If the following are True:

  • P(A | B) = P(A)
  • P(B | A) = P(B)
  • P(A and B) = P(A)P(B)
94
Q

Mutually Exclusive Events

A

Events that cannot occur at the same time.

P(A AND B) = 0

95
Q

Multiplication Rule

A

If A and B are two events are defined on a sample space:

P(A AND B) = P(B)P(A | B)

96
Q

Addition Rule

A

If A and B are defined on a sample space:

P(A OR B) = P(A) + P(B) - P(A and B)

97
Q

Contingency Table

A

A table consisting of at-least two rows and two columns that shows the observed frequency of two variables.

98
Q

Tree Diagram

A

Consists of nodes and branches. Each node represents the probability of an event.

99
Q

Venn Diagram

A

A box that represents the sample space, and circles/ellipses that represent the individual events. The overlap of circles represents a common outcome between two events.

100
Q

Random Vairable

A

A variable that describes the outcomes of a statistical experiment in words.

Denoted by upper case letters like X or Y.

Lower case letters like x or y denote the value of the variable.

Example:

X = the number of heads you get when you toss three fair coins

x = 0,1,2,3

101
Q

Discrete Probability Distribution Function (Discrete PDF)

A

Has two characteristics:

  1. Each probability is between zero and one, inclusive.
  2. The sum of the probabilities is one.
102
Q

Expected Value

A

AKA long-term average or mean

The average value that is expected when an experiment is repeated over the log-term

Denoted by μ

μ = Σ (x · P(x))

103
Q

Standard Deviation of a Probability Distribution

A

Denoted by σ.

σ = SQRT( Σ[ (x - μ)^2 · P(x) ])

The square root of the sum of the variances squared times the probability

104
Q

Binomial Experiment

A
  1. Fixed number of trials, denoted by n
  2. Only two possible outcomes for each trial: “success” and “failure”. The letter p denoted the probability for success on one trial and q represents the probability of failure for one trial. p + q = 1
  3. The n trials are independent and are repeated using identical conditions.
105
Q

Binomial Probability Distribution

A

The outcomes of a binomial experiment

The random variable X = the number of successes in the n independent trials

The mean μ = np

The variance σ² = npq

The Standard Deviation = √npq

106
Q

Bernoulli Trial

A

A binomial experiment win which n=1.

107
Q

X̴̴~B(n,p)

A

X is a random variable with a binomial distribution.

B = binomial probability Distribution Function with parameter n = number of trials and p = probability of success on each trial

108
Q

Geometric Experiment

A
  1. One or more Bernoulli trials with all failures except the last one. In other words, Trials are repeated until a success.
  2. The number of trials must be greater than 0 but has not limit.
  3. Each trial has the same p and q.

X = # of trials until first success.

109
Q

Geometric Probability Distribution Function

A

X~G(p)

X is a random variable with a geometric distribution.

p = the probability of a success for each trial

110
Q

Hypergeometric Experiment

A
  1. Take samples form two groups
  2. Concerned with a group of interest, “the first group”
  3. Sample without replacement
  4. Each pick is not independent, given the sampling is done without replacement
  5. These are not Bernoulli Trials

X = # of items from the group of interest

111
Q

Poisson Experiment

A
  1. Poisson probability distribution gives the probability of a number of events occurring in a fixed interval of time or space if these events happen with a known average rate and independently of the time since the last event.
  2. The Poisson distribution may be used to approximate the binomial if the the probability of success is “small” (such as .01) and the number of trials is “large” (such as 1,000)

X = the number of occurrences in the interval of interest

112
Q

Poisson Probability Distribution Function

A

X~P(μ)

X is a random variable with a Poisson distribution

The parameter μ (or λ) = the mean for the interval of interest.

THe standard deviation of the Poisson distribution with mean μ is Σ =√μ

113
Q

Probability Density Function (pdf)

A

The function f(x) that represents a continuous probability distribution.

114
Q

Cumulative Distribution Function (cdf)

A

Measures the area under the curve.

  1. Outcomes are measured, not counted
  2. The area under the curve is equal to 1
  3. Probabilities are found for intervals of x rather than individual values of x
  4. P(c < x < d)
  5. P( x = c) = 0
  6. P( c < x < d) = P(c <= x <= d)
115
Q

Exponential Distribution

A

Often concerned with the amount of time until an event occurs.

Typically has greater numbers of small values and lesser numbers of large values.

116
Q

Memoryless Property

A

P(X > r + t | X > r) = P(X > t) for all r, t >= 0

The probability of an event happening in time t given that an amount of time r has past is the same as the probability of an event happening in time t regardless of the amount of time past.

117
Q

Conditional Probability

A

The Likelihood that an event will occur given that another event has already occurred.

118
Q

Decay Parameter

A

Describes the rate at which probabilities decay to zero for increasing values of x.

It is the value m in the probability denity function f(x) = me^(-mx)

m=1/μ

μ = mean of the random variable

119
Q

Uniform Distribution

A

A continous rand variable that has equally likely outcomes over the domain, a

120
Q

Standard Normal Distribution

A

A normal distribution of standardized values called z-scores.

X~N(μ,σ)

121
Q

Z-Scores

A

Measured in units of standard deviation.

z = (x - μ) / σ

122
Q

Normal Distribution PDF

A

f(x) = (1 / (σ•√(2π))•(e^(-1/2((x-μ)/σ)²))

123
Q

Central Limit Theorem

A

If repeated sampling of large enough sizes n, and each sample’s mean is calculated, the histogram created from those means will approximate a normal bell shape.

124
Q

The Central Limit Theorem for Sums

A

If one repeatedly draws samples of a given size and calculates the sum of each sample, the sums will follow a normal distribution.

The normal distribution has a mean equal to the original mean multiplied by the sample size .

The normal distribution has a standard deviation equal to the original standard deviation multiplied by the square root of the sample size.

125
Q

Inferential Statisitics

A

Part of statistics concerned with using sample data to make generalizations about an unknown population.

126
Q

Point Estimates

A

Single values used to estimate a parameter within a population.

127
Q

Confidence Interval

A

An interval of numbers that a parameter will fall in with a given probability.

(point estimate - margin of error, point estimate + margin of error)

128
Q

Hypothesis Test

A

Collecting data from a sample and evaluating the data.

  1. Set ip two contradictory hypotheses
  2. Collect sample data
  3. Determine the correct distribution to perform the hypothesis test
  4. Analyze sample data by performing calculations that will ultimately allow you to reject or decline to reject the null hypothesis
  5. Make a decision and write a meaningful conclusion
129
Q

Null Hypothesis

A

Denoted as H₀

A statement of no difference between the variables - they are not related.

130
Q

Alternative Hypothesis

A

Denoted as Hₐ

A claim that contradicts the null hypothesis.

The hypothesis that the researcher is trying to prove

131
Q

Hypothesis Testing Outcomes

A
  1. Do not reject null / null is true
  2. Reject the null / null is true
  3. Do not reject null / null is false
  4. Reject the null / null is false
132
Q

Type I Error

A

Rejecting the null when it is actually true

133
Q

Type II Error

A

Accepting the null when it is actually false

134
Q

P(Type I Error)

A

Probability of a type I error.

Denoted as α

135
Q

P(Type II Error)

A

Probability of a type II error

Denoted as β

136
Q

Power of the Test

A

The probability of rejecting the null when it is false

137
Q

P-Value

A

The calculated probability of getting the test result

138
Q

Rejecting or Not Rejecting Null Hypothesis

A
  1. if α > p-value, reject null.

2. if α ≤ p-value, do not reject null

139
Q

Independent Groups

A

Sample groups that are independent from each other

140
Q

Matched Groups

A

Two samples that are dependent on each other

141
Q

Standard of Error

A

An estimated standard deviation for a hypothesis test of the difference in sample means

sqrt( (s_1)^2/n_1 + (s_2)^2/n_2 )

142
Q

Chi-Square Distribution

A

X~X²_df

df is the degrees of freedom

μ = df

σ = sqrt( 2(df) )

143
Q

Student’s t-distribution

A
  • the graph is similar to the standard normal curve
  • the mean is zero and the distribution is symmetric about zero
  • has more probability in its tails than a standard normal distribution
144
Q

Degrees of Freedom

A

The number of independent pieces of information needed to calculate a statisitic

145
Q

Goodness=of-Fit Test

A

∑ₖ(O - E)² / E

O = observed values
E = expected values
k = number of different data cells or categories
146
Q

Multivariate

A

Data containing multiple variables

147
Q

Linear Regression

A

the process of fitting the best-fitting line

148
Q

Error of Residual

A

The difference between the y value of a data point at x and the regression line at x.

149
Q

Sum of Squared Errors (SSE)

A

∑𝜀²

150
Q

Correlation Coefficient r

A

𝑟= (𝑛𝛴(𝑥𝑦)−(𝛴𝑥)(𝛴𝑦)) / √[[𝑛𝛴𝑥²-(𝛴𝑥)²][𝑛𝛴𝑦²−(𝛴𝑦)²]]

-1 <= r <= 1
Values of r closer to -1, 1 indicates a stronger linear relationship between x and y

151
Q

Coefficient of Determination

A

The square of the correlation coefficient

Represents the percent of variation in the dependent variable y that can be explained by variation in the independent variable x using the regression line.

152
Q

Significance of the Correlation Coefficient

A

A hypothesis test to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population

153
Q

Population vs Sample Correlation Coeffecient

A

ρ = population correlation coefficient

ρ is unknown

r = sample correlation coefficient

r is known, calculated from the sample data

154
Q

Significance Level

A

The probability of rejecting a null hypothesis when it is true

155
Q

Outliers

A

Observed data points that are far from the least squares line

Usually identified as being further than two standard deviations from the best-fit line

s = sqrt( SSE / n-2 )

s is the standard deviation

SSE = sum of squared errors

n = number of data points

156
Q

Analysis of Variance (ANOVA)

A

Used to determine the existence of a statistically significant difference among several group means

Assumptions:

  1. Each population from which a sample is taken is normal
  2. All samples are randomly selected and independent
  3. The populations are assumed to have equal SD or variances
  4. The factor is a categorical variable
  5. The response is a numerical variable