Stats Flashcards

1
Q

Most common observation study?

A

Surveys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are surveys? (Observational study)

A

Questionnaires presented to individuals, selected from a POPULATION OF INTEREST

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the role of surveys (what they can and can’t do)?

A
  • Can only report relationships between variables

- Cannot claim CAUSE and EFFECT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an experiment?

A

The systematic procedure carried out under controlled conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the role of experiments (3)?

A
  • To discover an unknown effect
  • To illustrate a known effect
  • To test OR establish a hypothesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What should experiments be designed to do?

A

Minimise BIASES that might occur

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When analysing a process, experiments are used to evaluate…

A
  • Which PROCESS INPUTS have a significant impact on the PROCESS OUTPUTS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What’s the process called behind the several different ways to collect experimental process input/output information?

A

Design of Experiments (DOE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Purpose of experimentation… (6)

A
  • Comparing alternatives
  • Identifying the significant inputs (factors) which affect the outputs response
    I.e. separating vital many from the trivial few
  • Achieving an OPTIMAL PROCESS OUTPUT (response)
  • Reduce Variability
  • Minimizing, Maximizing, or Targeting an Output
  • Achieve product & process robustness
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

To minimize bias, you need to…

A

Select your sample of individuals randomly!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the three data collection types? (3)

A
  • Categorical data
  • Numerical data
  • Ordinal data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Categorical data?

A

Records qualities or characteristics about the individual, such as eye color or opinions (agree/disagree)
(NB Numbers do not have “real numerical meaning”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Numerical data?

A

Records measurements or counts regarding each individual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Ordinal data?

A

Are in between categorical and numerical: data appear in categories, but the categories have a meaningful order (E.g. Rankings 1st - 5th (best to worst))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

If the data set contains an even number of values… (median)

A

The median is the average of the two values that are in the middle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Standard Deviation? (definition)

A

Quantifies the typical distance from any value in the data set to the centre

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Standard Deviation (equation)

A

sigma = sqrt (sum: xi - mean x)^2/n-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Properties of standard deviation

A
  • Is always +ve
  • Smallest possible value is zero
  • Affected by OUTLIERS
  • Has the same UNITS as the original data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

A random variable is…

A

a variable whose possible values are numerical outcomes of a RANDOM PHENOMENON

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Types of random variables:

A
  • Continuous

- Discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

A probability of distribution is…

A

a list of possible values of a random variable,

together with their probabilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

A binomial distribution is…

A

a frequency distribution of the possible number of
successful outcomes in a given number of trials in each of which there is the same probability of success… (I.e. SUCCESS/FAILURE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Characteristics of a Binomial Distribution (4)

A
  • Must be a fixed number of trials (n)
  • Only two outcomes: SUCCESS/FAILURE
  • The probability of success,p, must remain the same for each trial (p)
  • The outcomes of each trial must be INDEPENDENT of each other
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

If a random variable X has a binomial distribution, PROBABILITIES for X can be calculated using the following formula:

A

(n choose x) (p^x)(1-p)^n-x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Binomial Distribution parameters:

A
n = no. trials
x = no. successes
n-x = no. fails
p = success probability (any trial)
1-p = failure probability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Probabilities of a binomial distribution hold between…

A

0 to n (least/most no. successes in a trial)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

For a binomial random variable the mean is:

A

µ = n.p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

The variance of a random variable is…

A

The weighted average of the squared distances from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

The variance of a random variable is… (formula)

A

sigma^2 = n*p(1-p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Discrete random variable:

A

A variable which can only take a countable number

of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Continuous random variable:

A

A random variable takes on values within AN INTERVAL (has so many possible values that they might as well be considered continuous)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

The most adopted distribution for continuous

random variables:

A

The normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

The Normal Distribution: Definition

A

Random Variable X follows a normal distribution if its values fall into a bell-shaped continuous curve that is symmetric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

The Normal Distribution: Fundamental characteristics (3)

A
  • The area under the curve is EQUAL TO UNITY
  • It has symmetry about the centre (i.e., it has 50% of values less than the mean and 50% greater than the mean)
    -Each normal distribution is described via the mean,
    µ, and the standard deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Saddle Points:

A

Where the bell-shaped curve changes from concave down to concave up.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Distance between the mean and the saddle points

A

1 σ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

For any normal distribution, almost all
its values lie within __ standard
deviations of the mean

A

3

38
Q

The Standard Normal Distribution, AKA:

A

The Z-Distribution

39
Q

The Standard Normal Distribution has mean equal to:

A

0

40
Q

The Standard Normal Distribution has S.D. equal to:

A

Unity

41
Q

The normal random variable of a standard normal distribution is called a…

A

Standard score / z-value

42
Q

A value on the Z-distribution represents…

A

the number of standard deviations the data is

above or below the mean

43
Q

68% of Standard normal distribution values are:

A

within 1 σ of the mean

44
Q

95% of Standard normal distribution values are:

A

within 2 σs of the mean

45
Q

99.7% of Standard normal distribution values are:

A

within 3 σs of the mean

46
Q

To change a value of X into a value of Z, you can use this formula:

A

z = (X - µ)/σ

47
Q

!Problem follows a normal distribution, this is what you have to do to find a probability for X

A TO F!

A

a) Define your problem as either P(X<a>b), or P(ab) the result is one minus the probability determined under c) problem solved!
b) Calculate the corresponding z-values via: Z=(Xµ)/σ
If your problem follows a normal distribution, this is what you have to do to find a probability for X:
c) Find the probability for the transformed Z-value using the Z-table
d) If P(X</a><a>b) the result is one minus the probability determined under c) problem solved!
f) If P(aa) and subtract the results problem solved!
f) If P(aa) and subtract the results problem solved!</a>

48
Q

When a sample of data is taken from a given population of data…

A

the statistical results/characteristics vary from sample to sample

49
Q

To build the sampling distribution of the sample mean (3):

A

To build the sampling distribution of the sample mean:

1) Take a sample of values from random variable X (population)
2) Calculate the mean of the sample,
3) Repeat step 1) and 2) over and over again

50
Q

All the sample means result in a new population which is denoted using random variable

A

X~

51
Q

The sampling distribution of the sample means gives all the possible values of the sample mean and quantifies…

A

how often they occur

52
Q

A sampling distribution has its own…

A

shape, centre, and variability.

53
Q

The mean of SAMPLING DISTRIBUTION X~ is denoted as:

A

µx~

54
Q

The variability characterising a population of values (

X) is quantified in terms of

A

Standard deviations

55
Q

The variability in the sample mean X~ is measured in terms of standard errors

A

σx~ = σx/sqrt n

56
Q

If the distribution of X is normal, then also the distribution of X~ is…

A

normal

57
Q

If the distribution of X is unknown or not-normal, according to Central Limit Theorem (CLM), the distribution of X~ can be…

A

approximated with a normal distribution

58
Q

For the sampling distribution X~, it can be approximated to the normal distribution if: (2)

A
  • The population has mean µ, and standard deviation σ

- A sufficient amount of LARGE/RANDOM samples are taken

59
Q

Further, the larger the sample size, n, the closer the distribution of the sample means will be to a…

A

normal distribution

60
Q

Probability for X~ (formula)

A

Z = (X~-µx~)/(σx/sqrt n)

61
Q

Confidence Interval:

A

A range of values so defined that there is a specified
probability that the value of a parameter lies within it
- sample statistic ± (margin of error) gives a range of likely values for the parameter under investigation.

62
Q

The goal when making an estimate using a confidence interval is to

A

minimise the margin of error.

63
Q

The size of the margin of error is affected by:

A

1) Confidence level
2) Sample size
3) Variability in the population

64
Q

Confidence Level:

A

The probability that the value of a parameter falls within a specified range of values.
… in other words, the confidence level of a confidence interval corresponds to the percentage of the time the result would be correct if numerous random samples were taken.

65
Q

For a given confidence level, the number of standard errors to be added and subtracted (±) is proportional to…

A

z*-, which determined from the standard normal distribution (Z-)

66
Q

The confidence interval for a population mean is:

A

x~ ± z*(σx/sqrt n)

67
Q

This means that as n increases both the standard error and the margin of error decrease, with this resulting in a

A

narrower confidence interval

68
Q

as the confidence level increases,

A

the margin of error increases

69
Q

When estimating a population mean, the sample size needed to achieve the desired margin of error can be estimated a priori via the following formula:

A

n = (z*σx/MOE)^2 (next greatest integer)

70
Q

If σx is unknown,

A

a pilot test can be run in order to make a rough estimate

71
Q

The sample size needed to achieve the desired margin of error can be estimated (very roughly!) via the following formula:

A

1/sqrt n

72
Q

Variability (also called spread or dispersion) refers to how

A

spread out a set of data is. Variability is measured in terms of standard errors/deviations

73
Q

To compare two different populations, it is common practice to calculate the confidence interval for the difference of two population means as:

A

x~-y~ ± z*sqrt(σ1/n1+σ2/n2)

74
Q

A hypothesis test is

A

a procedure that uses data from a sample to confirm or

deny a claim about a population

75
Q

Every hypothesis test is based on two hypotheses, i.e.:

A
  • null hypothesis H0

- the research (or alternative) hypothesis (denoted Ha)

76
Q

Ha can be formed in three different ways, the population parameter is _____ to the claimed value (3)

A
  • Not equal to
  • Larger than
  • Smaller than
77
Q

The null hypothesis is set up so that H0 is

A

true unless some data and statistics demonstrate otherwise

78
Q

a statistically significant result is when:

A

H0 is rejected in favour of Ha

79
Q

As soon as the z-value of interest is known, proceed as follows:

A

⊗ if Ha is the less than alternative then: p-value = z-value
⊗ if Ha is the greater than alternative then: p-value = 1 - z-value
⊗ if Ha is the not-equal-to alternative then: p-value = 2*z-value

80
Q

bivariate data set

A

each observation is described using two variables, x and y

81
Q

After organising your bivariate data set, you can…

A

⊗ look for patterns
⊗ find a possible correlation
⊗ predict a value fory for a given value for x
⊗ summarise the dataset with scatterplots

82
Q

given a bivariate data set, it is important to quantify

A

STRENGTH & DIRECTION of linear relationship

83
Q

n in the correlation coefficient equation is..

A

the number of pairs of data

84
Q

we have a strong linear relationship when

A

r+0.6

85
Q

the correlation coefficient is dimensionless, so that changing the units of X and Y

A

does not affect r

86
Q

the correlation coefficient does not change if variables X and Y are

A

switched in the data set

87
Q

Pearson product moment correlation coefficient, R^2, ranges between

A

0 to 1 for no to perfect correlation

88
Q

Function y=f(x) can be determined using a regression line provided that: (2)

A
  • the data in the scatterplot follow (roughly) a linear distribution
  • we have a strong linear relationship between
    x and y, i.e. r+0.6
89
Q

To determine m and b, you can use the following relationship

A

m = r(σy/σx)

90
Q

A log-log regression line is expressed mathematically as:

A

y= a x^k

91
Q

log-log line

A

Y = mX + b (X = logx, Y = logy)