Statistics Flashcards

1
Q

Bivariate data

A

Data relating to pairs of variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Variables that are statistically related

A

Correlated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you identify correlation

A

Scatter graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What goes on x axis

A

Explanatory or independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Population

A

The set of things you are interested in

E.g. all people in the uk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Census

A

Observes or measures every member of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Parameter

A

A number that describes the entire population

E.g. the mean or standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sample

A

Subset of a population

Used to find out information about the whole population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Statistic

A

A value calculated from a sample
E.g. the mean or standard deviation of the sample that can be used to estimate the mean of the population or standard deviation of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sampling unit

A

An individual unit from the population

E.g. The particular person living in the uk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Sampling frame

A

A list of all the sampling units in the population

E.g. The electoral register for the uk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Advantage of using a sample over a census

A

Quicker, fewer people have to respond and less data to process
Less expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Advantage of using a census over a sample

A

Should be a completely accurate result

A sample may not be large enough to give information about small sub groups of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sample disadvantage

A

Data may not be accurate

Sample might not be large enough to give information about small sub groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Census disadvantage

A

Takes a long time and expensive
Hard to process the large quantities of data
Cannot be used if the testing destroys them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Advantages of sampling

A

Quick and not as expensive
Fewer people have to respond
Less data to process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Advantages of a census

A

Should give a completely accurate result

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

If you want to know the mean number of sweets in a packet of sweets, why is it not possible to use a census

A

Destroying all the sweets

Can’t use a census if all the sampling units are being destroyed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

3 methods of random sampling

A

Simple random sampling
Systematic sampling
Stratified sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Simple random sampling method

A

Number all the items in the population
Use a random number generator to select sample of desired size
If a number is replicated generate another number for item to be sampled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Systematic sampling method

A

Number all items in the population
Let n=population size/sample size
Use random number generator from 1 to n to select the first item
Choose every nth item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Stratified sampling method

A

The population divided into groups
Decide how many to sample from each group using…
(Number in group/Number in population)×sample size
Use simple random sampling to select the items from each group
So it is proportional and representative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

2 methods of non random sampling

A

Opportunity sampling/convenience sampling

Quota sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Opportunity sampling method

A

Sample consists of any items available to be sampled
Used to sample the required number from each group and once requirement is filled any further items are ignored
E.g. who walks into the frozen aisle of a supermarket

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Quota sampling method

A

The population divided into groups
Decide how many to sample from each group using…
(Number in group/Number in population)×sample size
Sample the first “X” for each group and ignore any further items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Advantages of simple random sampling

A

Free of bias
Easy and cheap to implement for small populations and small samples
Each sampling unit has a known and equal chance of selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Disadvantages of simple random sampling

A

Not suitable for a small population size or sample size
Large samples are expensive and time consuming and disruptive
Need a sampling frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Advantages of systematic sampling

A

Simple and quick to use

Suitable for a large sample/population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Disadvantages of systematic sampling

A

Sampling frame needed

Can introduce bias if the sampling frame isn’t random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Advantages of stratified sampling

A

Sample accurately reflects the population

Guarantees proportional representation of groups within a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Disadvantages of stratified sampling

A

Population must be clearly classified into discrete groups

Selection within each group suffers same as simple random sampling e.g. need a sampling frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Advantages of opportunity sampling

A

Easy to carry out

Inexpensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Disadvantages of opportunity sampling

A

Unlikely to provide representative sample
Highly dependant on individual researcher
Not random so may introduce bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Advantages of quota sampling

A

Allows a small sample to still be representative of the population
No sampling frame needed
Quick and easy and inexpensive
Allows for easy comparison between different groups in a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Disadvantages of quota sampling

A

Not random so may introduce bias
Population must be divided into groups which can be costly or inaccurate
Non-responses are not recorded as such
Increasing scope of study increases number of groups which adds time and expense

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Mean

A

A numerical measure

Given by the Σx/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What’s a median

A

A numerical measure
Given by n+1/2 for non grouped data
And n/2 for groped

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Mode

A

Most common value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Range

A

Difference between the highest and lowest data value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Lower Quartile

A

Q1
Point that is a quarter of the way along an ordered data set
Given by n+1/4 for non-grouped data
And n/4 for grouped data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Upper Quartile

A

Q3
Point that is three quarters of the way along an ordered data set
Given by 3(n+1)/4 for non-grouped data
And 3n/4 for grouped data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

IQR

A

Interquartile range
The difference between the lower and upper quartile
Q3-Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Variance

A

A measure of spread of data
σ^2=Σ(x-x̄^2)/n

Where x̄ is the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Standard deviation

A

A measure of spread of data

σ=sqrt of variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Can you use your calculator to get the median for linear interpolation

A

No

Not accurate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

How do you use a calculator to get the mean, median, standard deviation, variance and quartiles

A
Shift
Menu/setup
3. Statistics
Frequency on
Menu/setup
6. Statistics
1. 1-Variable
Input values
AC (sets table)
OPTN
2 (1-Variable calc)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Discrete data

A

Can only take certain values and can have gaps

shoe size, money, number of sweets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Median for grouped

A

n/2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Median for non-grouped data

A

n+1/2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Continuous data

A

Can take any value in a certain range

height, time, length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Linear interpolation assumption

A

Assuming that the data values are evenly distributed within each class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

How do you work out standard deviation

A

Root of variance

Or

Page 3/4 of formula book
Where

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What is coding

A

A way of simplifying statistical calculations

Each data value is coded to make a new set of data values that are easier to work with

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

Coding formula for mean and standard deviation

A

Mean: ȳ=(x̄-a)/b

Standard deviation: σy=σx/b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

What is an outlier

A

An extreme piece of data which differs significantly from other observed data values
Expected formula will be given in exam

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

What does it mean to clean data

A

Remove outliers
But keep the outliers in unless told otherwise
Mark with an x if you are able to identify them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

Advantage of mode

A

Useful for non numerical data
Not usually affected by outlier or emissions
Always an observed data value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Disadvantage of mode

A

Does not use all data values
May not be representative if low frequency
May not be representative if in a small population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

Advantage of median

A

Not usually affected by outliers or errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

Disadvantage of median

A

Not always a data value

Does not use all data values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

Advantage of mean

A

When data is large a few extreme values have little effect

Uses all data values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

Disadvantage of mean

A

May not always be a data value

Affected by outliers and errors if in a small population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

Advantage of range as a measure of spread

A

Reflects the full data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

Disadvantage of range

A

Distorted by outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

Advantage of using the IQR as a measure of spread

A

Not distorted by outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

Disadvantages of using the IQR as a measure of spread

A

Does not reflect the full data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

Advantage of using the standard deviation as a measure of spread

A

When data is large a few outliers have negligible impact

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

Disadvantage of using the standard deviation as a measure of spread

A

When a data set is small a few outliers have a large impact

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

What is a box plot

A

Can be drawn to represent important features of data
AKA FIVE FIGURE SUMMARY since it displays the lowest and highest values, the quartiles and the median
Can display any outliers with an x or *

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

When can cumulative frequency be used

A

For grouped data

Can be an alternative way to estimate the median, quartiles or percentiles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

Do you include outliers in range

A

Yes

Unless told otherwise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

How do you construct a cumulative frequency graph

A

Calculate cumulative frequency
Appropriate scale
Plot points using max value for the class width NOT middle of class
Find the quartile necessary by using the cumulative frequency to read off the value of ‘variable’ like height or time

73
Q

When can a histogram be used

A

Grouped continuous data

Gives a good picture of how data is distributed and allows you to see the rough location and shape of the data spread

74
Q

Relationship between area and frequency in histogram

A

Area is proportional to frequency

75
Q

Frequency density

A
Frequency/class width
Assume there is an equal spread so you use the midpoint of each class
Don't join first and last point
76
Q

What is a frequency polygon

A

Obtained by joining the middle of the top of each bar

77
Q

How do you construct a histogram

A
Frequency density: frequency/class width
Frequency density on the y axis
78
Q

Assumption for using a frequency polygon

A

That the data is spread equally in classes

79
Q

What is an experiment

A

A repeatable process that gives rise to a number of outcomes

80
Q

What is an event

A

A collection of outcomes from an experiment from which a probability is assigned

81
Q

What is a sample space

A

A set of all possible outcomes

82
Q

How are probabilities written

A

Decimals

Fractions

83
Q

What is a random variable

A

A variable whose value depends on the outcome of an event

84
Q

Sample space in terms of discrete probability distribution

A

Range of values a random variable can take

85
Q

When is a variable discrete

A

If it can only take certain numerical values

86
Q

What is a probability distribution

A

Describes the probability of every outcome in the sample space
Several ways this can be displayed e.g
Table
Probability mass function

87
Q

How can probability distribution be displayed

A

Table

Probability mass function

88
Q

What is a discrete uniform distribution

A

Probability distribution in which the probabilities of each outcome is the same

89
Q

What is a probability density function

A

The distribution for a continuous random variable

The area under the graph of this function represents probability

90
Q

Explain thr notation X~B(n,p)

A

Notation for the binomial distribution of X
Where n is the number of trials carried out
And p is the probability of success

91
Q

When can a binomial distribution be applied

A

Two outcomes only, win and lose
There are a fixed number of trials, n
The probability of success is the same for each trial, p
All trials are independent

92
Q

How do you find the probability for a binomial that’s equal to something

A
Menu
7
4 - Binomial PD
2
Enter with =
93
Q

How do you find the probability for a binomial that’s less than or equal to

A
Menu
7
Down
1 - Binomial CD
2
Enter with =
94
Q

How do you find the probability of a binomial that’s less than something

A

Calculator only works out less than or equal to

Change it

95
Q

How do you find the probability of a binomial that’s greater than or greater than or equal to

A

Calculator only works out less than or equal to

Change it and use 1 minus

96
Q

How do you calculate the expected value of a binomial distribution

A

np

97
Q

What is the expectation of a distribution

A

The ling term average
If the event was repeated many times the expected value would be the average of the outcomes
E[X] = ų = np
Where n is the number of trials and p is the probability

98
Q

What is a hypothesis

A

A statement made about the value of a population parameter

Testing a hypothesis is done by carrying out an experiment or taking a sample from the population

99
Q

What is a test statistic in term of hypothesis

A

The result of the experiment or the statistic that is calculated from the sample
In order to carry out the test there must be two hypotheses

100
Q

What is a null hypothesis

A

H0
The default position
What is expected

101
Q

What is an alternative hypothesis

A

H1
Describes an alternative possibility
More, less, different

102
Q

What is a one tailed test

A

Describes when you are testing whether a parameter is more or less than some number

103
Q

What is a two tailed test

A

When you are testing whether a parameter is not equal to some number

104
Q

2 methods to conduct a hypothesis test

A

Find critical region and compare to test statistic

Find the probability of being at least as extreme as the test statistic and comparing to significance level

105
Q

How do you construct a hypothesis testing conclusion

A

Compare the probability to significance level or test statistic to critical region
Accept/Reject H0
State the outcome - is/is not enough evidence to suggest…

106
Q

Method for hypothesis testing with probabilities

A

State hypotheses
Assume H0 true and state the distribution being used
Expected value and diagram
Calculator to find the probability of interest
Compare to the given significance level and be careful of the tail of interest
If the probability is greater than the given significance then accept H0
Conclusion

107
Q

When do you accept or reject H0 in hypothesis testing with probabilities

A

When calculated probability is greater than significance level you accept H0 so insufficient evidence to suggest…

When the calculated value is less than the significance level then you reject H0 so there is sufficient evidence to suggest…

108
Q

What is a critical region

A

The region of a probability distribution which, if the test statistic falls within, would cause the null hypothesis to be rejected
The critical value is the first value to fall inside the critical region

109
Q

What is the critical value

A

The first value to fall inside the critical region

110
Q

Acceptance region

A

Region of a probability distribution which, if the test statistic falls within, would cause the null hypothesis to be rejected

111
Q

What is the actual significance level of a hypothesis test

A

The probability of the test statistic falling in the critical region assuming the null hypothesis is correct

112
Q

What does the location of the critical region depend on

A

The type of alternative hypothesis

113
Q

What is the significance level

A

The probability of incorrectly rejecting the null hypothesis

114
Q

How do you find the critical region

A

State hypotheses
Assume H0 true and state binomial distribution
Calculate the expected value
Determine whether the critical region is before or after this
‘What numbers lie in the “significance level” percent?”
Menu
7
Down
Binomial CD
1 list
Input estimate numbers until you get a suitable value
Critical region must be lower than significance level if at bottom and greater if at the top

115
Q

How do you find the actual significance level

A

Once you’ve found the critical region

It is the probability that correlates to this

116
Q

What do you have to be careful of when finding the critical region above the expected value

A

Add one to the value that is just above the significance

117
Q

How do you test hypotheses with the critical region

A

State hypotheses
Assume H0 true and state binomial distribution
E[X] and graph to determine location of critical region
Find critical region:
Menu 7
Down Binomial CD
1 List
Input approximate values
If test value is in the critical region then you reject H0
If test value is not in the critical region it is in the acceptance region for H0 so accept
Conclusion

118
Q

Ø

A

The empty set

No intersections

119
Q

Definition and formula for mutually exclusive events

A

When the events have no outcomes in common

P(AnB) = 0 and P(AuB) = P(A) + P(B)

120
Q

Definition for independent events and formula

A

When one event has no effect on another

P(AnB) = P(A) x P(B)
Formula used to prove and test if independent

121
Q

What can tree diagrams be uses for

A

To show two or more events happening in succession

122
Q

Explain the notation P(B|A)

A

The probability that B occurs given A has already occured

123
Q

What is conditional probability

A

A way of modelling situations in which the probability of an event can change depending on the outcome of a previous event

124
Q

Formula for conditional probability

A

P(B|A) =P(BnA)/P(A)

125
Q

Rule for independent events in conditional probability

A
P(A|B) = P(A|B') = P(A)
P(B|A) = P(B|A') = P(B)
126
Q

Addition formula for for the events A and B

A

P(AuB) = P(A) + P(B) - P(AnB)

127
Q

Multiplication rule for conditional probability

A

P(B|A) = P(BnA)/P(A)
So
P(AnB) = P(B|A) x P(A)

128
Q

Binomial vs normal distribution

A

Binomial is for discrete data

Normal is for continuous

129
Q

What is a continous random variable

A

A variable that can take any one of infinitely many values

130
Q

What is the normal distribution

A

A continously probability distribution that can model naturally occurring characteristics

131
Q

Notation for normal distribution

A

X~N(μ,σ²)

If X is a normally distributed random variable with the population mean μ and variance σ²

132
Q

What are the conditions for normal distribution

A

Symmetrical, mean=median=mode
Has a bell shaped curve with asymptote at each end
Has a total area under the curve of 1
Has points of inflection at μ+σ and μ-σ

133
Q

What is a point of inflection

A

Convex to concave or vice versa

134
Q

Rules for a normally distributed variable

A

Approximately 68% of data lies within one standard deviation of the mean (μ+/-σ)
95% of the data lies within two standard deviations of the mean (μ+/-2σ)
Nearly all data (99.7%) lies within three standard devations of the mean (μ+/-3σ)

135
Q

How do you find probabilities using the normal distribution

A
Menu 7: Distribution
2: Normal CD
Enter μ and σ using =
Fill in the upper and lower boundaries
If only one boundary to use...
Lower = -99999
Upper = 99999
136
Q

Explain P(X=a)=0

A

The probability of an individual thing, a, happening is zero
Not actually zero since asymptote but so small it is approximately 0 and has no area

137
Q

When do you use the inverse normal

A

When given a probability to calculate a value that satisfies an inequality
Calculator only calculates less than

138
Q

How do you calculate inverse normal

A

Menu 7: Distribution
3: Inverse normal
Area is the area less than the value that satisfies inequality
Since calculator only works out

139
Q

Why can’t you use PD for P(X<=1)

A

Must be CD

Since X can also be zero

140
Q

How do you standardise a normally distributed variable

A

By coding the data

So that it is modelled by the standard normal distribution

141
Q

Why is the standard normal distribution useful

A

To standardise a normally distributed variable

By coding it

142
Q

Rules for the standard normal distribution

A

Z~N(0,1)

Has mean 0 and standard deviation 1

143
Q

What can the standard normal distribution be used to find

A

μ or σ if they are unknown

Z=(x-μ)/σ

144
Q

How do you use the standard normal distribution go find μ or σ

A

Z~N(0,1)
Draw both graphs with equivalent areas
Find value of z for which P(Z>/etc)=area
Z=x-μ/σ to get value

145
Q

How can you test hypotheses about the mean of a normally distributed random variable

A

By looking at the mean of a sample called the sample mean

146
Q

Formulas to use for hypothesis testing with the normal distribution

A

For a random sample of size n taken from a random variable X~N(μ,σ²), the sample mean distribution is given by

X̄~N(μ,σ²/n)

The mean is the same but the variance is different

147
Q

What must be used when completing a hypothesis test with the normal distribution

A

The sample mean

Because you are using a sample of a given size and extrapolating that to give conclusions about the whole population

148
Q

Method for using hypothesis testing with normal distribution

A
State hypotheses
Assume H0 true and state the sample mean distribution 
Sketch the graph
Find the probability 
Compare
Conclusion
149
Q

What goes on y axis

A

Response/dependant variable

Expected to change in response to the other variable

150
Q

What is a regression line

A

A line which fits as well as possible to the points on the scatter graph
Useful to identify a trend

151
Q

PMCC

A

Product Moment Correlation Coefficient
Provides information on the type and strength of the correlation between two variables
Described by ‘r’

152
Q

PMCC of 1

A

Perfect positive correlation

153
Q

PMCC of -1

A

Perfect negative correlation

154
Q

PMCC of 0

A

No correlation

155
Q

PMCC of -0.2 - 0.2

A

Weak/poor correlation

156
Q

PMCC of 0.75 to 1

A

Strong positive correlation

157
Q

PMCC of -0.75 to 1

A

Strong negative correlation

158
Q

Type and strength of correlation for a town’s annual income and the crime rate

A

Moderate negative correlation

159
Q

Give the type and strength of correlation for the height of father’s and their sons

A

Positive correlation

160
Q

Give the type and strength of correlation between the cooking time for a chicken and the weight of the chicken

A

Strong positive correlation

161
Q

Give the type and strength of the correlation between shoe size and salary

A

No correlation

162
Q

When is a prediction made using a regression line unreliable

A

When the predication is made in different conditions than those for the original sample data

163
Q

Interpolation in terms of regression line

A

Using a regression line to make predictions which fall within the range of observed data
Stronger correlation means more reliable prediction

164
Q

Extrapolation in terms of regression line

A

Making predictions outside of the range of observed data

Unreliable since no evidence that the pattern extends beyond the observed range

165
Q

How do you find the regression line and PMCC (r)

A
Frequency off
Menu 6: Statistics
2: y=ax+b
Enter data items x and y in table
OPTN 4: Regression calc
Displays a and b for the regression line of form y=ax+b and 'r'/PMCC
166
Q

How do you put frequency on

A

Shift
Setup
3
1 or 2

167
Q

What is causal correlation

A

When a change in one variable does affect the other

168
Q

What is spurious correlation

A

Correlation without causal connection

169
Q

What is a regression line

A

Line of best fit
Y=c+mx

C: when “x” is zero “units”, “C” is the predicted number of “y”

M: every increase in “x” by “1 unit” corresponds to an increase/decrease in “y” by “m units”

170
Q

What is curve fitting used for

A

To model polynomial and exponential relationships

171
Q

Polynomial curve fitting equations

A
If y=ax^n
Then log(y)=log(a)+nlog(b) 
Where Y=log(y) and x=log(x)
172
Q

Exponential curve fitting equations

A
If Y=kb^x 
Then log(y)=log(k)+xlog(b)
Where y=log(y)
173
Q

a and b in Y=ab^t

A

a=initial number of variable y

b=proportional increase or decrease as t increases by 1

174
Q

Why can you use hypothesis testing with correlation coefficient

A

To determine whether the oroduct moment correlation coefficient, r, for a particular sample indicates that there is likely to be a linear relationship within the whole population

175
Q

r vs p for correlation hypothesis testing

A

r is PMCC for a sample

p is PMCC for the population

176
Q

Explain the hypotheses for corration hypothesis testing

A

H0: p=0
H1: p>0 or p<0 or p≠0

Positive correlation, negative correlation, correlation

177
Q

Method to find the critical region with PMCC then test hypothesis

A

Page 37 of FB, read off to find the critical region for r using the significance level and sample size
Sketch number line to determine if r is negative or positive
Assume no correlation to test alternative hypothesis
If r>critical region then reject H0
Conclusion

178
Q

What is the large data set

A

Contains the weather data
For 5 UK weather stations
And 3 weather stations overseas

179
Q

Why can’t you predict x for a value of your for the regression line y=mx+c

A

Regression line for y on x

Can only reliably be used to predict the y value