4 statistics and probability Flashcards

1
Q

discrete data

A

something you can count

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

discrete data

A

something you can count

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

continuous data

A

something you measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

a hypothesis

A

a statement you test to see if it is true or false

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

raw data

A

data before it has been analsyed or processed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

primary data

A

data you collect yourself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

secondary data

A

data you use which someone else has collected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

categorical data

A

data is words not numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

numerical data

A

data given as numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

types of numerical data

A

continuous or discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

ordinal data

A

data that is ordered in some way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

adv of secondary data

A

available
cheaper
easy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

adv of primary data

A

reliable

aware of bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

ways of collecting data

A

measurement or experiment
survey or questionaire
modelling or simulation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

mistakes to avoid when doing surverys or questionaires

A
asking the wrong people or a biased sample
asking leading questions
asking confusing questions
asking personal questions
asking too open ended questions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

random

A

every member of the popuation ahs the same probability of being included
the members of a genuinely random sample have to be selected independently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

ways of collecting a sample

A

convience
systematic
genuinely random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

convienence sample

A

asking whoever is easiest to get hold of

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

systematic sample

A

asking every 3rd person

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

genuinely random sample

A

picking out of a hat or using a random number generator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

quota sampling

A

Choosing a sample that is only comprised of members of the population that fit certain characteristics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

stratified sampling

A

Choosing a random sample in a way that the proportion of certain characteristics matches the proportion of those characteristics in the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

continuous data

A

something you measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

hypothesis

A

a statement you test to see if its true or false

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

raw data

A

date before analysis or processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

primary data

A

data you collect yourself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

secondary data

A

data you use which someone else has collected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

categorical data

A

word data not numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

numerical data

A

number data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

ordinal data

A

ordered in some way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

adv of secondary data

A

available
cheaper
easy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

adv of primary data

A

reliability

aware of bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

ways of collecting primary data

A

measurement or experiment
survey or questionaire
modelling or simulation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

random sample

A

every member of population has the same probability of being included. selected indepently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

what is the opposite of a census

A

a random sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

convience sampling

A

asking friends or those easy to ask

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

systematic sammpling

A

e.g. asking every 3rd person

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

genuinely random sampling

A

pick out of hat or use random number generator on calculator.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

quota sampling

A

the populalation is divided into groups. a given number is surveyed forme ach grouo.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

cluster sampling

A

the population is divided into groups or clusters. a random sample of clusters is chosen and every item in it is surveyed. a large number of small clusters minimises the chances of being unrepresentative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

opinion polls

A

large scale opinion polls often use a combination of cluster and quota sampling. large sample size based on small proportion of population. (geographical area, age). but opinions change over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

what is a uniform distribution

A

flat/even

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

what is a normal distribution

A

peaked in the middle
mean, median, middle, mode in the same place
gaussian distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

what is negatively skewed

A

leading up to the right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

what is the positively skewed

A

leading up to the left or decreasing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

box plot left skewed

A

box on the right with the median line towards the right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

box plot right skewed

A

box on the left with the median line towards the left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

normal distribution and standard deviatiosn

A

the standard deviations (outliers) next to the highlighted (70%) will be30% total, 15% each

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

box plot name

A

box and whisker diagram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

the ends of the box in a box plot are the

A

interquartile range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

outlier definition

A

a term of data that is
at least 2 standard deviations away from the mean (histogram)
OR
at least 1.5 x IQR beyond the nearer quartile
(box and whisker)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

benefits of a curve in a cumulative frequency diagram

A

they use the data to show a gradient, so if the frequency decreases slightly then the gradient will show it by flattening a little. straight lines only show the data and not the link between them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

datum

A

singular piece of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

why do bars not touch with discrete data

A

because there is no continuity between columns

55
Q

what graph do you use for continuous data

A

histograms

56
Q

what graph do you use for discrete data

A

bar graph

57
Q

what graph do you use for cumulative frequency

A

line

58
Q

how do you plot for cf graphs

A

to the upper bound

59
Q

For data grouped into intervals or classes, we may identify the following:

A
mid-interval values
interval width (though it is not common to have a varying interval width)
lower interval boundary
upper interval boundary
modal class (the class with the highest frequency or the tallest class in the diagram; be aware, use the tallest class in the frequency diagram,  not in the cumulative frequency diagram).
60
Q

what is the 5 number summary

A
minimum
Q1
median
Q3
maximum
61
Q

when is a box and whisker plot a normal distribution

A

when you can recognise symmetry

62
Q

cumulative frequency polygon

A

The data points are connected by straight lines, implying a linear distribution of the data points within an interval.

63
Q

cumulative frequency curve

A

All the data points are connected by a smooth curve

64
Q

no correlation is

A

a bunch of dots

65
Q

strong positive correlation is

A

a line goin gup to the right with all the dots very close on that line

66
Q

perfect negative correlation

A

a line going down to the right with all the dots onit

67
Q

moderate negative correlation is

A

going gently down to the right with dots around it

68
Q

weak positive correlation is

A

a line faintly foing up to the right with dots all aroun dit

69
Q

what is the r of no correlation

A

0

70
Q

what is the r of strong positive correlation

A

0.9

71
Q

what is the r of perfect negative correlation

A

-1

72
Q

what is the r of modertae negative correlation

A

-0.5

73
Q

what is the r of weak postiive correlation

A

0.3

74
Q

what is the r of a curved relationship or no correlation

A

dont add straight line so r not meaningful

75
Q

what are residuals

A

the vertical displacements for some of the points from the line

76
Q

which residuals are positive and negative

A

above the line - positive

below the line - negative

77
Q

what is the sum of all residuals

A

0

78
Q

why would we square the residuals

A

so they are all positive

79
Q

what does the sum of residuals show

A

how well the line fits the poitns

80
Q

would a good line have a low or high sum of residuals

A

the line with the lowest possible sum of square residuals is called the least squares regression line of y on x

81
Q

if you want to calcualte the y values from the x values how would you plot the line of best fit

A

vertical residuals to be as small as possible.

82
Q

if you want to calcualte the x values from the y values how would you plot the line of best fit

A

horizontal residuals to be as small as possible.

83
Q

what is the line called that has the lowest possible sum of square residuals

A

the least squares regression line of y on x

84
Q

what are the two seperate regression least squares regression lines

A

one for y on x

one for x on y

85
Q

what extra 3 columns should you have if youre calcualting regression and correlation

A

x squared
y squared
xy

86
Q

what are teh sections of the graph called

A

quadrants

87
Q

positive correlation to the quadrants

A

in the 1st and 3rd (top right and bottom left)

88
Q

negative correlation to the quadrants

A

int he 2nd and 4th quadrant (top left and bottom right)

89
Q

product moment correlation coefficent

A

(square root) SxxSyy

90
Q

what is a in stats

A

gradient

91
Q

what is b in stats

A

y intercept

92
Q

graident of y on x line

A

Sxx

93
Q

what is the product moment correlation coefficent

A

r

94
Q

if measurements multiplied by 10 what effect would that have on the correlation

A

no effect

95
Q

y = ax + b

whats a

A

gradient

96
Q

y = ax + b

whats b

A

y intercept

97
Q

interpoaltion

A

within data range

98
Q

extrapolation

A

outside data range

99
Q

r squared or variance is used to

A

show how clsoe the points are to the line. they remove knowledge of whether the data is trending up or down.

100
Q

small sd or variance means

A

the data is all close together

101
Q

high sd and variance means

A

the data is spread out

102
Q

if the values when calcualting sd were all multiplied by 10 what would happen to sd

A

it would also be multiplied by 10

103
Q

if the values when calculating variance were all multiplied by 10, what would happen to variance

A

it would get multiplied by 10^2

104
Q

if the values when calculating mean were all multiplied by 10, what would happen to the mean

A

it would also be multiplied by 10

105
Q

if the values when calculating r/correlation were all multiplied by 10, what would happen to the r/correlation

A

there would be no change

106
Q

if 10 was added to all the values when calculating sd, what would happen

A

there would be no change

107
Q

if 10 was added to all the values when calculating variance, what would happen

A

there would be no change

108
Q

if 10 was added to all the values when calculating the mean, what would happen

A

10 would be added to the mean

109
Q

what is variance essentially

A

standard deviation squred

110
Q

which type of standard deviation is the notation of sigma used for

A

population

111
Q

which type of standard deviation is the notation of Sx used for

A

sample

112
Q

to find x on y, give the order of the columns that you would enter into the calculator

A

y column and then the x column

113
Q

to find y on x, give the order of the columns that you would enter into the calculator

A

x column and then the y column

114
Q

when you are calcuating x on y, what value are you finding

A

x

115
Q

when you are calculating y on x, what value are you finding

A

y

116
Q

what is the mean point

A

the line of regression x on y and the line y on x will pass through the mean point. (x bar, y bar)

117
Q

what is relative frequency

A

the decimal probability as a percentage

118
Q

0 on the probability scale is

A

never

119
Q

1 on the probability scale is

A

absolutely certain

120
Q

when might you use r squared

A

to plot a curve

121
Q

probability P(A) =

A

n(u)

122
Q

complementary events are represented by an

A

apostrophe

123
Q

when do you multiply probabilities

A

when they are independent events

124
Q

when do you add probabilities

A

when they are mutually exclusive

125
Q

what are independent events

A

when one event does not effect the

126
Q

what is relative frequency

A

probability multiplied by 100, so it is a percentagee

127
Q

do you multiply or add AND

A

multiply

128
Q

do you multiply or add OR

A

add

129
Q

what are mutually exclusive events

A

when only one event can happen. there is no intersection

130
Q

combined events

P(A∪B) =

A

P(A) + P(B) - P(A∩B)

131
Q

what is a random variable

A

an outcome of a random experiment which can be represented as a number

132
Q

what is a probability distribution

A

a table showing all the possible outcomes and their probabilities.

133
Q

probabilities add up to

A

1