igd Flashcards

1
Q

what is discrete data?

A

possible values form a set of separate numbers (0,1,2…)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

when would we use discrete data?

A

numbers of covid cases per week
numbers of students doing…..
number of _____

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is continuous data?

A

INFINITE continuum of possible real number values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

examples of continuous data

A

peoples: blood pressure, age, height, weight………

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what type of datas fit into the interval catagory

A

discrete and continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is categorical data made up of

A

nominal, ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is nominal data?

A

unordered catagories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

examples of nominal data?

A

gender, yes/no answers, plant species

basically when there are like different catagories to choose from that dont go up in a particular series unlike ORDINAL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is ordinal data

A

ORDERED CATAGORIES

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

name some examples of ordinal data

A

better/same/worse

strongly agree/disagree/neutral

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what are random errors

A

result of experimenters ability to take measurement the same way to produce same number each time

eg) someone slightly changing the location they record coordinates, more spread out less precise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what are systematic errors

A

reproducable inaccuracies

If GPS is receiving poor signals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is precision

A

repeated measurements refers to the degree to which they agree with one another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is accuracy

A

its higher when the amount of SYSTEMATIC ERROR IS LOW and vice versa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

measures of central tendancy

A

mode median mean

GET THE BEST ESTIMATE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

measures of data dispersion

A

range, SD, interquartile range

MEASURE VARIABLITY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is the squared mean deviate of standard deviation

A

Squared Mean Deviate: The distance of an observation from the mean, expressed in squared measurements (changes negative values to positive)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what does the sum of squares do

A

Sum of Squares (SS): measures the total amount of squared variation around the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is varience

A

Variance: average squared deviation from the mean – based the sum of each observation, minus the mean, squared, divided by sample size
𝑠=√((∑▒(𝑿_𝒊−𝑿̅ )^𝟐 )/(𝒏−𝟏))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

how does standard deviation for a sample differ sample from a population

A

sample has n-1 underneath population just has capital N

n-1 because sample smaller so overcomes this and makes SD slightly larger to accomodate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is the interquartile range

A

range of data spread between 25% to 75% percentiles of data

LOOKS AT SPREAD OF VALUES AROUND THE MEDIAN

SD EQUIVALENT FOR MEDIAN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

why cant we always trust the mean

A

the mean and the median arent always equal
histograms arent always symmetrical
mean tends towards the tail

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what are absolute frequencies

A

the actual number of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

relative frequencies

A

the proporational % distribution of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

what type of data iare histograms used for

A

interval

shows the distribution of data or frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

independent variables go on the

A

x axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

uniform histograms looks

A

fairly level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

what is kurtosis in a histogram

A

how spread it is

can be +- or normal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

scatterplots:

A

shot data exploration visually between 2 variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

what do line graphs show

A

show trends

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

how do you calculate probabilities of event

A

P(A) = F(A)/F(E)

F= frequency of outcome A/E

E usually larger number like a population of number in the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

how do we calculate probability if multiple events are INDEPENDENT and both happening

A

multiplication rule -
just times the probabilities together

eg) P(A+B) happening -> P(A) x P(B)
0. 04x0.03x0.05 = 0.00006 OR as percentage x100 so 0.006

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

how do find probablity of EITHER independent event occuring eg A or B

A

ADDITION -

P(N) + P(C) + P(M)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

how do you work out probability of A or B but not both!

A

P(A) + P(B) - P(A+B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

normal distribution is a ____ curve

A

bell

36
Q

how do you work out the probability of POF

A

area under curve??

37
Q

how do you work out standard error

A

σ/ (Root)N its how far sample mean is likely to be from sample

the larger the standard error the larger the SD

smaller SError means more reliable and smaller SD

38
Q

when do we use Z score

A

when n>30

z distribution used

39
Q

what does the central limit theorem

A

tells us sample means are normally distributed around the POPULATION mean

40
Q

what does DF mean

A

degrees of freedom

41
Q

where would you see DF

A

in a t table or z table of just the tables

42
Q

what is a 95% confidence interval = to

A

2 SD of error or 1.96

43
Q

what are inferential statistics

A

used to derivegeneral conclusions about our data and beyond

44
Q

what are descriptive statistics

A

summarises what our data shows

45
Q

examples of descriptive statistics

A

central tendency dispersion, plots and charts to illustrate distribution of data, STANdard devEVIATION

46
Q

how do you get from sample to population

A

inference

47
Q

CENTRAL LIMIT THEOREM IS LARGER IF

A

THE SAMPLE IS LARGER AND MORE NORMAL THE DISTRIBUTION.

48
Q

what is the confidence interval

A

range of values around a sample

49
Q

when would you t - tables

A

samples less than 30

50
Q

what makes a good hypothesis

A
  • parsimonious
  • generalisable
  • testable
  • plausable
  • directional
51
Q

when can you reject the null hypothesis

A

when p < 0.05/ x
if YOUR value is BIGGER than the critical value
if Z <1.96

52
Q

which factors impact the width of the confidence interval

A

1) confidence level 95% (1.96) is narrower than 99% CI
2) variability as measured by the SD
3) sample size

53
Q

what does a lower p value mean

A

more certainty

54
Q

what is 0.05 referring to?

A

significance level

55
Q

what is a one tailed test

how does 2 tailed test differ

A

one directional relationship in one direction and disregards that it can go in the other direction (cant falls before -1.96)

two tailed can fall in +/- 1.96

56
Q

what does a shapiro wilks test do

A

calculates the sample was drawn from a normal population.

57
Q

what would the hypothesis for a shapiro wilks test look like

A

Ho sample data are NOT SIGNIFICANTLY different than a normal distribution

Ha sample data ARE Significantly different than normal population.

58
Q

when would you use a paired sample t-test

A

when data is a repeat measurement (eg over 2 years). or if samples are paired in same manner! must be as it violates the assumption that samples are independent from one another

59
Q

what does a paired sample test assume

A
  • that it doesnt need to have assumption of normality of data values ASSUMES PAIRED DIFFERENCES ARE NORMALLY DISTRIBUTED
    (this is because we are using paired differences rather than actual observations)
60
Q

how can we transform data

A

lorgorhythms, square roots, for positive data
reciprocals for non-zero data
histograms

61
Q

there are 2 outcomes from stats test what are they? and give examples of each

A
  • test statitsics (the value calculated) - eg) Z -score

- probability p-values - eg) mean, SD, sample size

62
Q

if p value is less than x….

A

reject the null

found using t-table

63
Q

you cant compare p values unless….

A

2 tests are exactly the same

64
Q

when do we rank data before doing any thing else with it?

A

when its non-parametric

65
Q

what is the non-parametric equivalent of the t-test

A

Mann-whitney-u

66
Q

what is ANOVA

A

(aka analysis of varients)
-used when comparing more than 2 groups

compares VARIATION within groups to the variation between groups

greater the variations -> we reject null

67
Q

what do anova tests assume

A

1 - observations between samples are independent

2- observations in each catagory are normally distributed

68
Q

how do we determine statistical significance

A

f-statistic -
a ratio of variences which helps to answer whether “is the variation due to a group, greater than the residual variation

69
Q

if we get a significant f - result in an ANOVA test what does it mean for the hypothesis

A

we can reject null

70
Q

what is a Posthoc test

A

used to tell us which groups differ from the rest

they arent used unless null hypothesis was rejected for the f test

71
Q

examples of posthoc tests

A
  • tukeys

- honestly significant different test (HSD)

72
Q

in an H - table H must be greater than or equal to the critical in order to ____ the null

A

reject

73
Q

what does Chi-squared test for

A

tests the independence of TWO catagorical variables from a single sample

the hypothesis either HAS effect = H0
or HAS NO effect= HA

74
Q

when is chi squared used

A

NOMINAL data in the form of frequencies or counts

75
Q

does chi squared assume normality?

A

NO it does not assume normality

76
Q

Cramers Phi is computed when?

A

after a chi squared test

measures degree of association of 2 variables

77
Q

what does pearsons corrolation measure

A

measures intensity of linear relationship

78
Q

what would a perfect negative and positive score of pearsons corrolation come out as

A

+1 perfect positive
-1 perfect negative
0 no corrolation

79
Q

what are the degrees of freedom for pearsons correlation

A

df= n-2

80
Q

what does spearmans rank test for

A

non-parametric corrolation

81
Q

what are the main assumptions of regression

A
  • relationship is CAUSAL

- relationship is linear

82
Q

what are the assumptions of the regression MODEL

A
  • residuals are normally distributed
  • residuals have a mean of 0
  • errors dont vary with x
  • residual errors are independent and dont influence others
83
Q

what does heteroskedastic look like

A

increasing varience on a graph

84
Q

what does multiple regression assume

A
  • linear relationship
  • multivariate normality - normal distribution of errors
  • no multicolinearity
  • homoskedasticity, no clear distribution of residuals
85
Q

what is anscombes quartet

A

dataset consisting of 2 variables x and y

86
Q

what is simpsons paradox

A

comparing 2 variables eg petal width or speal width

- reversal of trend when you group data