igd Flashcards

1
Q

what is discrete data?

A

possible values form a set of separate numbers (0,1,2…)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

when would we use discrete data?

A

numbers of covid cases per week
numbers of students doing…..
number of _____

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is continuous data?

A

INFINITE continuum of possible real number values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

examples of continuous data

A

peoples: blood pressure, age, height, weight………

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what type of datas fit into the interval catagory

A

discrete and continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is categorical data made up of

A

nominal, ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is nominal data?

A

unordered catagories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

examples of nominal data?

A

gender, yes/no answers, plant species

basically when there are like different catagories to choose from that dont go up in a particular series unlike ORDINAL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is ordinal data

A

ORDERED CATAGORIES

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

name some examples of ordinal data

A

better/same/worse

strongly agree/disagree/neutral

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what are random errors

A

result of experimenters ability to take measurement the same way to produce same number each time

eg) someone slightly changing the location they record coordinates, more spread out less precise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what are systematic errors

A

reproducable inaccuracies

If GPS is receiving poor signals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is precision

A

repeated measurements refers to the degree to which they agree with one another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is accuracy

A

its higher when the amount of SYSTEMATIC ERROR IS LOW and vice versa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

measures of central tendancy

A

mode median mean

GET THE BEST ESTIMATE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

measures of data dispersion

A

range, SD, interquartile range

MEASURE VARIABLITY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is the squared mean deviate of standard deviation

A

Squared Mean Deviate: The distance of an observation from the mean, expressed in squared measurements (changes negative values to positive)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what does the sum of squares do

A

Sum of Squares (SS): measures the total amount of squared variation around the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is varience

A

Variance: average squared deviation from the mean – based the sum of each observation, minus the mean, squared, divided by sample size
𝑠=√((∑▒(𝑿_𝒊−𝑿̅ )^𝟐 )/(𝒏−𝟏))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

how does standard deviation for a sample differ sample from a population

A

sample has n-1 underneath population just has capital N

n-1 because sample smaller so overcomes this and makes SD slightly larger to accomodate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is the interquartile range

A

range of data spread between 25% to 75% percentiles of data

LOOKS AT SPREAD OF VALUES AROUND THE MEDIAN

SD EQUIVALENT FOR MEDIAN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

why cant we always trust the mean

A

the mean and the median arent always equal
histograms arent always symmetrical
mean tends towards the tail

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what are absolute frequencies

A

the actual number of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

relative frequencies

A

the proporational % distribution of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
what type of data iare histograms used for
interval shows the distribution of data or frequency
26
independent variables go on the
x axis
27
uniform histograms looks
fairly level
28
what is kurtosis in a histogram
how spread it is | can be +- or normal
29
scatterplots:
shot data exploration visually between 2 variables
30
what do line graphs show
show trends
31
how do you calculate probabilities of event
P(A) = F(A)/F(E) F= frequency of outcome A/E E usually larger number like a population of number in the sample
32
how do we calculate probability if multiple events are INDEPENDENT and both happening
multiplication rule - just times the probabilities together eg) P(A+B) happening -> P(A) x P(B) 0. 04x0.03x0.05 = 0.00006 OR as percentage x100 so 0.006
33
how do find probablity of EITHER independent event occuring eg A or B
ADDITION - | P(N) + P(C) + P(M)
34
how do you work out probability of A or B but not both!
P(A) + P(B) - P(A+B)
35
normal distribution is a ____ curve
bell
36
how do you work out the probability of POF
area under curve??
37
how do you work out standard error
σ/ (Root)N its how far sample mean is likely to be from sample the larger the standard error the larger the SD smaller SError means more reliable and smaller SD
38
when do we use Z score
when n>30 z distribution used
39
what does the central limit theorem
tells us sample means are normally distributed around the POPULATION mean
40
what does DF mean
degrees of freedom
41
where would you see DF
in a t table or z table of just the tables
42
what is a 95% confidence interval = to
2 SD of error or 1.96
43
what are inferential statistics
used to derivegeneral conclusions about our data and beyond
44
what are descriptive statistics
summarises what our data shows
45
examples of descriptive statistics
central tendency dispersion, plots and charts to illustrate distribution of data, STANdard devEVIATION
46
how do you get from sample to population
inference
47
CENTRAL LIMIT THEOREM IS LARGER IF
THE SAMPLE IS LARGER AND MORE NORMAL THE DISTRIBUTION.
48
what is the confidence interval
range of values around a sample
49
when would you t - tables
samples less than 30
50
what makes a good hypothesis
- parsimonious - generalisable - testable - plausable - directional
51
when can you reject the null hypothesis
when p < 0.05/ x if YOUR value is BIGGER than the critical value if Z <1.96
52
which factors impact the width of the confidence interval
1) confidence level 95% (1.96) is narrower than 99% CI 2) variability as measured by the SD 3) sample size
53
what does a lower p value mean
more certainty
54
what is 0.05 referring to?
significance level
55
what is a one tailed test how does 2 tailed test differ
one directional relationship in one direction and disregards that it can go in the other direction (cant falls before -1.96) two tailed can fall in +/- 1.96
56
what does a shapiro wilks test do
calculates the sample was drawn from a normal population.
57
what would the hypothesis for a shapiro wilks test look like
Ho sample data are NOT SIGNIFICANTLY different than a normal distribution Ha sample data ARE Significantly different than normal population.
58
when would you use a paired sample t-test
when data is a repeat measurement (eg over 2 years). or if samples are paired in same manner! must be as it violates the assumption that samples are independent from one another
59
what does a paired sample test assume
- that it doesnt need to have assumption of normality of data values ASSUMES PAIRED DIFFERENCES ARE NORMALLY DISTRIBUTED (this is because we are using paired differences rather than actual observations)
60
how can we transform data
lorgorhythms, square roots, for positive data reciprocals for non-zero data histograms
61
there are 2 outcomes from stats test what are they? and give examples of each
- test statitsics (the value calculated) - eg) Z -score | - probability p-values - eg) mean, SD, sample size
62
if p value is less than x....
reject the null found using t-table
63
you cant compare p values unless....
2 tests are exactly the same
64
when do we rank data before doing any thing else with it?
when its non-parametric
65
what is the non-parametric equivalent of the t-test
Mann-whitney-u
66
what is ANOVA
(aka analysis of varients) -used when comparing more than 2 groups compares VARIATION within groups to the variation between groups greater the variations -> we reject null
67
what do anova tests assume
1 - observations between samples are independent | 2- observations in each catagory are normally distributed
68
how do we determine statistical significance
f-statistic - a ratio of variences which helps to answer whether "is the variation due to a group, greater than the residual variation
69
if we get a significant f - result in an ANOVA test what does it mean for the hypothesis
we can reject null
70
what is a Posthoc test
used to tell us which groups differ from the rest they arent used unless null hypothesis was rejected for the f test
71
examples of posthoc tests
- tukeys | - honestly significant different test (HSD)
72
in an H - table H must be greater than or equal to the critical in order to ____ the null
reject
73
what does Chi-squared test for
tests the independence of TWO catagorical variables from a single sample the hypothesis either HAS effect = H0 or HAS NO effect= HA
74
when is chi squared used
NOMINAL data in the form of frequencies or counts
75
does chi squared assume normality?
NO it does not assume normality
76
Cramers Phi is computed when?
after a chi squared test measures degree of association of 2 variables
77
what does pearsons corrolation measure
measures intensity of linear relationship
78
what would a perfect negative and positive score of pearsons corrolation come out as
+1 perfect positive -1 perfect negative 0 no corrolation
79
what are the degrees of freedom for pearsons correlation
df= n-2
80
what does spearmans rank test for
non-parametric corrolation
81
what are the main assumptions of regression
- relationship is CAUSAL | - relationship is linear
82
what are the assumptions of the regression MODEL
- residuals are normally distributed - residuals have a mean of 0 - errors dont vary with x - residual errors are independent and dont influence others
83
what does heteroskedastic look like
increasing varience on a graph
84
what does multiple regression assume
- linear relationship - multivariate normality - normal distribution of errors - no multicolinearity - homoskedasticity, no clear distribution of residuals
85
what is anscombes quartet
dataset consisting of 2 variables x and y
86
what is simpsons paradox
comparing 2 variables eg petal width or speal width | - reversal of trend when you group data