Research Methods Flashcards

1
Q

What is Simpson’s Paradox

A

When aggregate data does shows a bias but disaggregated data doesn’t, or does but in the other direction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is aggregate data vs disaggregate data

A
aggregate = average 
disaggregate = all data (not averaged data)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what do you need to do with negative numbers in R

A

put them in parenthesis

eg. -1 ^ 2

should be

(-1)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what does an exclamation mark mean

A

not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what are the three types of rounding functions

A

round( )
floor( )
ceiling( )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

differentiate ‘=’ and ‘==’

A

== is used for a comparison –> like a question
= is a command and exists for the purpose of the current function

= is comparable to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is a numeric variable

A

a numbered variable. henryage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is a character variable

A

a word variable. food

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is a logical variable

A

true/false basis. isFurry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

variable titles cant have what?

A

spaces

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the hierarchy of vectors

A

character > numerical > logical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is a package

A

a collection of R variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are theoretical constructs and how do they relate to the two other elements

A

unobservable psychological entities which are then operationalised into a measure - a tool designed to obtain data. Then turned into data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are the three types of scales

A

nominal –> no set data order eg. eye colour –> discrete
ordinal –> ranked data, where the difference between data points is hard to tell –> continuous
interval –> numbers eg. dates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what do you use to find/make column lists

A

$

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how do you remove a data point

A

make it equal to ‘NULL’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

how do you make a vector

A

c( )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what does the “here” package do

A

finds the root of your projects based on the current RProj file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

basic codes for how to make tables

A

table( ) or for a neater option kable( )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what does echo = FALSE do

A

ensures that we don’t see the r code in the markdown document

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is the function to name columns

A

col.name = c(“x”, “y”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what does pipes do and what is the symbol for it

A

makes it so you don’t have to make new variables each time.

symbol: %>%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

grouping allows you to

A

move beyond nominal variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what is the difference between group_by( ) and summarise ( )

A

group_by( ) groups variables in a dataset

summarise( ) tells us what kinds of stats to create over the groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
what does filter( ) do
lets you select the subset(s) of the rows eg. filter(species == "bears") will only show bears in the dataset
26
what does arrange ( ) do
tells you how to sort them | eg. after piping from filter for eg. then arrange(gender) will arrange by gender so that they are clumped together
27
what does select ( ) do
lets you select which columns you want to view eg. gdata %>% select (species, age, gender)
28
what does mutate ( ) do
lets you make new columns eg. gdata%>% mutate (ageMonth = age * 12)
29
what does the '-' sign do when not using numeric values eg. ( - species)
means not, or everything but it. so if you are using select (- gender) --> will show all columns but gender
30
what does pivot_wider do
converts from long to wide dataset display, by increasing the number of columns and decreasing the number of rows pivot_wider (names_from = x, values_from = y)
31
what does pivot_longer do
converts from wide to long dataset display by increasing the number or rows and decreasing the number of columns pivot_longer (cols, names_to = "x", values_to "y")
32
write out a ggplot function
ggplot ( mapping = aes ( x = year, y = rating))
33
what does the 'aes' in a ggplot stand for
aesthetic
34
what are the two places to put colours when plotting using ggplot
if you put colour in the aes( ) you make colour change one of the variables if you put it in ggplot( ) it changes the colours of the overall graph
35
what does labs( ) do
edits the labels on a graph
36
what does facet_wrap ( ) do and what sign do you need to include in the brackets
divides the graph up into different variables | eg. facet_wrap( ~ food)
37
what is the difference between fill and colour
``` fill = filling shapes colour = changes the colour of lines ```
38
when graphing using ggplot, what does the alpha value do on a graph
changes the transparency of the graph
39
what is scale_fill_x
it is a colouring function where x is the selected pallet .
40
how do you change the display dimensions of a graph
``` {r ......... , fig.width = 6 , fig.height = 7} for example
41
what is probability theory
making a prediction
42
what are inferential statistics
drawing a conclusion | - two types, freqentists and bayesians
43
what does a frequentist believe
they believe in long run frequency and that probability is objective
44
what does a bayesian believe
odds are mixed with rationale - a degree of belief in probability --> more flexible and tricky
45
what are binomial distributions
data from one of two possible events
46
what are some tricks to remember when calculating binomial distributions in R
- p and qbinom() use the opposites letter in the brackets | - if you want the opposite of pbinom() do: 1 - pbinom()
47
what does a normal distribution assume
all points are continuous
48
in a normal distribution what do we know about the mean median and mode
they are all the same
49
when calculating normal distributions in R what do you use
the same functions as binomial distribution but with 'norm' instead of 'binom' and the brackets use:
50
what is a trick with dnorm()
you can not let n = a specific number because points are continuous. so if you want to fine the number of people who are 185cm tall you have to create a vector between say 184.5 and 185.5
51
describe the difference between true mean and estimated mean
true mean is the actual mean of the population whereas estimated mean is the mean of the sample you have taken from the population --> it is often impossible to get the true mean as we cannot get data from the whole population
52
when using the sample or estimated mean or sd what sign do we use
the same sign as the true one but with a little hat on top of it '^' - used to indicate that these numbers are not the same as the actual numbers in the population
53
what is the sampling distribution of the mean
is sampling a population of means --> many means from many different samples
54
what is standard error of the mean (SEM)
is the sd of the sampling distribution of the mean
55
what is the formula for the SEM
SEM = sd/ sqrt(sample size)
56
what is central limit theorem
the sampling distribution of a mean will always form a normal distribution regardless of what the true population looks like. eg. if there is a massive positive skew on a pop. when sampling 100 cases from it, this data will form a normal distribution
57
what is a confidence interval
an interval around the sample mean where by we can accurately report --> usually around 95% - this means we are 95% sure that the true mean falls within this range
58
historically with null hypothesis testing (NHT) there were two main ideas. who held them and what were they
Fisher --> thought NHT was about falsifying a single hypothesis Neyman --> hypothesis is about choosing between two rival hypothesises
59
how are we suppose to look at our results in relation to the null hypothesis
we say "how likely are our results given that the null is true?" - rather than if the null is true or not
60
define the two types of error that can occur in stats
type 1 - reject a true null hypothesis type 2 - non-rejection of a false null hypothesis if we reduce our alpha level, that is significance level for type 1 error, then we increase out beta value (type 2 error significance) - refer to book if this doesnt make sense
61
what is the sign for both of the error values
type 1 = alpha sign | type 2 = beta sign
62
which error type is worse
type 1 `
63
how do we best control for type one errors
we set a low alpha level (significance level ), therefore increasing the chance of type 2 error over type 1 error
64
what is a diagnostic test statistic
a diagnostic test to see if the hypothesis and alternative predict different values
65
what is the difference between one and two tailed areas
one-tailed - is where one side of the rejection region (the area outside the 95%) is favoured two-tailed - where there is even spread outside the rejection region
66
what is key to remember when addressing the null hypothesis
cannot say if it is true or not, only that we are accepting it or rejecting it.
67
when do you use a chi-square test
- eg. comparing 'x' against a theoretical predictor 'p' | - null hypothesis says x will equal p, alternative hypothesis says x will not equal p
68
what is another name for a chi square test
a goodness of fit test
69
for chi-squared tests, how do you calculate the degrees of freedom
k - 1, where k = no. of factors
70
what is the relationship between the chi square statistic and null.
the higher the chi squared statistic, the worse the null is at explaining the data
71
what does qchisq( ) do
it allows you to input the quantile and the degrees of freedom and outputs which data point to cut off at in order to reject the null eg. qchisq ( 0.95, df = 3 )
72
when is the only time to use the n() function in R
when using the summarise function
73
what is a chi-squared independence test
same idea as the regular chi-square test but used when you are dealing with two nominal variables
74
what is the key difference between a chi-squared test and an independence chi-square test
chi-square test (goodness of fit test) is hypothesising about the true probabilities of a variable whereas independence is hypothesising about the relationship between two variables
75
what is effect size
how large the difference between our data and the null hypothesis predictions actually were
76
what does the p-value measure
how probable it was that our results were due to chance
77
what is Cramer's V
is an effect size measurement
78
what are the two assumptions of chi-squared tests and what do we do if they are not met
1) the expected frequencies are large fisher. test( x = dataset) 2) the data is independent mcnemar. test( x = dataset) - when reporting data must mention the use of mcnemar's chi-square test
79
what are the ranges for cramers V
0 - 0.1 = negligible 0. 1 - 0.3 = weak 0. 3 - 0.5 = moderate 0. 5 - 1 = high
80
what makes a statistic diagnostic
a test statistic is diagnostic if it accurately says the larger our data is from the null, the larger our output variable will be - the statistic gets larger as our data moves away from the null
81
what is the z score good for
good for calculating/comparing separate data | eg. apples and oranges
82
how do we measure effect size for t-tests
Cohen's d
83
what are the ranges of Cohen's d
0. 2 = small 0. 5 = moderate 0. 8 = large
84
different to a two-sided t-test which is assumed in R, why would you want to calculate a one-sided t-test
If we are only looking at one side of the data
85
when do we use an independent samples t-test
when we are trying to analyse data from two separate groups. eg. rich vs poor hunger levels over a year
86
what does a normal t-test or 'student t-test' assume
1) what both groups are normally distributed 2) both groups have equal variance (which is almost impossible for independent samples)
87
which t-test do we need to perform if the variance of the groups is not equal as used in independence groups t-tests
a welch t-test this doesn't assume the variance is equal - R defaults to using this type of t-test if you don't specify
88
true or false it is better to do a welch t-test most of the time
true
89
what is a paired samples t-test
comparing two means with a repeated measures design --> difference between T1 and T2
90
is there a difference when you enter long form data vs wide form data for t-tests in R
yes
91
what do all t-test apart from student/normal t-tests assume
1) the population is normally distributed | 2) the data are independent --> apart from paired samples
92
what are two ways to test if a dataset is normally distributed
1) plotting - lets you look and see if the data is normally distributed 2) shapiro test - gives you actual data --> returns a p-value that if greater than 0.05, your data is normally distributed
93
which t-test do you perform if your data is not normal distributed
Wilcoxon test - centres each groups data around that groups mean eg. how many data points were above the mean in group A and in group B
94
when do we use ANOVA's
when we have more than two variables/influencing factors
95
what is the difference between one and two way ANOVA
one way --> for things defined by a single variable eg. no. of cows on different land types two --> for things defined by two variables/ grouping variables. eg. no. of cows depending on land type and food source
96
what is the difference between between groups and within groups sum of squares
between groups are how different the group means are from other groups - separation of the graphs within groups are how large the spread is in one variable. - width of the graph
97
what is the total variability defined by
SSb / SSw
98
what is the key thing to remember about one-way anova's
they do not tell you which variables are is more or less influential, they just tell you if the variables are significant
99
what is a family wise type 1 error when using ANOVA's
the probability of obtaining one type 1 error across the multiple tests
100
how do you correct for family wise type 1 errors in R
two ways: 1) Bonferroni correction --> works ok, very conservative 2) Holm correction --> better, sorts all p-values in order and does a bonferroni to all of them
101
what are the two assumptions of ANOVA's
1) residuals are normally distributed | 2) homogeneity of variance across all groups
102
what is an interaction effect
when the effect of one variable is dependant on another
103
when using the ~ sign, which variable goes first
the numerical variable is always first
104
what is the difference between eta.sq and eta.sq.part
eta. sq --> gives you the amount of variance each interaction is responsible for eta. sq.part --> the variance attributed to each variable if the other variables equalled zero
105
what does cor.test( ) default to in R
pearson's correlation
106
what does a spearmans rho test assume
monotonicity --> the data points are always going up or down
107
in the regression equation y = 2x + 5 + ei, what does the ei account for
random variation
108
reference the tree diagram on the bottom of the page in week 10 modules 1
:)
109
what are the two sources of variance in a regression
1) SSmod --> difference between the regression line and the mean Y value 2) SSres --> difference between data and the regression line
110
in terms of regression lines what does the null hypothesis assume compared to the alternative hypothesis
the null assumes that 'bi' or the 'x' value is equal to zero the alternative assumes that it is not
111
how does effect size for regressions (r^2) work
r^2 = 1 - SSres / SStot r^2 is a value between 1 and 0. If r^2 = 1 then there are no residuals (perfect model). If r^2 = 0 then there is total variability If r^2 = 0.52, then 52% of the variance is due to x
112
what does an r^2 value of 0.76 tell us
that 76% of the variance is due to x
113
what do the standardised coefficients from regressions tell us
which x value / predictor has a greater effect on y / the outcome
114
what are the three potential problems with regressions
1) outliers (not too bad) 2) high leverage point (not too bad) 3) high influence point (a combo of 1) and 2) and will change the slop of your regression line
115
how do you control for high influence points
one way would be to use exclusion criteria to try and remove outliers with high leverage points
116
define validity
a test is valid if it accurately measures what it is suppose to
117
define reliability
a test is reliable if it has the property of consistency in measurement --> is consistent across tests
118
which one out of reliability/ validity is reliant on the other
a test must be reliable first, in order to be valid
119
define error
unsystematic variance on a test occasion
120
what are the two type of error in classical test theory (not statistical testing) and what are they
1) endogenous --> due to the test taker | 2) exogenous --> outside the test takers control
121
what does classical test theory 1 say
observed score is a combination of true score plus error
122
what does classical test theory 2 say
variance of observed test results is equal to true score variance plus error variance
123
what are the four types of reliability
test-retest --> comparing T1 and T2 alternative forms --> comparing two alternative tests' scores split-half --> correlation between the two subsets of the test cronbach's alpha --> the mean of all possible split halves
124
is cronbach's a the best test you can use
no, it provides a lower-bound estimate, more effective modern tests exist
125
what did Nunnally and Bernstein (1994) say about reliability scores
for important decisions, a reliability of 0.90 is minimum and 0.95 is optimum
126
what happens to the standard error of measurement when reliability is perfect
it goes to zero
127
as reliability decreases...
the more a score moves away from its true score, and towards the mean
128
if asked about predicted true scores...
bottom of page right before week 11 modules 1
129
what are three ways to increase the correlation between test scores and constructs
1) increase relationships between constructs 2) remove sources of inconsistency in test administration and interpretation 3) increase no. of items on the test
130
disattenuated formula is on...
week 11 modules 1 top of the page
131
spearman-brown prophecy fromula is on...
below the disattenuated formula in week 11 modules 1
132
true or false, validity should be viewed as a continuum
true
133
what are the three types of validity
1) criterion 2) content 3) construct
134
talk about criterion validity
- least vague, most concrete | - has two types --> concurrent and predictive
135
in criterion validity, what is concurrent validity and give an example
test scores measured against a criterion measured at the same time. eg. a new test for AIDS compared against the current test for AIDS
136
in criterion validity, what is predictive validity and give an example
test scores evaluated against a criterion measured later. eg. ATAR score performance compared to university score performance
137
talk about content validity
the tests content reflects the full domain of the construct it is measuring --> related concepts = face validity --> does it look valid eg. sex drive test asking about finger size = poor content validity
138
what are two threats to content validity
1) including construct-irrelevant content | 2) construct under representation (the opposite of one)
139
talk about construct validity
- reflects the construct it is referencing | - two types convergent and discriminant validity
140
in construct validity, what is convergent validity
test scores should correlate with test scores on tests with related constructs --> correlation with related tests
141
in construct validity, what is discriminant validity
test scores should not heavily correlate with with unrelated tests --> non-correlation with unrelated tests
142
define sensitivity in terms of tests and validity
a tests ability to correctly detect positive cases
143
define specificity in terms of tests and validity
tests ability to correctly detect negative cases
144
define positive predictive power (PPP) in terms of tests and validity
probability a positive test result indicates a positive case
145
define negative predictive power (NPP) in terms of tests and validity
probability a negative test result indicates a negative result
146
when trying to determine a positive or negative case, which number is the least useful
0.5
147
define prevalence in terms of tests and validity
the probability a random case from a study is criterion positive
148
the validity that is termed the one to rule them all is
construct validity
149
what are the 5 aspects of construct validity
C RIRC 1) content --> should the things in the test be included 2) response processes --> makes test takers think about what it should 3) internal structure 4) relations to other variables 5) consequences of testing --> does the test get the result that it should
150
what does the Campbell & Fiske 1959 Multitrait - multimethod matricies do
attempts to understand convergent/ discriminant validity by evaluating two sources of variance: - trait and method variance
151
in terms of Campbell & Fiske 1959 Multitrait - multimethod matrices what is trait variance
measures tend to share variance if they're based on the same traits
152
in terms of Campbell & Fiske 1959 Multitrait - multimethod matrices what is method variance
measures tend to share variance if they're based on the same data source
153
see the Campbell & Fiske 1959 Multitrait - multimethod matrices model
bottom of week 11 modules 2 - diff methods diff constructs = weakest - same methods diff constructs = moderate? - diff methods same constructs = moderate? - same methods same constructs = strongest
154
what are the limitations of classical test theory
- struggles to generalise results to other assessments - struggles to calibrate different items to measure a common attribute - tends to adhere to a purely criterion-orientated view of validity
155
give a brief overview of the six sorting/grouping functions in R
group_by ( ) --> groups a variable summarise ( ) --> tells us what kinds of stats to create over these groups filter ( ) --> select from rows arrange ( ) --> tells you how to sort these rows select ( ) --> select from columns mutate ( ) --> create new columns
156
out of t-test (except one sample), anova's and regressions, who all use the: test ( formula, dataset) structure, which ones say 'formula =' and which ones just have the formula
anova's and regressions don't say, formula =, they just have the formula t-tests say, formula =
157
briefly explain when you would use: - a t-test - an anova - a chi-square - a regression
chi-square: when using nominal variable, comparing x against a theoretical p t-test: when you want to compare two unrelated things. eg. apples and oranges anova: when you have more than two means impacting on an outcome regression: when you want to see a correlation between two variables
158
when do you use a t-test (longer definition)
A t-test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether a process or treatment actually has an effect on the population of interest, or whether two groups are different from one another.
159
what test do you perform if your residuals are not normally distributed when doing ANOVA's
Kraskall - Wallis test