Research Methods Flashcards

Question

what does filter( ) do

Answer 1

lets you select the subset(s) of the rows eg. filter(species == "bears") will only show bears in the dataset

Answer 2

tells you how to sort them | eg. after piping from filter for eg. then arrange(gender) will arrange by gender so that they are clumped together

Answer 3

lets you select which columns you want to view eg. gdata %>% select (species, age, gender)

Answer 4

lets you make new columns eg. gdata%>% mutate (ageMonth = age * 12)

Answer 5

means not, or everything but it. so if you are using select (- gender) --> will show all columns but gender

Answer 6

converts from long to wide dataset display, by increasing the number of columns and decreasing the number of rows pivot_wider (names_from = x, values_from = y)

Answer 7

converts from wide to long dataset display by increasing the number or rows and decreasing the number of columns pivot_longer (cols, names_to = "x", values_to "y")

Answer 8

ggplot ( mapping = aes ( x = year, y = rating))

Answer 9

if you put colour in the aes( ) you make colour change one of the variables if you put it in ggplot( ) it changes the colours of the overall graph

Answer 10

edits the labels on a graph

Answer 11

divides the graph up into different variables | eg. facet_wrap( ~ food)

Answer 12

``` fill = filling shapes colour = changes the colour of lines ```

Answer 13

changes the transparency of the graph

Answer 14

it is a colouring function where x is the selected pallet .

Answer 15

``` {r ......... , fig.width = 6 , fig.height = 7} for example

Answer 16

making a prediction

Answer 17

drawing a conclusion | - two types, freqentists and bayesians

Answer 18

they believe in long run frequency and that probability is objective

Answer 19

odds are mixed with rationale - a degree of belief in probability --> more flexible and tricky

Answer 20

data from one of two possible events

Answer 21

- p and qbinom() use the opposites letter in the brackets | - if you want the opposite of pbinom() do: 1 - pbinom()

Answer 22

all points are continuous

Answer 23

they are all the same

Answer 24

the same functions as binomial distribution but with 'norm' instead of 'binom' and the brackets use:

Answer 25

you can not let n = a specific number because points are continuous. so if you want to fine the number of people who are 185cm tall you have to create a vector between say 184.5 and 185.5

Answer 26

true mean is the actual mean of the population whereas estimated mean is the mean of the sample you have taken from the population --> it is often impossible to get the true mean as we cannot get data from the whole population

Answer 27

the same sign as the true one but with a little hat on top of it '^' - used to indicate that these numbers are not the same as the actual numbers in the population

Answer 28

is sampling a population of means --> many means from many different samples

Answer 29

is the sd of the sampling distribution of the mean

Answer 30

SEM = sd/ sqrt(sample size)

Answer 31

the sampling distribution of a mean will always form a normal distribution regardless of what the true population looks like. eg. if there is a massive positive skew on a pop. when sampling 100 cases from it, this data will form a normal distribution

Answer 32

an interval around the sample mean where by we can accurately report --> usually around 95% - this means we are 95% sure that the true mean falls within this range

Answer 33

Fisher --> thought NHT was about falsifying a single hypothesis Neyman --> hypothesis is about choosing between two rival hypothesises

Answer 34

we say "how likely are our results given that the null is true?" - rather than if the null is true or not

Answer 35

type 1 - reject a true null hypothesis type 2 - non-rejection of a false null hypothesis if we reduce our alpha level, that is significance level for type 1 error, then we increase out beta value (type 2 error significance) - refer to book if this doesnt make sense

Answer 36

type 1 = alpha sign | type 2 = beta sign

Answer 37

we set a low alpha level (significance level ), therefore increasing the chance of type 2 error over type 1 error

Answer 38

a diagnostic test to see if the hypothesis and alternative predict different values

Answer 39

one-tailed - is where one side of the rejection region (the area outside the 95%) is favoured two-tailed - where there is even spread outside the rejection region

Answer 40

cannot say if it is true or not, only that we are accepting it or rejecting it.

Answer 41

- eg. comparing 'x' against a theoretical predictor 'p' | - null hypothesis says x will equal p, alternative hypothesis says x will not equal p

Answer 42

a goodness of fit test

Answer 43

k - 1, where k = no. of factors

Answer 44

the higher the chi squared statistic, the worse the null is at explaining the data

Answer 45

it allows you to input the quantile and the degrees of freedom and outputs which data point to cut off at in order to reject the null eg. qchisq ( 0.95, df = 3 )

Answer 46

when using the summarise function

Answer 47

same idea as the regular chi-square test but used when you are dealing with two nominal variables

Answer 48

chi-square test (goodness of fit test) is hypothesising about the true probabilities of a variable whereas independence is hypothesising about the relationship between two variables

Answer 49

how large the difference between our data and the null hypothesis predictions actually were

Answer 50

how probable it was that our results were due to chance

Answer 51

is an effect size measurement

Answer 52

1) the expected frequencies are large fisher. test( x = dataset) 2) the data is independent mcnemar. test( x = dataset) - when reporting data must mention the use of mcnemar's chi-square test

Answer 53

0 - 0.1 = negligible 0. 1 - 0.3 = weak 0. 3 - 0.5 = moderate 0. 5 - 1 = high

Answer 54

a test statistic is diagnostic if it accurately says the larger our data is from the null, the larger our output variable will be - the statistic gets larger as our data moves away from the null

Answer 55

good for calculating/comparing separate data | eg. apples and oranges

Answer 56

0. 2 = small 0. 5 = moderate 0. 8 = large

Answer 57

If we are only looking at one side of the data

Answer 58

when we are trying to analyse data from two separate groups. eg. rich vs poor hunger levels over a year

Answer 59

1) what both groups are normally distributed 2) both groups have equal variance (which is almost impossible for independent samples)

Answer 60

a welch t-test this doesn't assume the variance is equal - R defaults to using this type of t-test if you don't specify

Answer 61

comparing two means with a repeated measures design --> difference between T1 and T2

Answer 62

1) the population is normally distributed | 2) the data are independent --> apart from paired samples

Answer 63

1) plotting - lets you look and see if the data is normally distributed 2) shapiro test - gives you actual data --> returns a p-value that if greater than 0.05, your data is normally distributed

Answer 64

Wilcoxon test - centres each groups data around that groups mean eg. how many data points were above the mean in group A and in group B

Answer 65

when we have more than two variables/influencing factors

Answer 66

one way --> for things defined by a single variable eg. no. of cows on different land types two --> for things defined by two variables/ grouping variables. eg. no. of cows depending on land type and food source

Answer 67

between groups are how different the group means are from other groups - separation of the graphs within groups are how large the spread is in one variable. - width of the graph

Answer 68

they do not tell you which variables are is more or less influential, they just tell you if the variables are significant

Answer 69

the probability of obtaining one type 1 error across the multiple tests

Answer 70

two ways: 1) Bonferroni correction --> works ok, very conservative 2) Holm correction --> better, sorts all p-values in order and does a bonferroni to all of them

Answer 71

1) residuals are normally distributed | 2) homogeneity of variance across all groups

Answer 72

when the effect of one variable is dependant on another

Answer 73

the numerical variable is always first

Answer 74

eta. sq --> gives you the amount of variance each interaction is responsible for eta. sq.part --> the variance attributed to each variable if the other variables equalled zero

Answer 75

pearson's correlation

Answer 76

monotonicity --> the data points are always going up or down

Answer 77

random variation

Answer 78

1) SSmod --> difference between the regression line and the mean Y value 2) SSres --> difference between data and the regression line

Answer 79

the null assumes that 'bi' or the 'x' value is equal to zero the alternative assumes that it is not

Answer 80

r^2 = 1 - SSres / SStot r^2 is a value between 1 and 0. If r^2 = 1 then there are no residuals (perfect model). If r^2 = 0 then there is total variability If r^2 = 0.52, then 52% of the variance is due to x

Answer 81

that 76% of the variance is due to x

Answer 82

which x value / predictor has a greater effect on y / the outcome

Answer 83

1) outliers (not too bad) 2) high leverage point (not too bad) 3) high influence point (a combo of 1) and 2) and will change the slop of your regression line

Answer 84

one way would be to use exclusion criteria to try and remove outliers with high leverage points

Answer 85

a test is valid if it accurately measures what it is suppose to

Answer 86

a test is reliable if it has the property of consistency in measurement --> is consistent across tests

Answer 87

a test must be reliable first, in order to be valid

Answer 88

unsystematic variance on a test occasion

Answer 89

1) endogenous --> due to the test taker | 2) exogenous --> outside the test takers control

Answer 90

observed score is a combination of true score plus error

Answer 91

variance of observed test results is equal to true score variance plus error variance

Answer 92

test-retest --> comparing T1 and T2 alternative forms --> comparing two alternative tests' scores split-half --> correlation between the two subsets of the test cronbach's alpha --> the mean of all possible split halves

Answer 93

no, it provides a lower-bound estimate, more effective modern tests exist

Answer 94

for important decisions, a reliability of 0.90 is minimum and 0.95 is optimum

Answer 95

it goes to zero

Answer 96

the more a score moves away from its true score, and towards the mean

Answer 97

bottom of page right before week 11 modules 1

Answer 98

1) increase relationships between constructs 2) remove sources of inconsistency in test administration and interpretation 3) increase no. of items on the test

Answer 99

week 11 modules 1 top of the page

Answer 100

below the disattenuated formula in week 11 modules 1

Answer 101

1) criterion 2) content 3) construct

Answer 102

- least vague, most concrete | - has two types --> concurrent and predictive

Answer 103

test scores measured against a criterion measured at the same time. eg. a new test for AIDS compared against the current test for AIDS

Answer 104

test scores evaluated against a criterion measured later. eg. ATAR score performance compared to university score performance

Answer 105

the tests content reflects the full domain of the construct it is measuring --> related concepts = face validity --> does it look valid eg. sex drive test asking about finger size = poor content validity

Answer 106

1) including construct-irrelevant content | 2) construct under representation (the opposite of one)

Answer 107

- reflects the construct it is referencing | - two types convergent and discriminant validity

Answer 108

test scores should correlate with test scores on tests with related constructs --> correlation with related tests

Answer 109

test scores should not heavily correlate with with unrelated tests --> non-correlation with unrelated tests

Answer 110

a tests ability to correctly detect positive cases

Answer 111

tests ability to correctly detect negative cases

Answer 112

probability a positive test result indicates a positive case

Answer 113

probability a negative test result indicates a negative result

Answer 114

the probability a random case from a study is criterion positive

Answer 115

construct validity

Answer 116

C RIRC 1) content --> should the things in the test be included 2) response processes --> makes test takers think about what it should 3) internal structure 4) relations to other variables 5) consequences of testing --> does the test get the result that it should

Answer 117

attempts to understand convergent/ discriminant validity by evaluating two sources of variance: - trait and method variance

Answer 118

measures tend to share variance if they're based on the same traits

Answer 119

measures tend to share variance if they're based on the same data source

Answer 120

bottom of week 11 modules 2 - diff methods diff constructs = weakest - same methods diff constructs = moderate? - diff methods same constructs = moderate? - same methods same constructs = strongest

Answer 121

- struggles to generalise results to other assessments - struggles to calibrate different items to measure a common attribute - tends to adhere to a purely criterion-orientated view of validity

Answer 122

group_by ( ) --> groups a variable summarise ( ) --> tells us what kinds of stats to create over these groups filter ( ) --> select from rows arrange ( ) --> tells you how to sort these rows select ( ) --> select from columns mutate ( ) --> create new columns

Answer 123

anova's and regressions don't say, formula =, they just have the formula t-tests say, formula =

Answer 124

chi-square: when using nominal variable, comparing x against a theoretical p t-test: when you want to compare two unrelated things. eg. apples and oranges anova: when you have more than two means impacting on an outcome regression: when you want to see a correlation between two variables

Answer 125

A t-test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether a process or treatment actually has an effect on the population of interest, or whether two groups are different from one another.

Answer 126

Kraskall - Wallis test