the research process Flashcards

1
Q

different type of missing values

A
  • Missing completely at random
  • Missing at random
  • Missing not at random
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

MCAR

A

no pattern in the missing data and completely random. Can be ignored or removed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

MNAR

A
  • Data missing not at random (MNAR) are missing for reasons related to the values themselves.
  • mostly high or low scores are missing.
  • some participants with low incomes avoid reporting their holiday spending amounts because they are low
  • lack data from key subgroups within the sample and not representative of your population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

MAR

A

missing values can be explained by other (observed) variables. Missingness predictable from other variables in the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how to identify HOW MUCH missing data is missing

A

1) Frequencies functions > statistics table
2) Explore function > case processing table
3) Missing value analysis > univariate statistics table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how to detect patterns

A
  • little MCAR test
  • T-test or dummy variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

little MCAR test

A

a multivariate test, that evaluates the subgroups of the data that share the same missing data pattern. evaluates differences between the observed and estimated means in each missing data pattern.
It is not a definitive test
* provides EM means table with the MCAR test provided
* If p-value is above .05 (non-significant), the data is MCAR
* If p-value is below .05 (significant), the data is not MCAR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

t-tests for missing values

A
  • T-test: evaluates if missingness is related to any of other variables with alpha= .05
  • For the t-test procedure, SPSS first separates cases with complete and missing values by creating an indicator variable of variables that contain missing values
  • confirmed by partitioning the data into two parts: one set containing the missing values, and the other containing the non-missing values.
  • After partitioning the data, use the t-test of mean difference to check whether there exists any difference in the sample between the two datasets.
  • MAR can be inferred if the MCAR test is statistically significant but missingness is predicted from variables (other than the DV) as indicated by the separate variance t-tests
  • MNAR is inferred if the t-test shows that missingness is related to the DV
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

using dummy variables for missing values

A
  • Construct a dummy variable with two groups, cases with missing and non missing values on income, and perform a test of mean differences in attitude between the groups
  • 1 = missing
    0 = observed
  • run t-tests and chi-square tests between this variable and other variables in the data set to see if the missingness on this variable is related to the values of other variables.
  • If there are no differences, decisions about how to handle data are not so critical
  • For example, if women really are less likely to tell you their weight than men, a chi-square test will tell you that the percentage of missing data on the weight variable is higher for women than men.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

listwise deletion

A

deleting data from all cases (participants) who have data missing for any variable in your dataset. You will have a dataset that is complete for all participants included in it

Few cases are missing

You may end up with a smaller/biased sample to work with

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

pairwise deletion

A

Lets you keep more of your data by only removing the data points that are missing from any analyses. It conserved more of your data because all available data from cases are included

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

estimating data

A
  1. mean substitution
  2. common point replacement
  3. regression
  4. expectation maximisation
  5. multiple imputation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

mean subsitution

A
  • replacing the value with the mean of cases across values
  • However the variance of the variable is reduced because the mean is closer to itself than the missing value it replaces
  • Is it best to avoid mean substitution unless the proportion of missing values is very small
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Common point replacement

A

replacing the value with the midpoint of the scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

imputation

regression

imputation

A
  • other variables are used as IV’s to write a regression equation for the variable with missing data serving as DV
  • Cases with complete data generate the regression equation; the equation is then used to predict the missing values for incomplete cases
  • however scores fit together better than they should
  • reduced variance because the estimate is probably too close to the mean
  • IV’s must be good
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

expectation maximisation

A
  • creates a missing data correlation (or covariance) matrix
  • assumes the shape of the distribution as normal for the partially missing data and infers the likelihood of the missing value falling within that distribution
  • can be used with (MCAR or MAR)
  • associated with bias and inapp SE
    1. First the E step, finds the conditional expectation of the missing data, given the observed values and current estimate of the parameters such as the correlations. These expectations are then subsituted for the missing data
    2. Second the M step performs maximum likelihood estimation as the missing data has been filled in&raquo_space; after convergence is achieved the EM variance-covariance matrix may be provided and/or the filled in data saved in the dataset
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

multiple imputation

A
  • using logistic regression to predict values based on other variables in your dataset:
  • differentiate cases with and without missing data&raquo_space;» uses other variables in your data set to estimate missing values&raquo_space; random samples taken from distribution from variable to create new datasets
  • Allows you to create 5 new data sets
  • Most respectable method and can be used when data is MNAR or MAR
  • Used for regression, ANOVA , Logistic regression and longitudinal data
  • Difficult to implement
  • Analyse >multiple imputation> impute missing data variables>insert key variables> set imputations to 5> create new data set> click on constraints to show which ones are predictors or values > impute and use as a predictor to the variable that has the missing data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

univariate outliers

A

large standardised scores, z scores, on a single variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

how to detect univariate outliers

A
  • By converting our data to z-scores, we can use benchmarks that we can apply to the dataset to search for outliers
  • Analyse&raquo_space;> descriptive statistics&raquo_space;» descriptives»> select the variable to convert and tick save standardised values as variables
  • Cases with standardised scores (z scores more than 3.29 from the mean (p <.001, two tailed test) are potential outliers (preferred method)
  • histograms&raquo_space;> using frequencies
  • Boxplots/ IQR range (Q3-Q1) method Q3 + (1.5IQR) & Q1-(1.5IQR)
  • P-P plots/ detrended P-P plots
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

multivariate outliers

A

a case with a strange combination of scores on two or more variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Calculating mahalanobis’s distance

A

a measure of the distance of each case from a centroid of cases determined from a combination of scores or variables detected along chi square distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Mahal distance cut off for 2 predictor variables

A

13.816

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

leverage

A

influential individual points
unusual predictor value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

discrepancy

A

the extent to which a case is in line with others (unusual y value given its x value)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

influence

A

combination of leverage and discrepancy

assesses a change in regression coefficients when a case is deleted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q
A

high leverage low discrepancy
moderate influence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q
A

high leverage
high discrepancy
high influence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q
A

low leverage
high discrepancy
moderate influence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

cook’s distance

A
  • used in regression analysis to find influential outliers in a set of predictor variables
  • Cases with influence scores larger than 1.00 are suspected of being outliers
  • Cooks distance is below 1 = which means no cases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

correcting outliers

A
  1. trimming data
  2. winsorizing
  3. robust estimation method
  4. transform the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

trimming the data

A

Delete a certain quantity of scores from the extremes.
2 methods:
1. trimmed mean
2. M-estimator

32
Q

trimmed mean

A

a 10% trimmed mean excludes the highest 10% of values and the lowest 10%. In other words, it uses the middle 80%.

33
Q

M-estimator

A

a robust regression method often used as an alternative to the least squares method when data has outliers. The amount of trimming is determined empirically to give a robust estimate of the mean

34
Q

winsorizing

A
  • Substitute outliers with the next highest score that is not an outlier
  • Replacing extreme scores with a score 3 standard deviations from the mean
35
Q

bootstrapping

A
  • bootstrapping to estimate parameters and their standard errors.
  • These tests do not rely on the assumption of normality
  • Bootstrapping allows us to estimate the properties of the sampling distribution from the sample data
  • sample data are treated as a population from which smaller samples (called bootstrap samples) are taken (putting each score back before a new one is drawn from the sample).
  • The parameter of interest (e.g., the mean) is calculated in each bootstrap sample.
  • This process is repeated perhaps 2000 times. The result is 2000 parameter estimates, one from each bootstrap sample.
36
Q

log transformation

A

corrects pos skew & kurtosis, unequal variances & lack of linearity

squashes the right tail of the distribution which reduces positive skew

can make a curvilinear graph linear

can’t get the log value of 0 or neg numbers so add a constant

37
Q

square root transformation

A

like the log, taking sqrt has a greater effect of larger scores than smaller ones. taking the sqrt brings larger scores to the centre which will reduce pos skew

zeros are fine but neg numbers are not

corrects pos skew & kurtosis, unequal variances and lack of linearity

38
Q

reciprocal transformation

A

dividing 1 by each score also reduces the impact of large scores.
high scores become low and vice versa

corrects for positive skew & pos kurtosis and unequal variances

39
Q

reverse score transformation

A

negative skew

can be used to correct negatively skewed data if you first reverse the scores

subtract each score from the highest score on the variable

40
Q

epistemology

A

The study of knowledge including methods, validity, scope, belief versus opinion
o Personal experience &Common sense
o Expert opinion
o Popular messages
o Ideological beliefs
o knowledge generated from systematic study and research.

41
Q

empirical data

A

evidence or observations grounded in human sensory experience: touch, sight, hearing, smell and taste.

42
Q

overgeneralisation

A

A statement that goes far beyond what can be justified based on available data

43
Q

selective observation

A

Process of investigation that reinforces pre-conceived ideas rather than a neutral and balanced process

44
Q

premature closure

A

Ending an investigation and deciding before gathering sufficient information about the data

45
Q

halo effect

A

The reputation of a place/person/thing alters their/its evaluation in a way that is no longer neutral or equal.

46
Q

false consensus

A

Projecting your own way of thinking onto others, assuming everyone thinks the same way

47
Q

pseudoscience

A

Ideas or information that has the appearance of science (through the use of jargon) but lacks the systematic rigour and standards of the scientific method e.g. Astrology

48
Q

junk science

A

A term used to criticise research (even if conducted according to standards of the scientific method) that produces findings opposing the views of an advocacy group (tobacco lobbyists discrediting scientific evidence of the health implications of smoking)

49
Q

pop science

A

Information is presented through mass media to the general public (lay audience), who may have low scientific literacy. This information may be oversimplified (leading to misinterpretation) or be outdated but is thought to be true by most people

50
Q

a hyptotheis must be

A

empirically testable, tentative statement about a proposed relationship between concepts of variables.

2 or more variables
causal relationship
predictive
linked to previous theory
falsifiable

51
Q

tautology

A

circular reasoning: the causal factor and the result are the same
o High neuroticism predicts high stress

52
Q

Teleology

A

causal factor does not precede result, or causal factor can’t be measured

Attraction between ppl is caused by human nature

53
Q

ecological fallacies

A

empirical data about associations for large scale units are over-genrealised to smaller scale unnits

Drew comes from a low SES background. People from lol SES backgrounds have poorer health outcomes. Therefore drew is less healthy than his high ses friend steve

54
Q

reverse of the ecological fallacy

Reductionist errors

A

Empirical data about association for small scale units are generalised to larger scale units.

55
Q

Spuriousness

A

wrongly inferred causal relationship due to an unseen third factor, can occur when you over-interpret the meaning of a simple correlation
o Money spent on pets (US) correlates with eBay total gross merchandise volume

56
Q

underpowered

A

too few participants

57
Q

overpowered

A

too many participants

58
Q

power

A

the probability of rejecting the null hypothesis

59
Q

A priori power analysis

A
  • conducted before your study takes place
  • Can estimate the sample size that we require to detect an effect based on our desired (α) level, the desired power level (1-β) and the size of the effect
  • allows us to control for both type 1 error probability α (the prob of incorrectly rejecting H0 when it is, in fact, true), type 2 error probability β (the prob of incorrectly retaining H0 when it is, in fact false)
  • Also controls the power of the test (complement of the type 2 error probability 1-β)
60
Q

Post hoc power analysis

A
  • Typically conducted after your study takes place.
  • We already know the outcome of our analysis however, we may wonder about the power of a test to detect a truly significant difference
  • Post hoc allows us to use the already known sample size (n), the set alpha level, and the specified effect size to return the power (1-β) associated with the test
  • You can only control for α not β
  • This means the the post hoc power analysis is to provide a critical eval of non-significant findings
61
Q

setting alpha and beta

A
  • We usually set the desired alpha level to .05 i.e., α = 0.05, and our beta level at .20 i.e. β = 0.20
  • If we set our beta level at .20, the power we would get from our sample would be 0.80. that is, power = 1-β, 1-0.2= 0.80. This means we have 80% probability of rejecting the null hypothesis.
62
Q

G* power

A
  • Using G*power
  • The most common software for performing power analysis and sample size calculations is G*power. It is a free, open source and user-friendly program for power analysis and sample size calculations for t-tests, ANOVA, regression models and other study designs.
  • Using G*power can underestimate the same size required for the indirect effect hence a more preferred power analysis using MedPower is often preferred for estimating sample sizes for mediation models.
63
Q

steps in the research process for quantitative

A
  1. select a topic
  2. focus on the question
  3. design the study
  4. collect the data
  5. analyse the data
  6. interpret the data
  7. inform others
64
Q

research process (qualitative)

A
  1. acknowledge self and context
  2. adopt a perspective
  3. design a study
  4. inform others
65
Q

quantitative

A
  • non-experimental or correlational designs
  • fully experimental designs
  • quasi experimental designs
66
Q

non-experimental correlational design

A

assess the interpretation of association (direction of their association, strength of their association and statistical significance of their relationship) these 3 findings are interrelated.

  • Typically involve participants completing a questionnaire measuring them on a different number of scales
67
Q

fully experimental designs

A

actively manipulate an IV or variable
use randomly formed groups of participants

68
Q

quasi-experimental designs

A

use pre-existing or other non-randomly assigned groups and/or interventions
a causal inference cannot be made

69
Q

exploratory research

A
  • research whose primary
  • purpose is to examine a little-understood issue or phenomenon to develop preliminary ideas about it and move toward refined research questions.
  • Our goal is to formulate more precise questions that we can address in future research.
  • Doesn’t yield definitive answers.
  • It addresses what.
  • Does not involve hypothesis testing and is qualitative in nature.
70
Q

descriptive research

A
  • primary purpose is to “paint a picture” using words or numbers and to present a profile or classification of types or an outline of steps to answer questions such as who, when where and how
  • You may have a well-developed idea about a social phenomenon and want to describe it.
  • Descriptive research presents a picture of the specific details of the situation, social setting, or relationship.
  • Much of the social research found in scholarly journals or used for policy-making decisions is descriptive.
  • Descriptive and exploratory blur together in practice
71
Q

explanatory study

A
  • Explanatory research: research whose primary purpose is to explain why events occur and to build, elaborate, extend, or test theory.
  • it builds on exploratory and descriptive research and goes on to identify the reason something occurs.
  • When encountering an issue that is known and with a description of it, we might wonder why things are the way they are.
  • Addressing the why is the purpose of explanatory research. It builds on exploratory and descriptive research and goes on to identify the reason something occurs.
  • An explanatory study looks for causes and reasons.
  • A descriptive study would document the number of heavy drinkers who abuse their children, whereas an explanatory study would be interested in learning why these parents abuse their children.
72
Q

within or across cases

A
  • Sometimes we carefully select or sample a smaller number of cases out of a much larger pool of cases or population.
  • Studies may involve hundreds or thousands of cases.
  • In other studies (experiments), we analyse a few dozen people and manipulate conditions for these people.
  • Whilst the number of cases in a study is important , a more critical issue is whether a study primarily focuses on features within cases or across cases
  • Case: a “definitional morass” – a case is bounded or delimited in time and space/ often called a unit or observation
  • An individual person can be case as can a family, company or entire nation
73
Q

case study research

A
  • Case study research: research that is an in-depth examination of an extensive amount of information about a very few units or cases for one period or across multiple periods of time
  • Examines many features of a few cases.
  • The cases can be individuals, groups, organisations, movements, events, or geographic units.
  • The data on the case are detailed, varied and extensive.
  • It can focus on a single point in time or a duration of time
  • Most case study research is qualitative however it does not need to be
  • Case study research intensively investigates one or a small set of cases, focusing on many details within each case and the context.
  • examines both details of each cases internal features as well as the surrounding situation
  • Enable us to link micro level (actions of individuals) to the macro level (large scale structures and processes
74
Q

what to consider when doing research

A
  1. Research must be ethical
  2. You research question must be testable
  3. Research must be within your capabilities
  4. Research must build on existing knowledges
  5. Ask yourself what you are trying to achieve within this research
  6. Good research has a clear purpose and rationale
75
Q

benefits of case studies

A

1) Conceptual validity: case studies to flesh out and identify concepts/variables that are of greatest interest and move toward their core or their essential meaning in abstract theory.
2) Heuristic impact: case studies are highly heuristic (providing further learning, discovery or problem solving) they help with constructing new theories, developing or extending concepts and exploring the boundaries among related concepts
3) Causal mechanisms identification
4) Ability to capture complexity and trace processes: case studies can effectively depict highly complex, multiple factor events/situations and trace processes over time and space
5) Calibration: case studies enable researchers to adjust measures of abstract concepts to dependable lives experiences and concrete standards
6) Holistic elaboration: case studies can elaborate on an entire situation or process holistically and permit the incorporation of multiple perspectives or views