Midterm 1 Flashcards
What is a hypothesis?
An assertion of one possible state of the phenomenon or relationship under investigation. In other words, a proposed explanation for a question you are asking.
When is a hypothesis falsifiable?
A scientific hypothesis is falsifiable when it it is specific. A genuine test of a hypothesis is one that tries to refute it, not confirm it.
When is a hypothesis useless?
A hypothesis is essentially useless if it is consistent with every possible outcome. Rather, a hypothesis should be consistent with only a subset of possible empirical (observable) outcomes, and incompatible with others.
Why are testable hypotheses necessary?
This is because science is ever-changing, and refining. The scientific status of a theory is based on its falsifiability, refutability, or testability.
What is the purpose of a hypothesis?
It helps us generate simple models of the physical world, that allow us to predict phenomena, determine causes of phenomena, explain phenomena and control phenomena.
When is a hypothesis unfalsifiable?
- When no empirical evidence is obtainable
- When predictions are so vague that they can hardly fail
- When a hypothesis is upheld even though refuted by data, by introducing additional assumptions after the fact.
What is an operational definition?
A testable hypothesis must be operationally defined. An operational definition is a description of how a concept will be measured. Essentially, turning a concept into a quantity.
What is an example of an operational definition?
Happiness can be measured by how many times someone smiles in an hour, brain activity, or a self-report survey.
What is the purpose of an operational definition?
- They allow us to quantify and measure concepts.
- It makes sure variables are measured throughout the study.
- It allows us to communicate ideas to others.
What makes a good operational definition? VAAPORC
V - Validity (did your operational definition measure what
it actually intended to measure)?
A - Absence of bias
A - Acceptance in the scientific community
P - Practicality (something easy to measure)
O - Objective
R - Reliability
C - Cost (it is cost-effective)
Reliability and bias refer to____?
Refers to the difference between the measure and the “true” value of that variable. This difference is referred as systematic error.
What is bias?
Bias is the average error over many measurements.
What are the differences between hypotheses and predictions?
Hypothesis is framed as a statement, whereas a prediction is more related to the specific methodological details.
Hypothesis is often phrased in present tense, whereas predictions are often in the future tense.
Hypothesis is derived from a broader theory, whereas a prediction is quite specific.
What are the two main ways to assess operational definitions?
Reliability and validity.
Details about reliability
- Operational definition has to be based on concrete, observable behaviours.
- It must facilitate consistency/precision across measurements.
The more the variation, random error, and noise decrease ______.
Reliability
Details about validity
- Must be based on relevant behaviour
2. Facilitates the accuracy of measurements
The more the systematic error, and bias decrease _____.
Validity
A measurement is…….?
the true score + measurement error
Measurement error is…..?
systematic error + random error
What factors contribute to measurement error?
- Precision of the operational definition (lack of detail, subjectivity, and specificity)
- Error as a result of the measurement device.
- Human error (level of training, expertise, and attention level)
The more specific the operational definition, the more ______.
Consistent the measurements.
What does the r value represent?
It represents the correlation between the two variables.
What are the r values?
r>0, positive correlation
r<0, negative correlation
r = 0, no correlation
What are the types of reliability measures?
Inter-rater reliability (finding consistency between raters)
Test-retest (re-taking the same test over and over)
Internal consistency reliability includes split-half reliability, Cronbach Alpha, and item-total.
How come test re-test is sometimes difficult?
It can become biased. If someone is taking the same test, over and over, they may become better at it over time.
What is construct validity?
How well a test or tool (hypothesis, operational definition etc.) measures what it intended to actually measure.
What are the indicators of construct validity?
- Face validity
- Content Validity
- Predictive Validity
- Concurrent Validity
- Convergent Validity
- Discriminant Validity
What is face validity?
Does the test appear to measure what it is intending to measure. As face validity is a subjective measure, it’s often considered the weakest form of validity. However, it can be useful in the initial stages of developing a method.
What is content validity?
Content validity assesses whether a test is representative of all aspects of the construct. To produce valid results, the content of a test, survey or measurement method must cover all relevant parts of the subject it aims to measure. If some aspects are missing from the measurement (or if irrelevant aspects are included), the validity is threatened.
What is an example of content validity?
A mathematics teacher develops an end-of-semester algebra test for her class. The test should cover every form of algebra that was taught in the class. If some types of algebra are left out, then the results may not be an accurate indication of students’ understanding of the subject. Similarly, if she includes questions that are not related to algebra, the results are no longer a valid measure of algebra knowledge.
What is predictive validity?
This is the degree to which a test accurately predicts a criterion that will occur in the future. For example, a prediction may be made on the basis of a new intelligence test, that high scorers at age 12 will be more likely to obtain university degrees several years later. If the prediction is born out then the test has predictive validity. “Can our measure predict something in the future?” Can this selection test predict performance on the job?
What is concurrent validity?
This is the degree to which a test corresponds to an external criterion that is known at the time and is already valid. If the new test is validated by a comparison with a currently existing criterion, we have concurrent validity.
These are both for the same construct.
What is an example of concurrent validity?
For example, let’s say a group of nursing students take two final exams to assess their knowledge. One exam is a practical test and the second exam is a paper test. If the students who score well on the practical test also score well on the paper test, then concurrent validity has occurred.
What is convergent validity?
Convergent validity is a supporting piece of evidence for construct validity. The underlying idea of convergence validity is that related construct’s tests should be highly correlated. For example, in order to test the convergent validity of a measure of self-esteem, a researcher may want to show that measures of similar constructs, such as self-worth, confidence, social skills, and self-appraisal are also related to self-esteem.
- different methods of measuring the same construct, to see whether both are related
What is discriminant validity?
Discriminant validity tests whether concepts or measurements that are not supposed to be related are actually unrelated.
- the same methods, measure different constructs, give scores that are NOT correlated.
Concurrent and predictive validity are the _______, whereas convergent and discriminant is based on ______.
- gold standard
2. other measures
What is a variable?
an event, situation, behaviour, or characteristic. Something that has a quality or quantity.
What is a quantitative variable?
A variable that measures a magnitude or quantity.
What are the types of quantitative variables?
- Interval - all quantitative variables are interval, but 0:00 is not defined. Celsius or Fahrenheit is not a ratio variable because 0°C does not mean there is no temperature
- Ratio - takes into account a true zero. Such as time at 0:00, that is a meaningful time. Weight, age, pulse rate, etc.
- Discrete variable - Variables that can only take on a finite number of values are called “discrete variables.” For example, you can only use whole numbers when describing your siblings. You can’t have HALF a sibling.
- Continuous variable - Variables that can take on an infinite number of possible values are called “continuous variables.” For example, height can be continuous as in 1.65 metres.
What is a categorical variable?
Variables that have different qualities (gender, colours, where you live etc).
What are the types of categorical variables?
Nominal - there is no obvious relationship between the levels
Ordinal - takes on an order (i.e., pain level on a scale of 1-5).
How can we distinguish between these variables?
You can usually take an average or use a subtraction test for a quantitative variable.
Quantitative variables can be discrete or continuous. If we use a midway test, by taking two values and averaging them, if that new value has a meaning it is discrete. If not, it is continuous.
What is monotonic vs. non-monotonic?
Non-monotonic means that the relationship does not always go in a single direction (i.e., non-linear).
What are the key points of non-experimental research?
- No direct intervention
- Observational or correlational
- Both variables are measured
- You can record record physiological responses, or observe behaviour.
- Examples include self-reports, or using existing records.
- Cannot determine causal relationships
What are the key points of experimental research?
- At least one variable is manipulated, the IV.
- One variable is measured, the DV.
- Can determine causal relationships
What is a spurious correlation?
Two variables that appear causal, when they are actually not.
What is a confounding variable?
Variables that influence both the dependent and independent variable. The confound makes it hard to determine which variables are actually causing the effect.
What is the difference between a confounding variable and extraneous?
Extraneous variables, which are any factors that are in the experiment but not being studied, and confounding variables, which are related to the independent variable and affect the dependent variable.
What are potential confoudns?
Poor operational definitions, participant factors (age, intelligence, socio-economic status), order effects (fatigue, practice), and group factors
How do we eliminate extraneous variables and even confounds?
Random assignment. Random assignment normalizes the effects of confounds.
What is internal validity?
The extent to which a study establishes a cause-and-effect relationship between a treatment and an outcome.
What is random assignment?
With random assignment, each participant has an equal chance of being placed in any of the experimental groups or control groups.
What is the difference between random sampling and random assignment?
Random sampling is different from random assignment. Random sampling involves recruiting random subjects from the population to ensure there is no bias etc.
What are descriptive statistics?
They help us organize, summarize and describe data (usually based on samples)
What are inferential statistics?
They help us make generalizations from the sample to the population.
What information can you get from descriptive statistics?
Shape, spread (variability), outliers, and central tendency.
What kind of graphical representations can you use?
A pie chart, bar graph, histogram, or frequency table.
Categorical variables use _______, whereas quantitive variables use __________.
- Bar graph and pie chart
2. Histogram
The smaller the sample, the greater the _______.
- Deviation
What does the law of large numbers tell us?
As sample size increases, the statistics become less variable, and more closely estimate the true population.
What characteristics are given by frequency distributions?
Shape, central tendency, and variability.
What shapes do distributions take on?
Symmetric (each half is a mirror image of the other)
Skewed (positively skewed - tail is on the right, negatively skewed - the tail is on the left).
Details about modality
There can be unimodal, bimodal, multimodal, or uniform (no defined mode) distributions.
What does the central tendency tell us?
The mean, the median (50th percentile), and the mode (most frequently occurring data).
What central tendencies can be used for quantitative variables (both ratio and interval)?
Mean, median and mode.
What central tendencies can be used for categorical variables (nominal, and ordinal)
Nominal - mode
Ordinal - median, and mode
What are the pros and cons of the mode?
Pros - it can be used for all types of data
Cons - it only tells you the most frequently occurring data point, but ignores the rest of the data
What are the pros and cons of the median?
Pros - Robust against outliers, it gives us a better summary of skewed data, and can be used with ordinal data
Cons - Limits the use of many statistical tests
What are the pros and cons of the mean?
Pros - tells us the average
Cons - cannot be used for categorical data, sensitive to outliers, and a poor measure of “central tendency” for highly skewed distributions
What is survey research?
A survey/self-report that is administered through an interview or questionnaire.
What is the purpose of survey research?
To learn about attitudes and beliefs, facts, demographics, and behaviours.
Who created the first survey?
Charles Darwin
What do surveys usually include?
- open ended or closed ended questions
- a methodology for asking people to tell them about themselves
- can be used to study relationships between variables
- can serve as an important complement to experimental research findings
What are some issues with question wording in surveys?
- unnecessary complexity - using unfamiliar technical terminology, or phrasing that overloads your working memory
- vague questions - using imprecise terms, or poor grammatical structure
- loaded/leading questions - embedding questions with misleading information, written in a way to bias the information
- double-barrelled questions - asking two things at once
- negative wording - the question should not have double negatives
- yay-say or nay-say wording - hard to distinguish responses from a participant, especially if they are not actually doing the survey properly. To fix this, put a specific question saying “for this question, please put highly disagree.”
What is an example of a leading question?
How great is our hard-working customer service team?
What is an example of a double-barrelled question?
Was the product easy to find and did you buy it?
What is an example of a negative question?
Was the facility not unclean?
What are some options for responses to questions?
- Likert scales
- Use an odd number of levels
- Rating scales
- Non-verbal scales (i.e., use of a facial expression)
What are response sets?
This is the human tendency to answer questions in ways that are the most complimentary, or flattering, to the respondent rather than telling the absolute truth. This includes demand characteristics, and social desirability bias.
How can we avoid demand characteristics and participant bias?
- Anonymity
- Deception
- Disguise the dependent measure by putting random questions so the participant cannot guess the true purpose of the story
- Ask the participant what they thought the hypothesis was during the debriefing stage
What things should we consider when finalizing the survey?
- Attractive and professional layout
- Neatly typed, and free from errors
- Consistent point scales
- Ask interesting questions first
- Keep the survey short as possible
How are surveys distributed?
- in person to groups or individuals
- Internet surveys
- Apps
What are the advantages and disadvantages of surveys?
Pros - less costly than interviews, and can ensure anonymity
Cons - Boredom and distraction. Participants may also not answer correctly.
Population vs. sample
Population - all individuals of interest
sample - a sample of some of these individuals
What is a census?
If we study everyone in the population, it is referred to as a census.
What is a confidence interval?
A range of values that’s likely to include a population value with a certain degree of confidence.
Smaller sample sizes have _____________.
- Greater confidence intervals, and larger variability.
What are the differences between non-probability sampling, and probability sampling?
In non-probability sampling, you do not need to use random sampling, but rather you can use convenience sampling. This is because the phenomena under investigation is expected to be relatively similar across the population. Ex: limit of short term memory.
In probability sampling, the phenomena under investigation is expected to vary across the population. Ex: beliefs, values, political view etc. In this case, we need a technique that would be representative of an entire population.
What are the types of probability sampling?
- Simple random sampling - every member of a population has an equal probability of being selected
- Stratified random sampling - population is divided into subgroups, and random samples are taken from each strata. Ex: All students are grouped by their major, and then 50 students are randomly chosen from each major.
- Cluster sampling - randomly selects clusters and uses all individuals belonging to those clusters. Ex: psych majors are identified at 100 schools. Then 10 of those clusters are chosen. All students in each cluster are sampled.
- Multistage cluster sampling - clusters are identified, and then only some individuals from each cluster is chosen
- Systematic sampling - choosing every nth person from a group
What are the types of non-probability sampling?
- Convenience sampling - sampling whoever is most convenient
- Purposive sampling - a sample of individuals that meet a pre-determined criterion (ex: age, gender etc.)
- Quota sampling - sampling reflects the numerical composition of various groups
Do sample sizes need to increase proportionally with the population to keep precision?
No, this is not necessary.
What does the p-value indicate?
p>0.05 indicates that the results are not statistically significant, and that random assignment has failed.
What is a two sample t-test used for?
It is used to test the difference between the population of two means.
When is a paired t-test used?
We use this when we are interested in two variables by the same subject (ex: measuring the left hand and right hand)
When do we use a one-way ANOVA?
Used to determine the difference between one factor, with at least two levels that are independent of each other.
When do we use a chi test?
We use this to find the significance between two or more categorical variables.
How do we determine which test to use?
REFER TO THE SLIDE
- scale of measurement (nominal vs. ordinal, or ratio vs. interval)
- how many levels of the IV are there?
- Repeated measures vs. independent groups
- are we looking for parametric or non-parametric tests?
What are the main types of experimental design?
Independent groups, and repeated measures
What is an independent group study?
Different participants experience different levels of the IV.
What is a repeated measures study?
The same participants experience all the different levels of the IV.
When is it not possible to do a repeated measure experiment?
sometimes in animal studies we have transgenic mice vs. wild type or in humans we want to see the difference between male and female. In addition, sometimes one treatment requires surgery, while the other does not.
What are the advantages of repeated measure?
- fewer participants needed
- greater power
- more likely to detect true differences
- reduces confounds because it accounts for individual differences, since it it all the same participants
What are the disadvantages of repeated measure?
Order effects: fatigue, practice, contrast effect (dog photo example)
Demand characteristics
How do we counteract order effects?
Counterbalance - do treatment A and then B, and then switch to B and then A.
Why is counterbalancing often difficult?
If you have 4 conditions, you have to have 24 variations. As a result, you would need a minimum of 24 people just to have 1 person in each condition.
How can counterbalancing be solved?
- We can use partial counterbalancing. This is the use of the latin square. Each condition appears in each position once.
- Randomize the treatment conditions
- Reverse counterbalancing ABC -> CBA
- Have a time interval between conditions
- Use independent group design
What are the advantages of independent groups?
- avoids order effects
- avoids demand characteristics
- treatments with relatively permanent effects
- similar to real world setting
What are the disadvantages of independent groups?
- greater risk for confounds due to individual differences
2. any true differences may not be detected due to lower power
How do we prevent this from happening?
- random assignment
- matched pairs
- using spouses because the other individual is likely similar age, and living in the same environment
How do we avoid placebo effect?
- use of a placebo group to avoid confusion
2. waitlist control group
What is the reactive adaptive participant bias?
Ask tasks get more difficult, if participants are uncertain about their answer in a task, they will give an answer they are not sure of. Sometimes they will answer incorrectly on purpose, so the questions stay at the same difficulty level.
How do we avoid this type of bias?
Do not always increase difficulty after every correct answer.
What is experimenter bias?
Experimenters who know the treatments may treat participants in different conditions differently or interpret data differently.
How do we avoid experimenter bias?
- Repeated measures design
- Single-blind or double blind
- Automated presentation of conditions and recording of data (i.e., use of a computer program)
What are complex experimental deisngs?
IVS with more than two levels, or designs with more than one IV.
Why do we use multiple levels of IV?
- Detect non-linear relationships between the IV and DV. For a curved and non-linear relationship we need at least 3 levels.
Rule out alternative explanations and eliminate confounds. Refer to music example
What is a 2 x 2 factorial design?
This experiment has 2 IV and 2 levels.
What is a marginal mean?
A marginal mean, shows the average of the main effects of the independent variable.
How do we determine main effects and interactions from factorial designs?
Main effect in A or B, have a difference in the averages
Interactions have a difference between the cells.
Main effects on graph you see parallel lines
Interactions - non parallel lines (crossing lines, converging, diverging)
Explain this 2x3x4 design
How many IVS? - 3 How many DVS? - 1 How many levels? - 2 + 3 + 4 = 9 How many main effects? - there can be three, one per IV How many interactions? AB,AC,BC,ABC
What are the differences between measurement error, systematic error, and random error?
Measurement error:
systematic error + random error
The extent that a measure is unreliable
Systematic error:
Created by faulty equipment or bias (the error is always the same amount each trial)
Decreases validity
Random error:
Errors are unpredictable and cannot be reproduced.
Decreases reliability
Describe variability and reliability in terms of an operational definition
Variability:
Does the operational definition measure the concept it’s supposed to?
Reliability:
Is the operational definition based on observable, objective behaviors?