Semester 1 Recap Flashcards
What is replicability and reproducibility?
Replicability - The ability of a scientific experiment or trial to be repeated to obtain a consistent result. Research is replicable when the researcher collects new data to arrive at the same scientific findings as a previous study.
Reproducibility - When the original researcher’s data and computer codes are used and regenerate the same results.
What is a theory?
An explanation of a particular behaviour or phenomenon, typically based on scientific research
What is a hypothesis?
A specific prediction about a behaviour or phenomenon that can be tested in a scientific research project
Independent variable (IV)
The variable that is manipulated by the experimenter (under the experimenter’s
control). May be naturally occurring
Dependent variable (DV)
The variable that is measured. An outcome variable that changes as a result of the IV.
E.g. reaction times, IQ scores, personality scores, body fat percentage, etc.
Control variables
The variables that are kept controlled and constant throughout the study so that they don’t interfere with the dependent variable
Extraneous/Confounding variables
Other variables that might have an effect on the relationship between the IV and DV. We try to control these variables.
Nominal data
Categorise data by labelling them in groups, no order between the categories. No numerical properties. E.g., city of birth, gender, ethnicity, car brands, marital status.
Ordinal data
Categorise and rank data in an order, cannot say anything about the intervals between the rankings. E.g., top 5 Olympic medalists, language ability, likert-type questions
Interval data
Ordered scale, intervals between units of measurement are all equal. No absolute zero. E.g., test scores, personality inventories, temperature in Fahrenheit or Celsius
Ratio data
Scale with equal intervals and an absolute zero. Negative values not possible. E.g., height, age, weight, temperature in Kelvin
Experimental design - Independent groups and repeated measures
Researcher manipulates independent variable(s) and measures the outcome (dependent variable). E.g., T-test, Analysis of variance (ANOVA).
Independent groups (between subjects) - pps randomly assigned to different conditions/groups
Repeated measures (within subjects) - each pp takes part in all conditions
Quasi-experimental design
Independent groups design in which pps are not randomly allocated to different conditions of the IV (non-random criteria). IVs that we cannot directly manipulate - gender, smoker, religion, genetics, etc.
Correlational design
Limitations of a correlational design
Measures the association (relationship) between variables. No independent variables. E.g., correlation, regression.
Limitations - Cannot infer causation from correlations, extraneous various may cause changes in both measured variables.
Categorical design
Measures nominal variables (frequency). E.g., Chi-Square
Population and samples
Summary values of population and samples
Population - A group that have something in common that is of interest for the research question.
Sample - Smaller group of members of a population selected to represent that group
Summary values:
- Parameters describe populations (e.g., the mean of a population is a parameter)
- Statistics describe samples (e.g., the mean value of a sample is a statistic)
Sampling error
When an analyst does not select a sample that represents the entire population of data, so the results found in the sample do not represent the results that would be obtained from the entire population.
Can be reduced by increasing same size and using random sampling
Sampling distribution
The distribution of sample statistics (e.g., means, proportions) obtained from multiple random samples of the same size from a population.
Provides information about the variability and characteristics of the sample statistics.
The shape of the sampling distribution depends on the population distribution and the sample size
Central limit theorem
States that, regardless of the shape of the population distribution, the sampling distribution of the sample mean approaches a normal distribution as the sample size increases.
A fundamental concept in statistics and allows for the use of normal distribution-based inference methods even when the population distribution is not normal.
Applicable when the sample size is sufficiently large (typically n ≥ 30) or when the population distribution is approximately normal.
Standard errors
Measures of the variability or uncertainty associated with a sample statistic.
Quantify the average amount of sampling error expected in estimating a population
parameter.
Directly related to the standard deviation.
SE reflects the precision of the sample statistic estimate, while the standard deviation describes the variability of the individual data point
Measures of central tendency
Value that represents a typical (or central) score in a dataset.
Mean, Median, Mode
Mean
Arithmetical value. Sum of scores divided by number of scores.
Used on interval or ratio data
Advantages - uses all values, is sensitive
Disadvantages - influenced by outliers
Median
Middle score when data points are arranged from smallest to largest.
Used on interval, ratio, and ordinal data
Advantages - not affe ted by outliers
Disadvantages - does not use all data, not very sensitive
Mode
Most frequently occurring score
Used on nominal data
Advantages - only measure that can be used for categorical data
Disadvantages - possible to have more than one mode or no mode - not vey useful
Measures of dispersion
Value that describes how spread out a set of data is.
Range, Interquartile range, Variance, Standard deviation
Range
Highest score minus lowest score
Disadvantage - only based on two scores, ignored other information
Interquartile range (IQR)
Put all the scores in order, from smallest to largest:
3, 4, 4, 5, 5, 6, 7, 7, 10, 12, 13, 15, 16, 16, 18
Divide set of scores into four equal parts (quarters):
3, 4, 4, 5 (lower quartile), 5, 6, 7, 7 (median), 10, 12, 13, 15 (upper quartile), 16, 16, 18
The middle two quarters (50%) represent the IQR:
IQR = 15 – 5 = 10
Disadvantage - not based on all observations, first 25% and last 25% are ignored
Variance
How much the values in the dataset deviate from the mean
Find the differences between each number in the data set and the mean, square the differences to make them positive, divide the sum of the squares by the number of values in the data set.
Variance = Sum of the squared deviations ÷ (N - 1). E.g.:
0.14 + 2.64 + 5.64 + 0.14 + 2.64 + 5.64 + 0.39 + 2.64 = 19.87
19.87 ÷ (8-1) = 2.84
Disadvantage - doesn’t describe the amount of variability in the same units as the original data (due to squaring the values)
Standard deviation (SD)
Square root of the variance. Most popular measure of dispersion.
Large SD - scores are spread out
Small SD - scores are close to the mean
Graphical representations
Tools used to present data in a visual format, making it easier to understand and interpret complex information. E.g., box plots and histograms
Box plot
Summarise the distribution of a dataset using key summary statistics, such as the minimum, maximum, quartiles, and outliers.
Helpful for comparing distributions and identifying outliers.
Histogram
Depict the distribution of numerical data by dividing it into intervals or bins and showing the frequency or count of data points falling into each bin.
Make it easier to identify whether data is unimodal, bimodal, or skewed
Outlier
A data point/ observation that differs significantly from other observations; an observation that lies an abnormal distance from other values in a random sample from a population
Normal distribution
Also known as Bell Shaped Curve
- Mean = mode = median
- The mean divides the data in half
- Symmetric
- Unimodal curve (i.e., one peak)
- The curve approaches, but never touches,
the x-axis
Skewed distribution
Neither symmetric nor normal because the data values trail off more sharply on one side than on the other