Stats Flashcards

1
Q

Process of Research in Conducting Statistics

A
  • First determine average results
  • Then individual variations
  • Then ethical reporting - full disclosure is crucial for accurate interpretation, giving other researchers the chance to replicate the study
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The Ethical Imperative: Why Understanding Stats Matters

A
  • Transparency and accountability
  • Advancing the field Ethical data practices
  • Ethical Data Practices are crucial for maintaining public trust and to avoid misrepresentation and unintended bias of psychological traits
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Implications of Misreporting

A
  1. Overgeneralisation - misleading one-size-fits-all impression of therapy effectiveness
  2. Patient harm - wasted time on ineffective treatments
  3. Research mistrust - damages credibility of psychological studies
  4. Ethical responsibility - researchers must present complete picture, including limitations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Measures of Central Tendency

A
  • mean, the most common, however others might be more appropriate
  • median
  • mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Measures of Dispersion/Variability

A
  • Range
  • Variance
  • Standard deviation
  • Interquartile range
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Histograms and Bar Charts

A
  • Graphs for understanding data
  • How often the data appears - histogram
  • Compare the magnitude of different categories - bar charts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Boxplots

A

Give median value, interquartile range and the spread of data

  • Can reveal key characteristics such as presence of skewness, extent of variability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Scatterplots and Correlations

A
  • Relationships between two variables
  • Trends, clusters and outliers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Importance of Data Cleaning and Preparation

A
  • Identifying error - mistakes that might skew the data
  • Handling missing data - appropriate methods to handle them
  • Standardised formats - all data is the same format so it can be compared
  • Transforming variables - apply necessary transformations to meet stat assumptions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Strategies for Handling Missing Data

A
  • Imputation - replace missing values with estimate based on patterns in the existing data
  • Listwise deletion - remove any cases with missing data (this can reduce statistical power and introduce bias if the missingness is not random)
  • Multiple imputation - generate multiple plausible values for each missing data point to account for uncertainty, then pool the results
  • Analysis of missingness - investigate the patterns and mechanisms behind missing data to select the most appropriate handling method
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Interpreting Descriptive Stats

A
  1. Visualising the data - through patterns, outliers and relationships in graphs etc
  2. Contextual interpretation - understanding real-world implications of descriptive statistics
  3. Practical significance - evaluating the magnitude of its effects
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Ethical Considerations in Data Presentation

A
  • Transparency
  • Avoiding bias
  • Context matters
  • Responsible reporting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Avoiding Common Pitfalls in Descriptive Stats

A
  • Misinterpreting visualisations
  • Choosing inappropriate analyses
  • Data entry errors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Practical Applications of Descriptive Stats

A
  • Research design
  • Psychological assessment
  • Intervention evaluation
  • Data visualisation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The Data Analysis Process

A
  1. Collect
  2. Organise
  3. Analyse
  4. Interpret
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
  1. Collect
A
  • Experimental measurements
  • Behavioural observations
  • Psychological test scores

Survey responses

  • End up with spreadsheets
  • Everything in row one is for participant 1 and so forth
  • Columns incorporate different variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q
  1. Organise
A
  • Median, mean, minimum and maximum
  • Summarise data to find averages → the first step
  • Not interested in individual data, but summative data
  • Box plots, histograms, relationships of variables (scatter plots)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q
  1. Analyse
A
  • Descriptive statistics, differential statistics, to put these into words for specific variables, making sense of the spreadsheets
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q
  1. Interpret
A
  • What do these numbers mean for the research, what does it suggest
  • Must be done accurately
  • “This suggests…”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Quantitative Variables

A

measurable quantities like age, height, test scores (anywhere within a range)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Qualitative Variables

A

descriptive categories such as gender, eye colour, mood

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Types of Data

A
  • Numerical
  • Categorical (grouping data)
  • Ordinal (ranked data like likert scales)
  • Continuous (infinitely divisible data like reaction time)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Nominal Scale - Identity

A
  • Used for categorical variables,
  • Numbers are arbitrary, acting as labels instead of names, they indicate difference, not size or order
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Ordinal Scale - Identity + Order

A
  • Scores can be ranked/ordered
  • Indicate differences and scale
  • Nothing more than rank order
  • No objective distance between any two points on the scale
  • Not measurable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Interval Scale - Identity + Order + Equal Unit Size

A
  • Allow us to separate objects or events into mutually exclusive categories, in an order, and with specific distances
  • Indicate differences, scale, interval length and size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Ratio Scale

A

identity + order + equal unit size + true zero point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Discrete Variables

A

Data are comprised of indivisible units, represented by whole numbers

  • number of children
  • errors on a true/false test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Continuous Variables

A

Data involve numbers that can be divided

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Measure of Variability

A

indicates the degree to which scores are either clustered or spread out in a distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Range

A

difference between lowest and the highest score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

SD

A

Average movement from the middle of the distribution

  • Most commonly used measure
  • How different from the mean the individual scores may be
  • Average of these deviations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Steps to Calculate the SD

A
  • Step 1: calculate the mean
  • Step 2: find the average of the difference of mean from each individual score (x1 - M) (this is the deviation)
    Mean for the deviation code is zero
  • Step 3: squared deviation (x1 - M)2
  • Step 4: find the mean of the squared deviation (known as the variance)
  • Step 5: square root of the variance
  • This is done because the variance is not in the same measurements as all the scores are (it is much larger), and so the standard deviation is something that can compare between the scores much better
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

How are SD and Variance Different

A
  • Both measures of variability
  • Both used in inferential statistics
  • Similar formula
  • Standard deviation: presents measure in original units
  • Variance: presents measure in squared units
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Data Collection Commandments

A
  1. Think about the type of data required to answer the question
  2. Where will you be collecting the data
  3. Make sure that the data collection form you are using is clear and easy to use
  4. Make a duplicate of the data files and keep it in a separate location
  5. Do not rely on other people to collect or transfer your data unless you have personally trained them and are confident that they understand the data collection process as well as you do
  6. Plan a detailed schedule of when and where you will be collecting your data
  7. As soon as possible cultivate possible sources of your participant pool
  8. Try to follow up on subjects who missed their testing session
  9. Never discard original data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Self-report Measures

A
  • Administered as questionnaires or interviews

Behavioural self-report measures

  • Unreliable
  • How often they may do something

Cognitive measures

  • What people think
  • Unreliable

Affective measures

  • How people feel
  • Unreliable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Types of Tests

A
  • Assess individual differences in various content areas

Personality tests

  • Often self-reported affective tests

Ability tests

  • Aptitude tests - measure an individual’s potential to do something
  • Achievement tests - measure an individual’s competence in an area
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Behavioural Measures

A
  • Observational measures
  • Involve some sort of coding system - a means of converting the observations to numerical data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Descriptive Statistics

A
  • Average score (central tendency)
  • Shape of the distribution
  • Width of the distribution
  • Organise data in tables and graphs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

The Median

A
  • Mid-point or central value
  • Divides the score in half
  • Not sensitive to outliers
  • Requires all scores to be placed in rank order
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Mode

A
  • Most frequently occurring category or score
  • Can be determined on all scales of measurement (nominal, ordinal, ratio, interval)
  • It is the only measure of central tendency that can be used for data measured on a nominal scale
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

When to use the Different Measures of Central Tendency

A

Mode

  • When the data are categorical in nature and values can fit into only one class (religion, hair colour)

Median

  • When there is extreme scores and don’t want to distort the mean

Mean

  • When data isn’t extreme and isn’t categorical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What Do Central Tendencies Look Like in a Symmetrical Unimodal Distribution

A

mode=median=mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Positive and Negative Skews

A

> 50% above mean and <50% below mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Why Care About Variable Types

A
  1. Different measurement approaches for different variables
  2. Different statistical tests most appropriate for analysis
  3. Different interpretation methods for correctly interpreting results and drawing accurate conclusions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Nominal Variables

A

categories with no natural order

Important for

  • Understanding patient choices
  • Analysing demographic patterns
  • Cultural differences in mental health
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Ordinal Variables

A

Ordered categories

Important for

  • Better understanding of patients subjective experience
  • May be useful in developing individualised treatments
  • Informs decision-making and further research
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Interval Scales

A
  • Equal distances between points
  • No true zero
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Ratio Scale

A
  • has true zero
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Memory Study Example to show the use of nominal variables and ratio scales

A

Independent variable: study method (nominal)

  • Visual learning
  • Auditory learning
  • Combined method

Dependent variable: recall score (ratio)

  • Number of words remembered
  • Response time in milliseconds
50
Q

Common Mistakes to Avoid with Variables and Scales

A

Treating ordinal as interval

  • Shouldn’t write “depression increased by 2 points on a mild/moderate scale” instead “depression severity increased from mild to moderate”

Inappropriate averages

  • Can’t average nominal data

Misleading comparisons

  • “Twice as anxious” only works with ratio scales
51
Q

Understanding Likert Scales

A
  • Fixed-choice rating scale designed to measure attitudes, opinions (subjective measure)
  • Consists of statement and then varying degrees to these statements
  • Rating in number points
  • 5 point scale

Difference response anchors like frequency, satisfaction, quality

  • Ordered responses
  • Balanced positive and negative options
  • Clear midpoint
  • Equal apparent intervals between options
52
Q

Part 1 of Analysis: Categorical Analysis

A
  1. Create frequency table
  2. Look at frequencies
53
Q

Part 2 of Analysis: Numerical Analysis

A
  1. Calculate the mean satisfaction score
  2. Calculate the standard deviation
54
Q

Frequency Distribution Graphs

A

→ show the relationship between score and frequency

Bar graphs

  • Categorical data, nominal and ordinal scale

Histograms

  • Numerical data, interval and ratio scale
  • Bar width: continuous variables (extends to the real limits of the category) and discrete variables (extends exactly half the distance to the adjacent category

Frequency polygons

  • Numerical data, interval and ratio scale
  • Large numbers
  • Compare sets of data with this
  • Cumulative frequency distribution
  • Changes over time
55
Q

Scattterplot

A
  • Bivariate numerical data
  • x,y pair
  • Negative, positive, no linear relationship
56
Q

Disadvantage of Stem and Leaf Plot

A

Not the best to present large data sets

  • Too many leaves for each stem
  • Create groupings that may affect clarity
57
Q

Characteristics of a Bar Graph

A
  • Categorical
  • Do not touch
  • Use to display differences in mean
58
Q

Characteristics of a Histogram

A
  • numerical
  • can touch
  • frequency distribution
59
Q

Key Differences between COUNT and COUNTA functions

A
  • COUNT: counts only cells with numerical values
  • COUNTA: counts all non-empty cells, including those with text, numbers or any other data
  • Counts column title too
60
Q

Percentile

A
  • Location of scores relative to the rest of the scores in the distribution
  • Your percentile in the distribution represents the position of your measurement in comparison with everyone else’s
  • It gives the percentage of the population that falls below you
  • 50th percentile, 50% of population falls below you
  • cf/n x 100 (cumulative frequency/number of individual scores)
61
Q

Percentile Rank

A

relative position of a given person in the group in reference to the trait being measured

62
Q

Percentile Score

A

score corresponding to a particular percentile rank

63
Q

Limitations of Percentile Measurement

A

equal differences do not reflect equal differences in actual scores

  • IQ 101 - IQ 100 → 52nd - 50th percentile
  • IQ 135 - IQ 128 → 99th - 97th percentile
  • distance between scores is not specified
64
Q

Z-score

A
  • A raw score or x value provides very little information about how that score compares with other values in the distribution
  • Z score transformation: the value of a z-score tells exactly where the score is located relative to all the other scores in the distribution

Transforms x score into new number so that

  1. The sign (+) or (-) tells us if the score is located above (+) or below (-) the mean, and
  2. The number tells the distance between the score and the mean in terms of the number of standard deviations
  3. Specifies the precise location of each raw score witin the distribution
65
Q

z-score =

A

z score = X-M/S

  • deviation divided by standard deviation
  • When something has different means and standard deviations you can’t compare the scores
  • Z-scores fix this problem
66
Q

Properties of Normal Distribution

A
  • Bell-shaped
  • Symmetrical
  • Mode, median and mean are the same value
  • 50% below and above the mean
  • Unimodal, one peak, one mode
  • Most of the observations are clustered around the centre of the distribution
  • When standard deviations are plotted along the x-axis, the percentage of scores falling between the mean and any point on the x axis is the same
67
Q

Kurtosis

A

how flat or peaked a normal distribution is; a degree of the degree of dispersion among the scores

  • Higher peak means there is more scores closer to the mean
  • Mean and standard deviation describe these peaks
68
Q

Z-scores

A
  • transforming ANY DISTRIBUTION of raw scores into Z-scores results in a distribution with a MEAN of 0 and a SD of 1
  • z-score quantifies the original score in terms of the number of standard deviations that the original raw score is from the mean of the distribution
  • a negative z-score means that the original score was below the mean. A positive z-score means that the original score was above the mean
69
Q

The Total Area Under the Curve Representing 100% of the Scores

A
  • z = -1 and z = +1 (SD of 1) covers approx 68% of scores
  • z = -2 and z = +2 (SD of 2) covers approx 95% of scores
  • z = -3 and z = 3 (SD of 3) covers approx 99% of scores
70
Q

Common Mistakes to Avoid When Interpreting the Percentile Rank

A
  • confusing percentile with percentage correct
  • thinking percentile tells us actual score
  • misunderstanding whether higher or lower percentiles are better
  • thinking 50th percentile means “halfway to maximum”
  • assuming percentile indicates absolute rather than relative measurement
71
Q

Real World Applications of Percentile Rank

A
  • standardised test scores (NAPLAN)
  • clinical assessments (IQ)
  • medical assessments
  • growth monitoring
72
Q

Probability

A

Defined as the expected relative frequency of a particular outcome

  • by knowing the makeup of population we can determine the probability obtaining specific samples
  • definition is accurate only for random samples
73
Q

Q1

A

25% of data falls below this point

74
Q

Q2

A

median, 50% of data falls below this point

75
Q

Q3

A

75% of data falls blow this point

76
Q

IQR

A

= Q3 - Q1

77
Q

Interpreting Box Plot Characteristics

A
  1. Symmetry
  • the symmetry of the box plot indicates the distribution’s skewness. A symmetric box plot suggests a normal distribution
  1. IQR
  • the size of the box represents the spread of the middle 50% of the data, providing insights into the data’s variability
  1. Whisker Length
  • the length of the whiskers indicates the range of the data, excluding outliers
78
Q

Comparing Data Sets Using Box Plots

A

→ side-by-side

  • allows for easy comparison of the distribution, median, and spread of multiple data sets

→ overlaid

  • Multiple overlapping on the same plot can highlight similarities and subtle differences in the data distribution

→ stacked

  • Stacked vertically can help visualise the relative positions and differences between the data sets for larger numbers of groups
79
Q

Bar Graphs - Advantages and Disadvantages

A
  • show the mean or total data
  • better for comparing categorical data or discrete counts
  • simple to understand for general audiences
  • cannot show outliers or data spread
80
Q

Boxplots - Advantages and Disadvantages

A
  • show median, quartiles (box edge), range (whiskers), outliers (individual data points)
  • better for comparing distributions
  • show data spread and sewness
  • excellent for spotting unusual patterns
  • more complex to interpret for general audiences
81
Q

Should you ever use multiple figures for the same data?

A

No, this makes it less concise and clear

82
Q

Q-Q Plot:

A
  • dotted line is the SD from the mean, where the normal range extends to
83
Q

Mixture of Normal Distributions

A
  • Multiple separate normal distributions placed together

Once these are combined, the outliers are no longer outliers anymore

  • Points that deviate from normality might not be true outliers
  • They could be valid data points from a different component of the mixture
  • E.g points around -2 and +2 SD are not true outliers – they are the centres of their respective distributions
84
Q

Importance of Outliers

A

Impact on analysis

  • Can influence mean, SD making them unreliable

Model performance

  • Causes models to overfit or perform poorly, leading to inaccurate predictions

Data quality

  • Can help detect errors, inconsistencies, etc
85
Q

Tools for Detecting Outliers

A
  1. Visual inspection - through figures
  2. Statistical methods - z-scores, IQR etc
  3. Domain expertise - understanding the content and identifying outliers that are unrealistic or unexpected based on domain knowledge
86
Q

Common Causes of Outliers

A
  • Measurement errors (equipment malfunction, faulty sensors)
  • Data entry errors (incorrect formatting, typographical)
  • Unusual events (unexpected occurrences)
87
Q

Strategies for Treating Outliers

A

Removal → deleting it if considered to be errors, replacing with more representative values, applying mathematical transformations to reduce the impact of outliers

88
Q

Clinical Responsibility

A
  • Might indicate persons needing immediate help
  • Removing data means removing important information
  • Balance statistical cleanliness with clinical reality
89
Q

Research Integrity

A
  • Document all decisions
  • Be transparent with outlier handling
  • Consider impact on conclusions
  • Report results with and without outliers
90
Q

Sampling Theory: Population and Sample

A

Sample - a portion of population that is actually measured

  • Summary properties or measures of sample values are called statistics
  • Concrete
  • Finite
  • Incomplete (set of people or entities)

Population - all items of interest

  • Called parameters
  • Abstract
  • Complete (all people or entities)
91
Q

The Law of Large Numbers

A
  • Large samples generally gives better information
  • More data= better information
  • Larger sample have M closer to the true population
92
Q

The Central Limit Theorem:

A

Ifyou take sufficiently large samples from a population, the samples’ means will be normally distributed, even if the population isn’t normally distributed

Ensuring that:
1. The distribution of sample means is normal

  1. The mean of all the samples would equal the population mean
  • the standard deviation of the sampling distribution (the sampling error) gets smaller as the sample size increases
  • the shape of the sampling distribution becomes normal as the sample size increases
93
Q

Frequency Distribution of Raw Scores

A
  • Is based on a real set of data
  • Each point on the x-axis represents a raw score and the height of the line represents how frequently that score occurred
  • The shape of the distribution can be normal but is often skewed or irregular
94
Q

Frequency Distribution of Sample Means

A
  • Based on hypothetical set of sample means
  • Each point on the x-axis represents a sample mean and the height of the line represents how frequently they are expected to occur
  • The shape of the distribution tends to be normal regardless of the distribution of the raw scores
  • The standard deviation of these means is called standard error
95
Q

Sample Error

A

Occurs when a sample that is not representative of the population being studied is selected

  • sample typically doesn’t provide a perfectly accurate representation of its population
  • there is some discrepancy (or error) between a statistics computed and the corresponding parameters
96
Q

Standard Error

A
  • in reference to the distribution of sample means
  • provides a measure of how much difference is expected from one sample to another
  • measures how well an individual sample represents the population mean
97
Q

Small VS Large Standard Error

A

small = the sample means are close together and have similar values

large = the sample means are distributed over wider range and there are large differences from one sample mean to another

98
Q

Hypothesis Testing

A
  1. Data-Driven Decision Making
  2. Statistical Inference
  3. Evidence-based Conclusions – determining validity
99
Q

Null Hypothesis Testing

A
  • Null hypothesis (H0) is that there is no effect or difference between groups being compared
  • Something we assume to be true at the beginning of a null hypothesis test, but the goal is to provide evidence against the H0
  • If we assume that the null hypothesis is true, what is the likelihood of our data turning out the way it has?
100
Q

Alternative Hypothesis (H1 o Ha)

A
  • A statement that there is an effect or difference between the groups compared
  • H0 is rejected
  • Can’t be statistically tested, so measuring against H0 is more important
101
Q

Types of Errors in Hypothesis Testing

A
  • Type 1 (false positive) - rejecting the null hypothesis when it is actually true
  • Type 2 (false negative) - failing to reject the null hypothesis when it is actually false
102
Q

Decision Rule

A

Where the line is drawn in terms of there being sufficient evidence from the data to reject the H0

  • a decision rule quantifies when we can say “it is unlikely for us to obtain this data if the null hypothesis is true, therefore it would be more reasonable to assert that the null hypothesis is false”
  • the decision is chosen by the experimenter (but guided by convention)

Rejecting the H0 as a consequence of applying a decision rule is known as a significance test

  • the test statistics is calculated differently depending on what kind of NHST is being carried out
103
Q

The Test Statistics

A

→ takes into account differences in scores due to the manipulation or factor of interest

→ considers differences in scores due to extraneous factors, that should have nothing to do with the factor of interest

104
Q

One and Two Tailed Tests

A
  • One-tailed: only sensitive to a difference in one direction
  • Two-tailed: sensitive to differences in either direction
  • One-tailed are more limited in the question they are asking but more sensitive to the presence of a difference (more statistical power)
105
Q

P-Values and Effect Sizes

A
  • A lower p-value is desirable because it implies a conclusion that rejects the null hypothesis as less likely to be an error
  • This is what is meant when papers refer to a difference or effect that is “highly significant”
  • It does not necessarily imply a large effect size
  • Effect size measures how big or important the difference is
106
Q

Confidence Intervals

A

Estimate the range within which the true population parameter is likely to fall. They provide a measure of uncertainty around our sample estimate

  • Set of values that range between an upper and lower limit
    A certain level of confidence that the confidence interval contains the population parameter of interest
  • Unlike tests, confidence intervals can tell us something about the size of the effect in the population
  • Mean might equal 71 but the confidence intervals ranges from 69-73, where the researchers are most confident that the true mean is
107
Q

Calculating Confidence Intervals

A
  • SE (standard error) = SD / Square root of sample size
  • SE tells how precise our sample mean is, how much does the sample mean differ from the population mean
108
Q

Small Sample

A
  • Less reliable estimate
  • Larger standard error
109
Q

Large Sample

A
  • More reliable estimate
  • Smaller standard error
110
Q

Difference between Confidence Intervals and p-values

A
  • P-values indicate the probability of obtaining the observed results under the null hypothesis (number describing the likelihood of obtaining the observed data if it were to be tested again)
  • Confidence intervals provide a range of plausible values for the population parameter, offering a more informative picture (the range of values that would contain the true population 95% of the time when it is completed)
  • If the confidence interval contains 0 the difference is not significant
111
Q

Interpreting Confidence Intervals

A

If the confidence interval covers 95% then there is a 95% chance that the confidence interval will hold the true population mean

  • Overlap - this suggests that you cannot confidently conclude a statically significant difference
  • Non-overlap - this suggests that you can conclude a statistically significant difference between groups
112
Q

Factors affecting sample size

A

larger samples lead to narrower intervals, providing more precise estimates

113
Q

Factors affecting confidence level

A

higher levels result in wider confidence intervals

114
Q

Factors affecting population variability

A

higher variability leads to wider intervals, indicating greater uncertainty

115
Q

Type 1 Inferential Error

A

Rejecting null hypothesis when the null hypothesis is in fact true

116
Q

Type 2 Inferential Error

A

Retaining hypothesis, when it is still false

117
Q

Imputation

A
  • Imputation - replace missing values with estimate based on patterns in the existing data
118
Q

Listwise Deletion

A
  • Listwise deletion - remove any cases with missing data (this can reduce statistical power and introduce bias if the missingness is not random)
119
Q

Multiple Imputation

A
  • Multiple imputation - generate multiple plausible values for each missing data point to account for uncertainty, then pool the results
120
Q

Analysis of Missingness

A
  • Analysis of missingness - investigate the patterns and mechanisms behind missing data to select the most appropriate handling method