Stats Flashcards

Question

Interval Scale - Identity + Order + Equal Unit Size

Answer 1

- Allow us to separate objects or events into mutually exclusive categories, in an order, and with specific distances - Indicate differences, scale, interval length and size

Answer 2

identity + order + equal unit size + true zero point

Answer 3

Data are comprised of indivisible units, represented by whole numbers - number of children - errors on a true/false test

Answer 4

Data involve numbers that can be divided

Answer 5

indicates the degree to which scores are either clustered or spread out in a distribution

Answer 6

difference between lowest and the highest score

Answer 7

Average movement from the middle of the distribution - Most commonly used measure - How different from the mean the individual scores may be - Average of these deviations

Answer 8

- Step 1: calculate the mean - Step 2: find the average of the difference of mean from each individual score (x1 - M) (this is the deviation) Mean for the deviation code is zero - Step 3: squared deviation (x1 - M)2 - Step 4: find the mean of the squared deviation (known as the variance) - Step 5: square root of the variance - This is done because the variance is not in the same measurements as all the scores are (it is much larger), and so the standard deviation is something that can compare between the scores much better

Answer 9

- Both measures of variability - Both used in inferential statistics - Similar formula - Standard deviation: presents measure in original units - Variance: presents measure in squared units

Answer 10

1. Think about the type of data required to answer the question 2. Where will you be collecting the data 3. Make sure that the data collection form you are using is clear and easy to use 4. Make a duplicate of the data files and keep it in a separate location 5. Do not rely on other people to collect or transfer your data unless you have personally trained them and are confident that they understand the data collection process as well as you do 6. Plan a detailed schedule of when and where you will be collecting your data 7. As soon as possible cultivate possible sources of your participant pool 8. Try to follow up on subjects who missed their testing session 9. Never discard original data

Answer 11

- Administered as questionnaires or interviews Behavioural self-report measures - Unreliable - How often they may do something Cognitive measures - What people think - Unreliable Affective measures - How people feel - Unreliable

Answer 12

- Assess individual differences in various content areas Personality tests - Often self-reported affective tests Ability tests - Aptitude tests - measure an individual’s potential to do something - Achievement tests - measure an individual’s competence in an area

Answer 13

- Observational measures - Involve some sort of coding system - a means of converting the observations to numerical data

Answer 14

- Average score (central tendency) - Shape of the distribution - Width of the distribution - Organise data in tables and graphs

Answer 15

- Mid-point or central value - Divides the score in half - Not sensitive to outliers - Requires all scores to be placed in rank order

Answer 16

- Most frequently occurring category or score - Can be determined on all scales of measurement (nominal, ordinal, ratio, interval) - It is the only measure of central tendency that can be used for data measured on a nominal scale

Answer 17

Mode - When the data are categorical in nature and values can fit into only one class (religion, hair colour) Median - When there is extreme scores and don’t want to distort the mean Mean - When data isn’t extreme and isn’t categorical

Answer 18

mode=median=mean

Answer 19

>50% above mean and <50% below mean

Answer 20

1. Different measurement approaches for different variables 2. Different statistical tests most appropriate for analysis 3. Different interpretation methods for correctly interpreting results and drawing accurate conclusions

Answer 21

categories with no natural order Important for - Understanding patient choices - Analysing demographic patterns - Cultural differences in mental health

Answer 22

Ordered categories Important for - Better understanding of patients subjective experience - May be useful in developing individualised treatments - Informs decision-making and further research

Answer 23

- Equal distances between points - No true zero

Answer 24

- has true zero

Answer 25

Independent variable: study method (nominal) - Visual learning - Auditory learning - Combined method Dependent variable: recall score (ratio) - Number of words remembered - Response time in milliseconds

Answer 26

Treating ordinal as interval - Shouldn't write “depression increased by 2 points on a mild/moderate scale” instead “depression severity increased from mild to moderate” Inappropriate averages - Can’t average nominal data Misleading comparisons - “Twice as anxious” only works with ratio scales

Answer 27

- Fixed-choice rating scale designed to measure attitudes, opinions (subjective measure) - Consists of statement and then varying degrees to these statements - Rating in number points - 5 point scale Difference response anchors like frequency, satisfaction, quality - Ordered responses - Balanced positive and negative options - Clear midpoint - Equal apparent intervals between options

Answer 28

1. Create frequency table 2. Look at frequencies

Answer 29

1. Calculate the mean satisfaction score 2. Calculate the standard deviation

Answer 30

→ show the relationship between score and frequency Bar graphs - Categorical data, nominal and ordinal scale Histograms - Numerical data, interval and ratio scale - Bar width: continuous variables (extends to the real limits of the category) and discrete variables (extends exactly half the distance to the adjacent category Frequency polygons - Numerical data, interval and ratio scale - Large numbers - Compare sets of data with this - Cumulative frequency distribution - Changes over time

Answer 31

- Bivariate numerical data - x,y pair - Negative, positive, no linear relationship

Answer 32

Not the best to present large data sets - Too many leaves for each stem - Create groupings that may affect clarity

Answer 33

- Categorical - Do not touch - Use to display differences in mean

Answer 34

- numerical - can touch - frequency distribution

Answer 35

- COUNT: counts only cells with numerical values - COUNTA: counts all non-empty cells, including those with text, numbers or any other data - Counts column title too

Answer 36

- Location of scores relative to the rest of the scores in the distribution - Your percentile in the distribution represents the position of your measurement in comparison with everyone else’s - It gives the percentage of the population that falls below you - 50th percentile, 50% of population falls below you - cf/n x 100 (cumulative frequency/number of individual scores)

Answer 37

relative position of a given person in the group in reference to the trait being measured

Answer 38

score corresponding to a particular percentile rank

Answer 39

equal differences do not reflect equal differences in actual scores - IQ 101 - IQ 100 → 52nd - 50th percentile - IQ 135 - IQ 128 → 99th - 97th percentile - distance between scores is not specified

Answer 40

- A raw score or x value provides very little information about how that score compares with other values in the distribution - Z score transformation: the value of a z-score tells exactly where the score is located relative to all the other scores in the distribution Transforms x score into new number so that 1. The sign (+) or (-) tells us if the score is located above (+) or below (-) the mean, and 2. The number tells the distance between the score and the mean in terms of the number of standard deviations 3. Specifies the precise location of each raw score witin the distribution

Answer 41

z score = X-M/S - deviation divided by standard deviation - When something has different means and standard deviations you can’t compare the scores - Z-scores fix this problem

Answer 42

- Bell-shaped - Symmetrical - Mode, median and mean are the same value - 50% below and above the mean - Unimodal, one peak, one mode - Most of the observations are clustered around the centre of the distribution - When standard deviations are plotted along the x-axis, the percentage of scores falling between the mean and any point on the x axis is the same

Answer 43

how flat or peaked a normal distribution is; a degree of the degree of dispersion among the scores - Higher peak means there is more scores closer to the mean - Mean and standard deviation describe these peaks

Answer 44

- transforming ANY DISTRIBUTION of raw scores into Z-scores results in a distribution with a MEAN of 0 and a SD of 1 - z-score quantifies the original score in terms of the number of standard deviations that the original raw score is from the mean of the distribution - a negative z-score means that the original score was below the mean. A positive z-score means that the original score was above the mean

Answer 45

- z = -1 and z = +1 (SD of 1) covers approx 68% of scores - z = -2 and z = +2 (SD of 2) covers approx 95% of scores - z = -3 and z = 3 (SD of 3) covers approx 99% of scores

Answer 46

- confusing percentile with percentage correct - thinking percentile tells us actual score - misunderstanding whether higher or lower percentiles are better - thinking 50th percentile means "halfway to maximum" - assuming percentile indicates absolute rather than relative measurement

Answer 47

- standardised test scores (NAPLAN) - clinical assessments (IQ) - medical assessments - growth monitoring

Answer 48

Defined as the expected relative frequency of a particular outcome - by knowing the makeup of population we can determine the probability obtaining specific samples - definition is accurate only for random samples

Answer 49

25% of data falls below this point

Answer 50

median, 50% of data falls below this point

Answer 51

75% of data falls blow this point

Answer 52

1. Symmetry - the symmetry of the box plot indicates the distribution's skewness. A symmetric box plot suggests a normal distribution 2. IQR - the size of the box represents the spread of the middle 50% of the data, providing insights into the data's variability 3. Whisker Length - the length of the whiskers indicates the range of the data, excluding outliers

Answer 53

→ side-by-side - allows for easy comparison of the distribution, median, and spread of multiple data sets → overlaid - Multiple overlapping on the same plot can highlight similarities and subtle differences in the data distribution → stacked - Stacked vertically can help visualise the relative positions and differences between the data sets for larger numbers of groups

Answer 54

- show the mean or total data - better for comparing categorical data or discrete counts - simple to understand for general audiences - cannot show outliers or data spread

Answer 55

- show median, quartiles (box edge), range (whiskers), outliers (individual data points) - better for comparing distributions - show data spread and sewness - excellent for spotting unusual patterns - more complex to interpret for general audiences

Answer 56

No, this makes it less concise and clear

Answer 57

- dotted line is the SD from the mean, where the normal range extends to

Answer 58

- Multiple separate normal distributions placed together Once these are combined, the outliers are no longer outliers anymore - Points that deviate from normality might not be true outliers - They could be valid data points from a different component of the mixture - E.g points around -2 and +2 SD are not true outliers – they are the centres of their respective distributions

Answer 59

Impact on analysis - Can influence mean, SD making them unreliable Model performance - Causes models to overfit or perform poorly, leading to inaccurate predictions Data quality - Can help detect errors, inconsistencies, etc

Answer 60

1. Visual inspection - through figures 2. Statistical methods - z-scores, IQR etc 3. Domain expertise - understanding the content and identifying outliers that are unrealistic or unexpected based on domain knowledge

Answer 61

- Measurement errors (equipment malfunction, faulty sensors) - Data entry errors (incorrect formatting, typographical) - Unusual events (unexpected occurrences)

Answer 62

Removal → deleting it if considered to be errors, replacing with more representative values, applying mathematical transformations to reduce the impact of outliers

Answer 63

- Might indicate persons needing immediate help - Removing data means removing important information - Balance statistical cleanliness with clinical reality

Answer 64

- Document all decisions - Be transparent with outlier handling - Consider impact on conclusions - Report results with and without outliers

Answer 65

Sample - a portion of population that is actually measured - Summary properties or measures of sample values are called statistics - Concrete - Finite - Incomplete (set of people or entities) Population - all items of interest - Called parameters - Abstract - Complete (all people or entities)

Answer 66

- Large samples generally gives better information - More data= better information - Larger sample have M closer to the true population

Answer 67

Ifyou take sufficiently large samples from a population, the samples' means will be normally distributed, even if the population isn't normally distributed Ensuring that: 1. The distribution of sample means is normal 2. The mean of all the samples would equal the population mean - the standard deviation of the sampling distribution (the sampling error) gets smaller as the sample size increases - the shape of the sampling distribution becomes normal as the sample size increases

Answer 68

- Is based on a real set of data - Each point on the x-axis represents a raw score and the height of the line represents how frequently that score occurred - The shape of the distribution can be normal but is often skewed or irregular

Answer 69

- Based on hypothetical set of sample means - Each point on the x-axis represents a sample mean and the height of the line represents how frequently they are expected to occur - The shape of the distribution tends to be normal regardless of the distribution of the raw scores - The standard deviation of these means is called standard error

Answer 70

Occurs when a sample that is not representative of the population being studied is selected - sample typically doesn't provide a perfectly accurate representation of its population - there is some discrepancy (or error) between a statistics computed and the corresponding parameters

Answer 71

- in reference to the distribution of sample means - provides a measure of how much difference is expected from one sample to another - measures how well an individual sample represents the population mean

Answer 72

small = the sample means are close together and have similar values large = the sample means are distributed over wider range and there are large differences from one sample mean to another

Answer 73

1. Data-Driven Decision Making 2. Statistical Inference 3. Evidence-based Conclusions – determining validity

Answer 74

- Null hypothesis (H0) is that there is no effect or difference between groups being compared - Something we assume to be true at the beginning of a null hypothesis test, but the goal is to provide evidence against the H0 - If we assume that the null hypothesis is true, what is the likelihood of our data turning out the way it has?

Answer 75

- A statement that there is an effect or difference between the groups compared - H0 is rejected - Can’t be statistically tested, so measuring against H0 is more important

Answer 76

- Type 1 (false positive) - rejecting the null hypothesis when it is actually true - Type 2 (false negative) - failing to reject the null hypothesis when it is actually false

Answer 77

Where the line is drawn in terms of there being sufficient evidence from the data to reject the H0 - a decision rule quantifies when we can say "it is unlikely for us to obtain this data if the null hypothesis is true, therefore it would be more reasonable to assert that the null hypothesis is false" - the decision is chosen by the experimenter (but guided by convention) Rejecting the H0 as a consequence of applying a decision rule is known as a significance test - the test statistics is calculated differently depending on what kind of NHST is being carried out

Answer 78

→ takes into account differences in scores due to the manipulation or factor of interest → considers differences in scores due to extraneous factors, that should have nothing to do with the factor of interest

Answer 79

- One-tailed: only sensitive to a difference in one direction - Two-tailed: sensitive to differences in either direction - One-tailed are more limited in the question they are asking but more sensitive to the presence of a difference (more statistical power)

Answer 80

- A lower p-value is desirable because it implies a conclusion that rejects the null hypothesis as less likely to be an error - This is what is meant when papers refer to a difference or effect that is “highly significant” - It does not necessarily imply a large effect size - Effect size measures how big or important the difference is

Answer 81

Estimate the range within which the true population parameter is likely to fall. They provide a measure of uncertainty around our sample estimate - Set of values that range between an upper and lower limit A certain level of confidence that the confidence interval contains the population parameter of interest - Unlike tests, confidence intervals can tell us something about the size of the effect in the population - Mean might equal 71 but the confidence intervals ranges from 69-73, where the researchers are most confident that the true mean is

Answer 82

- SE (standard error) = SD / Square root of sample size - SE tells how precise our sample mean is, how much does the sample mean differ from the population mean

Answer 83

- Less reliable estimate - Larger standard error

Answer 84

- More reliable estimate - Smaller standard error

Answer 85

- P-values indicate the probability of obtaining the observed results under the null hypothesis (number describing the likelihood of obtaining the observed data if it were to be tested again) - Confidence intervals provide a range of plausible values for the population parameter, offering a more informative picture (the range of values that would contain the true population 95% of the time when it is completed) - If the confidence interval contains 0 the difference is not significant

Answer 86

If the confidence interval covers 95% then there is a 95% chance that the confidence interval will hold the true population mean - Overlap - this suggests that you cannot confidently conclude a statically significant difference - Non-overlap - this suggests that you can conclude a statistically significant difference between groups

Answer 87

larger samples lead to narrower intervals, providing more precise estimates

Answer 88

higher levels result in wider confidence intervals

Answer 89

higher variability leads to wider intervals, indicating greater uncertainty

Answer 90

Rejecting null hypothesis when the null hypothesis is in fact true

Answer 91

Retaining hypothesis, when it is still false

Answer 92

- Imputation - replace missing values with estimate based on patterns in the existing data

Answer 93

- Listwise deletion - remove any cases with missing data (this can reduce statistical power and introduce bias if the missingness is not random)

Answer 94

- Multiple imputation - generate multiple plausible values for each missing data point to account for uncertainty, then pool the results

Answer 95

- Analysis of missingness - investigate the patterns and mechanisms behind missing data to select the most appropriate handling method