descriptive statistics Flashcards

1
Q

Nominal Data

A

Categorical data with no inherent order or ranking
e.g Gender, race, eye color

Gender, race, eye color.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Ordinal Data

A

Categorical data with a natural order or ranking
e.g. Likert scale (e.g., strongly agree, agree, neutral, disagree, strongly disagree).

Ordinal is ordered; it’s like ranks or grades.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Interval Data

A

Numerical data where the difference between values is meaningful
e.g. Temperature (in Celsius or Fahrenheit), IQ scores

Interval is in between; it has equal intervals but no true zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Ratio Data

A

Numerical data where both the difference between values and the ratio of values are meaningful, and there is a true zero point

e.g. Height, weight, time.

Ratio is the real deal; it has equal intervals and a true zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Plotting Nominal Data

A

Bar charts, pie charts.
Displaying frequencies or proportions of categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Plotting Ordinal Data

A

Bar charts, histograms.
Showing distributions or frequencies of ordered categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Plotting Interval and Ratio Data

A

Histograms, line graphs.
Visualizing distributions or trends over time.
Box plot
Suitable for displaying distributions and comparing groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Summary Measures

A

Nominal: Mode (most frequent category).
Ordinal: Median (middle value).
Interval and Ratio: Mean (average), standard deviation (spread of data).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Statistical Tests and Data

A

Nominal: Chi-square test.
Ordinal: Spearman’s rank correlation.
Interval and Ratio: t-test, ANOVA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Visualization of Data

A

Box plots: Useful for displaying ordinal, interval, and ratio data distributions.
Scatter plots: Suitable for exploring relationships between interval and ratio variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Test-retest reliability

A

Test-retest reliability compares individuals’ scores on the same test at different times to assess the consistency of the measurement over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Problem with Re-test

A

Situational changes, such as therapy or environmental factors, can influence individuals’ scores between test administrations, leading to inaccurate reliability estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to Screen Data

A

Visual inspection: Reviewing individual data points, summary statistics, and graphical representations (e.g., histograms, box plots).
Statistical tests: Performing formal statistical tests to identify outliers or assess data distribution (e.g., tests for normality, skewness, kurtosis).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Data Screening

A

The process of examining data for errors, outliers, or other issues before conducting statistical analysis.
Detect errors or anomalies that may affect the validity or reliability of the analysis.
Ensure data quality and integrity before proceeding with further analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Skewness

A

Measures the asymmetry of the distribution of data around the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Positive Skewness

A

the mean of the data is greater than the median
a large number of data-pushed on the right-hand side.
Tail extends to the right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Negative Skewness

A

Tail extends to the left, with more extreme values on the left side of the distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Kurtosis

A

Measures the peakedness or flatness of the distribution of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Leptokurtic Kurtosis

A

High kurtosis, indicating a distribution with heavy tails and a sharp peak.

19
Q

Mesokurtic Kurtosis

A

Normal kurtosis, similar to a normal distribution

20
Q

Platykurtic Kurtosis

A

Low kurtosis, indicating a distribution with lighter tails and a flatter peak.

21
Q

Normality

A

Refers to the assumption that data are normally distributed, meaning they follow a bell-shaped curve with a symmetrical distribution around the mean.
Many statistical tests assume normality, so it’s crucial to check if the data meet this assumption before proceeding with analysis.

22
Q

Methods to Assess Normality

A

Visual inspection: Histograms, Q-Q plots.
Statistical tests: Shapiro-Wilk test, Kolmogorov-Smirnov test, Anderson-Darling test.

23
Q

Histogram and Normality

A

A variable that is normally distributed has a histogram (or “density function”) that is bell-shaped, with only one peak, and is symmetric around the mean.

24
Q

Production of Outliers in Variables

A

Natural Variations: Random chance or genetic factors may lead to extreme values in variables.
Errors in Measurements: Mistakes or inaccuracies during data collection processes can result in outliers.
Deliberate Introduction: Intentional manipulation of data for specific outcomes, such as fraud or bias in research.

25
Q

Sampling Distribution of Mean Differences

A

Distribution of the differences between sample means from two independent groups.
Provides information about the variability of mean differences in repeated sampling.
Used to calculate the standard error of the mean difference for the t-test.

26
Q

Effect Size Measure: Cohen’s d

A

Small effect size: Typically around 0.2, indicating a small difference between the means.
Medium effect size: Typically around 0.5, indicating a moderate difference between the means.
Large effect size: Typically around 0.8, indicating a substantial difference between the means.

27
Q

t-tests

A

Statistical tests used to determine if there is a significant difference between the means of two groups.
Types:
Independent: Used to compare means of two independent groups (e.g., treatment vs. control).
Paired : Used to compare means of two related groups (e.g., pre-test vs. post-test).

If the p-value is less than the chosen significance level (usually 0.05), the difference between the means is considered statistically significant.

real effect or just random variation.

28
Q

Symmetrical Normal Distribution:

A

Median: Equal to the mean
Mode: Equal to the median and mean
Skewness: 0 (symmetrical distribution)
Kurtosis: 0 (normal distribution)

29
Q

Null Hypothesis in Hypothesis Testing

A

Assumption of no effect or no difference.
Basis for hypothesis testing; aim is to reject or fail to reject the null hypothesis based on data.

30
Q

p-value in Independent Samples t-test

conclusion of 0.03

A

A p-value of 0.03 indicates a statistically significant difference between the groups.
Reject the null hypothesis; there is likely a true difference in reaction times between the two groups.

31
Q

Conducting Independent Samples t-test:

A

Steps: Calculate t-statistic, degrees of freedom, and compare with critical value.
Conclusion: Based on comparison, state whether to reject or fail to reject the null hypothesis.

32
Q

Calculation of A Priori Power

A

A way to estimate the likelihood of detecting a true effect in a study before it’s conducted.
It’s calculated based on three main factors: the significance level (often denoted as alpha), the sample size, and the effect size.

33
Q

Purpose of A priori power

A

Helps researchers determine if their planned sample size is large enough to detect the effect size they’re interested in.

If a researcher wants to study the effect of a new drug on reducing blood pressure and aims for a power of 0.80, they may need to calculate the necessary sample size based on the expected effect size and the desired level of significance.

34
Q

Descriptive Statistics

A

Provides summary statistics (mean, median, etc.) for numerical variables.
Helps understand the central tendency and dispersion of data.

35
Q

Histogram

A

A graphical representation of the distribution of data, typically showing the frequencies of values in intervals.
Identifies outliers, checks data distribution, and assesses central tendency.

36
Q

68-95-99.7 rule

A

Approximately 68% of the data falls within one standard deviation of the mean.

Approximately 95% falls within two standard deviations of the mean.

Approximately 99.7% falls within three standard deviations of the mean.

37
Q

What would be the probability of randomly sampling an individual with a score of 0 or higher on a standard normal curve?

A

0.5.
This is because the standard normal curve is symmetric about its mean (which is 0), and about 50% of the area under the curve lies to the right of the mean (scores greater than 0) and 50% lies to the left (scores less than 0).

38
Q

What percentage of individuals fall between -1 and +1 SD units below and above the mean in a normal curve?

A

Approximately 68% of the data falls within one standard deviation of the mean.
Thus, 34% of the data falls between the mean and -1 standard deviation, and 34% falls between the mean and +1 standard deviation.

39
Q

The standard error of the mean (SEM)

A

a measure of the precision of the sample mean estimate. It represents the standard deviation of the sampling distribution of the sample mean.
it tells us how much the sample mean is likely to vary from the true population mean.

40
Q

SEM Equation

A

SEM = standard deviation / square root of the sample size

41
Q

What is the standard error of the mean (SEM)? Will the SEM normally be smaller or larger than the standard deviation? Why, or why not?

A

In larger sample sizes, the SEM will be smaller because the standard deviation is being divided by a larger number resulting in a smaller value. Conversely, in smaller sample sizes, the SEM will be larger because the standard deviation is being divided by a smaller number, resulting in a larger value.

42
Q

In a normally distributed dataset with a mean of 50 and a standard deviation of 10, calculate the z-score for a data point of 60.

A

Z= (X−μ) / σ

(60 – 50) / 10

43
Q

Describe how you would standardize a variable using z-scores and why this might be advantageous in statistical analysis.

A

Z-scores are used to standardise data points so that the mean is 0 and the standard deviation is 1.
This is done by subtracing the mean from the individual data point and then dividing by the standard deviation.

data points can be easily compared on the same scale.
standardised variables can normalise the data, making it easier to interpret and well as following the assumptions of t-test.

44
Q

Z-test statistic

A

Z = sample mean − population mean / σ / square root of n