Descriptive Statistics Flashcards

1
Q

What is the definition of descriptive statistics?

A

Descriptive statistics are methods for summarizing and organizing a group of scores to make them understandable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do descriptive statistics differ from inferential statistics?

A

Descriptive statistics summarize data, while inferential statistics generalize findings from a sample to a larger population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the key components of descriptive statistics?

A

Sample Distributions
Measures of Central Tendency
Measures of Variability
Data Visualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Central Tendency?

A

The central point of the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a sample distribution?

A

A summary of the distribution of scores for a variable, showing values and their frequencies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are common tools to summarize distributions?

A

Frequency tables
Bar charts
Histograms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the measures of central tendency?

A

Mode
Median
Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the advantages and disadvantages of using the mode?

A

Advantages: Unaffected by outliers, identifies the most common value, best for nominal data.
Disadvantages: Less sensitive to data distribution, not useful for small or uniform datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the advantages and disadvantages of using the median?

A

Advantages: Resistant to outliers, useful for skewed distributions.
Disadvantages: Ignores the exact values of all data, may not represent all observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the advantages and disadvantages of using the mean?

A

Advantages: Widely used, considers all data points.
Disadvantages: Affected by extreme values (outliers).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Bar Charts vs. Histograms.

A

Bar Charts:
Represent categorical data.
Bars are separated with spaces.
Can be arranged in any order.
Example: Comparing different product sales.

Histograms:
Represent numerical (continuous) data.
Bars touch each other to show continuity.
Values are grouped into intervals (bins).
Example: Distribution of student test scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a normal distribution?

A

A symmetrical, bell-shaped curve where most data points cluster around the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the key features of a normal distribution?

A

Equal values above and below the mean
Symmetry
Defined by mean and standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a positively skewed distribution?

A

A distribution where the tail extends to the right, indicating more low values and a few high outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a negatively skewed distribution?

A

A distribution where the tail extends to the left, indicating more high values and a few low outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is skewness?

A

A measure of the asymmetry of a distributionā€™s shape.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is kurtosis?

A

A measure of the ā€˜tailednessā€™ or sharpness of the peak of a distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the measures of variability?

A

Range
Interquartile Range (IQR)
Variance
Standard Deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How is range calculated, and what are its pros and cons?

A

Formula: Range = Max value - Min value
Pros: Simple to calculate.
Cons: Sensitive to outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the interquartile range (IQR), and why is it useful?

A

The IQR measures the range of the middle 50% of data, reducing the impact of outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the formula for IQR

A

IQR = Q3-Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the formula for deviation?

A

Deviation = Xāˆ’ XĖ‰
Where:
X = individual data point

Ė‰
X = mean of the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the formula for variance?

A

sĀ² = āˆ‘(Xāˆ’ XĖ‰)Ā² / N ā€“> the average squared deviation form the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does standard deviation represent?

A

The average deviation of data points from the mean. The square root of the variance. S=SQ of SĀ²

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How do you interpret standard deviation values?

A

Small SD: Data is tightly clustered around the mean.
Large SD: Data is widely spread out.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What are the key features of box plots?

A

Median line
IQR box
Whiskers extending to 1.5x IQR
Outliers shown as dots

27
Q

What are quartiles in descriptive statistics?

A

Quartiles divide data into four equal parts:
Q1 (25th percentile)
Q2 (50th percentile/median)
Q3 (75th percentile)

28
Q

How can variability be visualized?

A

Through histograms, box plots, and frequency distributions.

29
Q

What is the importance of descriptive statistics in psychology?

A

They help in summarizing psychological data and identifying patterns for further analysis.

30
Q

What is Variability?

A

It measures the spread of scores around the mean in a data set and provides insights into whether data points are tightly clustered or widely dispersed.

31
Q

What does a low variability dataset indicate?

A

That data points are closely clustered around the mean, indicating consistency.

32
Q

What does a high variability dataset indicate?

A

That data points are widely dispersed, indicating inconsistency.

33
Q

How can descriptive statistics support evidence-based practice?

A

By providing clear summaries of data trends to guide decision-making and interventions.

34
Q

What is a frequency table?

A

A table that lists data values and their corresponding frequencies (number of occurrences).

35
Q

What is a histogram, and how does it differ from a bar chart?

A

A histogram displays numerical data using adjacent bars, while a bar chart represents categorical data with separated bars.

36
Q

What is a stem-and-leaf plot?

A

A graphical representation that displays data values while preserving the original data points.

37
Q

What is the formula for calculating the mean?

A

š‘‹Ė‰ = āˆ‘š‘‹ / š‘, where š‘‹ represents individual data points and š‘ is the number of values.

38
Q

What type of data is best summarized using the median?

A

Ordinal or skewed data.

39
Q

Why is the mean preferred for normal distributions?

A

Because it accounts for all values and provides a balanced measure of central tendency.

40
Q

What is the formula for standard deviation?

A

š‘  = āˆ‘(š‘‹ āˆ’ š‘‹Ė‰)Ā² / š‘.

41
Q

What are the effects of outliers on descriptive statistics?

A

Outliers can significantly impact the mean and standard deviation but have little effect on the median and IQR.

42
Q

What are the properties of a symmetric distribution?

A

The mean, median, and mode are all equal.

43
Q

How can skewness affect the interpretation of data?

A

Skewness indicates the direction of the data tail and can impact the choice of central tendency measure.

44
Q

What is the significance of the coefficient of variation (CV)?

A

It measures relative variability by expressing the standard deviation as a percentage of the mean.

45
Q

When should the range not be used to summarize variability?

A

When there are extreme outliers, as it can give a misleading impression of data spread.

46
Q

What are the differences between population and sample variance?

A

Population variance uses š‘, while sample variance uses š‘āˆ’1 to correct for bias in estimating the population.

47
Q

What does a box plot reveal about a dataset?

A

It shows the spread, median, potential outliers, and overall distribution of data.

48
Q

What are percentiles, and how are they used in descriptive statistics?

A

Percentiles indicate the position of a value relative to the entire dataset, often used in standardized testing.

49
Q

What is the difference between absolute and relative frequency?

A

Absolute frequency counts occurrences, while relative frequency expresses them as percentages.

50
Q

What is a cumulative frequency distribution?

A

A running total of frequencies that shows the number of values below a given level.

51
Q

What does a small interquartile range (IQR) suggest?

A

That the data points are closely packed around the median.

52
Q

What does it mean if the mean is greater than the median?

A

The data is positively skewed.

53
Q

How can descriptive statistics assist in exploratory data analysis (EDA)?

A

They help detect patterns, trends, and outliers before conducting further analysis.

54
Q

What is a common misinterpretation of the mean in skewed distributions?

A

Assuming it represents a typical value, which may not be true in asymmetrical distributions.

55
Q

What is a trimmed mean?

A

A mean calculated after removing extreme values to reduce the influence of outliers.

56
Q

What is the role of descriptive statistics in hypothesis testing?

A

They provide a summary and understanding of the data before applying inferential methods.

57
Q

How does standard deviation relate to the normal distribution?

A

It determines the spread of data around the mean and helps identify proportions using the empirical rule.

58
Q

What is the impact of sample size on descriptive statistics?

A

Larger samples provide more reliable estimates, while smaller samples can lead to greater variability.

59
Q

How can outliers be detected using descriptive statistics?

A

Through methods like box plots, Z-scores, and IQR rule (1.5 x IQR).

60
Q

What is the difference between univariate and bivariate descriptive statistics?

A

Univariate describes a single variable, while bivariate examines relationships between two variables.

61
Q

What is the purpose of standardizing data?

A

To compare datasets with different units or scales by converting values into standard scores (Z-scores).

62
Q

Why is data cleaning important before descriptive analysis?

A

To ensure accuracy by removing errors, missing values, and inconsistencies in the dataset.

63
Q

What is the Pareto principle in data analysis?

A

The idea that 80% of effects come from 20% of causes, useful in identifying key trends in data.