Descriptive Statistics Flashcards
What is the definition of descriptive statistics?
Descriptive statistics are methods for summarizing and organizing a group of scores to make them understandable.
How do descriptive statistics differ from inferential statistics?
Descriptive statistics summarize data, while inferential statistics generalize findings from a sample to a larger population.
What are the key components of descriptive statistics?
Sample Distributions
Measures of Central Tendency
Measures of Variability
Data Visualization
What is Central Tendency?
The central point of the dataset.
What is a sample distribution?
A summary of the distribution of scores for a variable, showing values and their frequencies.
What are common tools to summarize distributions?
Frequency tables
Bar charts
Histograms
What are the measures of central tendency?
Mode
Median
Mean
What are the advantages and disadvantages of using the mode?
Advantages: Unaffected by outliers, identifies the most common value, best for nominal data.
Disadvantages: Less sensitive to data distribution, not useful for small or uniform datasets.
What are the advantages and disadvantages of using the median?
Advantages: Resistant to outliers, useful for skewed distributions.
Disadvantages: Ignores the exact values of all data, may not represent all observations.
What are the advantages and disadvantages of using the mean?
Advantages: Widely used, considers all data points.
Disadvantages: Affected by extreme values (outliers).
Bar Charts vs. Histograms.
Bar Charts:
Represent categorical data.
Bars are separated with spaces.
Can be arranged in any order.
Example: Comparing different product sales.
Histograms:
Represent numerical (continuous) data.
Bars touch each other to show continuity.
Values are grouped into intervals (bins).
Example: Distribution of student test scores.
What is a normal distribution?
A symmetrical, bell-shaped curve where most data points cluster around the mean.
What are the key features of a normal distribution?
Equal values above and below the mean
Symmetry
Defined by mean and standard deviation
What is a positively skewed distribution?
A distribution where the tail extends to the right, indicating more low values and a few high outliers.
What is a negatively skewed distribution?
A distribution where the tail extends to the left, indicating more high values and a few low outliers.
What is skewness?
A measure of the asymmetry of a distributionās shape.
What is kurtosis?
A measure of the ātailednessā or sharpness of the peak of a distribution.
What are the measures of variability?
Range
Interquartile Range (IQR)
Variance
Standard Deviation
How is range calculated, and what are its pros and cons?
Formula: Range = Max value - Min value
Pros: Simple to calculate.
Cons: Sensitive to outliers.
What is the interquartile range (IQR), and why is it useful?
The IQR measures the range of the middle 50% of data, reducing the impact of outliers.
What is the formula for IQR
IQR = Q3-Q1
What is the formula for deviation?
Deviation = Xā XĖ
Where:
X = individual data point
Ė
X = mean of the dataset
What is the formula for variance?
sĀ² = ā(Xā XĖ)Ā² / N ā> the average squared deviation form the mean.
What does standard deviation represent?
The average deviation of data points from the mean. The square root of the variance. S=SQ of SĀ²
How do you interpret standard deviation values?
Small SD: Data is tightly clustered around the mean.
Large SD: Data is widely spread out.
What are the key features of box plots?
Median line
IQR box
Whiskers extending to 1.5x IQR
Outliers shown as dots
What are quartiles in descriptive statistics?
Quartiles divide data into four equal parts:
Q1 (25th percentile)
Q2 (50th percentile/median)
Q3 (75th percentile)
How can variability be visualized?
Through histograms, box plots, and frequency distributions.
What is the importance of descriptive statistics in psychology?
They help in summarizing psychological data and identifying patterns for further analysis.
What is Variability?
It measures the spread of scores around the mean in a data set and provides insights into whether data points are tightly clustered or widely dispersed.
What does a low variability dataset indicate?
That data points are closely clustered around the mean, indicating consistency.
What does a high variability dataset indicate?
That data points are widely dispersed, indicating inconsistency.
How can descriptive statistics support evidence-based practice?
By providing clear summaries of data trends to guide decision-making and interventions.
What is a frequency table?
A table that lists data values and their corresponding frequencies (number of occurrences).
What is a histogram, and how does it differ from a bar chart?
A histogram displays numerical data using adjacent bars, while a bar chart represents categorical data with separated bars.
What is a stem-and-leaf plot?
A graphical representation that displays data values while preserving the original data points.
What is the formula for calculating the mean?
šĖ = āš / š, where š represents individual data points and š is the number of values.
What type of data is best summarized using the median?
Ordinal or skewed data.
Why is the mean preferred for normal distributions?
Because it accounts for all values and provides a balanced measure of central tendency.
What is the formula for standard deviation?
š = ā(š ā šĖ)Ā² / š.
What are the effects of outliers on descriptive statistics?
Outliers can significantly impact the mean and standard deviation but have little effect on the median and IQR.
What are the properties of a symmetric distribution?
The mean, median, and mode are all equal.
How can skewness affect the interpretation of data?
Skewness indicates the direction of the data tail and can impact the choice of central tendency measure.
What is the significance of the coefficient of variation (CV)?
It measures relative variability by expressing the standard deviation as a percentage of the mean.
When should the range not be used to summarize variability?
When there are extreme outliers, as it can give a misleading impression of data spread.
What are the differences between population and sample variance?
Population variance uses š, while sample variance uses šā1 to correct for bias in estimating the population.
What does a box plot reveal about a dataset?
It shows the spread, median, potential outliers, and overall distribution of data.
What are percentiles, and how are they used in descriptive statistics?
Percentiles indicate the position of a value relative to the entire dataset, often used in standardized testing.
What is the difference between absolute and relative frequency?
Absolute frequency counts occurrences, while relative frequency expresses them as percentages.
What is a cumulative frequency distribution?
A running total of frequencies that shows the number of values below a given level.
What does a small interquartile range (IQR) suggest?
That the data points are closely packed around the median.
What does it mean if the mean is greater than the median?
The data is positively skewed.
How can descriptive statistics assist in exploratory data analysis (EDA)?
They help detect patterns, trends, and outliers before conducting further analysis.
What is a common misinterpretation of the mean in skewed distributions?
Assuming it represents a typical value, which may not be true in asymmetrical distributions.
What is a trimmed mean?
A mean calculated after removing extreme values to reduce the influence of outliers.
What is the role of descriptive statistics in hypothesis testing?
They provide a summary and understanding of the data before applying inferential methods.
How does standard deviation relate to the normal distribution?
It determines the spread of data around the mean and helps identify proportions using the empirical rule.
What is the impact of sample size on descriptive statistics?
Larger samples provide more reliable estimates, while smaller samples can lead to greater variability.
How can outliers be detected using descriptive statistics?
Through methods like box plots, Z-scores, and IQR rule (1.5 x IQR).
What is the difference between univariate and bivariate descriptive statistics?
Univariate describes a single variable, while bivariate examines relationships between two variables.
What is the purpose of standardizing data?
To compare datasets with different units or scales by converting values into standard scores (Z-scores).
Why is data cleaning important before descriptive analysis?
To ensure accuracy by removing errors, missing values, and inconsistencies in the dataset.
What is the Pareto principle in data analysis?
The idea that 80% of effects come from 20% of causes, useful in identifying key trends in data.