Descriptive Stats - Chapter 2 - Theory Flashcards
What is the definition of statistics as presented in the material, and what are its key processes?
Statistics is the science of handling data to make decisions. Key processes: collecting, organizing, analyzing, interpreting, and presenting data.
Explain the difference between a population and a sample, providing an example from the document.
Population is the whole group; sample is a subset.
What is a random variable, and how does it relate to data collection in statistics?
A random variable is a measurable outcome (e.g., height). It’s what’s collected as data
Define a population parameter and a sample statistic, and give an example of each.
Parameter describes the population (e.g., mean of all scores). Statistic describes a sample (e.g., mean of 10 scores).
What are the three major components of statistics mentioned in the document, and what is the purpose of each?
Descriptive (summarizes data), Inferential (predicts from samples), Visualization (shows data graphically).
How does descriptive statistics differ from inferential statistics in terms of their objectives?
Descriptive summarizes data; inferential predicts beyond it.
What is the purpose of data visualization in descriptive statistics, and name three types of visualizations mentioned?
Purpose: Show patterns clearly. Types: Bar charts, pie charts, histograms.
Explain the difference between qualitative and quantitative data types with examples.
Qualitative: Categories (e.g., gender). Quantitative: Numbers (e.g., height).
What is the role of a bar chart, and how is it constructed according to the document?
Role: Compare frequencies. Construction: Categories on x-axis, frequencies on y-axis, bars with gaps.
Describe how a pie chart represents data and what condition must its segments satisfy?
Pie chart shows proportions as slices. Condition: Slices total 100%
What is a histogram, and how does it differ from a bar chart in displaying data?
Histogram shows frequency of continuous data with no gaps. Bar chart has gaps for discrete categories.
What does a boxplot visually represent, and what are its key components?
Boxplot shows spread and outliers. Components: Box (IQR), median line, whiskers, outliers.
Define outliers in a dataset, and explain why they are significant in data analysis.
Outliers are extreme values. Significant because they can signal errors or special cases.
What is the interquartile range (IQR), and how is it used to detect outliers?
QR is Q3 - Q1. Outliers are below Q1 - 1.5 × IQR or above Q3 + 1.5 × IQR.
What are measures of central tendency, and list the three types mentioned in the document?
Central tendency shows the dataset’s center. Types: Mean, median, mode.
What are measures of dispersion, and why are they important in describing data?
Dispersion measures spread (e.g., IQR). Important to show data variability.
Explain the concept of covariance and how it relates to the relationship between two variables.
Covariance shows if two variables move together (positive or negative).
What are the ground rules for engaging in a statistics session as outlined in the document, and how do they facilitate learning?
Rules: Be curious, ask questions, practice, collaborate. They encourage active learning.
What is correlation, and how does it differ from covariance in terms of interpretation?
Correlation measures relationship strength (-1 to 1). Covariance isn’t standardized, so it’s less interpretable.
Why is it important to understand both the spread and central tendency of a dataset?
Central tendency gives the average; spread shows how varied the data is—both are needed for a full picture.