Intro to Statistics Flashcards
Numerical Data
Also known as quantitative data. Consists of numbers and can be divided into two types: discrete and continuous.
Discrete data
Type of numerical or quantitative data. Includes countable values, often integers.
Examples: Number of patients in a hospital, number of defective products in a batch, nuber of books on a shelf.
Characteristics: Discrete data often arisie from counting.
Continuous Data
Includes measurable values that can take any value within a given range.
Examples: Height, weight, temperature, time, distance.
Characteristics: Continuous data often arise from measurements and can have an infinite number of possible values within a range.
Categorical Data
Also known as qualitative data. Consists of categories or groups. It can be divided into two types: nominal and ordinal.
Nominal Data
Represents categories that do not have a specific order or ranking.
Examples: Gender, blood type, types of animals.
Characteristics: Nominal data are purely lavels and do not imply and sort of order.
Ordinal Data
Represents categories that have a specific order or ranking.
Examples: Education level, satisfaction rating, severity of pain.
Characteristics: Ordinal data indicate a meaningful order among categories, but tthe intervals between categories are not necessarily equal.
Key Differences of Numerical and Categorical Data: Nature of Data
Numerical: Involves numbers and quantifiable data
Categorical: Involves categories and qualitative distinctions
Key Differences of Numerical and Categorical Data: Subtypes
Numerical: Discrete and Continuous.
Categorical: Nominal and Ordinal.
Key Differences of Numerical and Categorical Data: Operations
Numerical: Can perform mathematical operations (addition, subtraction, etc.)
Categorical: Typically summarized by counts and proportions.
Population
The complete set of all possible observations or data points.
Examples: All patients in a hospital, every student in a school, every product produced by a factory.
Use: Populations are used when the goal is to understand or make statements about the entire group.
Parameters: Characteristics of a population (such as mean, standard deviation).
Sample
A portion of the population selected for analysis.
Examples: A group of 100 patients from the hospital, 50 students from the school, 200 products from the factory.
Use: Samples are used to make estimakes or test hypotheses about the population.
Statistics: Characteristics of a sample (such as a sample mean, sample standard deviation)
Key Differences of Population vs. Sample: Scope
Population: Includes all members of a specified group.
Sample: Includes a part of the population.
Key Differences of Population vs. Sample: Size
Population: Generally large or infinate.
Sample: Manageable or finite.
Key Differences of Population vs. Sample: Notation
Population: Parameters often denoted by Greek letters (eg., μ for mean, σ for standard deviation).
Sample: Statistics opten denoted by Latin letters (eg., x̄ for mean, s for standard deviation).
Why use Samples?
Practicality: Studying an entire population can be time-consuming, expensive, and logistically challenging
Feasibility: Sometimes it’s impossible to access every member of a population.
Efficiency: Properly selected samples can provide accurate and reliable insights about the population
Sampling Methods
Random Sampling: Every member of the population has an equal change of being selected. This helps ensure the sample is representative of the population.
Stratified Sampling: The population is divided into subgroups (strata) based on specific characteristics, and samples are taken from each stratum.
Systematic Sampling. Every nth member of the population is selected.
Convenience Sampling. Samples are selected based on ease of access. This method is less reliavle but opten used for exploratory research.
Importance of Sampling in Research
Using a sample to infer about a population allows researchers to draw conclusions and make predictions without needing to collect data from every member of the population. However, it is crucial that the sample is representative of the population to ensure the validity and reliability of the infereces made.
Descriptive Statistics
Summarize and organize data to make it easily understandable. These statistics provide simple summaries about the sample and measures. They form the basis of virtually every quanititative analysis of data.
Example: In a study of patients’ blood pressure readings, this type of statistics might report the average (mean) blood pressure, the most common (mode) reading, and the range or readings.
Summary: Focus on summarizing and describing the features of a dataset. Useful for getting a clear picture of the data at hand.
Inferential Statistics
Make inferences about populations using data drawn from the population. Instead of summarizing the data itself, inferential statistics help make predictions or generalizations about a population based on a sample of data.
Example: In a clinical trial, this type of statistics might be used to determine whether a new medication significantly lowers blood pressure compared to a placebo, using data from a sample of patients.
Summary: Focus on making generalizations from a sample to a population. Useful for hypothesis testing and estimating population parameters.
Predictive Statistics (aka Predictive Analytics)
Use statistical models and machine learning techniques to predict future events or outcomes based on historical data. These statistics are often used in data mining, business forcasting, and machine learning applications.
Example: In healthcare, predictive statistics might be used to predict the likelihood of a patient developing a certain disease based on their medical history and other risk factors.
Summary: Focus on using models to predict future outcomes based on past data. Useful for forecasting and making informed decisions.
Characteristics of Descriptive Statistics: Purpose
To describe and sumarize the main features of a dataset.
Characteristics of Inferential Statistics: Purpose
To infer conclusions about a population based on a sample.
Characteristics of Predictive Statistics: Purpose
To make predictions about future outcomes based on patterns in historical data.
Characteristics of Descriptive Statistics: Methods
Include measures of central tendency and measures of variability or spread.
Characteristics of Inferential Statistics: Methods
Use probability theory to estimate population parameters, test hypotheses, and make predictions.
Characteristics of Predictive Statistics: Methods
Utilize statistical models and algorithms to forcast future events.
Characteristics of Descriptive Statistics: Common Techniques
Central Tendency: Mean, median, mode
Dispersion: Range, variance, standard deviation, interquartile range
Data Visualization: Charts, graphs, tables
Characteristics of Inferential Statistics: Common Techniques
Hypothesis Testing: T-tests, chi-square tests, ANOVA, regression analysis.
Confidence Intervals: Range within which a population parameter is expected to lie.
Sampling Methods: Random sampling, stratified sampling, etc.
Characteristics of Predictive Statistics: Common Techniques
Regression Analysis: Linear regression, logistic regression
Machine Learning Models: Decision trees, neural networks, support vector machines
Time Series Analysis: ARIMA, exponential smoothing.
Example of Descriptive Statistics
In a study of patients’ blood pressure readings, descriptive statistics might report the avverage (mean) blood pressure, the most common (mode) reading, and the range of readings.
Example of Predictive Statistics
In healthcare, predictive statistics might be used to predict the likelihood of a patient developing a certain disease based on their medical history and other risk factors.
Example of Inferential Statistics
In a clinical trial, inferential statistics might be used to determine wheter a new medication significantly lowers blood pressure compared to a placebo, using data from a sample of patients.