Continuous and rest till anova (p-values) Flashcards
What is the Normal Distribution?
The normal distribution, also known as the Gaussian distribution, is a symmetric probability distribution frequently encountered in nature. It is defined by its mean and standard deviation and forms a characteristic bell-shaped curve. Example: IQ scores in a population often follow a normal distribution. For instance, if the mean IQ score is 100 and the standard deviation is 15, the distribution captures the variation in IQ scores.
What is the Uniform Distribution?
The uniform distribution assigns equal probabilities to all outcomes within a specified interval. It models situations where each outcome has the same likelihood. Example: Rolling a fair six-sided die illustrates a uniform distribution, where each number (1 to 6) has an equal probability of 61. |
What is the Exponential Distribution?
The exponential distribution models the time between events in a Poisson process. It is often used for modeling waiting times or lifetimes and possesses a memoryless property. Example: Consider the time between arrivals of customers at a fast-food drive-thru with an average arrival rate of 3 minutes. The exponential distribution can model the time between successive arrivals. |
What is the Gamma Distribution?
The gamma distribution is a versatile distribution used to model various types of continuous random variables. It encompasses the exponential distribution as a special case. Example: To model the time it takes for a machine to produce a certain number of parts, the gamma distribution can be employed. If the machine produces an average of 100 parts per hour, the distribution captures production times. |
What is the Beta Distribution?
The beta distribution models probabilities or proportions and exhibits diverse shapes. It is commonly used in Bayesian analysis and quality control. Example: In a clinical trial, the proportion of patients responding positively to a new drug can be represented using a beta distribution, aiding in estimating a range of possible response rates. |
What is the Chi-Square Distribution?
The chi-square distribution is commonly used in statistical hypothesis testing and arises when summing the squares of independent standard normal random variables. Example: When testing the independence of two categorical variables, such as smoking habits and lung cancer incidence, the chi-square distribution is utilized to assess the significance of the association. |
What is the Student’s t-Distribution?
The t-distribution is employed when estimating the mean of a normally distributed population from a small sample or when the population standard deviation is unknown. Example: Suppose you wish to estimate the average time spent on a task from a small sample of 12 observations. The t-distribution is used to construct a confidence interval for the population mean. |
What is the Log-Normal Distribution?
The log-normal distribution models data that are positively skewed and cannot take negative values. It is often used in financial modeling and describes multiplicative growth. Example: The distribution of housing prices in a city can often be described using a log-normal distribution, accounting for positive skewness and preventing negative prices. |
What is the Weibull Distribution?
The Weibull distribution models the distribution of lifetimes or failure times of objects. It can take different shapes to describe various failure patterns. Example: The lifetime of electronic components, such as light bulbs, can be modeled using a Weibull distribution. Different shapes of the distribution correspond to different failure patterns. |
What is the Cauchy Distribution?
The Cauchy distribution is characterized by its heavy tails and lack of finite moments. It is used to describe certain types of distributions in physics, engineering, and other fields. Example: In a physics experiment involving interference patterns, the distribution of phase differences between waves can be modeled using a Cauchy distribution. |
What is the Pareto Distribution?
The Pareto distribution is used to model distributions where a small number of observations account for the majority of occurrences. It is often used in economics and finance. Example: In economics, the distribution of income or wealth often follows a Pareto distribution, where a small percentage of individuals hold the majority of resources. |
What is the Exponential Power Distribution?
The exponential power distribution is a flexible distribution capable of modeling a wide range of shapes and tail behaviors. It is used in economics, finance, and engineering to handle diverse datasets. Example: The distribution of rainfall intensity during heavy storms can be modeled using an exponential power distribution to capture different patterns of intensity variation. |
What is Bayes’ Theorem, and how does it relate to machine learning?
Bayes’ Theorem is a fundamental concept in probability theory and statistics. It provides a way to update predictions based on new evidence. In machine learning, it’s used for classification tasks like spam detection or medical diagnosis.
Can you provide an example of Bayes’ Theorem in spam email detection?
Certainly! Consider a scenario where you’re building a spam filter. Given prior probabilities and keyword occurrence probabilities, Bayes’ Theorem helps calculate the chance an email is spam based on keywords.
How does Bayes’ Theorem enhance decision-making in machine learning?
Bayes’ Theorem improves decision-making by incorporating prior knowledge and new evidence. It adjusts probabilities to update beliefs, leading to more accurate classifications and informed decisions.
What is Prior Probability (Prior)?
Prior Probability: The initial belief or probability of an event occurring before considering new evidence.Example: In a medical test for a rare disease, the prior probability of a person having the disease might be 0.001 (0.1%).
What is Posterior Probability (Posterior)?
Posterior Probability: The updated probability of an event occurring after considering new evidence using Bayes’ Theorem.Example: After a positive test result, the posterior probability of a person having the disease is recalculated based on the test.
What is Likelihood?
Likelihood: The probability of observing the evidence (data) given a specific hypothesis or event.Example: In a coin flip, the likelihood of getting heads given that the coin is fair is 0.5.
What is Evidence (Data)?
Evidence (Data): The observed information that is used to update probabilities.Example: In spam email detection, the evidence could be the presence of specific keywords in an email.
What is Marginal Probability?
Marginal Probability: The probability of a single event occurring, disregarding any other events.Example: The probability of rolling a 4 on a fair six-sided die is a marginal probability.
What is Conditional Probability?
Conditional Probability: The probability of one event occurring given that another event has already occurred.Example: The probability of a patient having a disease given that they exhibit certain symptoms.
What is Joint Probability?
Joint Probability: The probability of two or more events occurring together.Example: The joint probability of rolling a 3 and flipping a head on two independent coin tosses.
What is Law of Total Probability?
Law of Total Probability: A formula that computes the probability of an event by considering all possible ways it can occur.Example: Calculating the probability of a student passing a course by considering the probability of passing given study time.
What is Bayes’ Factor?
Bayes’ Factor: A measure of the strength of evidence for one hypothesis compared to another, obtained by a ratio.Example: Comparing the hypothesis that a medical treatment is effective versus the hypothesis that it is not based on patient outcomes.
What is Prior Distribution?
Prior Distribution: The probability distribution representing our uncertainty about a parameter before observing data.Example: In Bayesian statistics, the initial distribution representing our beliefs about the success rate of a new drug.
What is Posterior Distribution?
Posterior Distribution: The updated probability distribution of a parameter after observing data.Example: The distribution of possible values for a patient’s blood pressure after incorporating measurements and prior knowledge.
What is Probability Density Estimation (PDE)?
Probability Density Estimation (PDE) is a statistical technique used to estimate the probability distribution of a continuous random variable.
How does Probability Density Estimation work?
PDE involves creating a smooth curve, called a probability density function, that approximates the underlying pattern in the data.
Can you provide a simple example of PDE?
Certainly! For instance, PDE can help us understand the distribution of ages in a town by creating a curve showing how likely different ages are.
What’s the benefit of using Probability Density Estimation?
PDE helps us see common trends and variations in data, allowing us to make informed decisions about the overall pattern.
In what fields is Probability Density Estimation applied?
PDE is used in finance, biology, and machine learning, among others, to analyze data distributions and make predictions based on patterns.
How is the probability density function (PDF) created in PDE?
The PDF is created by smoothing out data points using mathematical techniques, providing insights into the likelihood of different values.
Is Probability Density Estimation useful only for large datasets?
PDE is useful for both large and small datasets, helping us understand data patterns regardless of the data’s size.
What’s the main goal of Probability Density Estimation?
The main goal of PDE is to provide a representation of the underlying probability distribution, allowing us to understand data likelihoods.
What is Hypothesis Testing?
Hypothesis testing is a statistical method used to make decisions about population parameters based on sample data. It involves formulating competing hypotheses and assessing evidence.
Can you provide an example scenario for Hypothesis Testing?
Certainly! Imagine a company claims a new marketing campaign increased daily website visitors. Hypothesis testing helps us systematically assess whether this claim is supported by data.
What are the steps in Hypothesis Testing?
The steps include: Formulating Hypotheses, Choosing Significance Level (α), Collecting and Analyzing Data, Calculating Test Statistic, Determining Critical Region/Critical Value, Making a Decision, Drawing a Conclusion.
How do you formulate hypotheses in Hypothesis Testing?
Formulate a Null Hypothesis (H0) and an Alternative Hypothesis (H1 or Ha) that represent the default assumption and the statement being tested.