Introduction Flashcards

Question 1

Q

algorithmic and inferential aspects of statistical analysis.

Answer

A

“Here averaging (1.1) is the algorithm, while the standard error provides an inference of the algorithm’s accuracy.

The point is that the algorithm comes first and the inference follows at a second level of statistical consideration.

In practice this means that algorithmic invention is a more free-wheeling and adventurous enterprise, with inference playing catch-up as it strives to assess the accuracy, good or bad, of some hot new algorithmic methodology.

The past few decades have been a golden age of
statistical methodology. It hasn’t been, quite, a golden age for statistical
inference,”

Question 2

Q

Frequentist inference

Answer

A

“Statistical inference usually begins with the assumption that some prob-
ability model has produced the observed data x.

Bias and variance are familiar examples of frequentist inference.

Frequentism is often defined with respect to “an infinite sequence of
future trials.””

Question 3

Q

Frequentism in practice (devices to get around not having the true function)

Answer

A

“The plug-in principle
the frequentist accuracy estimate for x-hat is itself estimated from the observed data!

Taylor-series approximations (the delta method)

Parametric families and maximum likelihood theory

Simulation and the bootstrap

Pivotal statistics
A pivotal statistic is one whose distribution does not depend upon the underlying probability distribution F.
The classic example concerns Student’s two-sample t-test:”

Question 4

Q

Frequentist optimality

Answer

A

“relatively modest mathematical modeling assumptions: only a probability model F (more exactly a family of probabilities, Chapter 3) and an algorithm of choice t(x).
the principle of frequentist correctness doesn’t help with the choice of algorithm.

The first of these was Fisher’s theory of maximum likelihood estimation
and the Fisher information bound: in parametric probability models of the
type discussed in Chapter 4, the MLE is the optimum estimate in terms of
minimum (asymptotic) standard error.

Frequentism cannot claim to be a seamless philosophy of statistical in-
ference. That being said, frequentist methods have a natural appeal to working scientists, an impressive history of successful application, and, as our list of five “devices” suggests, the capacity to encourage clever methodology.”

Question 5

Q

Uniform distribution

Answer

A

“Definition: A probability distribution where all outcomes in a specified range are equally likely.

Key Highlights:

Types:
    Discrete Uniform: Finite set of equally likely outcomes (e.g., rolling a die).
    Continuous Uniform: Equal likelihood across an interval [a,b][a,b].
Used as a baseline model when no additional information about probabilities is available.

Assumptions:

Equal probability across the range.
Independent observations.

Limitations:

Rarely occurs in real-world data.
Over-simplifies situations where probabilities differ within the range.

Example: Randomly selecting a point on a line segment of length 10. Each point is equally likely.”

Question 6

Q

Gamma distribution

Answer

A

“Definition: A continuous probability distribution used to model waiting times or lifetimes of events that occur in a Poisson process.

Key Highlights:

Flexible shape controlled by two parameters:
    Shape parameter (kk): Determines the skewness.
    Rate parameter (θθ) (or scale, 1/β1/β): Determines the spread.
Includes exponential and chi-square distributions as special cases

Assumptions:

Data is continuous and positive.
Events are independent and occur at a constant average rate.

Limitations:

Requires careful parameter estimation.
Not suitable if the underlying process does not follow Poisson-like assumptions.

Example: Modeling the total time until kk failures in a system with exponentially distributed times between failures.”

Question 7

Q

Geometric distribution

Answer

A

“Definition: A discrete probability distribution that models the number of trials needed to get the first success in a series of independent Bernoulli trials.

Key Highlights:

Models ""waiting time"" for a success.
Defined by a single parameter pp, the probability of success on each trial.

Assumptions:

Trials are independent.
Probability of success (pp) is constant across trials.
Each trial has only two possible outcomes: success or failure.

Limitations:

Assumes constant pp; not suitable if probabilities vary.
Sensitive to deviations from independence.

Example: Modeling the number of coin flips needed to get the first heads in a fair coin toss (p=0.5p=0.5).”

Question 8

Q

Poisson distribution

Answer

A

“Definition: A probability distribution that models the number of events occurring in a fixed interval of time or space when events happen independently and at a constant average rate.

Key Highlights:

Discrete distribution.
Defined by a single parameter, λ (average rate of occurrence).
Probabilities decline as events deviate significantly from λ.

Assumptions:

Events are independent.
Events occur at a constant rate (no clusters).
No two events can occur simultaneously.

Limitations:

Assumes homogeneity; not suitable for over-dispersed or under-dispersed data.
Sensitive to deviations from independence or constancy.

Example: Modeling the number of emails received in an hour.”

Question 9

Q

Normal distribution

Answer

A

“Definition: A continuous probability distribution that describes data clustered around a mean (μμ) with a symmetric bell-shaped curve.

Key Highlights:

Defined by two parameters:
    μμ: Mean (center of the distribution).
    σ2σ2: Variance (spread or dispersion).
Symmetrical around the mean.
Total area under the curve equals 1.

Properties:

Approximately 68% of data lies within 1 standard deviation (μ±σμ±σ), 95% within 2, and 99.7% within 3 (μ±3σμ±3σ).
Basis for many statistical methods due to the Central Limit Theorem (CLT).

Assumptions:

Data is continuous.
Symmetry and unimodality.
No extreme outliers (if modeling real-world phenomena).

Limitations:

Not suitable for heavily skewed data or data with multiple modes.
Real-world data often only approximates normality.

Example: Heights of adults in a population often follow a normal distribution”

Question 10

Q

Exponential distribution

Answer

A

“Definition: A continuous probability distribution that models the time between independent events occurring at a constant average rate (Poisson process).

Key Highlights:

Memoryless property: The probability of an event occurring in the future is independent of past events.
Defined by a single parameter: λλ (rate), or alternatively, θ=1/λθ=1/λ (mean).

Assumptions:

Events occur independently and at a constant rate.
The process follows the Poisson framework.

Limitations:

Memoryless property may not hold in many real-world scenarios.
Not suitable for data with varying rates over time.

Example: Modeling the time between arrivals at a service counter or the lifespan of electronic components.”

Question 11

Q

Binomial distribution

Answer

A

“Definition: A discrete probability distribution that models the number of successes in nn independent Bernoulli trials, each with a constant probability of success pp.

Key Highlights:

Two possible outcomes per trial: success or failure.
Defined by two parameters:
    nn: Number of trials.
    pp: Probability of success on each trial

Assumptions:

Fixed number of trials (nn).
Trials are independent.
Constant probability of success (pp) for each trial.

Limitations:

Assumes independence and constant pp; not suitable for dependent trials or varying probabilities.

Example: Tossing a coin 10 times and counting the number of heads (p=0.5p=0.5).”

Question 12

Q

Introduction Flashcards

(12 cards)