Statistics Flashcards

Question

what does S_xx What is variance in terms of S_xx

Answer 1

S_xx= (x_i- mean)²

Answer 2

when the population is split into groups based on similar characteristics

Answer 3

a random variable that is a function of a sample which contains no unknown quantities/parameters

Answer 4

methods of making decisions and predictions about a population based on a sample selected from the population.

Answer 5

A _sample_ provides a set of data values of a random variable, drawn from all such possible values. A sample is a subset of the target population.

Answer 6

a numerical summary of the population, examples are the population mean, and the population standard deviation Population parameters are denoted using the Greek alphabet.

Answer 7

The frequencies for each group in the sample are often proportional to the frequencies for each group in the population

Answer 8

all possible values of a statistic together with their associated probabilities

Answer 9

observed data - predicted data above the line = positive residual data below the line = negative residual

Answer 10

R positive = 1 negative = -1 no = 0

Answer 11

data which has pairs of values for two variables represented on scatter diagrams

Answer 12

explanatory variable = independent variable dependent = response variable

Answer 13

When there is: * a fixed number of trials * when there are two possible outcomes (success and failure) * fixed probability of success * when trials are independent of each other

Answer 14

the probability of the data being in the critical region

Answer 15

The spread of data values inside each class is evenly distributed around the midpoint.

Answer 16

Used for large quantitative data Symmetry around its mean. This means that the mean, median, and mode are all equal and located at the center of the distribution. Bell-shaped Curve: The distribution has a bell-shaped curve, meaning that it has a single peak and tails that extend indefinitely in both directions. Independence: The observations or measurements are assumed to be independent of each other. This assumption is important because the normal distribution assumes that the values do not influence each other's probabilities. Continuous Data: The normal distribution is appropriate for continuous data, where the values can take any real number. It may not be suitable for discrete or categorical data.

Answer 17

Skewed Data: If your data is significantly skewed, meaning it is asymmetric with a long tail on one side, the normal distribution may not accurately represent the underlying distribution. Outliers: When your data contains outliers, extreme values that are significantly different from the majority of the observations, the normal distribution may be sensitive to these outliers. Outliers can strongly influence the mean and standard deviation. Categorical or Discrete Data: The normal distribution is suited for continuous data, where values can take any real number. However, if your data is categorical (e.g., yes/no, red/blue/green) or discrete (e.g., counts or whole numbers), the normal distribution is not applicable. Small Sample Sizes: When you have a small sample size, the assumption of normality may be difficult to verify, and the distribution of your data may deviate from the normal distribution. Non-Linear Relationships

Answer 18

- multiple outcomes instead of just purely success or failure Dependent Trials: The binomial distribution assumes that each trial is independent of the others Continuous Data: The binomial distribution is designed for discrete data, where the outcomes are counted or represented as whole numbers. large sample: Sample Size Too Large: When the sample size is very large, the assumptions of the binomial distribution may not hold. One of the key assumptions of the binomial distribution is that the trials are independent and identically distributed. With an extremely large sample, it is possible for the independence assumption to be violated, as individual trials may become correlated. In such cases, alternative distributions like the normal distribution or the Poisson distribution might be more appropriate approximations. When the sample size is small, it can lead to unstable estimates and imprecise results. The binomial distribution assumes a fixed number of independent trials, and with a small sample size, there may not be enough data to accurately estimate the underlying probability of success (p) for each trial.