chapter 6.1 Flashcards
descriptive statistics
allows scientists to summarize and represent data sets in meaningful ways. Weve seen how to do so visually, with charts, plots, and graphs, and also with numbers - including means, standard deviations and correlation coefficients.
inferencial statistics
an important form of inductive reasoning that extends the reach of decriptive statistics with the use of probability theory
requency distribution
are lists that include every possible value of a variable and the number of times each value of that variable appears in the data set, often organized in tables
Relative frequency distributions a
are frequency distributions that record the proportions of occurrences of the value of a certain variable instead of the absolute number of occurrences. By using relative frequency distributions, we record how often different values occur for the variable under consideration, relative to the total number of valuees in the data set
Probability distribution
how probable it is for different values to occur in general
gaussian distribution
a bell shaoed curve or normal distribution
Central limit theorem
A statistical theory that samples with a large enough size will have a central tendency approximating that of the population. As a result, the probability distribution of random variables is a normal distribution or bell curve. What varies for different random variables is the central tendency and variability of the normal distribution, which, as we saw in chapter 5 can be described with mean and standard deviation
One important use of inferential statistics is to
predict or estimate the value of a feature or parameter of interest in a population on the basis of a set observed data concerning a sample of the population
the basic idea behind generalizating from a sample to a population is
using the observed frequency distribution for some feature of the individuals in a sample as the basis for estimating the probability distribution for the range of values of that feature in the general population
the sample mean
the most likely average value of the feature in the population; in other words, the sample mean is the estimate of the population mean
standard deviation formula
sd = sqr(SUM(value - mean)^2/n)s
sample standard deviation formula
s = sqr(SUM(value - mean)^2/(n-1)
A helpful rule of thumb for getting a rough probability estimate of a characteristic of interest is called the 68-95-99.7 rule.
This rule can be used to remember the percentages of values expected to lie within a certain range around the man in a normal distribution. It sayss that about 68% 95% and 99.7% of the values lie, respectively, within one, two and three standard deviations of the mean
standard error + formula
SE = s/sqr(sample size)
The standard error is a measure of the precision of the sample mean, or the uncertainty about the stimate of the mean of a population.
The standard error, and hence uncertainty about the sample mean, decreases as the sample size increases. This is because a large sample size helps control for chance of variation in the traits of sample.
Representative sample
the sample should accurately reflect the target features in the general population.
Samples chosen in ways that make some individuals in a population less or more likely to be included than others will introduce bias in the inferences made about the population based on the sample