Chapter 11 statistical sampling Flashcards
Mean
Sum of the numbers in a data set divided by the total number of values in the data set. Average. Best used in data set with numbers that are close together.
Median
Midpoint value of a data set, where the values are arranged in ascending or descending order. Better with a data set with outliers.
Random Variable
A variable that describes all of the possible outcomes of a random process. For example if you have X for a coin flip, then X=1 when it is heads and X=0 when it is tails.
Discrete
The total number of possible outcomes is countable. An example is heads or tails.
Continous
The total number of possible outcomes is uncountable. An example is time measurements.
Probability Density Function
A continuous probability distribution function. This means that for any measurement x sub 1, there exists a corresponding value for f(x sub1).
Empirical Probabilities
Are probabilities generated from data.
Expected VAlue
Also known as the mean or average of the probability distribution. Can be thought of as the outcome we should expect on average.
E(x)=Sum(x*P(x))
Random Sampling
A method of choosing an equally distributed subset from a larger population. There is simple random samples, stratified random samples, cluster sampling, and systematic random samples
Sampling
A part of a population used to describe the whole group.
Population
All members of a specified group.
Simple Random Sampling
A type of random sampling where the variables have an equal, and unsystematic, chance of selection. Best used when a researcher does not know a lot about the demographics in the population.
Stratified Random Sampling
Divide members of a population into ‘strata’ or homogeneous subgroups. Different in that you seperate the population into groups first. Stratified random sampling cannot have crossover. Stratified random samples must include all members of a population.
Example is splitting up a high school with freshman, sophomore, junior, and senior students to then decide how many of each group is needed to take a sample. Best used when research is familiar with the demographics. No more then four to six strata is recommended but you can have as many as you want.
Cluster Random Samples
The sampling method where different groups within a population are used as a sample. Cluster cannot have crossover and must include all members of a population. Unlike stratified, the cluster sampling does not have to have an equal selection from each group but must be as close to the same size as possible. Use this when the entire population is unclear or unknown.
Systematic Random Sampling
Requires selecting samples based on a system of intervals in a population. For example selecting very 4th customer in a movie theater. Can only do this if the population is homogenous with a randomized list.
Law of Large Numbers
Theorem that states that the larger sample sizes, the closer the sample mean will be to the mean of the population.
Normal Distribution
Roughly bell shaped distribution that occurs over and over throughout populations and samples.
Central Limit Theorem
If you run a random experiment enough times the results will follow a normal distribution. The data set only maintains integrity if the new students are drawn from a random sampling of students.
Mean
Average value of all individuals in the sample.
Standard Error
How accurate your mean is by comparing it to the mean of a value that exists.
To find the standard error
Standard error=(standard deviation)/(root(samplesize))
Regression LIne
A straight line that attempts to predict the relationship between two points. Also called trend line or line of best fit.
Simple Linear Regression
A prediction when a variable Y is dependent on a second variable X based on the regression equation of a given set of data.
Scatterplot
A graph of ordered pairs showing a relationship between two sets of data.
Correlation
The relationship between two sets of variables used to describe or predict information.
Regression Analysis
Study of two variables in an attempt to find a relationship, or correlation.
Independent Variable
A condition or piece of data in an experiment that can be controlled or changed.
Dependent Variable
A condition or piece of data in an experiment as controlled or influenced by an outside factor, most often the independent variable.
Positive Correlation
The dependent variables and the independent variables in the data set increase or decrease together.
Negative Correlation
The dependent variables and independent variables in a data set either increase or decrease opposite from one another.
Causation
An observed event or action appears to have caused a second event or action.
Chi-square
A statistical test used to compare expected data with what we collected.
Null hypothesis
The prediction that there is no interaction between variables. If there is a big enough difference between the scores, then we can say something significant happened which would be rejecting the Null hypothesis.
Chi squared definition and formula
(O-e)^2/e. Where o is the observed data and e is what you expected. P value needs to be under .05 for it to be considered sucessfull. A statistical test used to compare expected data with what we collected.
Degrees of Freedom
of categories-1=Degrees of freedom
Formula for the ax+b regression line
a=(nsum(xy)-sum(x)sum(y))/(nsum(x^2)-sum(x)^2)
b=1/n (sum(y)-a(sumxi))