Chapter 11 statistical sampling Flashcards

1
Q

Mean

A

Sum of the numbers in a data set divided by the total number of values in the data set. Average. Best used in data set with numbers that are close together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Median

A

Midpoint value of a data set, where the values are arranged in ascending or descending order. Better with a data set with outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Random Variable

A

A variable that describes all of the possible outcomes of a random process. For example if you have X for a coin flip, then X=1 when it is heads and X=0 when it is tails.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Discrete

A

The total number of possible outcomes is countable. An example is heads or tails.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Continous

A

The total number of possible outcomes is uncountable. An example is time measurements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Probability Density Function

A

A continuous probability distribution function. This means that for any measurement x sub 1, there exists a corresponding value for f(x sub1).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Empirical Probabilities

A

Are probabilities generated from data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Expected VAlue

A

Also known as the mean or average of the probability distribution. Can be thought of as the outcome we should expect on average.

E(x)=Sum(x*P(x))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Random Sampling

A

A method of choosing an equally distributed subset from a larger population. There is simple random samples, stratified random samples, cluster sampling, and systematic random samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sampling

A

A part of a population used to describe the whole group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Population

A

All members of a specified group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Simple Random Sampling

A

A type of random sampling where the variables have an equal, and unsystematic, chance of selection. Best used when a researcher does not know a lot about the demographics in the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Stratified Random Sampling

A

Divide members of a population into ‘strata’ or homogeneous subgroups. Different in that you seperate the population into groups first. Stratified random sampling cannot have crossover. Stratified random samples must include all members of a population.
Example is splitting up a high school with freshman, sophomore, junior, and senior students to then decide how many of each group is needed to take a sample. Best used when research is familiar with the demographics. No more then four to six strata is recommended but you can have as many as you want.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Cluster Random Samples

A

The sampling method where different groups within a population are used as a sample. Cluster cannot have crossover and must include all members of a population. Unlike stratified, the cluster sampling does not have to have an equal selection from each group but must be as close to the same size as possible. Use this when the entire population is unclear or unknown.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Systematic Random Sampling

A

Requires selecting samples based on a system of intervals in a population. For example selecting very 4th customer in a movie theater. Can only do this if the population is homogenous with a randomized list.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Law of Large Numbers

A

Theorem that states that the larger sample sizes, the closer the sample mean will be to the mean of the population.

17
Q

Normal Distribution

A

Roughly bell shaped distribution that occurs over and over throughout populations and samples.

18
Q

Central Limit Theorem

A

If you run a random experiment enough times the results will follow a normal distribution. The data set only maintains integrity if the new students are drawn from a random sampling of students.

19
Q

Mean

A

Average value of all individuals in the sample.

20
Q

Standard Error

A

How accurate your mean is by comparing it to the mean of a value that exists.
To find the standard error
Standard error=(standard deviation)/(root(samplesize))

21
Q

Regression LIne

A

A straight line that attempts to predict the relationship between two points. Also called trend line or line of best fit.

22
Q

Simple Linear Regression

A

A prediction when a variable Y is dependent on a second variable X based on the regression equation of a given set of data.

23
Q

Scatterplot

A

A graph of ordered pairs showing a relationship between two sets of data.

24
Q

Correlation

A

The relationship between two sets of variables used to describe or predict information.

25
Q

Regression Analysis

A

Study of two variables in an attempt to find a relationship, or correlation.

26
Q

Independent Variable

A

A condition or piece of data in an experiment that can be controlled or changed.

27
Q

Dependent Variable

A

A condition or piece of data in an experiment as controlled or influenced by an outside factor, most often the independent variable.

28
Q

Positive Correlation

A

The dependent variables and the independent variables in the data set increase or decrease together.

29
Q

Negative Correlation

A

The dependent variables and independent variables in a data set either increase or decrease opposite from one another.

30
Q

Causation

A

An observed event or action appears to have caused a second event or action.

31
Q

Chi-square

A

A statistical test used to compare expected data with what we collected.

32
Q

Null hypothesis

A

The prediction that there is no interaction between variables. If there is a big enough difference between the scores, then we can say something significant happened which would be rejecting the Null hypothesis.

33
Q

Chi squared definition and formula

A

(O-e)^2/e. Where o is the observed data and e is what you expected. P value needs to be under .05 for it to be considered sucessfull. A statistical test used to compare expected data with what we collected.

34
Q

Degrees of Freedom

A

of categories-1=Degrees of freedom

35
Q

Formula for the ax+b regression line

A

a=(nsum(xy)-sum(x)sum(y))/(nsum(x^2)-sum(x)^2)

b=1/n (sum(y)-a(sumxi))