unit 1 - chapter 1 - sampling data Flashcards
descriptive statistics (observed data)
Characteristics
Tables
Graphs
Measurements
Observed data
inferential statistics (unobserved data)
Statistical modeling
Hypothesis testing
Confidence intervals
Predictive analytics
Unobserved data
statistic (number describing a sample)
count:
mean:
sd:
variance:
correlation coefficent:
error:
count/size: n
Mean: x bar
Standard deviation: S
Variance: s^2
Correlation coefficient: r
Error: e
parameter (number describing a population)
count:
mean:
sd:
variance:
correlation coefficent:
error:
Count/size: N
Mean: Mew (u)
Standard deviation: Sigma (o)
Variance: sigma^2
Correlation coefficient: Rho (p)
Error: Epsilon (e)
the goal of sampling
- The sample represents the population, so the statistic reflects its corresponding parameter
–> Think of it like: x bar taking a pic of mew (stats/sample taking a picture of parameter/population)
–> Our sample should reflect what our population is like - To increase the likelihood the sample represents the population: use a systematic, random sampling method
Examples: a sample survey
Example: asking the first 5 people to take a survey
Example: picking out names from a hat to survey someone randomly
on exam - various methods of sampling - simple random sample
Each method has pros and cons. The easiest method to describe is called a simple random sample.
Any group of n individuals is equally likely to be chosen as any other group of n individuals if the simple random sampling technique is used. In other words, each sample of the same size has an equal chance of being selected.
on exam - various methods of sampling - stratified sample
To choose a stratified sample, divide the population into groups called strata and then take a proportionate number from each stratum.
For example, you could stratify (group) your college population by department and then choose a proportionate simple random sample from each stratum (each department) to get a stratified random sample.
To choose a simple random sample from each department, number each member of the first department, number each member of the second department, and do the same for the remaining departments.
Then use simple random sampling to choose proportionate numbers from the first department and do the same for each of the remaining departments.
on exam - various methods of sampling - cluster sample
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters.
All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your college population, the four departments make up the cluster sample.
Divide your college faculty by department.
The departments are the clusters. Number each department, and then choose four different numbers using simple random sampling. All members of the four departments with those numbers are the cluster sample.
on exam - various methods of sampling - systematic sample
To choose a systematic sample, randomly select a starting point and take every nth piece of data from a listing of the population.
For example, suppose you have to do a phone survey. Your phone book contains 20,000 residence listings. You must choose 400 names for the sample
. Number the population 1–20,000 and then use a simple random sample to pick a number that represents the first name in the sample.
Then choose every fiftieth name thereafter until you have a total of 400 names (you might have to go back to the beginning of your phone list). Systematic sampling is frequently chosen because it is a simple method.
on exam - various methods of sampling - NON-RANDOM - convenience sample
A type of sampling that is non-random is convenience sampling.
Convenience sampling involves using results that are readily available. For example, a computer software store conducts a marketing study by interviewing potential customers who happen to be in the store browsing through the available software. The results of convenience sampling may be very good in some cases and highly biased (favor certain outcomes) in others.
the good, the bad and the ugly of sampling
Pros of sampling: save money, save time, increase practicality (population too large, population changing, collecting data destroys product), reduce monotony, increase accuracy
Cons of sampling (errors): confirmation bias, skewing of data to get a specific result, doesn’t represent the whole population, drawing good data, sampling biases, misuse of data, security.privacy issues, reliability of data, training.understanding
the ripple effect of sampling
Samples affect…
The test type
The test form
Power
Confidence
Critical values
qualitative data
Qualitative data are the result of categorizing or describing attributes of a population. Qualitative data are also often called categorical data.
Hair color, blood type, ethnic group, the car a person drives, and the street a person lives on are examples of qualitative(categorical) data.
Qualitative (categorical) data are generally described by words or letters.
For instance, hair color might be black, dark brown, light brown, blonde, gray, or red. Blood type might be AB+, O-, or B+. Researchers often prefer to use quantitative data over qualitative(categorical) data because it lends itself more easily to mathematical analysis.
quantiative data
Quantitative data are always numbers. Quantitative data are the result of counting or measuring attributes of a population.
Amount of money, pulse rate, weight, number of people living in your town, and number of students who take statistics are examples of quantitative data.
Quantitative data may be either discrete or continuous.
attributes of data: qualitative data
responses:
result:
format:
tallied:
processed:
Responses: descriptive or categorical place holders
Result: frequency counts of proportions
Format: discrete (discrete: value falls on exact point)
Tallied: counted
Processed: descriptive analysis and/or coarse statistical analysis