Ch. 1: Statistics Flashcards
Data
Facts, especially numerical facts, collected together for reference or information.
Statistics
- Aggregated data, summed up into one or a few numbers or an image, are statistics
- A way to turn data into useful information
- A tool for creating new understanding from a set of numbers. Ideally, the data will help us tell new stories.
Information
Knowledge communicated concerning some particular fact.
Population
- The group of all items of interest to a statistics practitioner.
- Frequently large sometimes infinite
Parameter
A descriptive measure of a population
Sample
- A set of data drawn from a population
- Potentially very large, but less than the population.
- If we could afford it, we’d directly look at the population, but populations tend to be big.
- A statistic is a descriptive measure of a sample
Cross-sectional
A survey done in many places at the same time.
Ex: Tracking average temperatures this July in ten places on the US East Coast including Long Island, Baltimore, and Ocean County NJ
Time-series
Done in the same place more than once
Ex: Counting the tons of tuna captured each year over a 20 year span
Panel data
Is both time-series and cross-sectional. If I interview all of you now and every 5 years for the next 50 years, that would make a panel data set.
Descriptive statistics
- Are methods of organizing, summarizing, and presenting data in a convenient and informative way.
- Describe the data set that’s being analyzed, but don’t allow us to draw any conclusions or make any inferences about the full set of data or the population as a whole.
Inferential statistics
- A set of methods, but it is used to draw conclusions or inferences about characteristics of populations based on data from a sample.
- Statistics that are useful not just to describe a sample but to draw conclusions about a larger population
Statistical inference
Is the process of making an estimate, prediction, or decision about a population based on a sample.
Observational study
- Doesn’t involve messing with things: it’s just watching to see what’s already going on.
- Hard to identify cause & effect because confounding variables make it look there’s a relationship when there’s not
Experiments
Involve researchers manipulating “explanatory” variables to check for effects on “outcome” variables.
- Ex: Stanley Milgram’s work on obedience
Simple Random Sample
- Is chosen using a method that ensures that each different possible sample of the desired size has an equal chance of being chosen.
- Example: make a list of all possible choices and choose from them using random.org.
- Note that it is the selection process, and not the final sample, which determines whether the sample is a simple random sample.
Stratified sampling
- Split the target population into groups and randomly sample the groups
- Each group is called a “stratum” (plural “strata”)
- Ex: In a class of 16 boys and 16 girls, if I chose a sample of 10 by randomly selecting 5 boys and 5 girls
Cluster sampling
- Split the target population into groups and choose a few groups and do everyone in them
- Ex: Split the entire country into districts, and within two districts, administer proficiency tests to all Jiffy Lube auto technicians.
Systematic sampling
- Get a list of all possible observations you could sample. For example, say you have a list of all 10000 Dunkin Donuts stores.
- Put the list in some set order. (Say by zip code.)
- Choose the size of sample you want. (Say 50.)
- Calculate the constant “k,” which is the total number of possible observations (10000 in our example) divided by sample size (50). In the example, 10000/ 50 = 200.
- To get the first observation to sample, choose randomly (using random.org?) from the first k observations. In this example, say random.org tells us to choose the 16th observation.
- Now choose every kth observation after that. So, we’d choose the 216th, the 416th, etc.
Convenience sampling
Yields results that are about as good as convenience store food. Basically a “convenience” sample is just some people you grabbed because they walked by, or something like that: it’s not careful at all.
Selection bias
- Leaving out some part of the population of interest.
- Ex: In 1948 a phone survey systematically left out those without a phone.
Variable
- the set of everyone’s answers to one question
- columns
- Ex: name, address, phone number
Observation
- set of responses from one respondent
- rows
- Ex: James, Hagerstown.
Quantitative
Quantitative: if they are numbers you can do math with, like salaries or a GPA.
Qualitative
Qualitative: Data that are not numbers, like a list of people’s hometowns, or phone numbers
Nominal
- “naming” data
- Ex: Type of car owned by a household
Ordinal
- “order” data
- Ex: Crash-test safety information on cars sold. Score is out of 5 stars
Interval
- Data with meaningful numbers, and where zero means zero
- Ex: Mileage on a car