Unit 1: Exploring One-Variable Data Flashcards
Statistics
The science of reasoning from data (Dealing with Data)
Population
The entire group of people/things that you want to investigate.
Sample
Smaller portion of the population that you want to gather information about
Data
Actual values of the variables you find from sampling (Ex. the number of people who like certain sports the best)
Parameter
Data about a population (numerical characteristic)
Statistic
Data about a sample (numerical characteristic)
Variable
Element, feature, or factor that’s liable to vary or change
Convenience Sampling
A type of sampling that is NOT random and involves using a population that is READILY AVAILABLE
Quota Sampling
Using a sample that is selected to match the population with respect to some specific characteristic(s)
Cluster Sampling
Dividing the population into groups, then randomly selecting some of the groups
Systematic Sampling
Randomly selecting a starting point in a list of names and taking every nth piece of data from a listing of the population (a special subset of cluster samples)
Stratified Sampling
Dividing the population into groups, and then taking your sample from a proportionate number from each group
Observational Units
The person or thing to which the number or category is assigned
Datum
One piece of data
Variability
The phenomenon of a variable taking on different values or categories from observational unit to observational unit
Quantitative Variable
Measures a numerical characteristic such as height
Categorical Variable
Records a group designation such as gender
Binary Variables
Categorical variables with only two possible categories, for example, male and female
Bar Graph
Displays the distribution of a categorical variable
Distribution
The distribution of a variable refers to its pattern of variation. With a categorical variable, distribution means the variable’s possible categories and the proportion of responses in each.
Statistical Tendency
Refers to observational units in one group being more likely to a certain category or to have higher values than those in another group
Consistency
Refers to how variable, or how spread out, the values in a dataset are for a quantitative variable
Representative
If the sample is selected carefully, it is has similar characteristics of the population
Sample Size
The number of observational units studied in a sample
Sampling Bias
If a sampling procedure tends systematically to over-represent certain segments of the population and underrepresent others
Voluntary Response Variables
Refers to samples collected in such a way that members of the population decide for themselves whether or not to participate in the study
Nonresponse Variables
When people do not reply to the survey
Sampling Frame
The list used to decide the subjects
Explanatory Variable
The variable whose effect you want to study; X value, input
Response Variable
The variable that you suspect is affected by the other variable; Y value, Output
Observational Study
Establishes an association or relationship between the explanatory and response variables, but you cannot draw a cause-and-effect conclusion between the explanatory and response variables
Lurking Variables
Variables that are unaccounted for that could affect the response variable
Confounding Variable
Lurking variable whose effects on the response variable are indistinguishable from the effects of the explanatory variable
Ordinal Data
Numbers which are categorical
Categorical Graphs
Bar graph, segmented bar graph, and pie chart
Extrapolating
Making a guess based on data outside of your data set
Shape
Normal(bell), skewed left/right, variability (uniform)
Center
Mean(normal), median(skewed), mode
Spread
Range, IQR (75-25), standard deviation
Causal
Only 1 variable changes
Casual
Association
Prospective & Retrospective
2 types of observational studies
Prospective Study
Have an idea, but haven’t recorded it yet
Retrospective Study
Data has already been collected
Simple Random Sampling (SRS)
Give every member of the population the same chance of being selected for the sample. It must ensure that every possible sample has an equal chance of being the sample ultimately selected
Table of Random Digits
Table is constructed so that each position is equally likely to be occupies by any one of the digits, and so that the value in any one position has no impact on the value in any other position
Unbiased
If the values of the statistic from different random samples are centered at the actual parameter value
Sampling Variability
Refers to the fact that the values of sample statistics vary from sample to sample
Precision
Refers to how much the values vary from sample to sample.
(Precision is related to sample size: sample statistics from larger samples are more precise and closer together than those from smaller samples. Statistics from larger random samples, therefore, provide a more accurate estimate of the corresponding population parameter.)
Anecdotal Evidence
Refers to situations that come to mind easily and are of little value in scientific research.
Third Quartile
The median of the stat values that are to the right of the median in the ordered list
Quartiles
The quartiles divide the ordered data set into four groups having roughly the same number of values. Arrange data from smallest to largest to find it
First Quartile
The median of the stat values that are to the left of the median in the ordered list
Resistant
A statistical measure is a resistant if it isn’t sensitive to extreme values
Discrete Variable
A quantitative variable that takes a fixed set of possible values with gaps between them
Continuous Variable
A quantitative variable that can take any value in an interval on the number line