Exam 1 Flashcards
Data file
the format in which statistical format is organized, typically in spreadsheet form. Rows contain measurements for a particular subject, columns contain measurements for a particular characteristic
Simulation
use of a computer to mimic what would actually happen if you selected a sample and used statistics in real life. These are done when it is not practical to physically perform an experiment. Probability sampling is used in designing simulations
Response variable
variable we are interested in measuring
component
what you are simulating through use of a random device
trial
One repetition of a simulation/experiment
Steps for building simulations
- Identify component to be repeated/simulated
- Explain how you will model the component’s outcome
- State response variable clearly
- Explain how to combine the components into a trial to model the response variable
- Run several trials
- Collect and summarize the results of the trials
- State your conclusion
3 reason for studying stats
- being informed
- making good decisions
- evaluate decisions that affect you
Definition of statistics
The science of learning from data in the presence of variability. variability is everywhere
Statistical problem solving process
- formulate a statistical research question
- collect data
- analyze data
- interpret results
Main components of statistics
- design: plan on how to obtain data to answer the question
- description: summarize and analyze the data
- probability: determine how sample differs from population
- Inference: make decisions and predictions
Variable
any characteristic observed in a study
data
the values of a variable for one or more people or things
Observation
(subject) an individual piece of data
data set
the collection of all observations for a particular variable
Categorical variable
(qualitative) Non-numerical variable with different categories, can still be a number depending on what that number represents
Quantitative variable(and types)
a numerical variable
Types
1. Discrete: values form a set of separate numbers. Typically something we count
- continuous: values form a continuum of values, infinite number of possible values. Typically something we measure
Reasons for identifying different data types
- Choose appropriate graphical display
2. Choose correct statistical method for inferential procedures
W’a and H for data
How, What, Where, When, Why, Who
Frequency distribution
A listing of distinct categories and their frequencies
Relative frequency distribution
A listing of distinct values and their relative frequencies(proportions and percentages). Used to compare samples of unequal size
Joint event
Event with two or more characteristics
How to tell if there is an association or not?
Association: relative frequencies differ
No association: relative frequencies are similar
Dot plots
- easy to make
- useful for comparing 2 or more data sets
- display individual values of data set
- good for smaller data sets
- shows raw data
Stem plots
- not useful with large data sets
- Usually displays more info than histograms
- include raw data
- useful for comparing 2 or more data sets
- Have “stem”(can have more than one digit) and “leaf” can not have more than one digit
- arranged in ascending order
- must have a key
Histogtams
- analogous to bar charts
- horizontal axis has classes of quantitative data
- frequency, relative frequency or percent
- bars touch
- good for larger data sets
- good if you need more flexibility
Time plots
- show changes over time
- vertical axes show each observation
- horizontal axes show time when observation was measured
- trends can be seen by connecting points
what does “n” usually indicate?
sample size
Which measures of center are resistant to the outliers and which arent?
- Resistant: Median
* Not resistant: Mean
Which measures of center are useful with quantitative data and which are useful with qualitative/categorical data?
Mean and median can only be used with quantitative data. Mode can be used with both
What can you know about the distribution if the mean is greater than median? What about if the is less than the median?
Mean is greater: right skewed
Mean is less than: left skewed
Measures of variation(purpose and types)
Indicate amount of spread in a distribution
types
1. Range: if you dont know this youre screwed
2. standard deviation: accounts for all
observations, indicates how far on average observations lie from the mean, not resistant to outliers
3. Interquartile range(IQR): Quartiles of data, used with boxplotd
which types of graphical displays are for quantitative data?
- dot plots
- stem and leaf plots
- histograms
- time plots
Graphical displays for categorical data
- Frequency distribution
- Relative frequency distributions
- Pie charts: use relative frequencies, aka circle graph, difficult to construct by hand, best for data sets for few categories
- Bar charts: easiest way to graph, horizontal axis is distinct values of categorical data, vertical axis is frequencies or relative frequencies
- Pareto charts: bar graph with bars from tallest to shortest
Response variable
measured to make comparisons between groups
Explanatory variable
(predictor) explains the value of response values
Association
relationship between 2 variables