Exam #1 - Chapters 1 - 4 and Chapter 6 Flashcards
What is a “Qualitative Variable”?
categories & groups
a. gender, nationality
What is statistics?
Study of how to collect, organize, analyze, and interpret numerical information from data
What is an “Individual”?
Group of objects being studied
What is a “Variable”?
features/characteristics of the individual being studied
What is “Population Data”?
data from entire group of sample
What is “Sample Data”?
generates sample statistic
What is a “Population Parameter”?
average of results of entire population
What is a “Sample Statistic”?
average of results from sample
What is a “Quantitative Variable”?
values & numerical measurements
a. example: height, age, weight
What is a “Qualitative Variable”?
categories & groups
a. gender, nationality
What is a “Nominal Variable”?
names, labels, and categories that CANNOT be ordered
a. gender, nationality, political party, student names
What is an “Ordinal Variable”?
data that can be ordered but the difference between data values CANNOT be defined
a. letter grades (A, B, C, D, etc.) and opinion categories (agree, neutral, disagree)
What is an “Interval Variable”?
data has meaningful differences that can be ordered, but there is NO true zero
a. specific time, temperature, dates
What is a “Ratio Variable”?
data has meaningful differences that can be ordered, and HAS a true zero
a. amount of time, weight, age, height
What is “Simple Random Sampling”?
- Definition: randomly selecting participants from a specific population
- Steps:
a. number sample population sequentially
b. generate random numbers
c. pull those numbered participants to be apart of the sample
What is “Stratified Sampling”?
- Definition: separating into strata, or groups, and choosing participants in each strata
- Steps:
a. divide into strata
b. conduct simple random sampling with each strata
What is “Cluster Sampling”?
- Definition: population is divided by some demographic
- Steps:
a. divide the demographic areas into individual sections
b. randomly select section/cluster
What is “Systematic Sampling”?
- Definition: samples arranged in a natural order
- Steps:
a. arrange in some natural order
b. pick a random place within the order to start
c. select every n^th element
What is a “Nonsampling Error”?
A MISTAKE!!
- error in results
a. possible causes: poor sample, sloppy data collection, bias, etc.
What is a “Sampling Error”?
NOT A MISTAKE!!
- difference between measurements from an experiment and the actual respective measurements
a. basically percent error, without the percent
b. caused because each sample does not perfectly represent the entire population
What is a “Sample”?
measurements of observations from part of the population
What is a “Simulation”?
an exact data of the real-world
What is an “Observational Study”?
individuals are observed for specific outcomes
What is an “Experimental Study”?
- Definition: intervention with responses and studying the effects
- Two Types of Experimental Studies
a. completely randomized experiment: looking for different responses to different treatments
b. randomized block experiment: separated by a factor that may effect results (age, gender, etc.)
What are the steps to make a Frequency Table?
- Choose the desired number of classes
- Calculate the class width
- Determine the data range for each class
- Find the frequency for each class
- Find the class mark of each class
- Determine the class boundaries
- Calculated the final relative frequency of each class
For a Frequency Table, how do you determine how many classes you should have?
Less than 5 classes is too little & more than 15 is too many
- somewhere between 6-14 classes
For a Frequency Table, how do you determine the class width?
[(Largest Value) - (Smallest Value)] / # of classes
- make sure to increase the calculated class width to the nearest whole number
For a Frequency Table, how do you determine each classes range?
- Take the smallest given value (lowest limit) and add the class width to it
- this gives the lower limit of the next class - The number data of each class will contain the same number of values as the class width
make sure the first class starts at the lowest given value
For a Frequency Table, how do you determine the frequency for each class?
count the number of data values that fall within each class
For a Frequency Table, how do you determine the class mark for each class?
[(lower class limit) + (upper class limit)] / 2
For a Frequency Table, how do you determine the class boundaries for each class?
Take the average of the upper limit of class #1 and the lower limit of class #2
- for the next class you would find the average of the upper limit of class #2 and the lower limit of class #3
make sure to determine both the upper and the lower boundaries
boundaries will be lower than the smallest given value, but will not exceed the largest class limit value
For a Frequency Table, how do you determine the final relative frequency?
(class frequency) / (total value of frequencies)
How to you graph overall Relative Frequency?
- Class boundaries go on the x-axis
- Frequency goes on the y-axis
- Graph like a bar chart, but with the bars touching each other
What are the 5 types of Histograms?
- Mound-Shaped
a. symmetrical, highest frequencies in the middle - Uniform
a. equal frequencies - Skewed
a. right: higher frequency on the right, tail to the left
b. left: higher frequency on the left, tail to the right - Bimodal
a. two or more populations separated by at least one bar
b. has two or more higher bars with the rest being shorter - Outlier
a. multiple empty spaces between a majority of the data and the outlier
What are the axis’ labels for a Bar Graph?
X - Axis: categories
Y - Axis: frequency
What are the characteristics of a Pie/Circle Chart?
- Only use when less than 10 classes are present
- Must label each piece with the proper class
- Relative frequency must be present
What are the axis’ labels for a Time-Series Graph?
X - Axis: time
Y - Axis: variable of interest
What are the characteristics of a Stem-and-Leaf Graph?
- Always have a key to indicate what the graph is showing
- “Stem” is the tens/hundreds values
- “Leaves” are the ones values
- “Stem” and “Leaf” portions are always in numerical order and contain zeros *
What is “Range”?
- Difference between the largest and smallest values
What is “Sample Variance” and how do you calculate it?
- Definition: measure of the spread of data around a particular value, usually the mean
- Calculation Steps:
a. calculate the mean of the sample
b. find how far each data value is from the mean
c. square the values from step b
d. (addition of squared values from step b) / (“number of data values” - 1)
What is “Sample Standard Deviation” and how do you calculate it?
- Definition: measure of the spread of data around a particular value, usually the mean
- Calculation Steps:
a. calculate the mean of the sample
b. find how far each data value is from the mean
c. square the values from step b
d. (addition of squared values from step b) / (“number of data values” - 1)
e. square root the value found in step d
What does Chebyshev’s Theorem state?
“For any set of data and for any constant, k, that is greater than 1, the proportion of the data must lie within k standard deviation on either side of the mean.”
- All data lies within μ - kσ and μ + kσ
a. example: when k = 2, 75% of the data values are within μ - 2σ and μ + 2σ - Calculation steps
a. 1 - ( 1 / k^2 )
How do you determine percentile?
- Using the median
a. 50% of the values will be greater or equal to the median
b. 50% of the values will be less than the median - pth percentile
a. p% of the values fall at or below it
What is the included in the five number summary?
- Lowest value from the data set
- Q1: average of the lowest data value and the median
- Q2: using the median, but can be the measure of center
- Q3: average of the highest data value and the median
- Highest value from the data set
Q = quartile
How do you calculate the Interquartile Range?
Definition: measure the spread of the middle half
IQR = Q3 - Q1
What is “Probability”?
Definition: measure between 0 and 1 that describes the likelihood an event will occur
a. example: weather forecast
How do you show the probability of an event?
A: event of interest
P(A): probability of A occurring
P(A) = 1: event A is certain to occur
P(A) = 0: event A will never occur
When determining probability, what is the “Sample Space”?
Definition: set of all possible outcomes resulting from an experiment
Example: Flip a coin
Ω = {heads or tails}
a. flip a coin 2 times
b. sample space: Ω = {HH, HT, TH, TT}
How do you determine frequency of occurrence in a sample?
P(A) = ƒ/n
a. can be using to assign probabilities
What is the “Law of Large Numbers”?
In the long run, the more we do the actual experiment, the more the experimental values will get closer and closer to the theoretical value.
What are “Independent Events”?
Two events whose occurrence or nonoccurence of one event does not influence the occurrence or nonoccurence of the other event
How do you find the probability of two or more events if they are all independent of each other?
Multiply the probabilities of each event occurring
What is the total area under a normal distribution curve?
Area = 1
How do you determine the sample variance when looking at a normal distribution curve?
When the horizontal component is longer (has a greater range in values), the standard deviation and variance is larger.
What is the difference between a positive and negative Z-Score?
Definition: number of standard deviations between the raw score and the mean
a. positive: values to the right of the mean
b. negative: values to the left of the mean
What is the calculation for a Z-Score and a Raw Score?
Z-Score:
z = (𝒳 - μ) / σ
Raw Score:
𝒳 = (z ⋅ σ) + μ
As Z values increase, what happens to the area to the left of Z?
Increases
If the Z value is negative, what is the area to the left of Z?
Less than 0.5
If the Z value is positive, what is the area to the right of Z?
Less than 0.5