Exam 1 Flashcards
define population
everyone or everything that could be examined
example of population
an entire classroom of students
define sample
the individuals or things that are actually looked at
example of sample
a subset of students in the classroom that will be looked at
define experimental unit (EU)
one individual or thing you take a measurement on
example of an experimental unit
an individual in the classroom
define a variable
what is measure on each experimental unit
example of a variable
age
what is data value
one observation
example
a certain number of people at a certain age
what is data
all observations
example of data
list of ages of the entire group
define statistic
sample summary information (sample mean)
example of statistic
average age in sample size
define parameter
true population summary information (population mean)
example of parameter
average age of population
define descriptive statistics
information/ description about the subset of individuals examined
define inferential statistics
make inference to everyone using the description of the subset (taking descriptive statistics one step further)
define qualitative ordinal
a variable ordered with qualitative data; i.e. good, better, best
define quantitative discrete
a variable that uses whole numbers, i.e. # of people
define quantitative continuous
a number that is cut off to a certain number of decimal places, such as length, height, time, etc.
define qualitative nominal
a variable with a definitive name, or list; i.e. SSN or colors
what makes a ‘good’ sample
must be:
- random: selected with some element of chance
- represent entire population: everyone could have been sampled
- must not have bias: bias has a direction in which some individuals have not been included in sample
- independence: every EU is independent of other EUs
what is an experiment
a study in which a variable must be manipulated
what is a survey
a study in which data is simply collected from people
- Which of the following is not considered an aspect of a “good” sample?
a. Random
b. Represents entire population
c. Large sample
d. Independence
C. Large sample
- In an experiment to determine how the weight of a rat correlates to its likelihood of carrying a disease, what does the weight of an individual rat represent?
a. Variable
b. Experimental unit
c. Data value
d. Statistic
C. Data value
What is a judgment sample
a sample that can’t meet the four aspects of a ‘good’ sample
Which variable is a discrete quantitative variable?
A. The weight of all the students from University of Maryland
B. The number of siblings of the students in BIOM 301 course
C. The eyes’ color of students aged from 18-22 in Maryland
D. The phone number of the student’s parents
B. The number of siblings of the students in BIOM301 course
WHICH ONE IS NOT A CHARACTER OF A GOOD SAMPLE A. RANDOM B. INDEPENDENT C. SPECIFIC D. REPRESENTS ENTIRE POPULATION
C. specific
What are the two main areas of statistics?
descriptive statistics and inferential statistics
Does a big sample necessarily mean a good sample?
No.
is a small sample a bad sample?
no
1) Which of the following is NOT a qualitative summary graph?
a) Circle graph
b) Stem and Leaf plot
c) Bar graph
B. Stem and Leaf Plot
2) Which of the following is NOT one of the 4 measures of central tendency?
a) mean
b) mode
c) sample variance
d) midrange
C. Sample Variance
List the three indicative properties of a normal curve
always symmetrical, unimodal, and bell-shaped
What value is r when there is no linear relationship?
zero
what are 4 correlation terms?
linked, associated, connected, and tied to
Describe 3 reasons why r can equal zero.
r can equal zero when:
- there is no relationship
- y changes but x does not and vice versa
- when the relationship is not linear
T/F correlation analysis is a method of obtaining the equation that represents the relationship between two variables
False; regression
t/f the linear correlation coefficient is used to determine the equation that represents the relationship between two variables
false, direction and tightness
t/f a correlation coefficient of positive or negative 1 means that the two variables are perfectly correlated
true
t/f whenever the slope of the regression line is zero, the correlation coefficient will also be zero
true
t/f when r is positive b(1) will be negative
false, positive
t/f the slope of the regression line represents the amount of change expected to take place in y when x increases by 1 unit
true
t/f correlation coefficients range between 0 and -1
false, -1 and +1
t/f the value being predicted is called the input variable
false, output variable
t/f the line of best fit is used to predict the average value of y that can be expected to occur at a given value of x
true
define bivariate data
data containing 2 observations for 1 experimental unit
correlation coefficient
statistical variable, r, representing a relationship’s direction and tightness in respect to linear correlation data. Value range from -1 to +1
outlier
a datapoint that falls outside the range of bulk of the data set, can have a huge impact on statistical results
covariance
when variables vary together in some relationship. E.g. both X and Y variables values move from low to high.
• if X increases and Y decreases, this is also a pattern of co-variance
lurking variable
a third unmeasured variable that has a relationship to 2 variables and makes it appear that the measured variables are related to each other when actually they are related to the unmeasured variable.
regression
generates a relationship that explains how Y changes as a function of X
dependent or output variable
in terms of regression, the Y variable is the result of X (input variable)
independent or input variable
in terms of regression, the X variable that results in a certain outcome or Y variable
best fit line
a line in regression that minimizes the devation from data points to itself in the vertical direction
intercept b0
the statistic in the line of best fit that describes the intercept value when x = 0
slope b1
the statistic in the line of best fit that describes the direction of the relationship
R^2
a value of regression- how much variability in y varaible is explained by x variable. Value ranges from 0-1
prediction
a result of an observational or survey study analyzed using regression
causation
a result of a controlled or experimental study analyzed with regression
what graphs can be used to examine quantitative variables
box and whisker diagrams, stem and leaf diagrams and frequency histograms
what graphs can be used to examine qualitative variables
circle graphs and bar graphs
what must be present in a graph for it to be ‘good’
- a title
- labeled axes w/ units if available
- a key if available
is there a space in a bar graph of qualitative data
yes
do frequency histograms have spaces between the bars?
no
in grouped frequencies, what does n equal?
the sample size
what is used to measure central tendency
- mean
- median
- mode
- midrange
what is the mean
the average
what is the median
the middle number
what is the mode
the most frequent observation
what is the midrange
the halfway point through the data (max + min)/2
if a graph is symmetric and unimodal, are the mean, median, and midrange the same? the Mode?
yes, yes
if a graph is symmetric and bimodal, are the mean, median, and midrange the same? the Mode?
yes, no
what is the mean sensitive to?
outliers
what is sample variance
avg squared difference in data set, in sq. units
what is sample standard deviation
the square root of the sample variance
in a density curve, the area under the curve represents what
100% of the data provided
what is the rounding rule
when you calculate a statistic, take the answer 1 decimal place further than the original data
what is a good way to hide the impact of a few large or small numbers? how can this be corrected?
use the mean to skew the results. report the median
how can graphs be confusing
- not being drawn to scale
- using pictures or figures instead of bars
- using 3-d graphs
- misrepresentation
does correlation mean causation
NO NO NO
what variable can two seemingly correlated variables actually be correlated to?
a lurking variable instead of each other
what are we asking for when we use a scatter plot
-is there a pattern?
how can we interpret that pattern?
what two ways can we read scatter plots
correlation and regression
what can r tell us?
the direction and tightness of the relationship between x and y
in correlation, does flipping the axes influence r?
no
what can affect r?
outliers
what are some things to think about in regards to correlation
- flipping axes does not influence r
- changing one axis by a constant does not change r
- outliers can influence r
- lurking variables may be the cause of correlation
- be sure you have the full range
- do not draw conclusions outside of the range given
what is the goal of linear regression
to generate a relationship that explains how Y changes as a function of X
what does a best fit line do?
it minimizes deviations from data points to line in the VERTICAL DIRECTION
what is b0
the intercept
what is b1
the slope
what is R^2
how much variability in the Y variable is explained by x variable, ranges from 0-1. represents tightness, but does not explain direction
what explains direction in regression
the slope (b1)
what is an experiement
a process that gives 1 result
what is an outcome
all possible results
what is an event
1 outcome of interest
what is probability
the likelihood of an event
what are three ways to find probability
- theoretically
- empirically
- subjectively
what is the rule of large numbers
with repetition, empirical results will approach the expected theoretical probability
what 4 tools are given to think about probability
- tree diagrams (cant directly calc. prob.)
- venn diagrams (can)
- contingency tables (can)
- sample spaces (can)
survey or experiment
A researcher watches 100 people purchase soda at a vending machine and recorded whether they chose regular or diet soda.
survey
survey or experiment
Emergency room visitors complaining of stomach pain were randomly assigned to either a new drug treatment or a placebo.
experiment
survey or experiment
A researcher compares the medical records for 100 people that live near high-power electric lines to 100 people that don’t live near such lines. Survey or Experiment
survey
survey or experiment
. A researcher identified 20 students that got vigorous exercise at recess and then compared the grades of these students to a separate group of 20 who did not get vigorous exercise.
survey
what is one thing the frequency histograms show that relative frequency histograms do not?
sample size
The law of large numbers is used to calculate what?
emperical probability
Parameter of sample size
N
statistic of sample size
n
parameter of mean
mu symbol
statistic of mean
x (w/bar on top)
parameter of standard deviation
sigma
statistic of standard deviation
s
list the 4 aspects of a good sample
- random
- independent
- no bias
- covers entire population
T/F a normal curve is always unimodal
true
if P(A) = P(A ̅) then the P (A) = 0.5
true
t/f If two events are mutually exclusive, they are also independent
False
a scatter diagram is an appropriate display of bivariate data when both variables are quantitative
true
if the data points form a straight horizontal or vertical line, there is a strong correlation between the 2 variables.
False
What of the following would not be appropriate when considering 2 qualitative variables
2 histograms
2 bar graphs
2 circle graphs
2 histograms
T/F the value of the linear regression slope estimate will vary between -1 and +1
false
t/f the data is the list of observations recorded for each of the experimental units in your study
true