Test 1 Flashcards
Two types of observations are always discrete: __ variables and __ variables.
nominal, ordinal
Nominal variables are used for observations that have __ or __ as their values.
You can’t __.
ex: stats data
u__ categories; group m__
(ex: __ of cookie
categories, names
You can’t order
For example, when entering data into a statistical computer program, a researcher might code male participants with the number 1 and female participants with the number 2. In this case, the numbers only identify the gender category for each participant. They do not imply any other meaning
unordered, membership, type
Ordinal variables are used for observations that have __ (i.e., __,__,__) as their values.
You can __
examples:
(ex: __ of favorite cookie)
rankings
1st, 2nd, 3rd
order
top 3 favorite cookies, class rank, 3rd place in a race, hair length (short medium long)
ranking
Two types of observations can be continuous: __ variables and __ variables.
Continuous means it can have __ __.
interval, ratio
decimal points
Interval variables are used for observations that have __ as their values; the distance (or interval) between pairs of consecutive numbers is assumed to be __.
Interval variable have no __ __.
Ex:
numbers, equal
meaningful zero
Temperature in celsius, IQ score, RATING of cookie taste
A ratio variable is a variable that meets the criteria for an interval variable but also has a __ __point.
ex:
have a “true z__“-the a__ of what is being m__.
interval, meaningful zero
mass of cookie, minutes to finish homework
zero, absence, measured
A scale variable is a variable that meets the criteria for an __ variable or a __ variable.
interval, ratio
An independent variable has at least __ levels that we either m___ or o___ to determine its effects on the ___ variable.
two
manipulate, observe
dependent
The dependent variable is the o___ variable that we hypothesize to be related to or caused by changes in the __ variable.
outcome
independent
A confounding variable is any variable that __ varies with the __ variable so that we cannot logically determine which variable is at work; also called a confound.
systematically
independent
A reliable measure is __.
ex: bathroom scale
consistent
If you were to weigh yourself on your bathroom scale now, and then again in an hour, you would expect your weight to be almost exactly the same. If your weight, as shown on the scale, remains the same when you haven’t done anything to change it, then your bathroom scale is reliable
Descriptive statistics organize, summarize, and communicate a group of numerical observations. Descriptive statistics describe large amounts of data in a s__ n__ or in just a f__ n__.
ex: shark
single number, few numbers
National Geographic reported on their Web site that the average adult great white shark is 15 feet long, or 4.6 meters (2015). And that’s just the average. Some great white sharks grow to 20 feet, or 6 meters long! Most dorm rooms aren’t even that big. The average length of great white sharks is a descriptive statistic because it describes the lengths of many sharks in just one number
Inferential statistics use __ data to make estimates about the larger __.
use to generalize beyond our s__ to the p__; infer c__.
sample
population
sample, population, characteristics
Frequency tables are organization of number of times a __ has occurred.
List steps: 4 total
value
1) Find highest and lowest score
2) create 2 columns: variable and frequency
3) list full range of values (include zero frequency values)
4) count and list number of scores at each value
A __ __ describes the pattern of a set of numbers by displaying a count or proportion for each possible value of a variable.
frequency distribution
a __ ___ __ allows researchers to depict data visually by reporting the frequencies within a given interval rather than the frequencies for a specific value.
Use this if there are many __ __ or a __ range of data
grouped frequency table
decimal places
large
Steps to make a grouped frequency table:
1) find __ and __ scores.
2) get __ range of data.
- round numbers __, s__ to get the difference, and add _.
3) Take the given interval and __ the above number by the interval to get how many intervals to use (round __)
4) Make an i__ and f__ column
highest, lowest full down, subtract, 1 divide, up interval, frequency
A histogram is a graph that looks like a bar graph but depicts just __ variable, usually based on __ data, with the values of the variable on the _-__ (horizontal) and the frequencies on the _-__. (vertical)
(x is along the __, y is along the __)
one
scale
x-axis, y-axis
bottom, side
In a frequency polygon, a dot is placed __ each score or interval to indicate the frequency, and the dots are __.
above
connected
Stem and leaf plots allow for a __ and __ display of all data.
They allow you to regenerate __ data by looking.
You can see the __ of the data.
You can easily __ groups.
numerical, visual
raw
shape
compare
A normal distribution is a specific frequency distribution that is a __-shaped, s__, __ curve
bell-shaped, symmetric, unimodal
Skewed distributions are distributions in which one of the __ of the distribution is pulled away from the center.
tails
When the tail goes to the right, there is a __ skew.
May represent __ effects.
positive
floor
When the tail goes to the left, there is a __ skew.
May represent __ effects.
negative
ceiling
__ __, a situation in which a constraint prevents a variable from taking values below a certain point.
floor effect
___ __, a situation in which a constraint prevents a variable from taking on values above a given number.
ceiling effect
Graphs have positive and negative uses:
- they can a__ and s__ present information
- they can r__ and c__ complicated data
accurately, succinctly
reveal, conceal
the __ __ lie
- scaling to skew results
ex: word use
biased scale
using 3 positive words out of 5 options
the __ __ lie
-participants are pre-selected or self-selected
sneaky sample
the ___ lie
-assumes a value between 2 data points follows the same pattern.
interpolation
the ___ lie
-assumes knowledge outside of study
extrapolation
the __ __ lie
-uses scaling to distort portions of data
inaccurate values
A scatterplot is a graph that depicts the relation between two __ variables.
Label the horizontal x-axis with the name of the ___ variable and its possible values, starting with 0 if practical.
Label the vertical y-axis with the name of the __ variable and its possible values, starting with 0 if practical.
scale
independent
dependent
a linear relation between variables means that the relation between variables is best described by a __ line.
When the linear relation is __, the pattern of data points flows upward and to the right. When the linear relation is __, the pattern of data points flows downward and to the right
straight
positive
negative
A nonlinear relation between variables means that the relation between variables is best described by a line that __ or __ in some way.
breaks, curves
A line graph is used to illustrate the relation between two __ variables.
The first type of line graph, based on a scatterplot, is especially useful because the line of best fit minimizes the distances between all the data points from that line. That allows us to use the x value to predict the y value and make predictions based on only one piece of information.
A time plot, or time series plot, is a graph that plots a scale variable on the y -axis as it changes over an increment of time (for example, hour, day, century) labeled on the x -axis
scale
we commit a Type _ error when we reject the null hypothesis but the null hypothesis is correct.
A Type _ error is like a false positive in a medical test.
1
1
we commit a Type _ error when we fail to reject the null hypothesis but the null hypothesis is false. A Type _ error is like a false negative in medical testing.
2
2
If you know that the probability a student is wearing another sport’s shirt is 0.05, how many students does that represent?
0.05 * 159 = 7.95 -> 8 students
If there are 40,000 people who attend your school, estimate the total number of people who would be wearing a college football shirt.
40000 x 0.62 = 24800
Bar graph - when to use?
1 nominal and 1 scale variable
A bar graph is a visual depiction of data in which the __ variable is nominal or ordinal and the __ variable is scale.
The height of each bar typically represents the average value of the dependent variable for each category. The independent variable on the _-__ could be either __ (such as gender) or __ (such as Olympic medal winners who won gold, silver, or bronze medals).
independent
dependent
x-axis
nominal, ordinal
__ chart, a type of bar graph in which the categories along the x-axis are ordered from highest bar on the __ to lowest bar on the __.
This ordering allows easier comparisons and easier identification of the most common and least common categories.
pareto
left, right
A pictorial graph is a visual depiction of data typically used for an independent variable with very __ levels (categories) and a __ dependent variable. Each level uses a picture or symbol to represent its value on the scale dependent variable
few
scale
A pie chart is a graph in the shape of a circle, with a slice for every __ (category) of the independent variable. The size of each slice represents the __ (or percentage) of each category.
level
proportion
Put the __ __ on the x-axis (p__)
Put the __ __ on the y-axis (p__)
independent variable (predictor) dependent variable (predicted)
scatterplots are good for observing __ data point
line graphs show __
both are for __ __ variables
every
trends
two scale.
Creating a perfect graph general guidelines:
- use same __ as in body of text
- tell entire story without forcing reader to back to __
- use clear __ and label __
- include _ or d__ s__ marks
terms
text
title, axes
0, double slash
When is the mode useful? -if one score \_\_ if 2 are adjacent-a\_\_ if 2 are non-adjacent \_\_-\_\_\_ n\_\_ data can also be useful because it's an actually \_\_ score
dominated average bi-modal nominal occurring
Why use the median?
- it’s unaffected by __ scores
- can use for __, __, or __ data
downsides:
- can’t use easily in __
- not very __ from sample to sample.
extreme
ordinal, interval, ratio
equations
stable
Measures of central tendency tell us about the __ of __
center of distribution
What about the mean?
-most c__ used
disadvantages:
- influenced by __ scores
- value not an actually __ score
- interpretation requires __ or __ level data
advantages:
- can be manipulated __
- more __ across samples, better estimate of __ mean
commonly
extreme
occurring
interval, ratio
algebraically
stable, population
Where the mode median and mean are located relative to each other depends on shape of the frequency distribution.
bimodal: describe
negatively skewed describe:
positively skewed:
unimodal:
bimodal: mode in center of each bell shape and median and mean directly in middle.
negatively skewed: tail to the left and from left to right mean, median, and mode
positively skewed: tail to the right and from left to right mode, median, mean
unimodal: mean=median=mode
numbers that represent population= __
-population mean=__ this is g__
numbers that represent sample=__
-sample mean= __ or __ this is l__
parameters
u shaped looking greek letter
statistics
M or x with line over top. latin
if .212 of mnm’s were blue. how many blue were there?
x/2561=.212
.212(2561)=542
___=proportion expected over the long run
-decimal value
probability
The _____ in probability and statistics, states that as a sample size grows, its mean gets closer to the average of the whole population.
law of large numbers
___=number of successes divided by the number of trial
-decimal value
proportion
__=probability of proportion multiplied by 100
percentage
Discrete Variables:
variables that can only take on s__ values.
(ex: w__ numbers)
no other values can exist b__ these numbers.
specific, whole
between
Continuous Variables:
variables that can take on a full r__ of v__.
e.g., numbers out to several d__ places); an i__ number of potential values exists
range, values
decimal, infinite
Experiments: studies in which participants are r__ assigned to a c__ or l__ of one or more i__ variables. The d__ variable is measured.
Experiments are the gold standard of h__ testing because they are the best way to control c__ variables. Controlling confounding variables allows researchers to infer a c__–e__ relation between variables, rather than merely a systematic a__ between variables
randomly, condition, level, independent
dependent
hypothesis, confounding, cause-effect, association
Correlational Studies: variables are simply m__; usually still have c__ variables that make it inappropriate to make a c__ statement.
measured, confounding, causal
HOW TO INTERPRET A CORRELATION COEFFICIENT R:
Exactly –_. A perfect downhill (negative) linear relationship
–_. A strong downhill (negative) linear relationship
–_. A moderate downhill (negative) relationship
–_. A weak downhill (negative) linear relations
_. No linear relationship
+_. A weak uphill (positive) linear relationship
+_. A moderate uphill (positive) relationship
+_. A strong uphill (positive) linear relationship
Exactly +_. A perfect uphill (positive) linear relationship
- 1
- 0.70
- 0.50
- 0.30
0
+0.30
+0.50
+0.70
+1