Definitions Flashcards

Question 1

Q

measurement

Answer

A

Assigning number or codes to aspects of objects or events according to rules. -
positioning observations along numerical continuum -
classifying observations into categories

Question 2

Q

Observation

Answer

A

Unit upon which measurement is made

Question 3

Q

Variable

Answer

A

measurable charactoeristic that varies among persons, places, or objects

Question 4

Q

Nominal measuremsents

Answer

A

Observation variable that have two or more categories, but there is no intrinsic ordering to the categories. Nonparametric.

Examples: sex, blood type

aka. Categorical variable, attribute variable, qualitative variables

Question 5

Q

Ordinal measurements

Answer

A

Observation variable that has categories that can be put into rank order. Differs from interval, b/c space b/w values is not equal. Non-parametric.

Examples:Stage of cancer on a point scale); economic status (low, med, high)

Question 6

Q

Quantitative measurements

Answer

A

Observation variables are along meaningful numeric scale.

Interval = is equal spacing scale, but not absolute zero. (i.e. Farenhight, celcius)
Ratio = is value has absolute zero and can be added. (i.e. age, body weight, kelvin)

aka, ratio/interval measurement, numeric variable, scale variable, continuous variable.

Question 7

Q

Surveys

Answer

A

Type of study used to quantify population characteristics. “sampling” rule of statistics b/c data for entire population is rarely available.

Question 8

Q

Simple Random Sample (SRS)

Answer

A

Randomly sample population to collect data so:

1) each population member has same probability of being selected in the sample
2) selection of any individ into the samples is not bias for selecting another individ.
aka. sampling independence

Question 9

Q

Cautions

Answer

A

samples that tend to over- or under-represent certain segment of pop that can bias survey results.

Question 10

Q

Undercoverage

Answer

A

Type of sample caution. Occurs when some groups in the source pop are left out or underrepresented. Will undermine achieving equal selection probabilities.

Question 11

Q

Volunteer Bias

Answer

A

Type of sample caution. Occurs b/c self-selected participants of a survey are atypical of pop. ex. web survey volunteers have a particular view point causing hem to participate

Question 12

Q

Nonresponse Bias

Answer

A

Type of sample caution. Large % not represented, Occurs when large % of individs refuse to participate in survey. nonrepsonders differ from responders, which skews survey.

Question 13

Q

Probability Sample

Answer

A

Each member of pop has known probability of being selected. Include SRS, stratified random samples, cluster samples, and multistage sampling

Question 14

Q

Stratified random sample

Answer

A

draws independent SRS from a homogeneous “groups” or “strata.” Ex. divide pop into age groups

Question 15

Q

Cluster samples

Answer

A

Randomly selects large units (clusters) consisting of smaller subunits. Ex. list of household addresses to study all individs in cluster.

Question 16

Q

Comparative study

Answer

A

Learn relationship b/w an exploratory variable and a response variable. Compare group expose vs. not expose to exploratory factor.

two types: Experimental and Non-Experimental (observationa)

Question 17

Q

Experimental studies

Answer

A

Investigator assigns exposure to one group and not the other

Question 18

Q

Nonexperimental Studues

Answer

A

investigator classifies groups as exposed or nonexposed w/o intervention aka. Observational studies

Question 19

Q

Exploratory Variable (IV)

Answer

A

Treatment or exposure that explains or predicts change in the response variable.

aka. (IV) Independent variable

Question 20

Q

Response Variable (DV)

Answer

A

Outcome or response being investigated.

aka. (DV)Dependent variable.

Question 21

Q

Lurking variables

Answer

A

Extraneous factors

Question 22

Q

Confounding Variables

Answer

A

Distortion in an association b/w exploratory variable and response variable by influence of extraneous factors.

Question 23

Q

Factors

Answer

A

Exploratory variables in experiments

Question 24

Q

Treatment

Answer

A

Specific set of factors applied to subject

Question 25

Q

Intersection

Answer

A

Factors in combination produce effects that could not be predicted by looking at the effect of the factors separately.

Question 26

Q

Trials

Answer

A

Experiments involving human subjects. Two types: Controlled and Randomized Controlled

Question 27

Q

Randomized control trial

Answer

A

Assigned treatment is based on chance. Helps sort out effect of treatment from those of lurking variables.

Question 28

Q

Equipoise

Answer

A

Balanced doubt about benefits and rick

Question 29

Q

Discrete variable

Answer

A

Finite number of values b/w any 2 points

Question 30

Q

Continuous variable

Answer

A

infinite number of values b/w 2 points

Question 31

Q

Shape (graph)

Answer

A

Configuration of data points as they appear on a graph. Described in terms of :

skewness: shape reflects mirror image
modality: number of peaks -
kurtosis: “peakedness” of distrubution

Question 32

Q

Location (graph)

Answer

A

Distribution summarized by its center (Central tendency)

Mean: center of distribution. “arithmetic avg.” is distrib. balancing point -
Median -
Mode

Question 33

Q

Depth of data Point

Answer

A

Corresponds to its rank from wither top or bottom of ordered list of values.

Question 34

Q

Spread (graph)

Answer

A

Refers to distribution/variability of data points.

Measures of Spread

Range
Quartiles
Stnd. Dev.
variance

Question 35

Q

Class intervals

Answer

A

Group data in intervals with equal or unequal spacing before tallying freq.

Endpoint Conversion: ensure observations falls within interval

- include left boundary and exclude the right
- include right boundary and exclude left

Question 36

Q

Relative Frequency

Answer

A

Proportion equation: freq. counts/ by total.

Expressed in %

Question 37

Q

Cumulative Frequency

Answer

A

Proportion that falls in or below a certain level.

Equation: add two consecutive Rel. Frequencies.

Expressed in %

Question 38

Q

Bar Chart

Answer

A

Display freq. with bars that correspond to height of freq.

Best for categorical variables

Question 39

Q

Histogram

Answer

A

Bar chart with line connecting freq. .

Best for Quantitative variables

Question 40

Q

Descriptive Statistics

Answer

A

Set of observations that describe the characteristics of a sample.

ex: Cetntral tendency (mean, median, mode), Variability (St. Dev. variance, range, quartiles)

Question 41

Q

Inferential Statistics

Answer

A

Set of statistical techniques that provide predictions about the population based on info in the pop sample.

Question 42

Q

Univariate Statistics

Answer

A

Involve one variable at a time (i.e. age, height, weight)

Question 43

Q

Bivariate statistics

Answer

A

Involve two variables of the sample examined simultaneously (pre/post test)

Question 44

Q

Multivariate Statistics

Answer

A

Involve 2 or more variables in the same analysis

Question 45

Q

Stemplot

Answer

A

graphical technique that organizes data in a histogram-like display

Question 46

Q

mean

Answer

A

Arithmetic average of data VALUES. Balancing point in a set. Highly susceptible to outliers and skew.

Formula:

sample: (Σ n)/n = (X bar); Population: (Σ N)/N = µ

Functions: 1) predict individ. value drawn at random from sample, 2) predict value drawn at random from pop

* Best to pair with Stn. Dev for symmetrical distributions

Question 47

Q

Median

Answer

A

Midpoint of a distribution in CASES. More ROBUST (resilient to outliers and skew.)

Formula: put in order, calculate (n+1)/2, count places to midpoint.

* Best to pair with IQR for asymmetrical distributions. always Q2, 50th percentile

Question 48

Q

Mode

Answer

A

Most frequently occurring value in data set.

Useful in only ;arge sets with repeating values.

Question 49

Q

Variability

Answer

A

Measure of spread. Fundamental interest of behavior scientists.

Question 50

Q

Range

Answer

A

Measure spread of distribution. simplest measure of variability. Max -Minimum distribution Limitations; known to be biased or or highly unstable; increases w/ sample size. *Should always be supplemented with another unit of measure.

Question 51

Q

Quartile

Answer

A

Intuitive way to describe variability by dividing data set into 4 segments: - Q0 (min) = 0% - Q1 (lower hinge) =25% - Q2 (median) = 50% - Q3 (upper hinge) = 75% - Q4 (Max) = 100% Find MEDIAN to identify quartiles

Question 52

Q

Hinges

Answer

A

Orded array of “folds” upon itself.

Question 53

Q

Interquartile Range

Answer

A

Summary spread of measure that captures middle 50% of data points in set.

5 poitn sumary (Q0 - Q4)

IQR = Q3-Q1 (when Q3 is median b/w Q2 and Q4; Q1 is MEDIAN b/w Q0 and Q2; Q3 is the overall median)

Not sensitive to extreme values.

Question 54

Q

Box-and-Whiskers plot

Answer

A

Displays five-point summaries and “potential outliers” in graphical form.

aka. box plot.
box: spans IQR

Question 55

Q

Fences

Answer

A

lower = Q1 - (1.5)IQR. Upper = Q3 + (1.5)IQR Values below fences are “lower outside values” Values above upper fence are “upper outside values” Smallest values inside lower fence is the :lower inside values” Largest value inside upper fence is “upper inside value”

Question 56

Q

Variance

Answer

A

Common measure of spread.

Population: σ^2 = SS/(N) Sample: S^2 = SS/(n-1)*

SS=Sum of Squared deviations

*substract 1 from n to force a larger variance and SD (makes it an unbiased estimate)

Question 57

Q

Variability

Answer

A

Always present average with variability as to not misrepresent data.
2 data sets can have the same average but differenct variability.

Question 58

Q

Standard Deviation

Answer

A

Common measure of spread Unbiased estimate of samples (good scientists are CONSERVATIVE!)

Formula: Square root of variance

Sensitive to outliers and skews
Useful for making comparisons
smaller the SD, the more HOMOGENIOUS the set

Question 59

Q

Chebychev’s Rule

Answer

A

For Data sets: At least 3/4s of the date points lie within two stn. devs. of the mean.

Question 60

Q

Normal Rule

Answer

A

For data sets: applies only to distributions with a particular NORMAL shape..

- 68.3% of points fall within mean + 1 stb. dev. -
95.4% of data points lie within mean + 2 stn. devs -
99.7% of data points lie within mean + 3 stn. devs.

aka. 68-95-99.7 rule

Properties of Noral Curve:

Asymmetrical
unimodal
bellshaped
mean, median and mode are equal

Question 61

Q

Symmetrical vs. Asymmetrical Distribution

Answer

A

Symmetrical: Mean = Median

Asymmetrical: Mean not = Median -

Positive Skew: Mean > Median -
Negative skew: Mean < Median

Question 62

Q

Sum of Squares

Answer

A

Each data points deviation from the data set mean, squared, then all sumed. aka. SS +E (X1 - Xbar)^2 Calculating formula: SS= Ex^2 -((EX)^2/ N). 1) Sum data points and square, then divie by n. 2) Square each data point and then sum, 3) value of 2-1. *mathematically the same as above, needed for SPSS.

Question 63

Q

Probability

Answer

A

proportion of times an event is expected to occur.

Between 0 (never) and 1 (always)

Founded on ralative frequencies.

Question 64

Q

Probability: random variable

Answer

A

Numerical quantity that takes on different values depending on chance

Answer 65

A

set of all possible outcomes for a random variable

Answer 66

A

An outcome or set of outcomes for a random variable

Answer 67

A

Countable set of possible outcomes. Fractional units not possible. ex. variable # of luekemia cares in the US in 1995, variable # of successes in n independent treatments,

Answer 68

A

outcome quantities with unbroken continuum of possible values. Ex. variable amount of time it takes to complete a task; average weight or height of a newborn.

Answer 69

A

1) Range of Prob. - individ. props are never less than 0 and never more than 1 . 01 2) Total Prob. - probs in the sample space must sum to 1. Pr(S) =1 3) Complements - prob of a complement is equal to 1 minus prob of event . Pr (_A_) = 1 - Pr(A) 4. Disjoint events - events are disjiont if they cannot exist concurrently. Pr(A or B) = Pr(A) + Pr(B)

Answer 70

A

States the number of std. devs by which the original score lies above or below the mean of a normal curve. Formula: z = (x^i - x_)/ s - z distribution aka. standard Normal curve. - Mean = 0; s= 1 - Method to interpret raw score; takes into account mean value and variability of set of raw scores.

Answer 71

A

Raw Score (x): individual observed scores on measured variables. - Deviation of score (s) - standard score (Z)

Answer 72

A

Bell shape, symmetrical, unimodal. - Same Mean, Median, and Mode - precise relationship b/w area under curve and Std. Dev.

Answer 73

A

Use statistical framework that allows researchers to determine how likely it is that the research findings based on sample data are VALID. Proportion of times an event is expected to occur in the population. Prob. ranges from 0 to 1

Answer 74

A

Act of using data in a sample to make generalizations about its population.

Goals:

hypothesis testing
estimate value of population parameters

Answer 75

A

entire collection of values that conclusions are drawl on.

Answer 76

A

Infinitely large population of potential values that could ensure following study.

Answer 77

A

Parameter: numerical characteristics of a statistical population (population level) Statistic: value calculated in a sample. (sample level) - use different symbols (i.e u, σ vs. X_, s for mean)

Statistic –> statistical inference –> Parameter –> Random selection –> Statistic

Answer 78

A

The hypothetical distribution of mean from all possible samples of size n taken from the same population.

Characteristics:

follows central limit theorem
unbiased estimator of population mean.
Samples means are less variable than individ. distribution. (square root law)

Answer 79

A

Sampling distribution of x̅ tends toward Normality even when the underlying population is not Normal

i.e. Distrubution gets narrower as sample size increases

Answer 80

A

Standard Deviation of x̅

Formula: SE_x= σ/ √(n)

Law of large numbers: As an SRS gets larger and larger, its sample mean x̅ gets closer and closer to the true value of pop. mean.

Answer 81

A

Statement of NO difference H^o: u = “some number”

Reject H_{0 =} True (Type I error, a)/ False (correct decision)

Fail to Reject H_o=True (correct decision)/ False (Type II error, ß)

Alpha:

Probabilty of Type I error
Chnce you are willing to take in mistakenly rejecting a true null hypothesis

Beta:

Probability of Type II error
Chnce you are wiling to take in mistakenly accepting a false null hypothesis

Answer 82

A

Statement that claims a difference from null hypothesis.

H_a: u <,>, –> one-sided z-test

H_a:µ not = –> two-sided z-test

Answer 83

A

Statistical distance of samples mean X_ from the hypothesized value of u this provides the weight of evidence for or against H_o. Zstat = (X_ - u_o)/ SE_X_

Answer 84

A

Provides a single estaimtate of the parameter
No info regarding probability of accuracy; best “guestimate”

Answer 85

A

If populiation is not Normal, the distribution of sample means approaches Normal distribution as the size of sample gets larger.

Answer 86

A

Define hypothesis: H_oand H_a.
Test Statistic: calculate SE and Z/Tstat
Determine P-value: Z/Tstat for CL
Decide Significance level: Compare Z/Tstat to P-value. Statistically signifigant or not?
State Conclusion

Answer 87

A

Provides a range of values (CI) that seekd to capture the parameter

Confidence Interval between two limit values.

Answer 88

A

Testing statistical hypothesis about µ when

1) σ is unknown
2) samples size is small (n > 30)

Answer 89

A

Value indicating the # of independent pices of info a sample can provide for purposes of statistical inference.

Answer 90

A

x̅ ± t_{¤ /2* SE}

Mean Difference shoudl fall between upper and lower bound,

Ex. 90% CI –> ¤ = .1 –> .1/2 = .05 –> (1-.05) =.095

Look up in t-stat table: df and P(.095)

Answer 91

A

Reflect experience of a single group. NO control group, but results are cmpared to norms or expected values

Answer 92

A

Uses Data from two samples in which each data point in the first samples is matched to a data point in the 2nd sample.

Ex. Pre- and Post-sample from same subject

Answer 93

A

Use when comparing two samples in order to draw inferences about groups differences in the population.

Two levels of a nominal level variable; dependent variable approximates interval-scale characteristics. I.e DV = #tv hrs; RV = males, females
assumption of equal variances .
St. Dev of such sampling distribution is standard error of the difference.

Answer 94

A

Usese two smapels from separate populations. Data points are unrelated.

Ex. Eperimental study with treatment and control

Answer 95

A

One-way analysis of variance

compares 3 or more groups defined by one factor.
variation is the response analyized to understand group differences; in place of independent t-Test.
H_o: µ₁= µ₂= … = µ_k

EX: patients assigned to three treatment groups and measured on stress score (DV) in reaction to treatment (IV)

Answer 96

A

Quantifies variance of group means around the grand mean.

MSB = SS_B/ df_B

SS_{B =}n(x - grand Xbar)^{2 +….}–> (group mean - grand mean)2 x group n +…

measures variability between the groups comparing to grand mean.

Answer 97

A

Quantifies variability of data points in a group around its mean.

MSW = SS_W/ df_W

SS_W = (x - Xbar)² +……. –> (individual point - group mean)² + ….. then sum all SS together

Measures variability within each data group.

Answer 98

A

Ratio of MSB and MSW.
Large F-stat suggests the observed mean differences are NOT merelry due to random noise.
F_stat = MSB/MSW
When converting f-stat to P-values: DF: numerator df_B/ denominator df_W

Answer 99

A

Tests for variances assumed equal. Use when comparing two or more groups (samples).

Ho: σ_1²=σ₂² = σ₃²

Accept null when p-value is greater than CI.

Answer 100

A

Strength of a linear relationship.

1- < r 0 < r <1

Stength

Close to 1: when all point fall on a line with an upward slope
Close to 0: lack of linear correlation

Direction:

Upward slope = postive number
Downward slope = negative number

3 r’s:

metric…

Answer 101

A

Statistic that quantifies the proportion of variance in Y explained by X.

Expressed by coverting r² to % - x% of varience of Y is explained by X

Answer 102

A

Expresses functional relationship b/w X and Y by fitted a line to observed data.

Observed y = predicted y + residual
Residual = observed y - predicted y

Least Squares regression Line: drawn to minimize sum of squares

Formula: ŷ = a +bx; ŷ = predicted y, a = interception of regression at Y axis , b = slope.coefficient

b = r (s_y/s_x)

a = Ybar = b(Xbar)

Notes:

Not rebust
b show relationship b/w X and Y in same units as measure. r is unit-free measeure of strength
X must be IV; Y must be DV

Answer 103

A

Hypothesis:

H_o:B = 0
H_a:B not = 0

t-stat = b/ SE_b

CI formula: b +/- t_{n-2, 1 - (¤/2)}* SE_b

If “0” is captured in the CI for population slope, data is NOT sig.

Answer 104

A

Address multiple exploratory variable (IVs) in relation for response variable (DV).

IMPROVES prediction by using two or more variables to predict a dependent variable.

Formula: Y’ = a + b₁X₁+ b₂X₂ ….

Answer 105

A

Refers to the “peakedness” of a distribution.

Leptokurtic: narrow peak
PLatykurtic: flat peak (plataeu)

Answer 106

A

Measure os association b/w 2 nominal variables
magnitude of Pearson Chi-Square reflects the amount of discrepancy between observed frequencies and expected frequencies.
does not make any assumptions about the shape of the distribution nor about the homogeneity of variances.

Formula = Observed - Expected/ Expected

Answer 107

A

Use nonparametric stats when:
- the parametric assumptions cannot be justified: normal distribution, equal variances, etc.
- data as gathered are measured on nominal or ordinal data

Answer 108

A

mean of a sampling distribution of means will be the same as the mean of scores in the population (µ).
Central Limit Theorem
Allows us to determine the probability that the particular sample obtained will be unrepresentative.
*

Answer 109

A

Used to compare a sample mean to a (hypothesized) population mean and determine how likely (chance) it is that the sample came from that population.
Compare the probability associated with statistical results (i.e. probability of chance) with a predetermined alpha level.