vocab definitions Flashcards

1
Q

sample of convenience

A

a collection of individuals that happen to be available at the time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

variable

A

a measured characteristic on individuals from a population under study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

data

A

measurements of one or more variables made on a collection of individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

explanatory variable

A

a variable we use to predict or explain a response vairable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

response variable

A

a variable that is predicted or explained from a explanatory variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

populations

A

a group of all individuals or groups that you want to study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

sample

A

a subset ideally randomly chosen from a population you wish to study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

parameters

A

things we want to know about the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

estimates

A

are calculated from a sample to help understand perameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

bias

A

a systematic discrepancy between estimates and the true population characteristic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

volunteer bias

A

volunteers for a study are likely to be different on average from the poulation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

sampling error

A

chance difference from the truth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

precision

A

the spread of estimates resulting from sampling error
-gives a similar answer repeatedly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

accurate or unbiased

A

the average of estimates that are obtained is on the true population value
-accuracy (on average gets the correct answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

random sample

A

in a random sample each member of a population has equal and independent chance of being selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

categorial variables (attribute or qualitative variables)

A

describe membership in a category or group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

numerical variable

A

when measurements of individuals are quantitative and have magnitude. numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

continuous

A

numerical data that can take on any real-number value within some range. Between any two values of a continuous variable, an infinite number of other values are possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

discrete

A

numerical data that come in indivisible units. Example: number of amino acids in a protein and numerical rating of a statistics professor in a student evaluation are discrete numerical measurements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

frequency

A

the number of observations having a particular value of the measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

frequency distribution

A

shows how often each value of the variable occurs in the sample.
The frequency distribution describes the number of times each value of a variable occurs in a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

independence

A

two events are independeent if the occurance of on egives no info about whether the second will occrur

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

multiplication principle

A

if two evens A and B are independent, then Pr[A and B] = Pr[A] xPr[B]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

The addition principle

A

If two events A and B are mutually exclusive, then Pr[A or B]= Pr[A] + Pr[B]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Probibility distribution
A prob distribution describes the true relative frequency of all possible values of a random vairable
26
Mutually exclusive
if two events are mutually exclusive they cannot both be true Pr(A and B)= 0
27
probability
The prob of an event is its true relative frequency, the proportion of times the event would occur if we repeat the same process over and over
28
pseudoreplication
the error that occurs when samples are not indepenent, but they are treated as though they are
29
standard error
estimate is the standard deviation of its sampling distribution. predicts the sampling error of the estimate
30
standard error of an estimate
the standard deviation of its sampling distribution It predicts the sampling error of estimate
31
conditional probability
the conditional probability of an event is the probability of that event occurring given that a condition is met. Pr[X|Y]
32
confidence interval
the 95% confidence provides a plausible range for a parameter. All values for the parameter lying within the interval are plausible, given the data, whereas those outside are unlikely
33
The 2SE rule-of thumb
the interval from Y-2SEy to Y+2SEy provides a rough estimate of the 95% confidence interval for the mean
34
what does a x^2 goodness of Fit test do?
compares count data to a model of the expected frequencies of a set of categories -it is an approximation (don't use when there's little amount of data) H0: the data come from a specified probability distribution x^2= sum of all classes (observed-expected)^2/ expected
35
Degrees of freedom
the number of degrees of freedom of a test specifies which of a family of distributions to use for x^2 df= number of categories-number of parameters estimated from the data-1
36
Critical value
the value of the test statistic where P= alpha
37
What are test statistics
A test statistic is a number calculated from the data and the null hypothesis that can be compared to a standard distribution to find the P-value of the test
38
What are assumptions of x^2 test
=that its a random sample - No more than 20% of categories have expected <5 - no category with expected = 1 when both these conditions are not met the approximations to make the x^2 test do not work
39
what is a discrete distribution?
a porbobility disribution descibing a discrete numerical random variable example: number of heads from 10 flips of a coin number of flowers in a square meter number of disease outbreaks in a year
40
Poisson distribution
a mathematical probability distribution. - describes the probability that a certain number of events occur in a block of time or space, when those events happen independently of each other and occur with equal probability at every point in time or space
41
x^2 contingency analysis
tests the independence of two of more categorical variables
42
Fishers exact test
for 2x2 contingency analysis does not make assumptions about the size of expectations use when you cant do x^2 contingency analysis *don't need to do by hand
43
odds
the probability of success divided by the probability of failure
44
Odds ratio
the odds of success in one group divided by the odds of success in another group OR< 1 means odds of bad thing happens lower OR>1 means odds of bad thing is higher
45
properties of a good sample
-independent selection of individuals -random selection of individuals -sufficiently large
46
sampling error
The difference between the estimate and average value of the estimate
47
larger samples on average will have ____ sampling error
smaller
48
best way to graph a numerical variable frequency
histogram
49
cumulative frequency distribution
The cumulative frequency of a value is the proportion of individuals equal to or less than the value graphed this goes from 0-1 on y axis, never decreasing
50
best way to show association between two categorical variables
contingency table, grouped bar graph, mosaic plot,
51
best way to show association between categorical and numerical variable
multiple histograms
52
best way to show association between two numerical variables
scatter plot
53
how to calculate mean
Ybar= sum of Yi/n n=sample size
54
median
The median is the middle measurement in a set of ordered data
55
Mode
the mode is the most frequent measurment
56
Range
the maximum minus the minimum
57
Small samples tend to give ___ estimates of the range than small samples. So sample range is a _______ of the true range of the population
Small samples tend to give _lower__ estimates of the range than small samples. So sample range is a _biased estimator_ of the true range of the population
58
Variance in a population
sigma^2= sum of (Yi- u)^2/N N is the number of individuals in population u= true mean of the population
59
Sample variance
s^2= sum of(Yi-Ybar)^2/n-1 n=sample size Ybar= sample mean
60
Standard deviation (SD)
positive square root of the variance sigma is the true standard deviation s is the sample stand deciation s= sqare root of s^2= sqrt(sum(Y-Ybar)^2/n-1)
61
coefficient of variation (CV)
CV= 100% S/Ybar
62
skew
a measurement of asymmetry refers to the pointy tail of a distribution
63
Standard error of the mean
standard error of he mean: sigma ybar= sigma/ srt(n)
64
Estimate of the standard error of the mean
SEYbar= S/ srt(n) gives us some knowledge of the likely difference b/w our sample mean and the true population mean
65
law of total probability
Pr[x]= sum of all values of Y Pr[X|Y] Pr[Y]
66
probability of a positive result using the law of total probability (example if the events are not independent)
P[positive result]= Pr(positive result| X)Pr(x) +Pr(positive result| Y) Pr(Y)
67
Bayes theorem
Pr[A|B]= Pr[B|A]Pr[A]/ Pr[B]
68
what does hypothesis testing do
hypothesis testing asks how unusual it is to get data that differ from the null hypothesis If the data would be quite unlikely under H0 we reject H0
69
hypothesis are about populations but are tested from? with the assumptin?
sapmples with assumption it is random
70
Null hypothesis
a specific statement about a population parameter made for the purposes of argument. usually the simplest statement
71
Alternative hypothesis
represent all other possible parameter values except that stated in the null hypothesis usually the statement of greatest interes
72
A good null hypothesis
would be interesting if proven wrong
73
What is P-vale
the probability of getting the data or something as or more unusual, if the hypothesis where true
74
How do you find the P-value?
Simulation Parametric tests Permutation
75
Statistical significance
The significance level, alpha, is a probability used as a criterion for rejecting the null hypothesis If the P-value for a test is less than or equal to alpha then the null hypothesis is rejected often 0.05
76
A large sample will tend to give and estimate with a ____ confidence interval A larger sample will give ____ a false null hypothesis
A large sample will tend to give and estimate with a _smaller_ confidence interval A larger sample will give _more power to reject___ a false null hypothesis
77
Type I error
Rejecting a true null hypothesis Probability of Type I error is alpha (the significance level)
78
Type II error
Not rejecting a false null hypothesis The probability of a Type II error is beta The smaller beta the more power a test has
79
Power
The ability of a test to reject a false null hypothesis Power = 1- beta
80
Most tests are ___ tailed tests which means...
most tests are two-tailed tests and this means that a deviation in either direction would reject the null hypothesis normally alpha is divided into alpha/2 on one side and alpha/2 on the other
81
One-tailed test are
are only used when the other tail is nonsensical example: comparing grades on a multiple choice test to that expected by random guessing
82
Critical value
The value of a test statistic beyond which the null hypothesis can be rejected
83
If a hypothesis test rejects a null hypothesis thest (P<0.05) the value proposed by the null hypothesis is
outside the 95% confidence interval
84
confounding variable
an unmeasured variable that may be the cause of both X and Y
85
a proportion
a fraction of individuals having a particular attribute
86
binomial distribution
describes the probability of a given number of successes from a fixed number of independent traits Pr[X]=(n given X)p^X(1-p)^n-X n trails; p probability of success
87
properties of the binomial distribution
mean and variance of number of succusses u=np sigma^2= np(1-p)
88
proportion of successes in a sample
phat= X/n the hat (^) shows that this is an estimate of p
89
properties of sample
mean: p variance: p(1-p)/n
90
Agresti-Coull confidence interval
(p'-1.96 sqrt(p'(1-p')/n+4) <= p<= (p'+1.96 sqrt(p'(1-p')/n+4) p'= X+2/n+4
91
The binomial test
uses data to test whether a population proportion p matches a null expectation for the proportion example: H0: dog good is chosen at best 20% of time Ho=0.2 N=18, p0= 0.2, X=2 P-value= 2(Pr[2]+Pr[1]+Pr[0]) example say = 0.543 > 0.05 therefor cannot reject the null hypothesis (It is plausible that people do not prefer pate over dog food)