vocab definitions Flashcards

1
Q

sample of convenience

A

a collection of individuals that happen to be available at the time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

variable

A

a measured characteristic on individuals from a population under study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

data

A

measurements of one or more variables made on a collection of individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

explanatory variable

A

a variable we use to predict or explain a response vairable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

response variable

A

a variable that is predicted or explained from a explanatory variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

populations

A

a group of all individuals or groups that you want to study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

sample

A

a subset ideally randomly chosen from a population you wish to study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

parameters

A

things we want to know about the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

estimates

A

are calculated from a sample to help understand perameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

bias

A

a systematic discrepancy between estimates and the true population characteristic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

volunteer bias

A

volunteers for a study are likely to be different on average from the poulation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

sampling error

A

chance difference from the truth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

precision

A

the spread of estimates resulting from sampling error
-gives a similar answer repeatedly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

accurate or unbiased

A

the average of estimates that are obtained is on the true population value
-accuracy (on average gets the correct answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

random sample

A

in a random sample each member of a population has equal and independent chance of being selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

categorial variables (attribute or qualitative variables)

A

describe membership in a category or group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

numerical variable

A

when measurements of individuals are quantitative and have magnitude. numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

continuous

A

numerical data that can take on any real-number value within some range. Between any two values of a continuous variable, an infinite number of other values are possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

discrete

A

numerical data that come in indivisible units. Example: number of amino acids in a protein and numerical rating of a statistics professor in a student evaluation are discrete numerical measurements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

frequency

A

the number of observations having a particular value of the measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

frequency distribution

A

shows how often each value of the variable occurs in the sample.
The frequency distribution describes the number of times each value of a variable occurs in a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

independence

A

two events are independeent if the occurance of on egives no info about whether the second will occrur

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

multiplication principle

A

if two evens A and B are independent, then Pr[A and B] = Pr[A] xPr[B]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

The addition principle

A

If two events A and B are mutually exclusive, then Pr[A or B]= Pr[A] + Pr[B]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Probibility distribution

A

A prob distribution describes the true relative frequency of all possible values of a random vairable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Mutually exclusive

A

if two events are mutually exclusive they cannot both be true
Pr(A and B)= 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

probability

A

The prob of an event is its true relative frequency, the proportion of times the event would occur if we repeat the same process over and over

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

pseudoreplication

A

the error that occurs when samples are not indepenent, but they are treated as though they are

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

standard error

A

estimate is the standard deviation of its sampling distribution. predicts the sampling error of the estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

standard error of an estimate

A

the standard deviation of its sampling distribution
It predicts the sampling error of estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

conditional probability

A

the conditional probability of an event is the probability of that event occurring given that a condition is met.
Pr[X|Y]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

confidence interval

A

the 95% confidence provides a plausible range for a parameter. All values for the parameter lying within the interval are plausible, given the data, whereas those outside are unlikely

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

The 2SE rule-of thumb

A

the interval from Y-2SEy to Y+2SEy provides a rough estimate of the 95% confidence interval for the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

what does a x^2 goodness of Fit test do?

A

compares count data to a model of the expected frequencies of a set of categories

-it is an approximation (don’t use when there’s little amount of data)

H0: the data come from a specified probability distribution

x^2= sum of all classes (observed-expected)^2/ expected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Degrees of freedom

A

the number of degrees of freedom of a test specifies which of a family of distributions to use
for x^2 df= number of categories-number of parameters estimated from the data-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Critical value

A

the value of the test statistic where P= alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What are test statistics

A

A test statistic is a number calculated from the data and the null hypothesis that can be compared to a standard distribution to find the P-value of the test

38
Q

What are assumptions of x^2 test

A

=that its a random sample
- No more than 20% of categories have expected <5
- no category with expected = 1

when both these conditions are not met the approximations to make the x^2 test do not work

39
Q

what is a discrete distribution?

A

a porbobility disribution descibing a discrete numerical random variable
example:
number of heads from 10 flips of a coin
number of flowers in a square meter
number of disease outbreaks in a year

40
Q

Poisson distribution

A

a mathematical probability distribution.
- describes the probability that a certain number of events occur in a block of time or space, when those events happen independently of each other and occur with equal probability at every point in time or space

41
Q

x^2 contingency analysis

A

tests the independence of two of more categorical variables

42
Q

Fishers exact test

A

for 2x2 contingency analysis

does not make assumptions about the size of expectations

use when you cant do x^2 contingency analysis

*don’t need to do by hand

43
Q

odds

A

the probability of success divided by the probability of failure

44
Q

Odds ratio

A

the odds of success in one group divided by the odds of success in another group
OR< 1 means odds of bad thing happens lower
OR>1 means odds of bad thing is higher

45
Q

properties of a good sample

A

-independent selection of individuals
-random selection of individuals
-sufficiently large

46
Q

sampling error

A

The difference between the estimate and average value of the estimate

47
Q

larger samples on average will have ____ sampling error

A

smaller

48
Q

best way to graph a numerical variable frequency

A

histogram

49
Q

cumulative frequency distribution

A

The cumulative frequency of a value is the proportion of individuals equal to or less than the value

graphed this goes from 0-1 on y axis, never decreasing

50
Q

best way to show association between two categorical variables

A

contingency table,
grouped bar graph,
mosaic plot,

51
Q

best way to show association between categorical and numerical variable

A

multiple histograms

52
Q

best way to show association between two numerical variables

A

scatter plot

53
Q

how to calculate mean

A

Ybar= sum of Yi/n n=sample size

54
Q

median

A

The median is the middle measurement in a set of ordered data

55
Q

Mode

A

the mode is the most frequent measurment

56
Q

Range

A

the maximum minus the minimum

57
Q

Small samples tend to give ___ estimates of the range than small samples.
So sample range is a _______ of the true range of the population

A

Small samples tend to give _lower__ estimates of the range than small samples.
So sample range is a biased estimator of the true range of the population

58
Q

Variance in a population

A

sigma^2= sum of (Yi- u)^2/N

N is the number of individuals in population
u= true mean of the population

59
Q

Sample variance

A

s^2= sum of(Yi-Ybar)^2/n-1
n=sample size
Ybar= sample mean

60
Q

Standard deviation (SD)

A

positive square root of the variance
sigma is the true standard deviation
s is the sample stand deciation
s= sqare root of s^2= sqrt(sum(Y-Ybar)^2/n-1)

61
Q

coefficient of variation (CV)

A

CV= 100% S/Ybar

62
Q

skew

A

a measurement of asymmetry
refers to the pointy tail of a distribution

63
Q

Standard error of the mean

A

standard error of he mean:
sigma ybar= sigma/ srt(n)

64
Q

Estimate of the standard error of the mean

A

SEYbar= S/ srt(n)
gives us some knowledge of the likely difference b/w our sample mean and the true population mean

65
Q

law of total probability

A

Pr[x]= sum of all values of Y Pr[X|Y] Pr[Y]

66
Q

probability of a positive result using the law of total probability (example if the events are not independent)

A

P[positive result]= Pr(positive result| X)Pr(x) +Pr(positive result| Y) Pr(Y)

67
Q

Bayes theorem

A

Pr[A|B]= Pr[B|A]Pr[A]/ Pr[B]

68
Q

what does hypothesis testing do

A

hypothesis testing asks how unusual it is to get data that differ from the null hypothesis

If the data would be quite unlikely under H0 we reject H0

69
Q

hypothesis are about populations but are tested from? with the assumptin?

A

sapmples with assumption it is random

70
Q

Null hypothesis

A

a specific statement about a population parameter made for the purposes of argument.

usually the simplest statement

71
Q

Alternative hypothesis

A

represent all other possible parameter values except that stated in the null hypothesis

usually the statement of greatest interes

72
Q

A good null hypothesis

A

would be interesting if proven wrong

73
Q

What is P-vale

A

the probability of getting the data or something as or more unusual, if the hypothesis where true

74
Q

How do you find the P-value?

A

Simulation
Parametric tests
Permutation

75
Q

Statistical significance

A

The significance level, alpha, is a probability used as a criterion for rejecting the null hypothesis

If the P-value for a test is less than or equal to alpha then the null hypothesis is rejected

often 0.05

76
Q

A large sample will tend to give and estimate with a ____ confidence interval

A larger sample will give ____ a false null hypothesis

A

A large sample will tend to give and estimate with a smaller confidence interval

A larger sample will give _more power to reject___ a false null hypothesis

77
Q

Type I error

A

Rejecting a true null hypothesis
Probability of Type I error is alpha (the significance level)

78
Q

Type II error

A

Not rejecting a false null hypothesis
The probability of a Type II error is beta
The smaller beta the more power a test has

79
Q

Power

A

The ability of a test to reject a false null hypothesis
Power = 1- beta

80
Q

Most tests are ___ tailed tests which means…

A

most tests are two-tailed tests and this means that a deviation in either direction would reject the null hypothesis
normally alpha is divided into alpha/2 on one side and alpha/2 on the other

81
Q

One-tailed test are

A

are only used when the other tail is nonsensical
example: comparing grades on a multiple choice test to that expected by random guessing

82
Q

Critical value

A

The value of a test statistic beyond which the null hypothesis can be rejected

83
Q

If a hypothesis test rejects a null hypothesis thest (P<0.05) the value proposed by the null hypothesis is

A

outside the 95% confidence interval

84
Q

confounding variable

A

an unmeasured variable that may be the cause of both X and Y

85
Q

a proportion

A

a fraction of individuals having a particular attribute

86
Q

binomial distribution

A

describes the probability of a given number of successes from a fixed number of independent traits

Pr[X]=(n given X)p^X(1-p)^n-X
n trails; p probability of success

87
Q

properties of the binomial distribution

A

mean and variance of number of succusses
u=np
sigma^2= np(1-p)

88
Q

proportion of successes in a sample

A

phat= X/n
the hat (^) shows that this is an estimate of p

89
Q

properties of sample

A

mean: p
variance: p(1-p)/n

90
Q

Agresti-Coull confidence interval

A

(p’-1.96 sqrt(p’(1-p’)/n+4) <= p<= (p’+1.96 sqrt(p’(1-p’)/n+4)

p’= X+2/n+4

91
Q

The binomial test

A

uses data to test whether a population proportion p matches a null expectation for the proportion

example:
H0: dog good is chosen at best 20% of time Ho=0.2
N=18, p0= 0.2, X=2
P-value= 2(Pr[2]+Pr[1]+Pr[0])
example say = 0.543 > 0.05 therefor cannot reject the null hypothesis (It is plausible that people do not prefer pate over dog food)