final Flashcards

Question 1

Q

statistics

Answer

A

a field of mathematics that develops and studies methods to collect, analyze, interpret, and present empirical evidence

Question 2

Q

empirical vs anecdotal evidence

Answer

A

empirical - information received from the observation or measurements of patterns using experimentation
anecdotal - evidence collected in a casual or informal manner that relies heavily on personal testimony or conclusions (not statistical data collection)

Question 3

Q

data

Answer

A

a collection of numerical facts or information from which conclusions can be drawn

Question 4

Q

raw data

Answer

A

unformatted data (numerical measurements, instrument readings, text) that has not been processed or analyzed

Question 5

Q

replicates

Answer

A

parallel measurements of a phenomenon to estimate variability in your sample (the number of replicates = n)

Question 6

Q

sampling effort

Answer

A

how much data do we need?

Question 7

Q

precision vs accuracy

Answer

A

precision - how fine the divisions on a scale of measurement are
accuracy - how close to the truth our measurement is
(accuracy is the priority)

Question 8

Q

descriptive statistics

Answer

A

quantitative description of observations sampled from a population (mathematically summarizing patterns, data centers, and variability without making conclusions about overall meaning of data)

Question 9

Q

data distribution (historgram)

Answer

A

sampled populations arranged by rank order and graphically presented

Question 10

Q

normal distribution

Answer

A

an arrangement of data in which most values cluster in the middle of the range and the rest taper off symmetrically toward either extreme

Question 11

Q

central tendency

Answer

A

numeric value describing a central position in a dataset
mean, median, mode are all valid measures

Question 12

Q

skew

Answer

A

positive vs negative
positive - /_
negative - _/\

Question 13

Q

central limit theorem

Answer

A

if a population with finite variants is sufficiently sampled, the mean of all the samples from the population will be approximately equal to the mean of the population, AND the means from the samples will approach a normal distribution

Question 14

Q

steps of scientific method

Answer

A

planning - what are you going to do? learn the system, develop ideas about how the system works (maybe do a pilot study), decide hypothesis, figure out what data you will need
recording - collect and properly accord data, can take many forms, must record extremely carefully
analyzing - interrogate data to test hypothesis, analysis cannot be successful if data gathering was not designed with analysis in mind, should allow you to accept or reject null
reporting - disseminating methods and media will depend on the type of work and audience, statistical results must be reported using proper conventions, graphs must be properly labelled

Question 15

Q

continuous data

Answer

A

data that can take any value (usually measured)

Question 16

Q

discrete data

Answer

A

numerical data that can take a limited number of values (often counted)

Question 17

Q

ordinal data

Answer

A

data in categories that can be placed in order but the magnitude of difference between categories is not fixed

Question 18

Q

categorical data

Answer

A

data in categories that can’t be usefully ordered

Question 19

Q

null and alternate hypothesis

Answer

A

what do we test when we use them
test the null and decide if it is statistically probable

Question 20

Q

random sampling

Answer

A

best choice, random

Question 21

Q

systematic sampling

Answer

A

transects (sampling on a created line)

Question 22

Q

mixed sampling

Answer

A

stratified random sampling

Question 23

Q

haphazard sampling

Answer

A

when you are unable to randomly sample because of practicality

Question 24

Q

mean, median, mode

Answer

A

mean - average
median - less skewed middle
mode - most frequent

Question 25

Q

quartiles

Answer

A

rank data from smallest to largest
smallest is first number, largest is 5th
median is third
middle of first and third is 2nd, middle of fifth and third is 4th

Question 26

Q

why divide by n-1 when calculating varience

Answer

A

penalty for having a small amount of replicates

Question 27

Q

shapiro-wilk test for normality

Answer

A

takes a data distribution and determines whether it is significanyly different to normal
p-value of less than .05 = not normal, reject Ho

Question 28

Q

SEM (standard error of the mean)

Answer

A

=Sx/sqrt n
estimate of how close the sample mean is compared to the true population mean
standard deviation of resampled mean

Question 29

Q

descriptive projects

Question 30

Q

difference projects

Answer

A

is a different to b, bar charts and box and whisker plots, categorical variable and want to know if the response variable differs between categories

Question 31

Q

correlation/regression projects

Answer

A

links between variables, usually quantitative variables are independent and quantitative variables are dependent

Question 32

Q

association projects

Answer

A

similar to correlations but with categorical data

Question 33

Q

how to calculate mean

Answer

A

bar x = (E^n i=1 * xi)/n

Question 34

Q

how to calculate median

Answer

A

the middle value

Question 35

Q

how to calculate mode

Answer

A

the most often

Question 36

Q

how to calculate range

Answer

A

rank order observations - highest-lowest

Question 37

Q

how to calculate variance

Answer

A

=(E6n i=1(xi-barx)^2/n-1 OR =SS/n-1

Question 38

Q

how calculate standard deviation

Answer

A

=sqrt(E^n i=1 (xi-bar x)^2/n-1) OR =sqrt(SS/n-1)

Question 39

Q

how to calculate standard error

Answer

A

=Sx/sqrt n

Question 40

Q

outcomes of hypothesis testing

Answer

A

test null
p-value is the probability that the null hypothesis is correct from the data gathered

Question 41

Q

what project uses histograms

Answer

A

descriptive test

Question 42

Q

what projects use box plots

Answer

A

descriptive, difference (side by side)

Question 43

Q

what projects use scatterplots

Answer

A

correlation and regression

Question 44

Q

what projects use line plots

Answer

A

correlation and regression

Question 45

Q

what projects use pie charts

Answer

A

association

Question 46

Q

what probability is used as a threshold for hypothesises

Question 47

Q

set up for a t-test

Answer

A

create hypothesis, collect data, data must be normally distributed, each point must be independent

Question 48

Q

what happens to t when mean, standard deviation, and n

Answer

A

when t increases, mean difference increases
when t decreases, standard deviation increases
when t increases, n increases

Question 49

Q

what test is needed to decide if data is appropriate for a t-test

Answer

A

find if the data are normal (boxplot and shapiro-wilk test)
greater than .05 = the data is normal and a t-test can be done

Question 50

Q

one tailed vs two tailed t-test

Answer

A

one tailed - more power to detect directional effect (greater than or less than)
two tailed - shows evidence that the difference between means is greater than expected

Question 51

Q

paired t-test

Answer

A

repeated observations collected for a single variable with 2 levels (differences between sample point 1 and sample point 2 are compered for the same sample unit)

Question 52

Q

how do non-parametric tests work

Answer

A

use the rank of data and rank from smallest to largest, compare the ranks
mann - whitney (two sample) and wilcoxon (paired) tests

Question 53

Q

when do we have independent replicates

Answer

A

when the replicates are not connected to each other

Question 54

Q

simple pseudoreplication

Answer

A

only a single replicate per treatment and subsamples are collected from each area

Question 55

Q

sacrificial pseudoreplication

Answer

A

experimental units are replicated

Question 56

Q

temporal pseudoreplication

Answer

A

only a single replicate per treatment and subsamples are collected from it repeatedly over time

Question 57

Q

phylogenetic pseudoreplication

Answer

A

closely related individuals are the units being sampled (seeds, tadpoles, insect larvae)

Question 58

Q

technical pseudoreplication

Answer

A

different observers or instruments are used for different parts of the experiment

Question 59

Q

true positive

Answer

A

Ho is true and we fail to reject it

Question 60

Q

true negative

Answer

A

Ho is false and we reject it

Question 61

Q

false positive

Answer

A

Ho is true and we reject it
type 1

Question 62

Q

false negative

Answer

A

Ho is false and we fail to reject it
type 2

Question 63

Q

what is linear regression used for

Answer

A

to look for a relationship between quantitative independent and continuous variables

Question 64

Q

linear regression assumptions

Answer

A

data are independent and randomly selected
data can be reasonably described by a linear relationship
residuals are normally distributed
residuals have constant variance regardless of x-value
no extreme outliers
(assumptions 3 and 4 are less important)

Answer 63

A

y=mx+b
y - dependent variable
m - slope of the line
x - the dependent variable
b - the point that the line crosses the y-axis

Answer 64

A

m=0
no relationship between x and y
p-value of more than .05 = no relationship, fail to reject
p-value of less than or equal to .05 = reject

Answer 65

A

tells us if there is a significant slope

Answer 66

A

how much of the variation in our dependent variable is explained by the regression
r2=explained variation/total variation
values range between 0 (none of the variation is explained by regression) and 1 (all of the variation is explained by regression)

Answer 67

A

when data do not meet assumptions
Spearman’s Rank
does not give a slope or intercept
tells if the null hypothesis should be rejected
cannot assume the relation is linear

Answer 68

A

difference between 3 or more levels of a categorical variable
looks at variance in the dependent variable responses for each group

Answer 69

A

data are independent and randomly selected
residuals are normally distributed around group means
each within-group residual variance is equal
no extreme outliers

Answer 70

A

null - V1=V2=V3=Vt
alternative - V1=V2=V3<Vt
p-value of greater than .05 = fail to reject
p-value of less than or equal to .05 = reject the null hypothesis, at least one group has a different mean –> use Tukey’s HSD test

Answer 71

A

used after getting a significant result for an anova test
looks for pairwise differences
controls the type 1 error rate - gives p-values for all pairwise differences

Answer 72

A

if assumptions are not met - kruskal-wallis test
based on ranks
p-value of greater than .05 = fail to reject null hypothesis
p-value of less than or equal to .05 = reject the null hypothesis and conclude that one group has a different mean rank to at least one other group

Answer 73

A

used to compare two datasets that are categorical
compares the observed data to what would be expected if the values for each variable did not depend on the values for the other

Answer 74

A

null - no association between the variables
alternative - association between the two variables

Answer 75

A

p-value of greater than .05 = fail to reject
p-value of less than or equal to .05 = reject the null and conclude there is an association between the variables, can look at our observed data to determine where the largest differences are between observed and expected