Lets have it lad Flashcards
Add together
=sum
what symbol in edit bar means auto sum
Ʃ
Multiply
*
Divide
/
Subtract
-
Mean
=Average()
Median for even numbers
Middle two, add together divide by two
Median
Sort and filter icon, smallest to largest to put them in order
Mode
=mode()
Range
=Max() and =min()
Interquatile range
Put in numeric order
Split into 4 quarters
Subtract first value of 2nd quarter from last value of 3rd
Standard deviation
=STDEV()
Variance tpye
=VAR()
What is variance
STDEv without square root
How do you calculate stdev
Calculate mean, x with line above means mean
Difference of each x value and mean then square
Take sum of all square
Then divide by number of parts minus 1
Square root
Square data
data^2
What are the two types of statistical analysis
Descriptive stats
Inferential stats
What is descriptive stats?
Statistical analysis to summarise main points/characteristics of data
What is inferential stats
Statistical analysis to infer something about a whole form sample
Nominal data
PLace in categroies, labelled- options
Ordinal data
Preferences shown and then presences ranked, scale 1-5
Interval data
Any values that have a consistent interval, how hot
Ratio
Has a defined 0 point, distance travelled
What is the mean very sensitive to?
extreme values
What is the median compared to the mean?
Much less sensitive, more robust
For nominal data what should be used, descriptive stats
Mode
What should you use for quantitative data?
Mean or median
When should you use median over mode
When. there are extreme values because you do not want a distorted average
When should you use mean over median
When there are no extreme values
What are the 4 measures of spread
Range
Standard dev - most common
Interquartile range
Variance
What is the standard deviation used for?
Measure the variability in data
What does a high standard dev mean?
high change
wHat does a low standard dev mean>
low change
Example of standard dev?
LAI values of rainfall
What is variance
Calculates variation which is not in the same units as the data (squared units)- less common
What are histograms?
Graphic depiction of the shape of the distribution of data- most common
What is the issue w to many intervals in a histogram?
Too complex
What is the issue with too few intervals in a histogram?
Detail is lost
What is the ideal number of intervals in a histogram?
Ideal=10 to 20
4 measures to describe the similaries/differences of frequency distributions
Central tendency
Spread
Skewness
Kurtosis
Skewness
Measure of asymmetry in distribution
Skewness, low numbers
Positively skewed distribution
Skewness High numbers
Negatively skewed distribution
No skew
0
Sew =
Mean-median
+ve value skew
+ve skew
-ve value skew
- e skew
Kurtosis
Measure of how flat or peaked a distribution is
Kurtosis, Platykurtic
Relatively flat distribution w no obvious peak
Kurtosis, Leptokurtic
strongly pronounced peak in data
Culmative frequency graphs, steep slope
Intervals with many data points
Culmative frequency graphs, shallow slope
intervals with few data points
What are calmative frequency graphs used for )example)
grain size statistics
What is used to raph 2 variables
Scatter plots
Nominal data os
-Categroical data
What is nominal data frequnctly expressed as?
Pie charts, good at showing proportion
Cna histograms still be used for nominal data?
-Yes, intervals=categories and frequencies = number of x in each category
How is nominal data expressed?
Histograms and pie charts
What do frequency distribution histograms show?
Visually describe distribution and indetify skewness and kurtosis
How do you decide size of intervals on histogram
Total (highest value) divided by the number of intervals you want
How do you calculate the number of individuals in each class interval
=fequencey(data cells, class interval cells) highlight range of cells to put the answer into first
How do you produce freuencey distribution histogram?
insert tab-charts-coumn
Skew function
=skew()
What is a normal skew distribution?
0
Why are sampling and inferential stats important
- Rare to be able to sample a whole population
- Use characteristics of sample t infer
What is random sampling
Selecting individuals with no bias
What is systematic smapling
Individuals selected in a regular way
What is spatial sampling
individuals are selected at regular spatial intervals
Criteria for truly random samples
- Every individual has an equal chance of inclusion throughout the procedure
- Selection of any individual should not affect the chance of selection for another
Positive/negatives of systematic sampling
+allows fair/even coverage of range of individuals
-Not fair and equal chance of being chosen, can produce bunching of sampled individuals
Two key assumptions which underpin most inferential stats
- random sampling
- population has a known distribution
What is a parameter
number that describes data from a population
What is a statsistic
A number that describes data from a sample
How to use random generator on excel
=RaNDBETWEEN(1,200), pulldown for cells below, rapsate answers and click paste special
Where to find look up feature
Lecture week 4
What is a hypothesis?
Proposed explanation for narrow phenomena, based on a range of things e.g. background scientific knowledge, preliminary investigators, logic, etc.
What is a theory
structure conceived by human imagination to explain how/why patterns occur in observed data
- often broader and can integrate many hypotheses
- new oe very well tested
- can be used to generate hypotheses
What must hypotheses be to be a science?
testable
Why can descriptive statistics be used to make hypotheses?
We can make hypotheses based on observed patterns
Hypotheses can be formalised for what?
statisical testing
Null hypotheses, symbol
H0
What is a null hypothesis
accepted fact which is nullify able/invalidatable
Research hypotheses
H1, what we want to find the answer to
What is a 2 tailored test
Test to look into difference of means , could be 2 stats
what is a 1 tailored test
Only one way it could happen
What happens if there is evidence of statically significantly different between sample and overall mean?
We reject null and accept h1
What happens when we cannot reject null
If null hypohtesis cannot be statically evidence against, then it has to be accepted
What does an alternative hypothesis do?
Reject null
What is the only purpose of inferential stats?
Answer the quesiton
What is simulation modelling?
Testing all outcomes that might occur due to random variation and sampling- it is very slow and inelegant for well known process, but hd to do it top prove a point
Null hypotheses is what we?
test to reject, from which we may accept
What must we do to apply statistical tests?
We need to generate full hypotheses
What do inferential stats allow us to do?
Use sample statistics to comment on a populations paramters
Mode function
=Mode()
Function for m, slope
=slope()
Function for y intercept of line, c
=intercept()
Function for correlation between populations, r, Pearsons
=pearson()
r, correlation function
=correl()
Variance function
=VAR
What are sample figures called? (mean, mode and median)
Statistics
What are the true mean and median of a population called?
Parameters
What is used to estimate a parameter?
Statistics
What do statistics describe?
Sample
What do parameters describede?
population
How likely is it that a sample will give exact estimates of the population characteristics?
unlikely
What are different samples of the same population likely to compare?
Unlikely to have the same estimates of the population
What is sampling distribution?
Distribution of a large number of sample statistics
What is confidence interval?
Used to assess the accuracy of parameter estimates… and this is what allows us to test hypotheses
What is standard error?
Standard deviation/square root of sample size
Confidence interval
mean of x, plus or minus the standard error over the route of the sample size, times by the chosen Z value
What is standard deviation of the means called?
Standard error
What does standard deviation quantify?
The variation within a set of measurements
What does standard error quantify?
the variation in the means of multiple measurements, can still do form single measurements
Why statistical tests are needed?
Sample data, each mean value is to the same degree in error (it differs from the actual or population mean value of that site
Why can difference in mean values from same population occur?
Sampling chance
What is statistical significance?
level of risk ( a value); the risk of not being correct if we accept or reject null hypothesis
What does a lowering statistical significance mean?
Higher confidence
What do parametric statistical tests involve?
Assumptions of populations
-Populations should have some variance, independent date and hypothesis usually concern population ean
What are non parametric test in comparison?
-Have fewer assumptions and are more robust, handle non classical distributions e.g. chi squared
What stats test to use for nominal data?
test for promotion, difference of two proportions, chi squared independence
What stats test to use for interval ratio?
Most commonly mean, difference of two means and regression analysis
What is ordinal data?
Between both nominal and interval data
If there is one sample…
test for proportion and test for mean
If there is two samples..
Difference of two proportions and difference of two means
What test if there is one sample with two measures?
chi squared , regression and difference of two emasn
What if you are testing for a value?
test for proportion, test for mean and difference of two means
What if you are comparing 2 statistics?
Difference of tow proportions/means
Working out a relationship?
Chi squared, regression analysis
What is the T - Test for?
Individual sampels
What does t mean?
Difference between means/standard error of that difference
What is critical value calculated based on?
Degrees of freedom and significance level
What is the rejection reigon?
Part of the probability distribution beyond the a critical value of test statistic
What is regression?
Nature of relationship between two variables
What is measured by a correlation coefficient?
An association between two variables
- poistive or negative
- linear or non-linear
- visualization or scatter plot
Positive correlation
/
Negative correlation
\
Linear correlation
Close together and ordered
+1 correlation
Perfect correlation
0 correlation
no association
-1 correlation
Perfect negative association
0.00-0.19 correlation
very weak
0.2-0.39 corerlation
weak
0.4-0.69 correlation
modest
0.7-0.89
strong
0.9-1.0
very strong
What are the different estimates of correlation coefficient?
Parametric and no para metric
what is the parametric correlation estimate?
Pearsons product, moment correlation coefficient - asses normal distribution needs values
What is the non-parametric correlation estimate?
Spearmans rank korrelation coeffiencent- more effective, only needs rank
Pearsons rank
_requires interval scale (continuous data)
-Parametric assumes variables are normally distributed
Relationship between two variables tested is linear
-Should be in a consistent idrection
Spearmans Rank
Ordinal or interval dara
Assumes direction of relationship consistent
only 91% of power?
Spearmans rank equation
Look online
Pearsons rank equation
Look online
Steps of a stats test
-1 A Statement of the null hypothesis
Set the level of risk, statistical significance (Alpha value)
Select the appropriate statistical test and compute the test statistic value
Find critical value from published tables
Accept or reject hypothesis
What is a correlation coefficient?
Numerical index that reflects the linear relationship between two variables, which ranges from -1 to 1
Two types of correlation coefficient used
Spearmans rank and Pearsons
How can the significance of Roy and rs be tested?
Using the T test
How do you produce a scatter plot
highlight data, inset select x-y scatter chat type with only dots. Right click to alter, select chart layout tab to do the
What should be done for sample data to get an unbiased estimation of rs values?
Coefficient should be multiplied by n-1/m
How do you work out spearman’s rank
find difference between each pair of ranks, copy formula down column
- sqaure the rank difference, pull down column
- Add squared diffrences
- Tied dat= complex
- Non tied data = simple
How do you work out spearman’s rank using a scatter graph
scatter graph, add trend line, chart elements-add. Double click trend line and click show more. Square root r2 (squared) value for (Coefficient of determination) Pearsons
How to test significance for correlation data
put r value into T=r route over brackets with n-2 over 1-r squared inside
Regression
A parametric stats technique for identifying the relationship between dependant variable and one or more independent variables
What does a correlation tell us?
If 2 variables vary together
What does a regression describe
Functional relationship between 2 variables
What is the regression coefficient?
y=a+bx y=dependant variable a=intercept on y axis b= slope gradient x=independant variable
What does the regression coefficient show?
Provides diagnostic details that indicate the quality of the model fit
-used to provide a simplified relation between the two variables to evaluate the strength of the relation and correlation of the model based on sample data
Dependant variable
Depend on the values of other variables
Best fit line+
Sum of the square residuals minimised- determine b (slope) and a intercept for best fit line
Slope = b , equation
Sum of (x-(mean X) X (y-(mean y) over Sum of x-(mean X) squared
What is the coefficient of determination?
r2 (squared) between 0 and 1.0
- What proportion of total variance in the dependant variable is accoiunted for by the regression model, squared peasn
What do higher rates of the coefficient determination mean?
Less scatter along the line
What must be apparent for a relation to exist in the coefficient determination?
b must be different from 0 for a relation to exist
What is the T test used for with regardless to the slope?
See whether it is significant or not
T test can be used for both slope (b) and interecpet
If there is no relationship between X and Y what would the coefficient be expected to be?
0
How can you also test the slope?
Convert calculated value for b into units of T to to compare T statistic with T critical value- the larger the T value the less likely that the slope coefficient for the sample arose from random sampling of variables that are not related
Standard error of the estimate
Measures the amount of variability in the points around the regression
Limitations of linear regression?
-Interval scale (continuous data) required
Data should be approx. normally distributed
-Equation should not be used to predict values for beyond limits of original data
-Relationship assumed linear shears non linear fit may provide better fit to same data sets.
-Residuals of regression should be approx normally distributed with a mean of 0.
Variance of y about regression line does not vary markedly over range of x
How do you add linear regression line to the plot?
left click on data series to select- layout out tab, analysis group and click on add trend line- more trend line options and linear regression, check boxes to display the lines equation and r squared
What is the slope function to obtain b data ?
=Slope (Known y’s, known x’s)
What is the function to obtain intercept data, a?
=intercpet (known ys, known XS)
Pearsons function
=Pearson()
How do you calculate regression using the tool?
Select regression from data analysis tool data tab- data analysis group, if not options file- options add in
what test for slope coefficient hypothesis?
T test
The larger the value of t…
the less likely that the slope coefficient for the sample arose from random sampling of variables that are not linearly related
What is confidence interval related to?
Standard error of the estimates
What is the standard error of the estimates S.E
Average amount of difference between sample and population characteristics- in regression can be calculated in excel
What can be SE be used to calculate?
Confidence interval given a confidence level
What is the confidence interval for slope coefficient?
(b-2s.e. b +2s.e)
What is Anova?
Analysis of variance- test for multiple samples from same n
Why are degrees of freedom useful?
In finding the critical value in the table
Degrees of freedom regression
K-1
Degrees of freedom
N-K
Degreees of freedom
N-1
What is the significance level
P value
How do you find the T stat for the intercept?
a/s.e
How do you find the t stat for the x variable?
b/s.e.
What does 0.004 p value means
4% chance the relationship occurred randomly
What does 0.04 p value means
4% chance the relationship occurred randomly
Typically what does a p value less than 5 or 10% mean?
It is significant
What are the limitations in regression analysis?
- iinterval data required i.e. continuous
- Data should be approx. normally distributed
- Regression equation should not be used to predict values for beyond the limits of original data
- The relationship is assumed linear shears non linear best fit line may provide better fit
- Homoscasticty, variance of y about the regression line does not vary markedly
How should residuals of regression be approx distributed?
normally w mean of 0
What should residuals not show?
Trend, slope pof regression residuals on x should=0
What should residuals not show?
Trend, slope of regression residuals on x should=0