Initial analysis of the data Flashcards
What are all R commands?
Functions
how do you get data into R
Read.table()
read.csv()
how do you get data out of R
write.csv()
what is nominal data?
names of things
what is ordinal data?
ordered names
what is interval data?
numeric with no true zero (Celsius)
what is ratio data?
numeric with true zero (kelvin)
which 2 data classifications are categorical or discreet?
nominal and ordinal
which 2 data classifications are continuous variables?
interval and ratio
what is a number?
can have decimals
what is a integer?
whole number
what is a character?
not a number
what is a vector?
set of values of the same data (combine function c() )
what is a list?
collection of different vectors or other data structures
what is a factor?
categorical variable
fixed set of values
what are arrays?
n-dimensional homogeneous data types
what are matrices?
2D and numeric
what is a data frame?
a list but all component vectors are same length
what is the R code for viewing the data?
head()
tail()
what is the r code for viewing a summary of the data?
summary()
what is the r code for computing basic statistics?
sd()
var()
range()
IQR()
What is the r code for the correlation?
cor()
what does visualisation give you?
more holistic picture of the data
what are summary statistics?
mean vs median
standard dev
quartiles
correlations
what is Anscombe’s Quartet?
4 sets of data based on standard statistics
what does hist() mean in R?
Plot a histogram
what do missing values suggest?
dirty data
what is the best first visualisation of 2 variables?
scatter plots
what is a box and whisker plot?
a plot that shows the centre box of the data (50%)
why use a pairwise plot?
visually represent data relationships
examines relationship quickly
What does time series analysis have to have?
the same time period
what is the null hypothesis
no difference
what is the alternate hypothesis
there is a difference
what is the difference of means?
the overlap of 2 data sets
what is the p value?
the area under the tails of curve
if the p value is less than 0.5 what do you do?
reject the null hypothesis
student t-test assumes both populations are -
a) normally distributed
b) not normally distributed
a)normally distributed
what do you use if the data is not normally distributed?
wilcoxon rank sum test
what are the steps in hypothesis testing?(3)
calculate test statistic
calculate p value
if p value less than 0.5 then reject
what is a type 1 error (false positive)
reject null hypothesis and the null hypothesis is true
what is a type 2 error (false negative)
accept null hypothesis and null hypothesis is false
what is significance?
the probability of a false positive
what is power?
the probability of a true positive
what is effect size?
the actual magnitude of the result
what does ANOVA stand for?
analysis of variance
what is ANOVA?
Generalisation of the difference of means
what percentage of confidence interval do most people use?
95%
would you visualise before or after model building?
before