03. Review of Basic Data Analytic Methods Using R Flashcards
What is a factor
A factor is vector of values that are limited to a fixed set of values (categories). Factors always have levels (the things it allows to be put in - like it had a data validation drop down list). When you first create the factor it will assume the different options are the initial options, after that it will check against this set of levels.
What is a list
A list is a vector of values which can contain different data types
What is an array
An array can only contain the same type of data. Arrays can multi-dimensional. i.e. rows and columns and sheets and workbooks etc
What is a dataframe
A dataframe is a table of vectors or factors; all items of the same length. Individual columns are the same data type, but different columns can be different data types.
What is a record
A single cell
What is a pairs plot
It plots every variable against every other variable. Also known as a Splom or a ScatterPlot
What do T-Tests use
Samples of the population (not the full population)
What are T-Tests use
Samples of the population (not the full population) are tested to compare against a NULL hypothesis (i.e. checking if there is a statistical significance)
What can an AVOVA test be used for
AVOVA are used in hypothesis testing when you have more than two sample populations
What is hypothesis testing
where you are picking between the null and the alternative hypothesis
What is statistical power
Statistical Power is a measure of how well that test compares against the null
What is a parametric distribution
The data follows a normal distribution
How is the standard deviation calculated
The standard deviation is the square root of the deviance
The T-Test
Parametric. Can be one test and two test
A “students t-test” is another name for the two sample t test.
Welch’s test
Parametric. But can cope with different standard deviations (hence automatically two sample)