Initial analysis of the data Flashcards

1
Q

What are all R commands?

A

Functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

how do you get data into R

A

Read.table()

read.csv()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how do you get data out of R

A

write.csv()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is nominal data?

A

names of things

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is ordinal data?

A

ordered names

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is interval data?

A

numeric with no true zero (Celsius)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is ratio data?

A

numeric with true zero (kelvin)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

which 2 data classifications are categorical or discreet?

A

nominal and ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

which 2 data classifications are continuous variables?

A

interval and ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is a number?

A

can have decimals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is a integer?

A

whole number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is a character?

A

not a number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is a vector?

A

set of values of the same data (combine function c() )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is a list?

A

collection of different vectors or other data structures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is a factor?

A

categorical variable

fixed set of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are arrays?

A

n-dimensional homogeneous data types

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what are matrices?

A

2D and numeric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is a data frame?

A

a list but all component vectors are same length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is the R code for viewing the data?

A

head()

tail()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is the r code for viewing a summary of the data?

A

summary()

21
Q

what is the r code for computing basic statistics?

A

sd()
var()
range()
IQR()

22
Q

What is the r code for the correlation?

A

cor()

23
Q

what does visualisation give you?

A

more holistic picture of the data

24
Q

what are summary statistics?

A

mean vs median
standard dev
quartiles
correlations

25
Q

what is Anscombe’s Quartet?

A

4 sets of data based on standard statistics

26
Q

what does hist() mean in R?

A

Plot a histogram

27
Q

what do missing values suggest?

A

dirty data

28
Q

what is the best first visualisation of 2 variables?

A

scatter plots

29
Q

what is a box and whisker plot?

A

a plot that shows the centre box of the data (50%)

30
Q

why use a pairwise plot?

A

visually represent data relationships

examines relationship quickly

31
Q

What does time series analysis have to have?

A

the same time period

32
Q

what is the null hypothesis

A

no difference

33
Q

what is the alternate hypothesis

A

there is a difference

34
Q

what is the difference of means?

A

the overlap of 2 data sets

35
Q

what is the p value?

A

the area under the tails of curve

36
Q

if the p value is less than 0.5 what do you do?

A

reject the null hypothesis

37
Q

student t-test assumes both populations are -

a) normally distributed
b) not normally distributed

A

a)normally distributed

38
Q

what do you use if the data is not normally distributed?

A

wilcoxon rank sum test

39
Q

what are the steps in hypothesis testing?(3)

A

calculate test statistic
calculate p value
if p value less than 0.5 then reject

40
Q

what is a type 1 error (false positive)

A

reject null hypothesis and the null hypothesis is true

41
Q

what is a type 2 error (false negative)

A

accept null hypothesis and null hypothesis is false

42
Q

what is significance?

A

the probability of a false positive

43
Q

what is power?

A

the probability of a true positive

44
Q

what is effect size?

A

the actual magnitude of the result

45
Q

what does ANOVA stand for?

A

analysis of variance

46
Q

what is ANOVA?

A

Generalisation of the difference of means

47
Q

what percentage of confidence interval do most people use?

A

95%

48
Q

would you visualise before or after model building?

A

before