Book - Chapter 3 basic analytics in r Flashcards

1
Q

What is the function to import data

A

Read.csv

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the head function do

A

Examines the imported dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the summary function do

A

Provide some descriptive statistics, such as mean and medium, for each data column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When referring to a column in a dataset what symbol should you use

A

The $

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How would you plot linear regression

A

Lm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does our software use

A

Commandline interface

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you set a working directory

A

Set WD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the categorical/qualitative attribute types

A

Nominal and ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the numeric/quantitative attribute types

A

Interval and ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are nominal data types

A
ZIP Codes
 nationality 
street names 
gender 
employee ID number 
true or false
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is an ordinal data type

A

Ordered names for example
quality of diamonds
academic grades
magnitude of earthquakes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is interval data types

A

Numeric with no true zero for example Celsius or Fahrenheit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is ratio data type

A

Numeric with a true zero for example age or temperature in Kelvin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a vector

A

Set of values of the same data type.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What can you use to create vectors

A

The combined function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What dimension is a vector

A

They are dimensionless

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a 2dimensional array

A

Matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is an array

A

N dimensional set of homogenous data type values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does the function nrow and nCol do

A

Define the number of rows and columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a DataFrame

A

Like a spreadsheet and list but all columns are the same length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Can data frames stored different data types

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the list

A

A list is a collection of vectors and to be different lengths

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a factor

A

A set of categorical variables.

Fix set of values and use integer code to represent different values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does variance mean

A

The distance from means squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What does standard deviation mean

A

The square root of variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What does ranged mean

A

Minimum to maximum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is interquartile range

A

25% to 75% of the size order data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Why do we visualise

A

To get a sense of the data

29
Q

What should we visualise

A

Mean versus median. Standard deviation. Quantiles. Correlations between variables

30
Q

What does anscombes quartet do.

A

Illustrates the importance of visualising data. Uses for data sets. Each day is to set is plotted as a scatterplot and then fitted with lines with the results of applying linear regression

31
Q

What should you do if the data is skewed

A

Logit

32
Q

What does bimodal mean

A

It has more peaks

33
Q

What is data cleansing

A

Eliminating dirty data

34
Q

What is a plot function q

A

Scatter plot where x is the index and y is the value

35
Q

What is a barplot function

A

Barplot with vertical and horizontal bars

36
Q

What is a dot chart

A

Cleveland dot plot

37
Q

What is a plot (density (data))

A

Density plot. A continuous histogram

38
Q

What is a stem function

A

Stem and leave plot

39
Q

How many variables can a scatter plot have

A

5

40
Q

What is a loess line do?

A

Fit a non linear line to the data

41
Q

What charts can be used to visualise multiple variables.

A

Barplot and dotchart

42
Q

When would you use a hexbinplot

A

When dealing with large data sets

43
Q

What is pairwise plot

A

A scatter plot matrix

44
Q

What is the seasonality effect?

A

If a small peak or fall happens the same time every year or time series

45
Q

What is the basic concept for hypothesis testing?

A

To form an assertion and test it with data

46
Q

What is the null hypothesis

A

No difference

47
Q

What does it mean if the regression coefficient is zero

A

The null hypothesis

48
Q

What is the basic testing approach

A

To compare the observed sample means

49
Q

What does a large absorb difference between the sample means indicate

A

That the null hypothesis should be rejected

50
Q

For the difference in means how can this be tested

A

Students t-test or Welches t-test

51
Q

What is students t-test

A

Assumes that distributions of two populations have equal but unknown variances

52
Q

In students t-test if each population is normally distributed with the same main and with the same variance what do you do

A

On the T city stick follows a T distribution with degrees of freedom

53
Q

If the observed value of t is far enough from zero what should you do

A

Reject the null hypothesis

54
Q

What is a significance level

A

The small probability

55
Q

What is the significance level of the test

A

The probability of rejecting no hypothesis, when the no hypothesis is actually true

56
Q

What is the normal significance level

A

0.05

57
Q

What is different in a two sided hypothesis test

A

It is necessary for the sum of probabilities and the both tales of the t-distribution to equal the significance level

58
Q

What is the P value

A

Area under the tail

59
Q

What is a confidence interval

A

Is an interval estimate of the population para meter or characteristic based on sample data

60
Q

How is the confidence interval used

A

It is used to indicate the uncertainty of a point estimate

61
Q

What is Wilcoxen tank sum test

A

Makes no assumption about the distributions of populations. Robust test for difference and means

62
Q

What is the type one error

A

Rejection of the null hypothesis when the normal hypothesis is true

63
Q

What is the Type II error

A

And acceptance of the null hypothesis when the no hypothesis is full

64
Q

What is significance

A

Probability of a full positive

65
Q

What is power

A

Probability of a true positive

66
Q

What is affect

A

The size of the observed difference

67
Q

What do you use it is more than two populations

A

A nova

68
Q

What does a nova stand for

A

Analysis of variance

69
Q

What is the F statistic in a nova

A

A measure of how different the means are relative to the variability within the group