Book - Chapter 3 basic analytics in r Flashcards

(69 cards)

1
Q

What is the function to import data

A

Read.csv

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the head function do

A

Examines the imported dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the summary function do

A

Provide some descriptive statistics, such as mean and medium, for each data column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When referring to a column in a dataset what symbol should you use

A

The $

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How would you plot linear regression

A

Lm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does our software use

A

Commandline interface

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you set a working directory

A

Set WD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the categorical/qualitative attribute types

A

Nominal and ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the numeric/quantitative attribute types

A

Interval and ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are nominal data types

A
ZIP Codes
 nationality 
street names 
gender 
employee ID number 
true or false
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is an ordinal data type

A

Ordered names for example
quality of diamonds
academic grades
magnitude of earthquakes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is interval data types

A

Numeric with no true zero for example Celsius or Fahrenheit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is ratio data type

A

Numeric with a true zero for example age or temperature in Kelvin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a vector

A

Set of values of the same data type.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What can you use to create vectors

A

The combined function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What dimension is a vector

A

They are dimensionless

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a 2dimensional array

A

Matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is an array

A

N dimensional set of homogenous data type values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does the function nrow and nCol do

A

Define the number of rows and columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a DataFrame

A

Like a spreadsheet and list but all columns are the same length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Can data frames stored different data types

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the list

A

A list is a collection of vectors and to be different lengths

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a factor

A

A set of categorical variables.

Fix set of values and use integer code to represent different values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does variance mean

A

The distance from means squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What does standard deviation mean
The square root of variance
26
What does ranged mean
Minimum to maximum
27
What is interquartile range
25% to 75% of the size order data
28
Why do we visualise
To get a sense of the data
29
What should we visualise
Mean versus median. Standard deviation. Quantiles. Correlations between variables
30
What does anscombes quartet do.
Illustrates the importance of visualising data. Uses for data sets. Each day is to set is plotted as a scatterplot and then fitted with lines with the results of applying linear regression
31
What should you do if the data is skewed
Logit
32
What does bimodal mean
It has more peaks
33
What is data cleansing
Eliminating dirty data
34
What is a plot function q
Scatter plot where x is the index and y is the value
35
What is a barplot function
Barplot with vertical and horizontal bars
36
What is a dot chart
Cleveland dot plot
37
What is a plot (density (data))
Density plot. A continuous histogram
38
What is a stem function
Stem and leave plot
39
How many variables can a scatter plot have
5
40
What is a loess line do?
Fit a non linear line to the data
41
What charts can be used to visualise multiple variables.
Barplot and dotchart
42
When would you use a hexbinplot
When dealing with large data sets
43
What is pairwise plot
A scatter plot matrix
44
What is the seasonality effect?
If a small peak or fall happens the same time every year or time series
45
What is the basic concept for hypothesis testing?
To form an assertion and test it with data
46
What is the null hypothesis
No difference
47
What does it mean if the regression coefficient is zero
The null hypothesis
48
What is the basic testing approach
To compare the observed sample means
49
What does a large absorb difference between the sample means indicate
That the null hypothesis should be rejected
50
For the difference in means how can this be tested
Students t-test or Welches t-test
51
What is students t-test
Assumes that distributions of two populations have equal but unknown variances
52
In students t-test if each population is normally distributed with the same main and with the same variance what do you do
On the T city stick follows a T distribution with degrees of freedom
53
If the observed value of t is far enough from zero what should you do
Reject the null hypothesis
54
What is a significance level
The small probability
55
What is the significance level of the test
The probability of rejecting no hypothesis, when the no hypothesis is actually true
56
What is the normal significance level
0.05
57
What is different in a two sided hypothesis test
It is necessary for the sum of probabilities and the both tales of the t-distribution to equal the significance level
58
What is the P value
Area under the tail
59
What is a confidence interval
Is an interval estimate of the population para meter or characteristic based on sample data
60
How is the confidence interval used
It is used to indicate the uncertainty of a point estimate
61
What is Wilcoxen tank sum test
Makes no assumption about the distributions of populations. Robust test for difference and means
62
What is the type one error
Rejection of the null hypothesis when the normal hypothesis is true
63
What is the Type II error
And acceptance of the null hypothesis when the no hypothesis is full
64
What is significance
Probability of a full positive
65
What is power
Probability of a true positive
66
What is affect
The size of the observed difference
67
What do you use it is more than two populations
A nova
68
What does a nova stand for
Analysis of variance
69
What is the F statistic in a nova
A measure of how different the means are relative to the variability within the group