INFERENTIAL STATISTICS Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

inferential stats?

A

reach conclusions that extend beyond immediate data sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Bernoulli distribution?

A

important case of discrete variables–>Binary only 2 possible outcomes (0 or 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

population parameter?

A

fixed feature of a particular population e.g. pop mean, pop variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

sample stats?

A

quantity that vary from one sample to another (obtain population parameter using random sampling as surveying entire population not practical)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Law of large numbers?

A

as sample size n increases, the sample mean gets closer to population mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Central limit theorem?

A

when sample size large (n>=30),sampling distribution of x is approximately normal, regardless of distribution we started out with

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

hypothesis testing

A

tells us how extreme our sample outcome is. creates a rejection region, beyond which sample too extreme to maintain that null hypothesis is true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

standardisation

A

Z=(x-mean)/SD Z~N(0,1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

test stat Z

A

(p observed-p)/sample variance (reject if >1.96)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

reject Ho?

A

p-value<0.05

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

95% Confidence Interval

A

(pop mean-1.96SD, pop mean+1.96SD) reject if observed P not in range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

import data from file?

A

Auto=read.csv(‘link’,header=TRUE,na.strings=’?’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

class of Auto?

A

‘data.frame’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

structure of data?

A

str(Auto)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

headers of data?

A

head(Auto)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

names of variables?

A

names(Auto)

17
Q

number of observations and variables?

A

dim(Auto)

18
Q

frequency of each observation under a origin variable?

A

table(Auto$origin)

19
Q

recoding data for ‘origin’? (check using table(Auto$originf)

A

Auto$originf = factor(Auto$origin,

labels = c(“USA”, “Europe”, “Japan”))

20
Q

create new data.frame without variable ‘origin’?

A

new_data=subset(Auto,select=c(-origin))

21
Q

identify number of rows with missing values (NA)?

A

sum(is.na(Auto))

22
Q

locate entries (which row and column) with missing values

A

which(is.na(Auto),arr.ind=TRUE)

23
Q

remove rows with missing values?

A

Auto=na.omit(Auto)

24
Q

summarising data for a variable?

A

mean(Auto$variable)
median(Auto$variable) –> quantile(Auto$variable,0.5)
max(Auto$variable),min(Auto$variable) (minus will range)
var(Auto$variable)
sd(Auto$variable)

25
Q

5 number summary?

A

quantile(Auto$variable) OR summary(Auto$variable)

26
Q

interquartile range?

A

IQR(Auto$variable)

27
Q

covariance n correspondance of variable?

A

attach(Auto)
cov(var1,var2)
cor(var1,var2)

28
Q

barplot of variable?

A

barplot(summary(Auto$variable), xlab= ‘label’, ylab=’frequency’,col ‘wheat’)

29
Q

histogram of variable?

A

hist(Auto$variable, breaks=20, xlab=’variable (#bin=20)’, ylab=’frequency’, main=’’, col=’wheat’)

30
Q

side by side graphs with 1 row n 2 columns?

A

par(mfrow=c(1,2))

31
Q

box plot of variable?

A

boxplot(Auto$variable, col=’wheat’, main=’title’, horizontal=TRUE)

32
Q

Detect outliers based on IQR: [Q1 - 1.5IQR, Q3 + 1.5IQR]?

A

boxplot.stats(Auto$variable)$out

33
Q

Locate the outliers in the dataset

A

outlier= boxplot.stats(Auto$variable)$out
outlier_row=which(Auto$variable)%int%c(outlier))
Auto[outlier_row, ]

34
Q

Detect outliers based on percentile: 2.5% - 97.5%

A

lower=quantile(Auto$variable,0.025)
upper=quantile(Auto$variable,0.975)
outlier_row=which(Auto$variable>upper|Auto$variable)

35
Q

scatterplot?

A

plot(Auto$var1,Auto$var2, xlab=’var1’, ylab=’var2’)