1. Data and Models Flashcards

Question 1

Q

Population

Definition

Answer

A

-a collection of individuals/items of interest

Question 2

Q

Sample

Definition

Answer

A

-the subset of the population for which observations are available

Question 3

Q

Variable/Variate

Definition

Answer

A

-a quantity or attribute whose value varies between individuals

Question 4

Q

Observation

Definition

Answer

A

-a recorded value of a variate for an individual

Question 5

Q

Data

Definition

Answer

A

-a collection of observations

Question 6

Q

Statistic

Definition

Answer

A

-a function of the data

Question 7

Q

Summarising Numerical Data

min and max

Answer

A

-the minimum and maximum values of the data

Question 8

Q

Summarising Numerical Data

Measures of Location

Answer

A

summary statistics which try to capture the location of the centre of the sample
1) sample mean
2) mode
3) median

Question 9

Q

Summarising Numerical Data

Sample Mean

Answer

A

-the sample mean or sample average of x1,…,xn∈R is given by:
1/n Σ xi

Question 10

Q

Summarising Numerical Data

Mode

Answer

A

the mode of a sample x1,…,xn is the value of the variate which occurs most frequently
in cases where different values occur with the same frequency the mode may not be unique

Question 11

Q

Sumarising Numerical Data

Meidan

Answer

A

-a median of x1,…,xn∈R is any number m∈R such that:
a) at least half of the observations are less than or equal to m
AND
b) at least half of the observations are greater than or equal to m
-if the number of samples is odd, there is a unique median
-if the number of samples is even even, the median can fall anywhere in the interval between the middle two values, we usually choose the midpoint

Question 12

Q

Summarising Numerical Data

Measures of Spread

Answer

A

statistics which characterise the spread of the sample
1) range
2) sample variance
3) sample standard deviation
4) interquartile and semi-interquartile range

Question 13

Q

Summarising Numerical Data

Range

Answer

A

-the range of a sample of numeric observations x1,…,xn∈R is the interval:
[min xi , max xi]
-i.e. the smallest interval which contains all the data

Question 14

Q

Summarising Numerical Data

Sample Variance

Answer

A

-the sample variance of x1,…,xn∈R is given by:
sx² = 1/(n-1) Σ(xi-x^)²

where x^ is the sample mean
the sample variance is nearly the average squared distances between samples and the sample mean, only the denominator is is n-1 instead of n

Question 15

Q

Summarising Numerical Data

Sample Standard Deviation

Answer

A

sample standard deviation is the square root of the sample variance
large values of sx indicate that the samples are spread out, while small values of sx indicate that the samples are concentrated around the sample mean

Question 16

Q

Summarising Numerical Data

α-quanitles

Answer

A

the idea of α-quantiles is to split the samples into two groups such that αn samples are smaller than qα and (1-α)n samples are larger than qα
the value of qα that leads to such a split is an α-quantile, depending on n, α and x, the α-quantile may or may not be unique

Question 17

Q

Summarising Numerical Data

first and third quartiles

Answer

A

using the definition of the α-quantile, qα:
the value q1/4 is called the first quartile
q3/4 is called the third quartile

Question 18

Q

Summarising Numerical Data

interquartile and semi-interquartile range

Answer

A

-the difference q3/4-q1/4 is called the interquartile range
-and:
(q3/4-q1/4)/2 is called the semi-interquartile range

Question 19

Q

Semi-Interquatile Range vs Sample Standard Devitation

Answer

A

the semi-interquartile range can be used as an alternative to the sample standard deviation
its definition is slightly more complicated but the semi-interquartile range is less affected by outliers than the sample standard deviation
i.e. the semi-interquartile range is a robust measure of the spread of a sample

Question 20

Q

Summarising Attribute Data

Answer

A

since the observations of attribute data do not consist of numbers, the mode is the only one of the summary statistics from the previous section which can be computed for attribute data
often the best way to summarise attribute data is to consider tables which show how often each of the possible values occurs

Question 21

Q

Statistical Model

Definition

Answer

A

-a statistical model for a sample x1,…,xn consists of random variables X1,…,Xn chosen such that the data x1,…xn ‘look like’ a random sample of X1,…,Xn

Question 22

Q

Fitting a Model

Answer

A

one of the main concerns in statistics is to ‘fit a model’ to given data
i.e. to find a distribution for the random variables X1,…Xn such that the data could plausibly be a random sample from the model

Question 23

Q

Questions about the relation between data and models

Answer

A

1) what are the best parameter values to use in the model -> parameter estimation
2) which parameter values in the model are compatible with the data -> confidence intervals
3) could the data have been produced by a given model with given parameter values -> hypothesis tests

Question 24

Q

Models in R

r

Answer

A

-generates n random numbers from the sample

Question 25

Q

Models in R

d

Answer

A

-densities (weights for the discrete case)

Question 26

Q

Model in R

p

Answer

A

-cumulative distribution functions

Question 27

Q

Models in R

q

Answer

A

-quartiles

Question 28

Q

Models in R

Distributions

Answer

A

binomial : binom
chi-squared: chisq
exponential: exp
gamma: gamma
normal: norm
poisson: pois
uniform: unif

Question 29

Q

Sampling Attribute Data

Answer

A

-to generate independent, random samples from a model for an attribute value, the command:
sample(values,n,replace=TRUE,prob=p)
-can be used
-values must be a vector of the possible values of the attribute, and p must be a vector of the same length as values giving the corresponding probabilities of each value
-if all possible values have the same probability, the argument prob=… can be omitted

1. Data and Models Flashcards

Summarising numerical data, summarising attribute data, fitting a model