Basics Flashcards

1
Q

What is a variable?

A
  • A characteristic or measurement that can be determined for each member of the population (age)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A numeric variable
- Continuous variable
- Discrete variable

A

Numeric: Variables that are expressed in numbers

Continuous variable: (we measure it): - Any value within a range (height (measured in cm.), weight (measured in kg.), income (measured in kr.))

Discrete variable: (we count it): - Values are whole numbers (number of cars sold, number of steps)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A Categorical variable

A
  • Qualitative data (categorized data) (education, yes or no to a term deposit)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a population?

A
  • The entire group of individuals or items of interest
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a sample/sampling?

A
  • A subset of individuals or observations from a population, in which we use to make inferences about the population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are inferences?

A
  • Conclusions drawn from data – goal: make predictions or understand patterns
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a statistic?

A
  • Numbers that describe the sample (sample mean, sample variance)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a parameter?

A
  • Numbers that describe the entire population (population mean, population variance)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Variance

A

variability from the mean. Measures how much the values in a dataset, on average, deviate squared from their mean – it describes the dataset’s spread. Note: decreases with larger sample sizes.
Example: average height, 170 cm. 2 observations, 160 cm & 180 cm.
deviate squared: (160-170)^2 = 100, (180-170)^2 =100
Variance: (100+100) / 2 = 100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Standard deviation

A

measures how far each score lies from the mean (square root of variance). Eksempel (as before). Square root of 100 = 10. The observation deviate on average by 10 units from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why samples

A

Cost are reduced and it is simpler to analyze compared to the whole population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Simple random sampling

A

Everyone has an equal chance of being selected. And the sample is selected independently of each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Stratified random sampling

A

The population is divided into subgroups and then randomly selected from each group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Cluster sampling

A

Population is divided into clusters, and then we randomly select clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Central limit theorem

A

if the sample is sufficiently large the sample’s mean will follow a normal distribution regardless of the population’s distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Standard error

A

Measures how much a sample statistic, like sample mean, is expected to vary from the true population value (due to sampling variability) Note: larger sample sizes yield smaller standard errors – inferences more precise

17
Q

Point estimator

A

calculate an estimate of an unknown population parameter based on sample data
Sample mean: x ̅ is a point estimator of the population mean μ
Sample proportion: p ̂ is a point estimator of the population proportion p

18
Q

Sample proportion

A

the ratio of individuals in a sample that possess a certain characteristic (40% smokes)

19
Q

Unbiasedness (estimator):

A

the expected value of an estimator equals the true value of the population parameter it is estimating. No overestimation or underestimation of a population parameter. Biased (I forlængelse): finding maxes of a dataset & sample standard deviation (s) is a biased estimator of the population standard deviation

20
Q

Efficiency

A

refers to how well an estimator uses the data to estimate a population parameter, relative to other estimators. If there are several unbiased estimators of a parameter, then the unbiased estimator with the smallest variance is called the most efficient estimator,

21
Q

Sample distributions of sample variances

A

allow to make inferences of the population variance. Essential for quality control and understanding process variability