Statistik Flashcards

1
Q

Vad innebär negative growth rate?

A

Exempelvis att Working age people decreases because less childen are born, and non working people (60–>) increases because people live longer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Vilka är the four basic steps in statistics?

A
  1. Gathering data
  2. Understanding data
  3. Modeling of data
  4. Conclusions from data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does population means?

A

A gathering of all elements with something in common.

Ex: The total number of students in a city

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is sample?

A

A collection of elements drawn from the population.

A sample is a smaller, manageable version of a larger group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is sampling frame?

A

A list of all the elements of population or the source material or device from which the sample is drawn.
It is a list of all those within a population who can be sampled, and may include individuals, households or institutions.

It’s a complete list of everyone or everything you want to study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the different between a population and a sample frame?

A

The population is general and the frame is specific.
For example, the POPULATION could be “People who live in Jacksonville, Florida.”
The SAMPLE FRAME would name ALL of those people, from Adrian Abba to Felicity Zappa.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does it means that the sample frame is Over coverage?

A

That the sampling frame contains elements that are not a part of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does it means that the sampling frame is under coverage?

A

That there are elements in the population that are not included in the sampling frame.

Ex. 100 homeless people, but only 90 registered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the 4 levels of data?

A
  1. Nominal data
  2. Ordinal data
  3. Interval data
  4. Ratio data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What defines Nominal data?

A

Nominal data is the lowest level of data.

It can be classified into categories and they have no natural order.

Ex: gender (male/female), eye color etc.

MODE!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What defines ordinal data?

A

The second lowest level of data.

It can be classified into categories with natural order and the order is significant.

Measured of non-numeric concepts lie satisfaction, level of happiness etc.

Ex: order from Very satisfied to not at all satisfied.

MODE & MEDIAN!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What defines interval data?

A

The second highest level of data

The data is numerical but lacks an absolute zero, meaning that when the measure is zero, there is nothing at all.

Interval scale tells us about the order and also about the value between each item.

Ex: Temperature, time. (Kan inte va = 0)

MODE, MEAN & MEDIAN!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What defines ratio data?

A

The highest level of data.

Has an absolute zero.

Ex: Lenght, weight, temperature in Kelvin scale.

MODE, MEAN & MEDIAN!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which measurement is best to use when the data has outliers or extreme values?

A

Median.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Mean kan be effected by outliers/extreme valuer, true or false?

A

True. And therefore mean is not a good option to use when we have outliers or extreme values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does it mean if the modal percentage is close to 100?

A

That the spread is small.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What defines the upper quartile (Q3) ?

A

That it has 25% of the observations above it and 75% below.

18
Q

What is the IQR, Inter quarter range?

A

The difference between Q3-Q1.

19
Q

What is an outlier?

A

An observation that is distance from the other observations.
Outliers can upcome because of variability in the measurement or because of a experimental error (datafel).
They are sometimes excluded from the data set.

If a datapoint is below: Q1 -(1.5xIQR)
OR above: Q3+(1.5xIQR)

20
Q

What is an extreme value?

A

When a data point is below: Q1-(3xIQR)

OR above: Q3+(3xIQR)

21
Q

What does cause and effect mean?

A

That one variable is the cause of the other ones effect. Ex:
x –> y = outdoor temp bestämmer indoor temp
x y = de påverkar varandra, ex vid income and consumption.
x y. En variabel påverkar två andra.
x, y —> ingen relation mellan dem.

22
Q

What are the important points in correlation?

A

It is a numerical measure of relationship between two variables.

The sign of correlation relates to the slope of best fitting line through x and y.

Correlation does not imply causality (orsakssamband)

23
Q

“r” measures something, what?

What does the value lie in-between ?

A

That there is no LINEAR relationship between two variables.

-1 < r > 1

24
Q

What is interpolation?

A

Interpolation is an estimation of a value within two known values in a sequence of values.

25
Q

What is Extrapolation?

A

extrapolation is the process of estimating, beyond the original observation range,

26
Q

What is the quartile? How is it constructed in the data set?

A

The quartile is constructed as four equal sizes of quartiles , which you set up for your data to measure variation and spread.

The quartiles are the three values that split a data set into four equal parts.

27
Q

How do we use the quartile to measure spread?

A

Calculate Q3-Q1 and then we can se how much spread out the middle 50% of the data is.

28
Q

What defines a negatively skewed distribution?

A

That it has a tail to the left.

Mode>Median>Mean

29
Q

What does Standard error mean?

A

It’s the AVERAGE distans between an observation and the estimated regression line. Measures the average size of a residual

30
Q

Vilka är de olika “stegen” på standard deviation?

A

1 Standard deviation = 68%
2 Standard deviation= 95%
3 Standard deviation = 99%

31
Q

What does standard deviation measure?

A

Measure the amount of variation of data values.

the further away from the mean, the higher standard deviation.

32
Q

What does (R2) coefficient of correlation measure? what does r=0 indicate?

A

The strength of a linear relationship between two known variables.
r=0 indicates that there is no LINEAR relationship, but there can still be a relationship.

33
Q

If a histogram is right (or positive) skewed, which is bigger, the mean or the median? Explain
why!

A

The tail is drawn towards the right i.e. towards higher values which makes the mean a higher value and larger than the median because the mean is also pulled towards the right.

34
Q

Given an example of how under coverage can bias (systematically change) the result of a
survey.

A

Say that you want to examine students at university west. The sample frame that you use does not
include students from the engineering department. When asking questions about attitude towards
technique you would get biased answers since engineering students likely to be fonder of technique
than other students

35
Q

For adult Swedish males age 50 the average weight is 84 kg and the standard deviation is 12
kg. How should the standard deviation be interpreted in this setting?

A

On AVERAGE the weight of a man deviates from 84 kg with 12 kg. This means that some weigh 15 kg
more some 5 kg less and so on but on AVERAGE the deviation is 12 kg.

36
Q

Explain how the coefficients a and b in simple linear regression are estimated using the
method of least squares. Explain the basics of the
method of least squares.

A

The least square method finds the line of the best fit for a data set, the line should be as close as possible to the data.
The line will explain the potential relationship between a dependent and one independent variable.

37
Q

The correlation between “Exam score” and “hours spent revising” is 0,82, what does this
mean? What would a correlation of 0 mean?

A

This means that there seems to be some linear relationship between the two variables. Since the
correlation is positive it means that when one of the variables is big the other tends to be big as well.

If the correlation was 0 this would mean that there is no linear relationship between the variables.

38
Q

Explain how the residuals for a regression model is calculated and give at least one possible
source of variation in the residuals

A

The residuals are calculated by subtracting the estimated value of the observation from the actual
observation using the linear model.
Two possible sources of variation is: Random measurement errors and model error, i.e. using a linear model when for instance the true relationship is quadratic.

39
Q

Explain the mode and mode proportion

A

The mode is a measure of location for nominal or ordinal data with few categories. The mode is the
most commonly occurring category, i.e. the mode is a category.
The mode proportion is a measure
of spread used for the same data types as the mode. It is the proportion of the answers that the
mode has, i.e. the mode proportion is a number. The higher the mode proportion the less spread in
the data and the lower the mode proportion the higher the spread.

40
Q

When is the variable income statistically significant?

A

When the value in the ”sig”-column is lower than 0,005 (0,5%)