Kapitel 1-3 Flashcards

1
Q

elements

A

the entities on which data is collected’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

variable

A

is the different values of interest which gives us different outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

nominal scale

A

A scale of measurement, when the data of a variable consits of label or names to identify an attribute of the element. Can also be numeric when the number stands for a label or name.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ordinal scale

A

A scale of measurement, the same as nominal scale + the order or rank of the data is meaningful. Is non-numeric but can be numeric when the number stands for a label/name

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

interval scale

A

A scale of measurement, the same as ordinal scale + the interval between values are expressed in terms of a fixed unit of measure. Always numeric. The differences between two numeric values has meaning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

ratio scale

A

A scale of measurement, the same as interval scale + the ratio of two values are meaningful. Exampel: disrance, height, weight and time. The scale requires that a zero value means zero point for the value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

quantatitive data

A

require numeric variables that indicate how much or how many

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

cross-sectional data

A

data collected at the same or approximately the same time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

time series data

A

data collected over several time periods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

descriptive statistics

A

Summarize of data through tabular, graphs or numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

population

A

the set of all elements of interest in a particular study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

sample

A

subset of population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

frequency distribution

A

is a tabular data summary showing the frequency of items in each of several non-overlapping classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

relative frequency distribution

A

tabular summary shoing relative frequency for each class. relative frequency of a class = frequency of the class / n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

percentage frequency distribution

A

summarizes the percentages frequency for each class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

bar chart

A

is a graphical representation of a frequency, relative frequency or percetages frequency with two axises (data and frequency)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

pie charts

A

a circel representing data for frequency, relative frequency or percentages frequency. Divided into sectors corresponding to its relative frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

histogram

A

is a chart showing quantatitive data in a frequency distribution

19
Q

cumulative frequency distributions

A

Show the frequency of classes but divides the classes in “less than or equal to the upper class limit” of each class.

20
Q

dot plot

A

graphical summarizes data, equal many dots as the frequency

21
Q

stem-and-leaf display

A

shows both the rank order and shape of data set.

22
Q

cross tabulations

A

a tabular summary of data for two variables

23
Q

simpsons paradox

A

the problem which can occur when two variabels aggregates and gives reversed relation between the variables in comparsion when they are not aggregated

24
Q

clustered bar charts

A

shows joint distribution of two categorical variabels

25
Q

stacked bar charts

A

shows joint distribution of two cateogorical variabels

26
Q

scatter diagram

A

graphical representation of the relationships between two quantatitive variabels. Trend line provides a approximation of the relationship

27
Q

percentile

A

provides information about how the data are spread over the interval from the smallest to largest value.
“the pth percentile is a value such that at least p percent of the observations are less than or equal to this value and at least (100-p) percent. of the observations are greater than or qual to this value”

28
Q

quartile

A

Division points:
First quartile - 25th percentile
Second quartile - 50 th percentile (also the median)
Third quartile - 75th percentile

29
Q

range

A

Range = largest value - smallest value
Highly affected by extrem high and low values.

30
Q

interquartile range

A

IQR = third quartile - first quartile

31
Q

variance

A

is a measure of variability that uses all data values and is based on the difference between each data value and the mean

32
Q

standard deviation

A

is defined as the positive square root of the variance

33
Q

coefficient of variation

A

describes how large the standrad deviation is relative to the mean –> standard deviation/mean x 100

34
Q

Distributional shape

A

histograms with relative frequency distributions shows the skewness, the distibutional shape

35
Q

negative, positive, zero skeweness

A

negative skeweness - skewed to the left
positive skeweness - skewed to the right
zero skeweness - symmetrical

36
Q

z-score

A

represents the number of standard deviations Xi from the sample mean, are standardized values

37
Q

Chebyshev´s theorem

A

enabels us to make a statements about the proportion of data values that lie within a specified numer of standrad deviations of the mean. “At least (1-1/z^2) x 100 of the data values must be within z standard deviations of the mean, where z is any value greater than 1”

38
Q

Cheabysthev´s theroem - percent

A
  • at least 75% of the data values be within z=2 standard deviations of the mean
  • at least 89% of the data values be within z=3 standard deviations of the mean
  • at least 94% of the data values be within z=4 standard deviations of the mean
39
Q

empirical rule

A

When the approximation of a distributions is bell-shaped can the empirical rule be used to determine the approximate percentage of data values that lie within a specified number of standard deviations of the mean.

40
Q

Empirical rule - percent

A

For data with bell-shaped distribution:
- approximately 68% of the data values lie within 1 standrad deviation of the mean
- approximately 95% of the data values lie within 2 standrad deviation of the mean
- allmost all data lies within 3 standard deviations of the mean

41
Q

outliers

A

Extreme values in a data set.
Z-scores (standrdized values) can be used to identify outliers. In bell-shaped distributions, the empirical rule says that any data with z-scores less than -3 or more than 3 can be said to be outliers

42
Q

Five-number summary

A

Used to summarize data:
1. Smallest value
2. First quartile
3. Median
4. Third quartile
5. Largest value

43
Q

Box plot

A

Graphical version of the five-number summary.
1. A box is drawn with the box ends located at first and third quartile
2. A line is drawn across the box at the location of the median
3. By using the IQR, limits are located. The limits are 1.5(IQR) below first quartile and 1.5(IQR) above thrid quartile. Data outside these limit are considered outliers.
4. Whiskers (dashed lines) are drawn from the ends of the box to the smallest and largest value.
5. The outliers is shown with *

44
Q

Covariance

A

a descpritive measure of the linear association between two variables