Final summary Flashcards
population
consists of all the items or individuals about which you want to draw a conclusion
sample
is the portion of the population selected for analysis
measuring
means linking numerical values to research objects
Observation unit i.e. statistical unit
is a single research object
observation
is the measured result (value) that is related to one research object
variable
is a characteristic of an item or
an individual
Discrete variables
arise from a counting process
Continuous variables
arise from a measuring
process
Sampling methods can be categorized
Probability - s in which the elements being included have a known chance of being selected
Non-probability samples - participants are selected in a purposeful way.
Probability sampling methods
Simple random sampling
- Systematic random sampling
- Stratified sampling
- Cluster sampling
- Sequence sampling
Probability samples are samples in which the elements being included have a
known chance of being selected.
Non-probability sampling methods
-Judgment sampling
-Quota sampling
Non-probability samples are ones in which participants are selected in a purposeful way.
Systematic random sampling
The sampling units are chosen from the sampling frame at a uniform interval at a
specified rate
- Sampling interval k = N/n (N = size of the population, n=sample size
- The starting point is selected from the first interval and the very kth element is selected
- For example: N = 200, n = 10 → k = 200/10 = 20
Simple random sampling
elements in the whole population are numbered and selected by using random numbers
Stratified sampling
the population devided into exclusive strata/groups (based on
nationality, profession, gender….)
- Each element can be included only in one strata
- Sample is drawn randomly from each strata/group
Cluster sampling
- each cluster represents the whole population
- random clusters are selected to the sample
- -selected clusters are included fully or randoms samples are selected
Sequence sampling
elements are picked up sequentially until the results do not change anymore
Judgement sampling
Relies on judgement or expertise
Quota sampling
The first step is to estimate the sizes of the various subclasses or strata in the population
-sampling continues until each quota is full
- Even quota
- Proportional quota
- Optimal quota
Even quota
the same number of elements is picked from each strata (e.g. 100
male and 100 female)
Proportional quota
if in population 60% male and 40% female, the sample
is drawn in the same proportions: 60 male + 40 female = total sample size 100
Convenience sampling
the participant are self-selecting, there is no sample design
Sampling frame
a list of the population
Sample size
is affected by the desired accuracy of the results
p= (confidence of p)
percentage in the sample, The maximum margin of error (e ) is reached with p = 50%
S
standart deviation, Measure the average scatter around the mean
e
margin of error
Z
critical value
Confidence of mean
method of calculating a sample size, based on deviation of the population mean (margin of error)
Confidence of percentage
method of calculating a sample size, based on percentage margin of error
Margin of error tables
- presents the effect of the sample size on the margin of error of percentages and a 95% confidence level
- presents the sample size based on margin of error and population
size
Cumulative percentage distribution
divided the cumulative frequency by the number of observations, F%
Frequency
number of observations, f,
Number of each value of the variable in the sample
Cross tabulation
presents the results of two (or more) variables
The table contains frequencies in each cell at the intersection of rows and columns
Cumulative frequency
is the running total for the data, F
Relative frequency
is the frequency in each class divided by the total number of observations, f%
Numerical descriptive measures are classified
as measures of central tendency and
measures of variation and shape
Measures of central tendency
mode, median, mean, quartiles and fractals
Mode
The value in a set of data that appears most frequently
Median
● The middle value in a set of data that has been ranked from smallest to largest
● Half the values are smaller or equal to the median and half the values are larger or
equal to the median
● If there is an even number of values, the median is either of the two values in the middle, or mean of the two middle values
Arithmetic mean
The arithmetic mean (often simply called the “mean” or “average”) is a measure of central tendency that represents the sum of all values in a data set divided by the number of values.
ˉ
X is the arithmetic mean.
Xi
represents each individual value in the data set
Quartiles
Arrange the data in ascending order.
Find Q2 (the median):
If the number of data points is odd, the median is the middle number.
If the number of data points is even, the median is the average of the two middle numbers.
Find Q1 (first quartile):
The first quartile is the median of the lower half of the data (excluding the overall median if the number of data points is odd).
Find Q3 (third quartile):
The third quartile is the median of the upper half of the data (excluding the overall median if the number of data points is odd).
Fractals
It is any other division of the data. It is necessary that the data can be arranged in descending or ascending order
Range
Largest value minus the smallest value
Interquartile range
● Interquartile Range = Q3 - Q1
● Extreme values do not affect
Normal disrtibution
Normality is tested by: Kolmogorov-Smirnov and Shapiro-Wilk tests
- If the sample size is less than 50 Shapiro-Wilk test is used, if over 50, Kolmogorow-Smirnov test is used
- If sig.>0.05 -> the variable is normally distributed
statistical testing
if some phenomenon is present in the sample, is it also present in the population. Statistical testing tells which of the hypotheses is supported
Hypothesis
is some theory of a particular parameter of the population
Null hypothesis
is always formed as “no difference” or “no correlation” H0: σ1 = σ2
Parametric tests assume
- Data at the interval or ratio level of measurement
- Normal distribution of the population (the test variable is normally distributed)
p-value
is the probability of getting a test statistic equal to or more extreme than the
sample result, given that the null hypothesis is true.
The p-value is often referred to as the observed level of significance
If Р is less than 0,05 => Н1 correct
correlation
relationships between variables
Knowing the value of value X, we can say something additional about Y
Variation
measures the spread of values in a data set
The shape of a data set
represents a pattern of all values, from lowest to highest value
If sig (p-value) < 0,05
Н1 correct
methods to analyse statistical correlation
- Cross tabulation
Chi-square (χ2) - Statistical measures
o Pearson’s correlation coefficient
o Spearman’s rank-order correlation (non-parametric)
o Partial correlation
Scatter plot
From the scatter plot you can see approximately
1) existence of correlation
2) character of the correlation
3) Extreme values (outliers)
Report the result in Chi square test
There is a statistically significant correlation between gender and choice of
department (because Chi-square p = 0,013 < 0,05). Male students are typically
most often studying Business administration, while females are divided between Business administration and International business more evenly
The credibility of the Chi-square test
if the test is credible, the expected values in each cell should not be less than 1
and the expected value can be under 5 only in 20% of the cells
Pearson’s correlation coefficient
r | > 0.7 strong linear correlation
0.3 ≤ | r | ≤ 0.7 average linear correlation
| r | < 0.3 weak linear correlation
if p < 0.05
there is a statistically significant correlation between the variables
Practical interpretation of the results of Pearson’s
There is a positive (r = 0,363),
average (0,3 < | r | < 0,7)
linear correlation between gender
and participation in lessons.
On average, women participate
more than men.