Sampling Flashcards

1
Q

Secondary data

A

Data already exists. Different statistics, registers and data bases produced and maintained by public organisations and authorities are available for research

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Preliminary data

A

Collected by resercher data, may usually be collected by survey, interviews, observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Survey research

A

Self-administered questionnaires

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Interviews (structured)

A

Respondent and interviewer talk face-to-face or on telephone or online

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Observation

A

It records actions as they occur by monitoring people, actions or situations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

experimentation

A

the researcher manipulates selected independent variable and
measures the effects of these manipulations on the dependent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sampling methods

A

Probability and non-probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Probability samples

A

are samples in which the elements being included have a known chance of being selected
- Simple random sampling
- Systematic random sampling
- Stratified sampling
- Cluster sampling
- Sequence sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Non-probability samples

A

are ones in which participants are selected in a purposeful way
-Judgment sampling
-Quota sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Simple random sampling

A

every element if the population has the same probability of being selected to the sample
- elements in the whole population are numbered and selected by using random numbers
- suitable for homogenous populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

N

A

population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

n

A

sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Systematic random sampling

A

Sampling frame: a list of the population
- The sampling units are chosen from the sampling frame at a uniform interval at a
specified rate
- Sampling interval k = N/n (N = size of the population, n=sample size
- The starting point is selected from the first interval and the very kth element is selected
- For example: N = 200, n = 10 → k = 200/10 = 20

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Stratified sampling

A

dividing the population into mutually exclusive strata/groups (based on
nationality, profession, gender….)
- Each element can be included only in one strata
- Sample is drawn randomly from each strata/group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Cluster sampling

A

population consists of mutually exclusive groups called clusters (e.g. municipalities, towns, postal code areas….)
- - each cluster represents the whole population
- random clusters are selected to the sample
- -selected clusters are included fully or randoms samples are selected from those
clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Sequence sampling

A

elements are picked up sequentially until the results do not change anymore
used in quality control

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Judgement sampling

A

Relies on sound judgement or expertise.
- It depends on selecting elements that are believed to be typical or representative
of the population
- Requires knowledge of the topic and population
- Results should be interpreted with careful consideration
- Used typically in preliminary investigations, questionnaire testing or to generate
ideas, points of view or developing hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Quota sampling

A

The first step is to estimate the sizes of the various subclasses or strata in the population.
- The relevant strata to the study have to be specified
- E.g. based on demographics like age, gender, family status, socioeconomic
group…
- -sampling continues until each quota is full
Even quota
Proportional quota
Optimal qouta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Even quota:

A

the same number of elements is picked from each strata (e.g. 100
male and 100 female)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Proportional quota

A

(e.g. if in population 60% male and 40% female, the sample
is drawn in the same proportions: 60 male + 40 female = total sample size 100)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Optimal quota:

A
  • large size or large variation - > more
  • high sampling costs -> less
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Convenience sampling

A
  • there is no sample design
  • what is convenient or easy from the point of view of the researcher
  • the researcher is not drawing the sample, the participant are self-selecting
  • For example in a student research:
    o meeting students on campus
    o leaving questionnaires in the lobby
    o posting a questionnaire link to the student web site
  • Risk: biased sample which does not represent the whole population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

SAMPLE SIZE

A
  • Sample size is affected by the desired accuracy of the results, time and money
  • Sometimes the samples size is increased until the results are accurate enough: sequence sampling
  • Populations under 300 are fully investigated
  • National research sample sizes often 1000-2000, local 150-300
  • Every group to be analyzed should include at least 30
  • Outliers or extreme cases distort the results, if the sample size is small
  • Accuracy of the results increases in proportion to the square root of the sample
    size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Confidence of mean

A

Using this formula we can calculate how much sample volume we need to get the right confidence level.

We are given Critical value (Zα/2) - depending on how confident we want to be in the accuracy of the result.

The more confident we want to be, the larger the sample should be.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Independent Variable
Can be measured without relying on other variables. Example: A person's weight.
26
Dependent Variable:
Requires information about independent variables. Example: BMI, which depends on both weight and height.
27
Constant Variables
Constant: A characteristic that does not change across individuals in the study. Example: In a study on students, their status as "students" is a constant.
28
Primary Data:
Collected for a specific research project. Example: Using surveys to gather student feedback.
29
sampling techniques
used to select representative groups from a population for research purposes.
30
Secondary Data:
Pre-existing data collected for other purposes. Example: Data from Statistics Finland.
31
ˉ X
is the sample mean
32
Sx
standart deviation
33
Zα/2
= critical value, Usually, unless there is a specific requirement for accuracy level, 95% accuracy is used 90% = 1.64 95% = 1.96 99% = 2.58 99.9% = 3.30
34
объяснить формулу и посчитать по ней размер сампла confidence of mean
см тетрадь
35
объяснить формулу и посчитать по ней размер сампла Confidence of percentage
см тетрадь
36
p в формуле расчета сампла
percentage in the sample
37
е
margin of error = deviation
38
Margin of error tables
The following table presents the effect of the sample size on the margin of error of percentages and a 95% confidence level
39
что за таблица на стр 20, посчитай ответ на задание под таблицей
The following table presents the sample size based on margin of error and population size.
40
Closed ended questions
everyone must find a suitable option Yes – No answers Scales
41
Open ended questions
What do you think about this?
42
Mixed questions
- answering options include “other” option
43
RESEARCH DATA
The research data is saved in a table format in the analysis software One row in the data contains information on one research unit (e.g. the responses of one respondent). The first row on the table contains the name of the variable One column on the table contains the values of one variable (e.g. respondent’s age). The first column contains the number of each statistical unit (e.g. number of the respondent or questionnaire). The values of the variables are typically saved in number format. If the variable is not numeric by nature (e.g. gender), the researcher assigns number codes for each value (e.g. 1= male, 2= female).
44
Describing the data
is the first step in data analysis even if the aim of the study is explanatory. The purpose is to summarise the information in a more easily interpretable format. This is done by presenting the data in tables, charts and numerical measures.
45
Relative frequency
is the frequency in each class divided by the total number of observations. Usually in the tables, percentage distribution (f%) is presented.
46
Frequency
= number of observations, count (f) Number of each value of the variable in the sample, the number of times a particular value appears in the dataset.
47
Cumulative frequency (F)
presents information about the number of items that are less than a certain value (накопительный итог) the sum of the frequencies of all previous values up to and including the current value. This helps you understand how many values are less than or equal to the current value.
48
Cumulative percentage distribution
(F%) presents the percentage from all observation shows the percentage of values accumulated as the values in the sample increase. This indicator allows us to see what percentage of the total number of values is below or equal to each particular value.
49
Frequency tables
is a tabular representation of data that shows how often (with what frequency) different values occur in a data set. A frequency table helps us understand the distribution of data, identify the most frequent values (modes) and identify patterns. A table usually consists of two main columns: Value - unique values or categories of data. Frequency - The number of times each value appears in the dataset. In Excel frequency tables are created as Pivot-tables The number of observations in a sample is marked with capital letter N ● Part of the sample is marked with small letter n
50
Cross tabulation (or contingency table)
presents the results of two (or more) categorical variables Key Elements of Cross Tabulation: Variables: Cross tabulation involves at least two variables. One variable is represented by the rows and the other by the columns. Cells: Each cell in the table shows the frequency (count) or percentage of occurrences for the intersection of two variables.
51
Numerical descriptive measures
are classified as measures of central tendency and measures of variation and shape.
52
Measures of central tendency
The central tendency is the extent to which all the data values group around a typical or central value. The measures of central tendency are mode, median, mean, quartiles and fractals.
53
Mode
● The value in a set of data that appears most frequently ● Multiple modes can exist on a data set
54
Median
● The middle value in a set of data that has been ranked from smallest to largest ● Half the values are smaller or equal to the median and half the values are larger or equal to the median ● Data has to be measured on ordinal , interval or ratio scale ● If there is an even number of values, the median is ● either of the two values in the middle, or ● mean of the two middle values
55
Arithmetic mean
The arithmetic mean (often simply called the "mean" or "average") is a measure of central tendency that represents the sum of all values in a data set divided by the number of values. It provides a general idea of the "typical" value in the dataset. Sensitive to outliers: Extreme values can significantly affect the mean ˉ X is the arithmetic mean.
56
ˉ X
is the arithmetic mean
57
Xi
represents each individual value in the data set
58
Найди Q1, Q2, Q3 здесь 5, 7, 8, 12, 13, 14, 18, 21, 23, 25
Q1 = 8 Q2 = 13.5 (the median) Q3 = 21
59
Quartiles
are statistical measures that divide a data set into four equal parts, each representing 25% of the data. Quartiles help identify the points where data is split into quarters. The three quartiles are typically referred to as the first quartile (Q1), second quartile (Q2), and third quartile (Q3). Arrange the data in ascending order. Find Q2 (the median): If the number of data points is odd, the median is the middle number. If the number of data points is even, the median is the average of the two middle numbers. Find Q1 (first quartile): The first quartile is the median of the lower half of the data (excluding the overall median if the number of data points is odd). Find Q3 (third quartile): The third quartile is the median of the upper half of the data (excluding the overall median if the number of data points is odd).
60
Fractals
It is any other division of the data. It is necessary that the data can be arranged in descending or ascending order, otherwise it is not possible. Data has to be measured on ordinal, interval or ratio scale
61
Range
Largest value minus the smallest value
62
Interquartile range
● Interquartile Range = Q3 - Q1 ● Extreme values do not affect
63
Standard deviation
● Measure the average scatter around the mean, S ● Square root of variance
64
Variance
● Standard deviation squared s2 ● In theoretical statistical analysis
65
Coefficient of variation
V ● Always presented as percentage ● Relative measure for comparison
66
Skewness
● Symbol g1
67
Kurtosis
● Symbol g2
68
Normal disrtibution
Normality is tested by: Kolmogorov-Smirnov and Shapiro-Wilk tests  If the sample size is less than 50 Shapiro-Wilk test is used, if over 50, Kolmogorow-Smirnov test is used  If sig.>0.05 -> the variable is normally distributed
69
Clustered bars
отдельные столбики
70
Stacked bars
столбики или горизонтальные барс, когда в одном сразу два значения, например, ж и м, где каждый бар поделен на ж в бизнесе и ж в хома и аналогично м
71
Histogram
данные поделены на бины (интервалы) и распределены по осям, от чего итоговый вид показывает форму и тенденцию
72
Объяснить стр 58
линейные
73
Pie chart
кругляш
74
Box-plot
Читать: вершина показывает самый больший уровень данных, если нет экстремальных данных. График показывает максимальный уровень. Минимальный уровень поорядка 7,3. Но в нашем примере есть экстремальный уровень. Он настолько меньше, чем основные данные, что признан экстремальным. Поэтому минимум 6,3. Номер 36 показывает строку, в которой был дан этот ответ. Далее описываем бокс. Первая квота на уровне 8,0. Это означает, что 25% студентов имеют грейд ниже, чем 8,0. На вершине третья квота. Это означает, что 25% студентов имеют выше 9,0. Таким образом мы знаем, что полвина студентов имеет грейд от 8 до 9, то есть от 1 квоты до 3 квоты. Внутри бокс есть х - mean value, то есть среднее расчетное. Median – половина студентов имеет ниже 8,6 и половина выше 8,6. Горизонтальная линия внутри коробки. Если мы исключим экстремальный случай, тогда только используем минимум (в нашем примере). прочесть стр 60
75
Scatter plot
куча точек. стр 62
76
Steam and leaf display
прочесть стр 62