Sampling Flashcards

1
Q

Secondary data

A

Data already exists. Different statistics, registers and data bases produced and maintained by public organisations and authorities are available for research

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Preliminary data

A

Collected by resercher data, may usually be collected by survey, interviews, observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Survey research

A

Self-administered questionnaires

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Interviews (structured)

A

Respondent and interviewer talk face-to-face or on telephone or online

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Observation

A

It records actions as they occur by monitoring people, actions or situations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

experimentation

A

the researcher manipulates selected independent variable and
measures the effects of these manipulations on the dependent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sampling methods

A

Probability and non-probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Probability samples

A

are samples in which the elements being included have a known chance of being selected
- Simple random sampling
- Systematic random sampling
- Stratified sampling
- Cluster sampling
- Sequence sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Non-probability samples

A

are ones in which participants are selected in a purposeful way
-Judgment sampling
-Quota sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Simple random sampling

A

every element if the population has the same probability of being selected to the sample
- elements in the whole population are numbered and selected by using random numbers
- suitable for homogenous populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

N

A

population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

n

A

sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Systematic random sampling

A

Sampling frame: a list of the population
- The sampling units are chosen from the sampling frame at a uniform interval at a
specified rate
- Sampling interval k = N/n (N = size of the population, n=sample size
- The starting point is selected from the first interval and the very kth element is selected
- For example: N = 200, n = 10 → k = 200/10 = 20

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Stratified sampling

A

dividing the population into mutually exclusive strata/groups (based on
nationality, profession, gender….)
- Each element can be included only in one strata
- Sample is drawn randomly from each strata/group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Cluster sampling

A

population consists of mutually exclusive groups called clusters (e.g. municipalities, towns, postal code areas….)
- - each cluster represents the whole population
- random clusters are selected to the sample
- -selected clusters are included fully or randoms samples are selected from those
clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Sequence sampling

A

elements are picked up sequentially until the results do not change anymore
used in quality control

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Judgement sampling

A

Relies on sound judgement or expertise.
- It depends on selecting elements that are believed to be typical or representative
of the population
- Requires knowledge of the topic and population
- Results should be interpreted with careful consideration
- Used typically in preliminary investigations, questionnaire testing or to generate
ideas, points of view or developing hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Quota sampling

A

The first step is to estimate the sizes of the various subclasses or strata in the population.
- The relevant strata to the study have to be specified
- E.g. based on demographics like age, gender, family status, socioeconomic
group…
- -sampling continues until each quota is full
Even quota
Proportional quota
Optimal qouta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Even quota:

A

the same number of elements is picked from each strata (e.g. 100
male and 100 female)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Proportional quota

A

(e.g. if in population 60% male and 40% female, the sample
is drawn in the same proportions: 60 male + 40 female = total sample size 100)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Optimal quota:

A
  • large size or large variation - > more
  • high sampling costs -> less
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Convenience sampling

A
  • there is no sample design
  • what is convenient or easy from the point of view of the researcher
  • the researcher is not drawing the sample, the participant are self-selecting
  • For example in a student research:
    o meeting students on campus
    o leaving questionnaires in the lobby
    o posting a questionnaire link to the student web site
  • Risk: biased sample which does not represent the whole population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

SAMPLE SIZE

A
  • Sample size is affected by the desired accuracy of the results, time and money
  • Sometimes the samples size is increased until the results are accurate enough: sequence sampling
  • Populations under 300 are fully investigated
  • National research sample sizes often 1000-2000, local 150-300
  • Every group to be analyzed should include at least 30
  • Outliers or extreme cases distort the results, if the sample size is small
  • Accuracy of the results increases in proportion to the square root of the sample
    size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Confidence of mean

A

Using this formula we can calculate how much sample volume we need to get the right confidence level.

We are given Critical value (Zα/2) - depending on how confident we want to be in the accuracy of the result.

The more confident we want to be, the larger the sample should be.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Independent Variable

A

Can be measured without relying on other variables.

Example: A person’s weight.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Dependent Variable:

A

Requires information about independent variables.

Example: BMI, which depends on both weight and height.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Constant Variables

A

Constant: A characteristic that does not change across individuals in the study.

Example: In a study on students, their status as “students” is a constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Primary Data:

A

Collected for a specific research project.

Example: Using surveys to gather student feedback.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

sampling techniques

A

used to select representative groups from a population for research purposes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Secondary Data:

A

Pre-existing data collected for other purposes.

Example: Data from Statistics Finland.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

ˉ
X

A

is the sample mean

32
Q

Sx

A

standart deviation

33
Q

Zα/2

A

= critical value, Usually, unless there is a specific requirement for accuracy level, 95% accuracy is used
90% = 1.64
95% = 1.96
99% = 2.58
99.9% = 3.30

34
Q

объяснить формулу и посчитать по ней размер сампла confidence of mean

A

см тетрадь

35
Q

объяснить формулу и посчитать по ней размер сампла Confidence of percentage

A

см тетрадь

36
Q

p в формуле расчета сампла

A

percentage in the sample

37
Q

е

A

margin of error = deviation

38
Q

Margin of error tables

A

The following table presents the effect of the sample size on the margin of error of percentages and a 95% confidence level

39
Q

что за таблица на стр 20, посчитай ответ на задание под таблицей

A

The following table presents the sample size based on margin of error and population
size.

40
Q

Closed ended questions

A

everyone must find a suitable option
Yes – No answers
Scales

41
Q

Open ended questions

A

What do you think about this?

42
Q

Mixed questions

A
  • answering options include “other” option
43
Q

RESEARCH DATA

A

The research data is saved in a table format in the analysis software
One row in the data contains information on one research unit (e.g. the responses of one
respondent). The first row on the table contains the name of the variable
One column on the table contains the values of one variable (e.g. respondent’s age). The
first column contains the number of each statistical unit (e.g. number of the respondent or
questionnaire).
The values of the variables are typically saved in number format. If the variable is not numeric by nature (e.g. gender), the researcher assigns number codes for each value (e.g. 1=
male, 2= female).

44
Q

Describing the data

A

is the first step in data analysis even if the aim of the study is explanatory. The purpose is to summarise the information in a more easily interpretable format.
This is done by presenting the data in tables, charts and numerical measures.

45
Q

Relative frequency

A

is the frequency in each class divided by the total number of
observations. Usually in the tables, percentage distribution (f%) is presented.

46
Q

Frequency

A

= number of observations, count (f)
Number of each value of the variable in the sample, the number of times a particular value appears in the dataset.

47
Q

Cumulative frequency (F)

A

presents information about the number of items that are less than a certain value (накопительный итог)

the sum of the frequencies of all previous values up to and including the current value. This helps you understand how many values are less than or equal to the current value.

48
Q

Cumulative percentage distribution

A

(F%) presents the percentage from all observation
shows the percentage of values accumulated as the values in the sample increase. This indicator allows us to see what percentage of the total number of values is below or equal to each particular value.

49
Q

Frequency tables

A

is a tabular representation of data that shows how often (with what frequency) different values occur in a data set. A frequency table helps us understand the distribution of data, identify the most frequent values (modes) and identify patterns.

A table usually consists of two main columns:
Value - unique values or categories of data.
Frequency - The number of times each value appears in the dataset.

In Excel frequency tables are created as Pivot-tables
The number of observations in a sample is marked with capital letter N
● Part of the sample is marked with small letter n

50
Q

Cross tabulation (or contingency table)

A

presents the results of two (or more) categorical variables
Key Elements of Cross Tabulation:
Variables: Cross tabulation involves at least two variables. One variable is represented by the rows and the other by the columns.
Cells: Each cell in the table shows the frequency (count) or percentage of occurrences for the intersection of two variables.

51
Q

Numerical descriptive measures

A

are classified as measures of central tendency and measures of variation and shape.

52
Q

Measures of central tendency

A

The central tendency is the extent to which all the data values group around a typical or
central value.
The measures of central tendency are mode, median, mean, quartiles and fractals.

53
Q

Mode

A

● The value in a set of data that appears most frequently
● Multiple modes can exist on a data set

54
Q

Median

A

● The middle value in a set of data that has been ranked from smallest to largest
● Half the values are smaller or equal to the median and half the values are larger or
equal to the median
● Data has to be measured on ordinal , interval or ratio scale
● If there is an even number of values, the median is
● either of the two values in the middle, or
● mean of the two middle values

55
Q

Arithmetic mean

A

The arithmetic mean (often simply called the “mean” or “average”) is a measure of central tendency that represents the sum of all values in a data set divided by the number of values. It provides a general idea of the “typical” value in the dataset.
Sensitive to outliers: Extreme values can significantly affect the mean

ˉ
X is the arithmetic mean.

56
Q

ˉ
X

A

is the arithmetic mean

57
Q

Xi

A

represents each individual value in the data set

58
Q

Найди Q1, Q2, Q3 здесь 5, 7, 8, 12, 13, 14, 18, 21, 23, 25

A

Q1 = 8
Q2 = 13.5 (the median)
Q3 = 21

59
Q

Quartiles

A

are statistical measures that divide a data set into four equal parts, each representing 25% of the data. Quartiles help identify the points where data is split into quarters. The three quartiles are typically referred to as the first quartile (Q1), second quartile (Q2), and third quartile (Q3).
Arrange the data in ascending order.
Find Q2 (the median):
If the number of data points is odd, the median is the middle number.
If the number of data points is even, the median is the average of the two middle numbers.
Find Q1 (first quartile):
The first quartile is the median of the lower half of the data (excluding the overall median if the number of data points is odd).
Find Q3 (third quartile):
The third quartile is the median of the upper half of the data (excluding the overall median if the number of data points is odd).

60
Q

Fractals

A

It is any other division of the data. It is necessary that the data can be arranged in descending or ascending order, otherwise it is not possible. Data has to be measured on ordinal, interval or ratio scale

61
Q

Range

A

Largest value minus the smallest value

62
Q

Interquartile range

A

● Interquartile Range = Q3 - Q1
● Extreme values do not affect

63
Q

Standard deviation

A

● Measure the average scatter around the mean, S
● Square root of variance

64
Q

Variance

A

● Standard deviation squared s2
● In theoretical statistical analysis

65
Q

Coefficient of variation

A

V
● Always presented as percentage
● Relative measure for comparison

66
Q

Skewness

A

● Symbol g1

67
Q

Kurtosis

A

● Symbol g2

68
Q

Normal disrtibution

A

Normality is tested by: Kolmogorov-Smirnov and Shapiro-Wilk tests
 If the sample size is less than 50 Shapiro-Wilk test is used, if over 50, Kolmogorow-Smirnov test is used
 If sig.>0.05 -> the variable is normally distributed

69
Q

Clustered bars

A

отдельные столбики

70
Q

Stacked bars

A

столбики или горизонтальные барс, когда в одном сразу два значения, например, ж и м, где каждый бар поделен на ж в бизнесе и ж в хома и аналогично м

71
Q

Histogram

A

данные поделены на бины (интервалы) и распределены по осям, от чего итоговый вид показывает форму и тенденцию

72
Q

Объяснить стр 58

A

линейные

73
Q

Pie chart

A

кругляш

74
Q

Box-plot

A

Читать: вершина показывает самый больший уровень данных, если нет экстремальных данных. График показывает максимальный уровень.

Минимальный уровень поорядка 7,3. Но в нашем примере есть экстремальный уровень. Он настолько меньше, чем основные данные, что признан экстремальным. Поэтому минимум 6,3. Номер 36 показывает строку, в которой был дан этот ответ.

Далее описываем бокс. Первая квота на уровне 8,0. Это означает, что 25% студентов имеют грейд ниже, чем 8,0.

На вершине третья квота. Это означает, что 25% студентов имеют выше 9,0.

Таким образом мы знаем, что полвина студентов имеет грейд от 8 до 9, то есть от 1 квоты до 3 квоты.

Внутри бокс есть х - mean value, то есть среднее расчетное.

Median – половина студентов имеет ниже 8,6 и половина выше 8,6. Горизонтальная линия внутри коробки.

Если мы исключим экстремальный случай, тогда только используем минимум (в нашем примере).

прочесть стр 60

75
Q

Scatter plot

A

куча точек. стр 62

76
Q

Steam and leaf display

A

прочесть стр 62