Data Collection, Sampling and Descriptive Statistics Flashcards

1
Q

Data Collection Techniques (5)

A

Observations
Tests and assessments
Surveys
Document analysis (published articles)
Interviews
Cannot mix the techniques

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Types of Data (2)

A

Primary: data that you collected
Secondary: data that someone else collected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Secondary Data Disadvantages (6)

A

May be out of date (limited by time)
May not have been collected long enough to detect trends.
May be missing info on some observations
May be incomplete
No control over data quality
Data collection may be estimated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Secondary Data Advantages (4)

A

Saves time
Saves money
Easily accessible
Makes collaboration easy, multicenter collaboration (rare diseases).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Primary Data Disadvantages (4)

A

Can be expensive to collect
Selection of population or sample
Difficulty recruiting participants
Pretesting the instrument to determine presence or absence of measurement bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Probabilistic Sampling Methods (4)

A

Simple random
Stratified random
Systematic random
Clustered random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Non Probabilistic Sampling Methods (3)

A

Convenience
Purposive
Snowball

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are Descriptive Statistics used for? (3)

A

To summarize data, describe data and present data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Types of Descriptive Statistics (4)

A
  1. Measures of frequency: count, percent and frequency (how often an observation occurs).
  2. Measures of central tendency: mean, median and mode (data in relation to the middle position, locates distribution).
  3. Measures of Dispersion or variability: range, variance, standard deviation(difference between observed score and mean) and Interquartile range.
  4. Measures of position and rank: Percentile ranks, quartile.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Mean

A

Average
Mean = (Y1+Y2+…+Yn)/n
Y: variable
Y1: 1st observation of variable Y
Yn: last observation of variable Y
n: number of observations in sample
Outliers make the mean a bad measure of central tendency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Median

A

All values are in rank order. The median is that value that splits the data set equally in halves. Same as 50th percentile.
If you have even nr. the average of the two middle nrs. is the median.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Mode

A

Observation with the highest frequency.
Can have more than one mode: Bimodal (2 modes).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Finding the mean when you have a bar chart with class intervals

A

You cannot find the exact mean when you have class intervals. You can estimate it by finding the midpoint of each interval.
(frequency x midpoint) / frequency
So you take each class interval and multiply the frequency of that class with it’s midpoint and then you add all of them up together and divide the nr by the total frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Range

A

Difference between the lowest value and the highest value in a dataset.
Range = maximum value - minimum value
Can be affected by outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Percentile

A

(C+0.5xf/N)x100%
C: nr/count of all observations lower than the observation of interest.
f: frequency of the observation of interest.
N: nr of all observations.
If you have two of the same observations you have to use the higher observation when finding C.
100th percentile means the highest score, 0 percentile the lowest score, not the same as percentage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Interquartile Range

A

Q1: the value occupying 1/4 position of all values.
Q3: the value occupying 3/4 position of all values.
IQR: Q3-Q1
When Q2 is an odd nr you use the 1st median value to calculate Q1 and 2nd median nr to calculate Q3.
When Q2 is an even nr. you do not include it in the calculations of Q1 and Q3.

17
Q

Variance

A

Measure of how close together or far apart the values in a dataset are.
The larger the variance, the further the individual values are from the mean.
The smaller the variance, the closer the individual values are to the mean.

18
Q

Standard Deviation and Variance

A

S= standard deviation
S2= variance
therefore s = √s2

19
Q

Empirical Rules of Normal Distribution

A

In symmetric normal distribution:
68% of values are within 1 SD of the mean
95% of values are within 2 SDs of the mean
99.7% of values are within 3 SDs of the mean.
Values more than 3 SDs from the mean are outliers.
Mean = Median = Mode for unimodal symmetrical normal distribution

20
Q

Asymmetrical Distribution (2 Types)

A

Positively skewed/right tailed: skewness > 0, drop in the trendline on the right side.
Negatively skewed/left tailed: skewness < 0, drop in the trendline on the left side.

21
Q

Describing what you see in relation to the mean example

A

To describe the relationship of the mean with the symmetry/asymmetry of the distribution, you could say that out of the 40 observations, 23 of them have IQ scores greater than or equal to the mean. That means that most of the people have I Q scores greater than or equal to the mean. While fewer people (n = 17, or 42.5% of the sample) have IQ scores below the mean score.