large data set Flashcards

1
Q

what is our large data set

A

data about individuals who took part in the American National Health and Nutrition Examination Survey (NHANES) in 2003-4.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is the sample size

A

randoms sample of 200 people from the 5000 that took the survey

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the age of the people in the sample / large data set

A

aged 16 and over

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how is this data collected

A

combination of interview and physical examination to assess the health and nutritional status of adults and children in the United States

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

is there people in the actual 5000 people survey thats under 16

A

likely yes - but their data was not collected in the LDS

is it really random then?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does N/A mean

A

data was not available for that data point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is N/A used and not leaving it blank

A

prevents some software from reading it as 0.

zero is a recorded value and cam not be used to represent no data being collected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Do we include data points with N/A in calculations - standard deviation / mean

A

No - we exclude that data point - therefore reduces the value of n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does tr mean as a recorded value

A

Tr = trace amounts

Data is recorded but numerical value is so close to 0 it is negligible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Do we include data points with tr in calculations

A

Yes - we treat it as 0 in calculations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is cleaning data

A

Fixing / removing incorrect, corrupted, incorrectly formatted, duplicate or incomplete data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

if you have to extrapolate the data at any point…

A

not reliable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Use your knowledge of the large data set to suggest two reasons why the sample data in the table may not be representative of the population

A

think about whats in the data set

both males + females
wider age range in population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Should outliers be removed

A
  • Think…
  • Is this BMI possible
  • Is this pulse rate possible etc
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

If PMCC is close to one….

A

can be modelled as a straight line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are the categories in the LDS

A

sex
age
marital status
weight
height
BMI
upper leg length
upper arm length
waist circumference
food in last 30 minutes
pulse readings

17
Q

lowest and highest weight

A

41.4kg

193.1kg

18
Q

lowest and highest height

A

140.9 cm

193.8cm

19
Q

oldest and youngest

A

17 and 85

20
Q

which arm was used for blood pressure measurements

A

everyone was either right or n/a

21
Q

highest and lowest pulse - beats in 60 seconds

A

44 to 128

22
Q

highest and lowest bmi

A

16.54 and 62.77

23
Q

what is systolic blood pressure and diastolic blood pressure

A

The systolic blood pressure is the pressure at the time when the heart beats.

The diastolic blood pressure is the pressure between heart beats.

24
Q

how many measurements of blood pressure were taken

A

up to 4

25
Q

systolic + diastolic averages - how are they taken

A

The first reading is ignored when taking the average, unless there is only one reading

26
Q

0 diastolic pressure

A

outlier - random error

27
Q

N/A vs could not obtain for arm

A

N/A for arm - then all of the corresponding pulse readings were N/A

could not obtain for arm - then all of the corresponding pulse readings were recorded

n/a for arm - did not do that part of the exam at all

could not obtain for arm - did do that part of the exam but unsure which arm

28
Q

why has a larger sample, more then 200, not been used

A

to ensure that dealing with missing data and copying into other software is not too time consuming