Organizing, Displaying, and Describing Data Flashcards

1
Q

What is a variable

A
  • Any characteristic that can & does assume different values for different people, objects, or events being studied
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the four measurement scales for variables

A
  • Nominal
  • Ordinal
  • Interval
  • Ratio
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe nominal

A
  • Numbers are simply used as a code to represent characteristics.
  • There is no order to the categories.
  • The assignment of numbers to categories is arbitrary
  • Ex: gender or ethnicity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe ordinal

A
  • Numbers represent categories that can be placed in a meaningful numerical order (e.g., from lowest to highest).
  • There is no information regarding the size of the interval between the different values.
  • The size of the interval may be different between the different categories.
  • There is no “true” zero.
  • EX: pain scale 1 = no pain, 2 = a little pain, 3 = some pain, 4 = a lot of pain
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe interval

A
  • Numbers can be placed in meaningful order.
  • The intervals between the numbers are equal.
  • It is possible to add and subtract across an interval scale.
  • There is no true zero, so ratios cannot be calculated.
  • Ex: Fahrenheit temp., SAT, or GRE
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe ratio

A
  • Numbers can be placed in meaningful order.
  • The intervals between the numbers are equal.
  • There is a “true” zero, determined by nature, which represents the absence of the phenomena.
  • Almost all biomedical measures (weight, pulse rate, and cholesterol level) are of ratio scale.
  • Ex: weight, age, # of min. spent exercising, cholesterol level, or # of wks pregnant
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the goal of displaying data

A
  • To get a feeling for the distribution of the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define the parts of displaying data

A
  • Central tendency: most frequently occurring values
  • Dispersion: how the values are spread out
  • Shape and skewness: symmetry or asymmetry of the distribution of the values
  • Outliers: unusual values that do not fit the pattern of the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe frequency distributions

A
  • A table that shows classes or intervals of data with a count of the number in each class. The frequency (f) of a class is the number of data points in the class.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define class width

A
  • The distance b/w lower (or upper) limits of consecutive classes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define range

A
  • The difference b/w the max and min data entries
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe histograms

A
  • A way of organizing the data in visual form
  • Data have to be at least ordinal in scale
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the rules for histogram construction

A
  • The values of the variable being graphed are on the x-axis
  • Class intervals are used (mutually exclusive, exhaustive, & even widths)
  • The bars of the histogram touch
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe a stem and leaf plot

A
  • Each number is separated into a stem (usually the entry’s leftmost digits) and a leaf (usually the rightmost digit)
  • Allows us to see the shape of the data as well as the actual values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the advantage and disadvantage of using a graphical method for describing data

A
  • Advantage: Its visual representation
  • Disadvantage: Its unsuitability for making inferences (our main goal)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are some numerical methods for describing data

A
  • Frequency distribution table
  • Histograms
  • Stem and leaf plot
  • Pie chart
  • Scatter plot
  • Times series chart
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Describe the differences between mode, median, and mean

A
  • Mode: most frequently recurring value (appropriate for nominal, ordinal, interval, & ratio data); if no entry is repeated then there is no mode
  • Median: the value that is in the middle of the distribution (appropriate for ordinal, interval, & ratio data); middle entry when all entries are put in order & if it’s a even # of entries take the mean of the 2 middle values
  • Mean: the arithmetic average of the distribution ( appropriate for interval & ratio data); sum of all values divided by total entries
18
Q

Define an outlier

A
  • A data entry that is far removed from the other entries in the data set
19
Q

Comparing mean, median, and mode which ones are affected by an outlier

A
  • Mean is affected while median and mode are not influenced by extreme values
20
Q

Define midrange

A
  • The average of the highest and lowest value in the data set
  • Very easy to find but highly effected by the extreme values
21
Q

Describe a weighted mean

A
  • It’s the mean of a data set whose entries have varying weights
  • Ex: homework is 30%, exams are 50%, and projects are 20% of your final grade
22
Q

What are the measures of dispersion and their goal

A
  • Goal is to get a feeling for the spread of the data
  • Range: difference b/w the highest & lowest value in a data set (appropriate for ordinal, interval, & ratio data)
  • Interquartile range: the value that is in the middle of the distribution (appropriate for ordinal, interval, & ratio data)
  • Standard deviation: average distance of each point from the mean (appropriate for interval & ratio data)
23
Q

Describe symmetrical distributions

A
  • Data are evenly distributed about the center
  • There is the same amount of data on the right & left side of the distribution
  • Not all symmetrical distributions are “normal”
24
Q

Describe skewed distributions

A
  • Data are not evenly distributed about the center
  • Can be “right skewed” or “left skewed”
25
Q

Define deviation

A
  • Difference b/w the entry & the mean of the data set
26
Q

Guidelines for finding the sample standard deviation

A

1) Find the mean of the sample data set
2) Find the deviation of each entry
3) Square each deviation
4) Add to get the sum of squares
5) Divide by n-1 to get the sample variance
6) Find the square root of the variance to get the sample standard deviation

27
Q

Describe the empirical rule for standard deviation

A
  • For data with a (symmetric) bell-shaped distribution the standard deviation has the following characteristics
    1) ~68% of the data lie within 1 standard deviation of the mean
    2) ~95% of the data lie within 2 standard deviations of the mean
    3) ~99.7% of the date lie within 3 standard deviations of the mean
28
Q

Describe standard error

A
  • The values of a specific variable from a sample are an estimate of the entire population of individuals who might have been eligible for the study
  • A measure of the precision of a sample in estimating the population parameter
  • Dependent on sample size: larger the sample, the smaller the standard error
29
Q

Standard error of the mean equation

A
  • Standard deviation ÷ square root of (sample size)
  • if sample greater than 60
30
Q

Describe confidence intervals

A
  • Range of values which we can be confident includes the true value
  • Defines the “inner zone” about the central index (mean, proportion or ration)
  • Describes variability in the sample from the mean or center
  • Will find CI used in describing the difference b/w means or proportions when doing comparisons b/w groups
  • Ex: 95% CI indicates that we are 95% confident that the population mean will fall within the range described
31
Q

Describe quartiles & percentiles

A
  • Useful for comparing scores within one data set
  • Ex: if a score is in the 80th percentile (P80) it means that 80% of all the scores fall at or below this score in the distribution & 20% of all the scores fall above this value
32
Q

Describe quartiles

A
  • The 3 quartiles, Q1, Q2, and Q3 approximately divide an ordered data set into four equal parts
  • Q1 is the median of the data below Q2
  • Q2 is the median
  • Q3 is the median of the data above Q2
33
Q

Describe the interquartile range (IQR)

A
  • The difference b/w the third and first quartiles
34
Q

Define fractiles, percentiles, and deciles

A
  • Fractiles: numbers that partition, or divide, an ordered data set
  • Percentiles: they divide an ordered data set into 100 parts (there are 99 percentiles)
  • Deciles: they divide an ordered data set into 10 parts (there are 9 deciles)
35
Q

Define hypothesis

A
  • Statement about a population, where a certain parameter takes a particular numerical value or falls in a certain range of values.
36
Q

Define null hypothesis (H0)

A
  • “Innocent until proven guilty”
  • Usually states that no difference b/w test groups really exists
  • Fundamental concept in research is the concept of with “rejecting” or “conceding” the H0
37
Q

Limitations of significance tests

A
  • Statistical significance does not mean practical significance
  • Significance tests don’t tell us about the size of the effect (like a CI does)
  • Some tests may be “statistically significant” just by chance
  • Be skeptical when you hear reports of new medical advances
  • There may be no actual effect
  • If an effect does exist, we may be seeing a sample outcome in right-hand tail of sampling distribution of possible sample effects, and the actual effect may be much weaker than reported.
38
Q

Difference between confidence interval and P-value

A
  • CI will give information about the size of the difference & the strength of the evidence
  • P-value will tell you whether or not there is a statistically significant difference
39
Q

Describe clinical importance

A
  • A medical judgement not statistical
  • Clinicians should change practice only if they believe the study has definitively demonstrated a treatment difference and that the treatment difference is large enough to be clinically important.
40
Q

Clinical importance depends on your knowledge of

A
  • A range of possible treatments
  • Their costs
  • Their side effects