Introduction To Data Flashcards

0
Q

3 components of statistics

A

Collect
Analyze
Infer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Study of how best to collect, analyze and draw conclusions from data

A

Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In a study, the group that provides the reference point against the treatment group is

A

Control group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Single number summarizing a large amount of data

A

Summary statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The first step in most analyses

A

Effective presentation and description of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Each row in the table is the

A

Case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Each column on the table is a

A

Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Row + column

A

Data matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Another term for case

A

Unit of observation or an observational unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A variable with values that can be added, subtracted or averaged is

A

Numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A numerical value that cannot take non negative numbers is

A

Discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Variables that denotes classification is

A

Categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The possible values of categorical is

A

Level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Categorical variable with levels of natural ordering is

A

Ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When two variables show some connection with one another, they are called ___________________ or _____________________ variables.

A

Associated; dependent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

If a variable increase and the other decrease, there is

A

Negative association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

If the variable increase, and the other increase, this is

A

Positive association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

If two variables are not associated, this is

A

Independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Can a pair of variable be associated and independent at the same time?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Each research question refers to a target

A

Population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

A subset of cases which is a small fraction of the population is known as

A

Sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Data collected in haphazard fashion is

A

Anecdotal evidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

If someone was permitted to pick and choose exactly the included subjects in a sample, this introduces _____________ into a sample.

A

Bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Most basic random sample is

A

Simple random sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

In simple random sample, each case in a population has a/an __________ chance of being included

A

Equal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Bias can crop up. If only 30% of people randomly sampled actually responded, it is unclear whether the results are __________________ of the entire population. The _____________ bias can skew results.

A

Representative / non response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

When individuals who are easily accessible are more likely included in the sample, this is _____________________.

A

Convenience sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Explanatory variable might affect

A

Response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Association implies causation. True or false.

A

Not always. False.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Two primary types of data collection

A

Observational studies

Experiments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Collecting data in a way that does not directly interfere with how the data arise is

A

Observational study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

When researchers want to investigate the possibility of a causal connection, they conduct a/an

A

Experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

When individuals are randomly assigned to a group, the experiment is called a

A

Randomized experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

In a two group experiment, the fake treatment is called a

A

Placebo

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Causation can only be inferred from a ______________.

A

Randomized experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

A variable correlated with both the explanatory and response variables

A

Confounding variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Two forms of observational studies

A

Prospective

Retrospective

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What observational study identifies individuals and collects information as events unfold

A

Prospective

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What observational study collect data after events have taken place, eg, researchers review past events in medical records

A

Retrospective

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Three random sampling techniques

A

Simple
Stratified
Cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Most intuitive form of random sampling

A

Simple random sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Fishbowl is

A

Simple random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Divide and conquer sampling strategy

A

Stratified sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

When similar cases are grouped together, then simple random sampling is employed in each group, this is

A

Stratified sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

A two-stage simple random sample is

A

A cluster sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

This is similar to stratified sampling but no requirement

A

Cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Studies where researchers assign treatments to cases are called

A

Experiments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Four principles of experimental design

A

Controlling
Randomizing
Replication
Blocking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Asking all patients to drink a 12 ounce of water with the pill demonstrates

A

Control

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

To even out differences and prevent accidental bias, what is done?

A

Randomization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Verifying an earlier finding to make it more accurate requires

A

Replication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

If variables influence a response, split the cases in categories, then split the distribution. This is

A

Blocking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

The gold standard in data collection is

A

Randomized experiments

53
Q

When researchers keep the patients uninformed about their treatment, the study is said to be

A

Blind

54
Q

Fake treatment

A

Placebo

55
Q

If a fake treatment results in a slight but real improvement in patients, this is

A

Placebo effect

56
Q

If doctors and researchers, like patients, are unaware of who is or is not receiving treatment, this is

A

Double blind

57
Q

Provides a case by case view of data for two numerical variables

A

Scatterplot

58
Q

Scatterplot helps spot

A

Associations

59
Q

One-variable scatterplot

A

Dot plot

60
Q

Common way to measure the center of a distribution of data

A

Mean

61
Q

Sample mean

A

X with line above where x is the total number of cases or observation units

62
Q

What is the sample size in x = x1 + x2 + xn / n

A

n

63
Q

The average of all observations in a population is known as ; a subscript represents

A

mu ; variable the population mean refers to

64
Q

Sample mean may provide a reasonable estimate of _____________. Although not perfect, this provides a _____________.

A

mu subscript x where mu = average of ALL observations and x = variable ; rough estimate

65
Q

Provides a view of the data density

A

Histogram

66
Q

Useful when individual values are of interest

A

Dot plot

67
Q

Useful for highlighting outliers, median and interquartile range

A

Box plot

68
Q

What determines skew

A

The long tail

69
Q

Useful for highlighting spatial distribution

A

Intensity map

70
Q

4 ways to evaluate variables relationship

A

Direction
Shape
Strength
Outliers

71
Q

3 forms of skewness

A

Left
Symmetric
Right

72
Q

4 modalities of skewedness

A

Unimodal
Bimodal
Uniform
Multimodal

73
Q

2 measures of variability

A

Variance

Standard deviation

74
Q

Which one is easier to understand? Variance or standard deviation?

A

Standard deviation

75
Q

Distance of an observation from the mean is

A

Deviation

76
Q

What is the symbol for sample variance?

A

S with superscript 2

77
Q

Formula for sample variance?

A

Square all over n-1

78
Q

The square root of the variance is

A

Standard deviation

79
Q

Standard deviation is the

A

Square root of the variance

80
Q

S squared / n-1 =

A

Sample Variance

81
Q

What is variance?

A

Average squared distance from the mean

82
Q

Square root of the variance

A

Standard deviation

83
Q

The greek letter for used for population values

A

Sigma

84
Q

What is the difference between sample variance and population variance?

A

Sample variance uses n-1 and population variance uses n

85
Q

Summarizes a data set using five statistics while plotting unusual observations

A

Box plot

86
Q

The first step in building a box plot is denoting the

A

Median

87
Q

To find median, arrange variables from

A

Smallest to largest

88
Q

The second step in building a box plot is

A

Drawing a rectangle to represent the middle 50% of the data

89
Q

The total length of the box in a box plot is the

A

Interquartile range (IQR)

90
Q

The two boundaries of the box are called

A

First quartile and third quartile

91
Q

The more variable the data, the _____________ the standard deviation

A

Larger

92
Q

25% of the data fall below this value

A

Q1

93
Q

25% of this data is above this value(vertical box plot)

A

Q3

94
Q

What is the formula for IQR?

A

IQR = Q3-Q1

95
Q

In a box plot, the ____________ attempt to capture the data outside of the box

A

Whisker

96
Q

The whisker is never allowed to go beyond

A

1.5 x IQR

97
Q

An observation beyond the whisker, aka, unusually distant observations are called

A

Outliers

98
Q

An observation that appears extreme relative to the rest of the data

A

Outlier

99
Q

Why is it important to look for outliers?

A

Insight to interesting data properties
Errors in entry or collection of data
Reexamine
Strong skew identification

100
Q

Extreme observations have little effect on the

A

Median and IQR

101
Q

Median and IQR are called ______________ estimates

A

Robust

102
Q

Why are median and IQR robust estimates?

A

They are only sensitive to the numbers near Q1, the median and Q3.

103
Q

A table that summarizes data for two categorical variables is called a

A

Contingency table

104
Q

Provides total counts across each row

A

Row totals

105
Q

Provides total counts down each column

A

Column totals

106
Q

A table for a single variable is

A

Frequency table

107
Q

A frequency table replaced with percentages and proportions is called a

A

Relative frequency table

108
Q

Common way to display a single categorical variable

A

Box plot

109
Q

Counts divided by their row totals

A

Row proportions

110
Q

Count divided by column totals

A

Column proportion

111
Q

A table that summarizes data for two categorical variables is called a

A

Contingency table

112
Q

Provides total counts across each row

A

Row totals

113
Q

Provides total counts down each column

A

Column totals

114
Q

A table for a single variable is

A

Frequency table

115
Q

A frequency table replaced with percentages and proportions is called a

A

Relative frequency table

116
Q

Common way to display a single categorical variable

A

Bar plot

117
Q

Counts divided by their row totals

A

Row proportions

118
Q

Count divided by column totals

A

Column proportion

119
Q

When do you use barplots? Histograms?

A

Barplot
Categorical
Histogram-numerical variable

120
Q

X axis on histogram is

A

Numerical

121
Q

X axis on barplot

A

Category

122
Q

Rescaling of the data using a function

A

Transformation

123
Q

When much of the data cluster is near zero relative to the larger values of the data set

A

Natural log transformation

124
Q

Why transform scatterplot

A

Make the relationship between variables more linear

125
Q

Goals of transformation

A

See data structure differently
Skew reduction to assist in modeling
Straighten a nonlinear relationship in a scatterplot

126
Q

To visualize two categorical variables

A

Segmented bar plot

127
Q

Useful for visualizing conditional frequency distributions

A

Segmented bar plot

128
Q

To explore relationships between variables in a segmented bar plot, we need to compare

A

Relative frequencies

129
Q

Segmented bar plot that uses proportion is

A

Relative frequency segmented bar plot

130
Q

It displays marginal distribution, by using the width of a bar

A

Mosaic plot

131
Q

Mosaic plot is only used for

A

Categorical variable