Summer Assignment Flashcards

1
Q

individuals

A

objects described by a set of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

variable

A

any characteristic of an individual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

categorical variable

A

places an individual into one of several groups or categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

quantitative variable

A

takes numerical values where it makes sense to take an average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

distribution

A

tells us what values the variable takes and how often it takes those values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

discrete variables

A

a variable that cannot take on a value between its minimum and maximum value. Like flipping a coin; you can’t get 2.5 heads, only 2 or 3 heads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Continuous variables

A

opposite of discrete variables. Can take on any value between minimum and maximum value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Univariate Data

A

when you conduct a study that only looks at one variable; data that only contains one variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Bivariate Data

A

data that contains two variables and examines the relationship between them. For example, height and weight.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Census

A

a study that obtains data from every member of a population. In most studies, a census is not practical, because of the cost and/or time required

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Boxplot

A

Also known as box and whisker plot. A boxplot splits the data set into quartiles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Bar Graph

A

A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Conditional

A

The conditional probability of an event B is the probability that the event will occur given the knowledge that an event A has already occurred.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sample

A

a set of observations drawn from a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Population

A

population refers to the total set of observations that can be made

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Inference

A

Statistical inference is the process of using data analysis to deduce properties of an underlying distribution of probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Frequency Table

A

When a table shows frequency counts for a categorical variable, it is called a frequency table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Relative Frequency

A

To compute relative frequency, one obtains a frequency count for the total population and a frequency count for a subgroup of the population. The relative frequency for the subgroup is:

Relative frequency = Subgroup count / Total count

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Table

A

an arrangement of data in rows and columns, or possibly in a more complex structure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Interquartile Range

A

a measure of variability based on dividing a data set into quartiles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Five-Number

A

A five-number summary consists of five values: the most extreme values in the data set (the maximum and minimum values), the lower and upper quartiles, and the median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

summary

A

The information that gives a quick and simple description of the data

23
Q

Standard Deviation

A

The standard deviation is the square root of the variance. Its symbol is the greek letter sigma

24
Q

Variance

A

The variance is the average of the squared differences from the Mean. It is a numerical value used to indicate how widely individuals in a group vary. If individual observations vary greatly from the group mean, the variance is big; and vice versa.

25
Q

Roundoff Error

A

the difference between a rounded number and the actual value.

26
Q

Pie Chart

A

A pie chart (or a circle chart) is a circular statistical graphic, which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice (and consequently its central angle and area), is proportional to the quantity it represents.

27
Q

Two-Way Table

A

A two-way table of counts organizes data about two categorical variables. Values of the row variable label the rows that run across the table, and values of the column variable label the columns that run down the table

28
Q

Marginal Distribution

A

This is a distribution of one of the variables. These are the counts or percentages found in the last row or column of the table or margins.

29
Q

Distribution

A

is a listing or function showing all the possible values (or intervals) of the data and how often they occur

30
Q

Segmented Bar Graph

A

In this graph each bar is a whole and is divided proportionally based on the conditional distributions for each variable. (2 variables, 1 for bar and one for the segments of the bar)

31
Q

Side-by-side Bar

A

like the segmented bar graph but the segments are placed next to each other instead of on top of each other.

32
Q

graph

A

A graph is a picture that represents data in an organized manner.

33
Q

Association

A

any relationship between two measured quantities that renders them statistically dependent.

34
Q

Simpson’s Paradox

A

an effect that occurs when the marginal association between two categorical variables is qualitatively different from the partial association between the same two variables after controlling for one or more other variables.

35
Q

Dotplot

A

a type of graphic display used to compare frequency counts within categories or groups.

36
Q

Shape

A

The shape of a distribution is described by its number of peaks and by its possession of symmetry, its tendency to skew, or its uniformity

37
Q

Mode

A

The mode of a set of data values is the value that appears most often.

38
Q

Center

A

The center of data is a single number that summarizes the entire data set. It is important to use the correct method for finding the center of data so you can accurately summarize the data set. You can do this by using either the mean or the median.

39
Q

spread

A

dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartile range.

40
Q

Range

A

lowest value to highest value difference

41
Q

Outlier

A

a data point that differs significantly from other observations.

42
Q

Symmetric

A

data set split down the center is equal

43
Q

Skewed Right

A

the mean is typically greater than the median

44
Q

Skewed Left

A

the mean is typically smaller than the median

45
Q

Unimodal

A

a distribution with one clear peak or most frequent value

46
Q

Bimodal

A

a probability distribution with two different modes. These appear as distinct peaks

47
Q

Multimodal

A

many different modes or peaks

48
Q

Stemplot

A

also known as stem and leaf plot. a way of comparing data.

49
Q

Splitting Stems

A

when the leaves on the stem and leaf plot get too crowded you can split the stems into two different components, like 0-4 and 5-9 instead of 0-9.

50
Q

Back-to-back Stem

A

compare 2 populations by having the stem in the middle and the 2 populations back to back against the stem. Easy to compare

51
Q

Plots

A

how people display their data so it is easy to compare. EX: box plots, stem and leaf plots, scatterplots, etc.

52
Q

Histogram

A

bar graph-like representation of data that buckets a range of outcomes into columns along the x-axis

53
Q

Mean

A

average. add all numbers together then divide by number of data points to get the mean

54
Q

median

A

the middle number of the data set