Basic Terminology Flashcards

1
Q

Quantitative Data v. Qualitative Data

A

quan. - numerical data; data measured or identified on a numerical scale
qual. - categorical data; data that can be classified in a group

context will determine if quan. or qual.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Discrete Data v. Continuous Data

A

disc. - data that can be listed or placed in order; usually finite but may be “countably” infinite (identify first or second term but not last)

cont. - data that can be measured or take on values in an integral

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Descriptive Statistics (EDA) v. Inferential Statistics

A

desc. - exploratory data analysis (analytical, graphical); examines data
inf. - uses data to make inferences about the population from which the sample is drawn (commonly random sample)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Parameters v. Statistics

A

stats. - values that describe a sample

para. - values that describe a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Collecting Data: Surveys v. Studies

A

used to collect data in order to make generalizations about a population

experiments or observational studies involve collecting comparative data on groups (treatment and control)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Random Variable and Distributions

A

can be thought of as a numerical outcome of a random phenomenon or an experiment; give rise to probability distrubutions (a way of matching outcomes with their probabilities of success) which leads to probabilistic statements about sampling distributions (distributions of sample statistics such as means and proportions)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Graphical Analysis: Shape

A

symmetric - has symmetry around some axis (does not need to be perfectly symmetrical)

mound-shaped - bell-shaped

skewed - data are skewed to the left if the tail is to the left; to the right if the tail is to the right

bimodal - has more than one location with many scores

uniform - frequencies of the various values are more-or-less constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Graphical Analysis: Dotplot v. Stemplot

A

dot - very simple type of graph that involves plotting the data values, with dots, above the corresponding values on a number line

stem - no rules to what constitutes the stem and what constitutes the leaf; data will suggest the stem and leaves

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Graphical Analysis: Bargraph v. Histogram

A

bar - used to illustrate qualitative data; horizontal axis is categories

hist - used to illustrate quantitative data; horizontal axis is numerical values

vertical axis of both is frequencies or relative frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Measures of Center: Mean

A

defined as the sum of the x’s divided by n

not resistant statistic (can be affected)

can remove indices and leave summation notation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Measures of Center: Median

A

“middle” value in the set; if even, add middlemost two and divide by 2

resistant statistic (one whose numerical value is not dramatically affected by extreme values)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Measures of Center: Mode

A

tells where the most frequent values occur, more than it describes the center

generally not used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Measures of Spread: Variance

A

average squared deviation from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Measures of Spread: Standard Deviation

A

square root of the variance; used to match the units of the original data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Measures of Spread: Interquartile Range (IQR)

A

measure of spread that works well when a mean-based measure is not appropriate

median is Q2 (50th percentile) and divides the distribution

lower quartile (Q1, 25th percentile) is median of lower half

upper quartile (Q3, 75th percentile) is median of upper half

IQR = Q3–Q1 = middle 50% of distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Measures of Spread: Range

A

difference between the maximum and minimum scores in the distribution

generally not used

17
Q

Measures of Spread: Outliers and the 1.5(IQR) Rule

A

outliers - value far removed from the others

1.5(IQR) Rule, pictured

extreme outlier - lies more than 3 IQRs beyond Q1 or Q3

18
Q

Position of a Term in a Distribution: 5-Number Summary

A

a dataset is composed of the minimum value, the lower quartile, the median, the upper quartile, and the maximum value

19
Q

Position of a Term in a Distribution: Boxplot

A

simply a graphical version of the 5-number summary; contains the middle 50%

“whiskers” extend from the lines at the ends of the box to the minimum and maximum values of the data, disregarding outliers

line inside box marks median

20
Q

Position of a Term in a Distribution: Percentile Rank of a Term

A

equals the proportion of terms in the distribution less than the term; 100th percentile is max

eg. term that is at the 75th percentile is larger than 75% of the terms in a distribution

21
Q

Position of a Term in a Distribution: z Score

A

notes how many standard deviations the term is above or below the mean

positive when x is above the mean and negative when it is below the mean

22
Q

Normal Distribution and the Empirical Rule

A

graph of ND - a continuous curve that “describes” the shape of the distribution for very large samples; normal curve is defined completely in terms of its mean and standard deviation

23
Q

Standard Normal Deviation

A

X is a variable that has a normal distribution

mean µ

standard deviation s

24
Q

Empirical Rule v. Chebyshev’s Rule

A

empirical - 68-95-99.7 rule; states that approximately 68% of the terms in a normal distribution are within one standard deviation of the mean, 95% are within two , and 99.7% within three

chebyshev’s - k is number of standard deviations; for k > 1, at least (1 - k^(-2))% of the data lies within k standard deviations of the mean

25
Q

Scatterplot

A

two-dimensional graph of ordered pairs that focuses on bivariate (two variable) problems

one variable (explanatory variable) on the horizontal axis and the other (response variable) on the vertical

associations - positive if one of them increases as the other increases; negative if one of them decreases as the other increases

26
Q

Correlation: Coefficient r

A

gives us information about strength of the linear relationship between two variables (how well a line fits the data) as well as the direction of the linear relationship (whether the variables are positively * r > 0 *or negatively associated * r < 0 *)

correlation is not causation

27
Q

Lines of Best Fit and Least-Squares Regression Line (LSRL)

A

regression line, line of best fit, LSRL - a line that can be used for predicting response values from explanatory values; a line that minimizes the sum of squared errors

28
Q

Residuals

A

y – yˆ (the actual value – the predicted value)

a positive residual means that the prediction was too small and a negative residual means that the prediction was too large

That is, Σ (y−yˆ)2 is small when the linear fit is good and large when it is not

29
Q

Residuals: Interpolation v. Extrapolation

A

inter. - if we are trying to predict a value of y from an x-value within the range of x-values

extra. - if we are predicting from a value of x outside of the x-values * generally not used

30
Q

Coefficient of Determination

A
31
Q

Outliers and Influential Observation

A

outlier - lies outside of the general pattern of the data

influential observation - often an outlier in the x-direction