Basic Terminology Flashcards by Angelica Q

Quantitative Data v. Qualitative Data

quan. - numerical data; data measured or identified on a numerical scale
qual. - categorical data; data that can be classified in a group

context will determine if quan. or qual.

How well did you know this?

Not at all

Perfectly

Discrete Data v. Continuous Data

disc. - data that can be listed or placed in order; usually finite but may be “countably” infinite (identify first or second term but not last)

cont. - data that can be measured or take on values in an integral

How well did you know this?

Not at all

Perfectly

Descriptive Statistics (EDA) v. Inferential Statistics

desc. - exploratory data analysis (analytical, graphical); examines data
inf. - uses data to make inferences about the population from which the sample is drawn (commonly random sample)

How well did you know this?

Not at all

Perfectly

Parameters v. Statistics

stats. - values that describe a sample

para. - values that describe a population

How well did you know this?

Not at all

Perfectly

Collecting Data: Surveys v. Studies

used to collect data in order to make generalizations about a population

experiments or observational studies involve collecting comparative data on groups (treatment and control)

How well did you know this?

Not at all

Perfectly

Random Variable and Distributions

can be thought of as a numerical outcome of a random phenomenon or an experiment; give rise to probability distrubutions (a way of matching outcomes with their probabilities of success) which leads to probabilistic statements about sampling distributions (distributions of sample statistics such as means and proportions)

How well did you know this?

Not at all

Perfectly

Graphical Analysis: Shape

symmetric - has symmetry around some axis (does not need to be perfectly symmetrical)

mound-shaped - bell-shaped

skewed - data are skewed to the left if the tail is to the left; to the right if the tail is to the right

bimodal - has more than one location with many scores

uniform - frequencies of the various values are more-or-less constant

How well did you know this?

Not at all

Perfectly

Graphical Analysis: Dotplot v. Stemplot

dot - very simple type of graph that involves plotting the data values, with dots, above the corresponding values on a number line

stem - no rules to what constitutes the stem and what constitutes the leaf; data will suggest the stem and leaves

How well did you know this?

Not at all

Perfectly

Graphical Analysis: Bargraph v. Histogram

bar - used to illustrate qualitative data; horizontal axis is categories

hist - used to illustrate quantitative data; horizontal axis is numerical values

vertical axis of both is frequencies or relative frequencies

How well did you know this?

Not at all

Perfectly

Measures of Center: Mean

defined as the sum of the x’s divided by n

not resistant statistic (can be affected)

can remove indices and leave summation notation

How well did you know this?

Not at all

Perfectly

Measures of Center: Median

“middle” value in the set; if even, add middlemost two and divide by 2

resistant statistic (one whose numerical value is not dramatically affected by extreme values)

How well did you know this?

Not at all

Perfectly

Measures of Center: Mode

tells where the most frequent values occur, more than it describes the center

generally not used

How well did you know this?

Not at all

Perfectly

Measures of Spread: Variance

average squared deviation from the mean

How well did you know this?

Not at all

Perfectly

Measures of Spread: Standard Deviation

square root of the variance; used to match the units of the original data

How well did you know this?

Not at all

Perfectly

Measures of Spread: Interquartile Range (IQR)

measure of spread that works well when a mean-based measure is not appropriate

median is Q2 (50th percentile) and divides the distribution

lower quartile (Q1, 25th percentile) is median of lower half

upper quartile (Q3, 75th percentile) is median of upper half

IQR = Q3–Q1 = middle 50% of distribution

How well did you know this?

Not at all

Perfectly

Measures of Spread: Range

Study These Flashcards

difference between the maximum and minimum scores in the distribution

generally not used

Measures of Spread: Outliers and the 1.5(IQR) Rule

Study These Flashcards

outliers - value far removed from the others

1.5(IQR) Rule, pictured

extreme outlier - lies more than 3 IQRs beyond Q1 or Q3

Position of a Term in a Distribution: 5-Number Summary

Study These Flashcards

a dataset is composed of the minimum value, the lower quartile, the median, the upper quartile, and the maximum value

Position of a Term in a Distribution: Boxplot

Study These Flashcards

simply a graphical version of the 5-number summary; contains the middle 50%

“whiskers” extend from the lines at the ends of the box to the minimum and maximum values of the data, disregarding outliers

line inside box marks median

Position of a Term in a Distribution: Percentile Rank of a Term

Study These Flashcards

equals the proportion of terms in the distribution less than the term; 100th percentile is max

eg. term that is at the 75th percentile is larger than 75% of the terms in a distribution

Position of a Term in a Distribution: z Score

Study These Flashcards

notes how many standard deviations the term is above or below the mean

positive when x is above the mean and negative when it is below the mean

Normal Distribution and the Empirical Rule

Study These Flashcards

graph of ND - a continuous curve that “describes” the shape of the distribution for very large samples; normal curve is defined completely in terms of its mean and standard deviation

Standard Normal Deviation

Study These Flashcards

X is a variable that has a normal distribution

mean µ

standard deviation s

Empirical Rule v. Chebyshev’s Rule

Study These Flashcards

empirical - 68-95-99.7 rule; states that approximately 68% of the terms in a normal distribution are within one standard deviation of the mean, 95% are within two , and 99.7% within three

chebyshev’s - k is number of standard deviations; for k > 1, at least (1 - k^(-2))% of the data lies within k standard deviations of the mean

Scatterplot

two-dimensional graph of ordered pairs that focuses on bivariate (two variable) problems one variable (explanatory variable) on the horizontal axis and the other (response variable) on the vertical _associations_ - positive if one of them increases as the other increases; negative if one of them decreases as the other increases

Correlation: Coefficient *r*

gives us information about *strength* of the linear relationship between two variables (how well a line fits the data) as well as the *direction* of the linear relationship (whether the variables are positively \* r \> 0 \*or negatively associated \* r \< 0 \*) correlation is not causation

Lines of Best Fit and Least-Squares Regression Line (LSRL)

_regression line, line of best fit, LSRL_ - a line that can be used for predicting response values from explanatory values; a line that minimizes the sum of squared errors

Residuals

y – yˆ (the actual value – the predicted value) a positive residual means that the prediction was too small and a negative residual means that the prediction was too large That is, Σ (y−yˆ)² is small when the linear fit is good and large when it is not

Residuals: Interpolation v. Extrapolation

_inter_. - if we are trying to predict a value of y from an x-value within the range of x-values _extra_. - if we are predicting from a value of x outside of the x-values \* generally not used

Coefficient of Determination

Outliers and Influential Observation

outlier - lies outside of the general pattern of the data influential observation - often an outlier in the x-direction

Basic Terminology Flashcards

(31 cards)