Exploring Data Flashcards by Deleted Deleted

Descriptive Methods

Ways that data are organized and summarized

How well did you know this?

Not at all

Perfectly

Categorical/Discrete Variable

A variable that can only take on a set number of values

How well did you know this?

Not at all

Perfectly

Continuous Variable

A variable that can take on any value

How well did you know this?

Not at all

Perfectly

Univariate Data

Data that only represents one measurement

How well did you know this?

Not at all

Perfectly

Bivariate Data

Data that only represents two measurements

How well did you know this?

Not at all

Perfectly

Frequency (f)

The number of times an observation occurs

How well did you know this?

Not at all

Perfectly

Relative Frequency (rf)

The ratio of a frequency to the total number of observations (n) (rf = f/n)

How well did you know this?

Not at all

Perfectly

Cumulative Frequency (cf)

Gives the number of observations less than or equal to a specified value

How well did you know this?

Not at all

Perfectly

Frequency Distribution Table

A table giving all possible values of a variable and their frequencies

How well did you know this?

Not at all

Perfectly

Center

Describes the point around which the data points are spread

How well did you know this?

Not at all

Perfectly

Spread

Describes how the data points are spread (Broadness/Narrowness of the Distribution)

How well did you know this?

Not at all

Perfectly

Shape

Describes what the distribution looks like

How well did you know this?

Not at all

Perfectly

Symmetric Distribution

The left half of the distribution looks the same as the right half

How well did you know this?

Not at all

Perfectly

Left-Skewed Distribution

When the left half of the distribution extends further from the center than its right half

How well did you know this?

Not at all

Perfectly

Right-Skewed Distribution

When the right half of the distribution extends further from the center than its left half

How well did you know this?

Not at all

Perfectly

Clusters/Gaps

Describing whether or not there are gaps in the data or if the data tends to cluster at a single point in the distribution

How well did you know this?

Not at all

Perfectly

Outliers

Study These Flashcards

An observation that is surprisingly different from the rest of the data

Population

Study These Flashcards

The entire group of individuals or things that we are interested in

Sample

Study These Flashcards

The part of the population that is being studied

Mean (mu or X bar)

Study These Flashcards

The average of all data in a given set (Is affected by outliers)

Median (Q2 or M)

Study These Flashcards

The point that divides the measurements in half (Not affected by outliers)

Range

Study These Flashcards

The difference between the largest and smallest measurements in a data set (Is affected by outliers)

Interquartile Range (IQR)

Study These Flashcards

The range of the middle 50% of the data (IQR = Q3 - Q1) and is used along with medians when describing distributions

Standard Deviation (Sigma or S)

Study These Flashcards

Shows how far a point is away from the mean (Is also affected by outliers) and is used with the mean in describing distributions

Variance (Sigma Squared or S Squared)

The square of the standard deviation

Quartiles

Divide a data set into four equal parts (Q1, Q2, Q3)

Percentiles

Divide a set of values into 100 equal parts

Standardized Scores (z-scores)

Tell how many standard deviations away from the mean a specific data point is (z* = measurement - mean/standard deviation)

Pearson's Correlation Coefficient

A numeric measure of the degree and direction of the linear relation between two quantitative variables

Linear Regression Model/Equation

An equation that gives a straight line relationship between two variables (Y = Beta 0 + Beta 1(X) + e)

Least Squares Regression Line

A line that minimizes the error sum of squares of the residuals

Coefficient of Determination

Measures the percent of variation explained by the linear relation between x and y values

Influential Observation

An observation that strongly affects a statistic

Residual Plot

A plot of residuals versus the predicted values of y (used to assess the fit of a model)

Transformation

Used to achieve linearity

Log Transformation

Used to linearize the regression model when the relationship between Y and X suggests a model with a consistently increasing slope (Z = ln(Y))

Square Root Transformation

Used when the spread of observations increases with the mean (Z = Square Root of Y = Y^1/2)

Reciprocal Transformation

Used to minimize the effect of large values of X (Z = 1/Y^1)

Square Transformation

Used when the slope of the relation consistently decreases as the independent variable increases (Z = Y^2)

Power Transformation

Used if the relation between dependent and independent variables is modeled by Y = aX^b (ln(Y) and ln(X))

Conditional Relative Frequency

The relative frequency of one category given that the other category has occurred

Association

Measures the degree of relation between two categorical variables

Exploring Data Flashcards

(42 cards)