Exploring Data Flashcards

1
Q

Descriptive Methods

A

Ways that data are organized and summarized

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Categorical/Discrete Variable

A

A variable that can only take on a set number of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Continuous Variable

A

A variable that can take on any value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Univariate Data

A

Data that only represents one measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Bivariate Data

A

Data that only represents two measurements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Frequency (f)

A

The number of times an observation occurs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Relative Frequency (rf)

A

The ratio of a frequency to the total number of observations (n) (rf = f/n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Cumulative Frequency (cf)

A

Gives the number of observations less than or equal to a specified value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Frequency Distribution Table

A

A table giving all possible values of a variable and their frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Center

A

Describes the point around which the data points are spread

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Spread

A

Describes how the data points are spread (Broadness/Narrowness of the Distribution)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Shape

A

Describes what the distribution looks like

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Symmetric Distribution

A

The left half of the distribution looks the same as the right half

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Left-Skewed Distribution

A

When the left half of the distribution extends further from the center than its right half

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Right-Skewed Distribution

A

When the right half of the distribution extends further from the center than its left half

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Clusters/Gaps

A

Describing whether or not there are gaps in the data or if the data tends to cluster at a single point in the distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Outliers

A

An observation that is surprisingly different from the rest of the data

18
Q

Population

A

The entire group of individuals or things that we are interested in

19
Q

Sample

A

The part of the population that is being studied

20
Q

Mean (mu or X bar)

A

The average of all data in a given set (Is affected by outliers)

21
Q

Median (Q2 or M)

A

The point that divides the measurements in half (Not affected by outliers)

22
Q

Range

A

The difference between the largest and smallest measurements in a data set (Is affected by outliers)

23
Q

Interquartile Range (IQR)

A

The range of the middle 50% of the data (IQR = Q3 - Q1) and is used along with medians when describing distributions

24
Q

Standard Deviation (Sigma or S)

A

Shows how far a point is away from the mean (Is also affected by outliers) and is used with the mean in describing distributions

25
Q

Variance (Sigma Squared or S Squared)

A

The square of the standard deviation

26
Q

Quartiles

A

Divide a data set into four equal parts (Q1, Q2, Q3)

27
Q

Percentiles

A

Divide a set of values into 100 equal parts

28
Q

Standardized Scores (z-scores)

A

Tell how many standard deviations away from the mean a specific data point is (z* = measurement - mean/standard deviation)

29
Q

Pearson’s Correlation Coefficient

A

A numeric measure of the degree and direction of the linear relation between two quantitative variables

30
Q

Linear Regression Model/Equation

A

An equation that gives a straight line relationship between two variables (Y = Beta 0 + Beta 1(X) + e)

31
Q

Least Squares Regression Line

A

A line that minimizes the error sum of squares of the residuals

32
Q

Coefficient of Determination

A

Measures the percent of variation explained by the linear relation between x and y values

33
Q

Influential Observation

A

An observation that strongly affects a statistic

34
Q

Residual Plot

A

A plot of residuals versus the predicted values of y (used to assess the fit of a model)

35
Q

Transformation

A

Used to achieve linearity

36
Q

Log Transformation

A

Used to linearize the regression model when the relationship between Y and X suggests a model with a consistently increasing slope (Z = ln(Y))

37
Q

Square Root Transformation

A

Used when the spread of observations increases with the mean (Z = Square Root of Y = Y^1/2)

38
Q

Reciprocal Transformation

A

Used to minimize the effect of large values of X (Z = 1/Y^1)

39
Q

Square Transformation

A

Used when the slope of the relation consistently decreases as the independent variable increases (Z = Y^2)

40
Q

Power Transformation

A

Used if the relation between dependent and independent variables is modeled by Y = aX^b (ln(Y) and ln(X))

41
Q

Conditional Relative Frequency

A

The relative frequency of one category given that the other category has occurred

42
Q

Association

A

Measures the degree of relation between two categorical variables