Examining numerical data Flashcards

1
Q

Scatterplot

A

A graphical representation of data for two numerical variables, where each point represents a single case. It helps visualize relationships between variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Dot Plot

A

A simple graphical display of a single numerical variable where each data point is represented by a dot, often stacked to show frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Mean

A

The mean, or average, is a measure of the center of a data distribution. It is calculated by summing all observations and dividing by the number of observations. Represented as x̄ in a sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Histogram

A

A histogram is a graphical representation of data where observations are grouped into bins, and the frequency of observations in each bin is represented by the height of a bar. It provides an overview of the distribution of numerical data, especially useful for large datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data Density

A

The concentration of data in different regions of a histogram, where higher bars indicate where data points are more common.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Right Skewed

A

A data distribution with a longer tail on the right side, meaning most values are concentrated on the left.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Left Skewed

A

A data distribution with a longer tail on the left side, meaning most values are concentrated on the right.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Symmetric

A

A data distribution where the left and right sides are approximately mirror images, with no long tail on either side.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Mode

A

A prominent peak in a distribution, representing the most frequent value or range of values in a data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Unimodal

A

A distribution with a single prominent peak.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Bimodal

A

A distribution with two prominent peaks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Multimodal

A

A distribution with more than two prominent peaks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Deviation

A

The distance of an observation from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Standard Deviation

A

A measure of variability that describes how far the typical observation is from the mean, calculated as the square root of the variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Box Plot

A

A graphical summary of a data set using five statistics (minimum, first quartile, median, third quartile, and maximum) while also plotting unusual observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Median

A

The median is the value that splits the data in half, with 50% of the observations falling below and 50% falling above it. If the number of observations is even, the median is the average of the two middle values. If the number of observations is odd, the median is the middle value itself.

16
Q

Interquartile Range (IQR)

A

The interquartile range (IQR) is the distance between the first quartile (Q1) and the third quartile (Q3) and represents the middle 50% of the data. It measures the variability in the central portion of the data set.

17
Q

First Quartile (Q1)

A

The first quartile (Q1) is the 25th percentile of the data, meaning that 25% of the data points fall below this value.

18
Q

Third Quartile (Q3)

A

The third quartile (Q3) is the 75th percentile of the data, meaning that 75% of the data points fall below this value.

19
Q

Whiskers

A

Whiskers extend out from the box plot and attempt to capture the data outside the interquartile range (IQR), but their reach is limited to 1.5 × IQR. They help show the spread of the data, but any data points beyond the whiskers are considered outliers.

20
Q

Outliers

A

Outliers are data points that lie beyond the whiskers of a box plot, meaning they are unusually distant from the rest of the data. They are often identified as points that fall outside of 1.5 × IQR from the first or third quartile.

21
Q

Robust Statistics

A

Robust statistics, such as the median and interquartile range (IQR), are resistant to the influence of extreme observations. These statistics are less affected by outliers or unusual data points, making them more stable in the presence of extreme values compared to the mean and standard deviation.

22
Q

Transformation

A

A transformation is a rescaling of data using a function, such as taking the logarithm, square root, or inverse of a variable. This technique is particularly useful for adjusting strongly skewed data, reducing the impact of outliers, and making it easier to build statistical models. For example, applying a logarithmic transformation can make data more symmetric and reveal hidden patterns, like relationships in a scatterplot. Transformations help in visualizing data differently, straightening nonlinear relationships, or improving the accuracy of statistical models.

23
Q

Intensity Map

A

An intensity map is a geographic visualization where colors represent varying values of a numerical variable across different locations. It is used to display trends and patterns in data, particularly for variables that have spatial characteristics, like poverty rate, unemployment rate, or homeownership rate. Intensity maps help identify geographic trends and generate research hypotheses, though they are not ideal for pinpointing precise values for specific locations.