Examining numerical data Flashcards

Question 1

Q

Scatterplot

Answer

A

A graphical representation of data for two numerical variables, where each point represents a single case. It helps visualize relationships between variables.

Question 2

Q

Dot Plot

Answer

A

A simple graphical display of a single numerical variable where each data point is represented by a dot, often stacked to show frequency.

Question 3

Q

Mean

Answer

A

The mean, or average, is a measure of the center of a data distribution. It is calculated by summing all observations and dividing by the number of observations. Represented as x̄ in a sample.

Question 4

Q

Histogram

Answer

A

A histogram is a graphical representation of data where observations are grouped into bins, and the frequency of observations in each bin is represented by the height of a bar. It provides an overview of the distribution of numerical data, especially useful for large datasets.

Question 5

Q

Data Density

Answer

A

The concentration of data in different regions of a histogram, where higher bars indicate where data points are more common.

Question 6

Q

Right Skewed

Answer

A

A data distribution with a longer tail on the right side, meaning most values are concentrated on the left.

Question 7

Q

Left Skewed

Answer

A

A data distribution with a longer tail on the left side, meaning most values are concentrated on the right.

Question 8

Q

Symmetric

Answer

A

A data distribution where the left and right sides are approximately mirror images, with no long tail on either side.

Question 9

Q

Mode

Answer

A

A prominent peak in a distribution, representing the most frequent value or range of values in a data set.

Question 10

Q

Unimodal

Answer

A

A distribution with a single prominent peak.

Question 11

Q

Bimodal

Answer

A

A distribution with two prominent peaks.

Question 12

Q

Multimodal

Answer

A

A distribution with more than two prominent peaks.

Question 13

Q

Deviation

Answer

A

The distance of an observation from the mean.

Question 14

Q

Standard Deviation

Answer

A

A measure of variability that describes how far the typical observation is from the mean, calculated as the square root of the variance.

Question 15

Q

Box Plot

Answer

A

A graphical summary of a data set using five statistics (minimum, first quartile, median, third quartile, and maximum) while also plotting unusual observations.

Question 16

Q

Median

Answer

A

The median is the value that splits the data in half, with 50% of the observations falling below and 50% falling above it. If the number of observations is even, the median is the average of the two middle values. If the number of observations is odd, the median is the middle value itself.

Question 17

Q

Interquartile Range (IQR)

Answer

A

The interquartile range (IQR) is the distance between the first quartile (Q1) and the third quartile (Q3) and represents the middle 50% of the data. It measures the variability in the central portion of the data set.

Question 18

Q

First Quartile (Q1)

Answer

A

The first quartile (Q1) is the 25th percentile of the data, meaning that 25% of the data points fall below this value.

Question 19

Q

Third Quartile (Q3)

Answer

A

The third quartile (Q3) is the 75th percentile of the data, meaning that 75% of the data points fall below this value.

Question 20

Q

Whiskers

Answer

A

Whiskers extend out from the box plot and attempt to capture the data outside the interquartile range (IQR), but their reach is limited to 1.5 × IQR. They help show the spread of the data, but any data points beyond the whiskers are considered outliers.

Question 21

Q

Outliers

Answer

A

Outliers are data points that lie beyond the whiskers of a box plot, meaning they are unusually distant from the rest of the data. They are often identified as points that fall outside of 1.5 × IQR from the first or third quartile.

Question 22

Q

Robust Statistics

Answer

A

Robust statistics, such as the median and interquartile range (IQR), are resistant to the influence of extreme observations. These statistics are less affected by outliers or unusual data points, making them more stable in the presence of extreme values compared to the mean and standard deviation.

Question 23

Q

Transformation

Answer

A

A transformation is a rescaling of data using a function, such as taking the logarithm, square root, or inverse of a variable. This technique is particularly useful for adjusting strongly skewed data, reducing the impact of outliers, and making it easier to build statistical models. For example, applying a logarithmic transformation can make data more symmetric and reveal hidden patterns, like relationships in a scatterplot. Transformations help in visualizing data differently, straightening nonlinear relationships, or improving the accuracy of statistical models.

Question 24

Q

Intensity Map

Answer

A

An intensity map is a geographic visualization where colors represent varying values of a numerical variable across different locations. It is used to display trends and patterns in data, particularly for variables that have spatial characteristics, like poverty rate, unemployment rate, or homeownership rate. Intensity maps help identify geographic trends and generate research hypotheses, though they are not ideal for pinpointing precise values for specific locations.