Data Analysis Flashcards

1
Q

Methods of presenting Data

A

Data can be organized using tables, graphical methods, and numerical methods. A variable represents a characteristic that varies within a population and can be:

Quantitative (numerical): e.g., height, age
Categorical (nonnumerical): e.g., eye color, political preference.

The distribution of a variable describes how frequently different values occur.

Frequency: The count of a specific value in the dataset.

Relative Frequency: The proportion of a value in relation to the total dataset (expressed as a percentage, fraction, or decimal).

Frequency Distributions and Relative Frequency Distributions use tables or graphs to summarize data effectively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Tables

A

Tables help organize and present data clearly. They are often used for frequency distributions (showing how often values occur) and relative frequency distributions (showing the proportion of each value).

A frequency distribution table lists categories or numerical values in one column and their frequencies in another.

A relative frequency table follows the same format but shows proportions (percentages, fractions, or decimals) instead of counts.

If there are many unique values, data can be grouped into ranges to simplify the table.

  1. Understand the Structure of Tables
    Rows and columns organize data clearly—always check labels to understand what’s being measured.
    Identify categories (qualitative) vs. numerical values (quantitative).
  2. Frequency vs. Relative Frequency
    Frequency = The number of times a value appears.

Relative Frequency = Frequency ÷ Total, expressed as a fraction, decimal, or percentage.

Make sure relative frequencies sum to 1 (or 100%).

  1. Recognize Grouped Data
    If there are too many unique values, data is often grouped into ranges.
    Pay attention to range boundaries (e.g., “71-80” includes both 71 and 80).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Mean (Average)

A

Mean (Average)

Mean=
∑x / n

Sum up all the numbers (∑x)

Divide by the total number of numbers (n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Median

A

Median
Step 1: Arrange numbers in order from smallest to largest.

Step 2: If there’s an odd number of values:

Median = Middlevalue

If there’s an even number of values:

Median = (MiddleValue1+MiddleValue2) /2

Example (odd numbers):
Numbers: 4, 4, 6, 7, 10 → Middle number is 6

median = 6

Example (even numbers):
Numbers: 1, 2, 3, 4 → Middle numbers are 2 and 3

Median = (2+3) / 2 =2.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Mode

A

The number that appears most often
If one number appears the most → it’s the mode
If multiple numbers appear the most → multiple modes
If no number repeats → no mode
Example:
Numbers: 1, 2, 3, 3, 4, 4, 4, 5 → Mode = 4 (because it appears the most

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Quartiles (Splitting into 4 Parts)

A

Quartiles (Splitting into 4 Parts)
Think of quartiles as cutting the list into 4 equal chunks:

Q1 (First Quartile): The number that splits off the first 25% of the data.

Q2 (Second Quartile or Median): The middle of the whole list (50% mark).

Q3 (Third Quartile): The number that splits off 75% of the data.

So if we lined up 16 numbers in order, we’d:

Find the median (Q2)—the middle of the list.
Take the first half of the numbers and find the middle of that → That’s Q1.
Take the second half of the numbers and find the middle of that → That’s Q3.
For example, if we have this list:
2, 4, 4, 5, 7, 7, 7, 7, 7, 7, 8, 8, 9, 9, 9, 9

Q2 (Median) = 7 (middle number)
Q1 = 6 (middle of first half)
Q3 = 8.5 (middle of second half)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Percentiles (Splitting into 100 Parts)

A

Now, if we want even smaller sections, we use percentiles, which break the list into 100 tiny pieces.

Q1 is the 25th percentile (25% of data falls below it).

Q2 (median) is the 50th percentile (halfway point).

Q3 is the 75th percentile (75% of data falls below it).

Percentiles are useful when dealing with big lists, like test scores or income levels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Range

A

Range – The difference between the biggest and smallest number.

Range=MaximumValue−MinimumValue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Interquartile Range (IQR)

A

Interquartile Range (IQR) – The middle 50% of your data.

This ignores the extreme numbers (outliers) and focuses on the “core” data.

Example: If your numbers are 2, 4, 5, 7, 8, 9, the middle half (quartiles) might go from 4 to 8, so IQR = 8 - 4 = 4.

formula: IQR= Q3 −Q1

Where:

Q
1
Q
1

Q1 (First Quartile) = 25th percentile (middle of the lower half)

Q3 (Third Quartile) = 75th percentile (middle of the upper half)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Standard Deviation (σ)

A

Standard Deviation (SD) – Measures how far each number is from the average.

If numbers are super close to the average, SD is small. If they’re spread out, SD is big.

Formula:

σ = √ ∑(xi −xˉ)^2 / n

Where:

xi = each data value
xˉ = mean (average) of the data
n = number of data points

Steps:

  1. Find the mean: ∑(xi / n
  2. Subtract the mean from each data point
  3. Square each result
  4. Find the average of those squared differences
  5. Take the square root of that average

Example for numbers 0, 7, 8, 10, 10:

Mean = 0+7+8+10+10 =7

Squared differences:

(7−0) 2, (7−7)2 ,(7−8) 2, (7−10)2 ,(7−10) 2→ 4, 9,0,1, 9, 9

Average = 49+0+1+9+9 / 5 =13.6

Standard deviation = √ 13.6 ≈ 3.7

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Standard Score (Z-Score)

A

Z= (x − xˉ) / σ

Where:

x = the data value
xˉ = mean
σ = standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Sets

A

A set is a collection of objects (called elements or members) that share a common property.

Example: The set of even digits: {0, 2, 4, 6, 8}.
Types of Sets

Finite Set: A set with a countable number of elements. Example: {1, 2, 3, 4}.

Infinite Set: A set with an uncountable number of elements. Example: The set of all integers.

Empty Set (∅): A set with no elements. Example: The set of all real numbers greater than 5 and less than 5.

Nonempty Set: Any set that has at least one element.
Subset (⊆): A set A is a subset of set B if all elements of A are also in B. Example: {2, 8} ⊆ {0, 2, 4, 6, 8}.

Universal Set (U): The set containing all elements under discussion.

Notation

Number of Elements in a Set: The number of elements in a set S is denoted as ∣S∣

Example: If S={6.2,−9,π,0.01,0}, then ∣S∣=5.
For the empty set: ∣∅∣=0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Lists

A

A list is a collection of objects in a specific order, where repetitions matter.

Example: The lists (1, 2, 3, 2) and (1, 2, 2, 3) are different.

a list and a set are different because on lists orders do matter and repetitions are counted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Intersection ( ∩ )

A

The intersection of two sets S and T is the set of elements that are in both S and T.

Formula: S∩T={x∣x∈Sandx∈T}

Example:

If S={1,2,3} and T={2,3,4}, then

S∩T={2,3}.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Union ( ∪ )

A

The union of two sets S and T is the set of elements that are in either S, T, or both.

Formula:
S∪T={x∣x∈Sorx∈T}

Example: If

S={1,2,3} and

T={2,3,4}, then

S∪T={1,2,3,4}.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Disjoint Sets

A

Two sets are disjoint if they have no elements in common.

Formula:
S∩T=∅.

Example: If A={1,2,3} and B={4,5,6}, then

A∩B=∅.

17
Q

Venn Diagrams

A

A Venn diagram is a visual representation of sets and their relationships using overlapping circles.

Each circle represents a set.
Overlapping regions represent intersections.
The universal set (U) is often represented by a rectangle containing all sets.

18
Q

Inclusion-Exclusion Principle

A

A formula used to count elements in the union of two finite sets while avoiding double-counting elements in their intersection.

Formula

For two sets A and B:

∣A∪B∣=∣A∣+∣B∣−∣A∩B∣

For two disjoint sets B and C:

∣B∪C∣=∣B∣+∣C∣

(B∩C=∅, so no need to subtract the intersection).

Example

If:

∣A∣=30
∣B∣=25
∣A∩B∣=10

Then: ∣A∪B∣=30+25−10=45

19
Q

Multiplication Principle (Fundamental Counting Principle)

A

The Multiplication Principle states that if one event can occur in m ways and a second event can occur in
n ways, then the total number of ways both events can occur together is:

TotalWays=m×n

General Formula

If a sequence of k independent events occurs in
n1, n2, …. nK ways respectively, then the total number of ways all events can happen is:

​ n1* n2* …*nK

Example

A restaurant offers 4 appetizers, 3 main courses, and 5 desserts.
The number of different meals (one appetizer, one main course, and one dessert) is:

4×3×5=60

20
Q

Permutations

A

A permutation is just a fancy word for arranging things in a specific order. Order matters in permutations.

Formula:
P(n,k)= n! / (n−k)!

n = total objects
k = number of objects you’re picking

Example:
If you want to pick 5 digits from 7 and arrange them:

P(7,5)=7! / (7−5)! = 7! / 2! = 7×6×5×4×3×2×1 / 2×1

Cancel out 2 × 1 (because it appears in both the top and bottom):

7×6×5×4×3=2,520

So, there are 2,520 ways to make a 5-digit number using 5 unique digits from 1 to 7.

21
Q

Factorial (!)

A

A factorial is when you multiply a number by all the numbers before it. It’s written as n! (read as “n factorial”).

formula: n!=n×(n−1)×(n−2)×…×3×2×1

Examples:

3! = 3 × 2 × 1 = 6
4! = 4 × 3 × 2 × 1 = 24
5! = 5 × 4 × 3 × 2 × 1 = 120
0! is always 1 (just a rule to make math easier).
Factorials help us quickly count how many ways we can arrange things.

22
Q

Combinations

A

Combinations are used when you want to choose items from a set, but the order doesn’t matter.

Formula for Combinations (n choose k):

C(n,k)= n! / k!(n−k)!

Where:

n = total number of items
k = number of items you’re choosing
! = factorial

Example:
How many ways can you pick 3 students from a group of 5?

C(5,3)= 5! / 3!(5−3)! = 5! / 3!2 = 5×4×3! / 3!×2×1
= 5×4 / 2× =10

So there are 10 ways to choose 3 students from 5.

23
Q

Basics of Probability

A

Definition: Probability is a numerical way to describe uncertainty.

General Probability Rules

Rule 1 (Certain & Impossible Events):

P(certainevent)=1
P(impossibleevent)=0

Rule 2 (Complement Rule):
The probability that an event does not occur:

P(notE)=1−P(E)

Rule 3 (Sum of All Probabilities):
The sum of the probabilities of all possible outcomes is 1.

24
Q

Probability formula

A

If all outcomes are equally likely, the probability of an event E is:

P(E)=
NumberofoutcomesinE/ Totalnumberofoutcomesinsamplespace

Example 1: Probability of rolling a 4:
P (4) = 1/6

25
Q

Mutually Exclusive Events (Cannot Happen Together)

A

Two events cannot happen at the same time.

Example: Rolling an odd number and rolling an even number.

Formula:

P(EorF)=P(E)+P(F)

26
Q

Independent Events (One Does Not Affect the Other)

A

Two events do not influence each other.

Example: Rolling a die twice → The result of the first roll does not affect the second roll.

Formula:

P(EandF)=P(E)×P(F)

27
Q

Data Distribution

A

How numerical data is spread out or organized.

28
Q

Relative Frequency

A

The proportion of times a value appears compared to the total data set.

formula : frequency of value / total number or data points

29
Q

Standard Deviation (SD)

A

Measures how spread out the data is around the mean (average).

d= √∑(xi −m)^2 / n

Measures how spread out the data is around the mean.

Standard Deviation Ranges

1 SD from mean: m±d → Includes ~68% of data
2 SD from mean: m±2d → Includes ~95% of data
3 SD from mean: m±3d → Includes ~99.7% of data

30
Q

Total Area Under Distribution Curve

A

TotalArea=1

The total area under a probability distribution (or histogram bars) always equals 1 (or 100% of data).

31
Q

random variable

A

is just a way to represent uncertain outcomes with numbers. Instead of saying “something random happens,” we assign a number to each possible outcome.

32
Q

mean (expected value)

A

is the average outcome you’d expect if you repeated the experiment many times.

This is used when you have probabilities associated with different values.

The formula is:

E(X)=∑X⋅P(X)

where:

X: is each possible value of the random variable

P(X): is the probability of that value

33
Q

normal distribution (bell curve)

A

is a bell-shaped curve that represents naturally occurring data.

The bell-shaped curve appears when you graph the normal distribution on a coordinate plane.

34
Q

Mean (m), Standard Deviation (d), and the Bell Curve

A

Mean (m): The middle of the data, also called the average. It’s the highest point on the bell curve.

Standard Deviation (d): A measure of how spread out the data is. A small d means the data is tightly packed near the mean, while a large d means the data is more spread out.

The bell curve is symmetric, meaning the left and right sides look the same.

35
Q

normal distribution properties

A
  1. Mean = Median = Mode → The most common value is also the average, so everything is centered.
  2. Symmetry → The left and right sides of the curve are mirror images.
  3. most values (around 68%) aren’t too far from the average.
    One standard deviation (σ) tells us how spread out the data is. If data is normally distributed, about 68% of values fall between:

Mean−σ to Mean+σ

example: If test scores have a mean of 75 and a standard deviation of 5, About 68% of students scored between 70 and 80 (75 ± 5).

  1. If we go a bit farther from the mean (2 standard deviations), we cover almost all values (95%).
    If data is normally distributed, about 95% of values fall between:
    Mean−2σ to Mean+2σ

Example:
Using the same test score example (mean = 75, standard deviation = 5),

About 95% of students scored between 65 and 85 (75 ± 10).

36
Q

Probability and the Normal Curve

A

The total area under the curve represents 100% of the data.
If you randomly pick a person, the chance of them falling in a certain range is the area under that section of the curve.
Example:

The probability of someone being taller than the average is 50% (since the curve is symmetrical).
The probability of someone having an IQ between 85 and 115 is 68% (because that’s within 1 standard deviation).

37
Q

Standard Normal Distribution

A

This is just a special normal distribution where the mean is 0 and the standard deviation is 1.

Any data point can be converted into this form using the formula:

Z= (X−m) / d

Where:

X = the data value

m = the mean

d = the standard deviation

Example: If your test score is 85, the average is 70, and the standard deviation is 10:

Z = (85−70) / 10
= 15/10
=1.5