midterm Flashcards by Jerry Benn

Define Logistic Regression

Logistic regression is a way to predict yes or no answers based on given data. It takes numbers as input, does some math, and gives a result between 0 and 1, which helps decide between two choices (like pass/fail or spam/not spam).

How well did you know this?

Not at all

Perfectly

Define Sample

A small group chosen from a larger group (population) to study.
Example: Surveying 100 students from a school of 1,000 students.

How well did you know this?

Not at all

Perfectly

Define Population

The entire group we are interested in studying.
Example: All the students in the school.

How well did you know this?

Not at all

Perfectly

independent Variable

A factor that we change or control in an experiment to see its effect.

How well did you know this?

Not at all

Perfectly

Dependent Variable

The outcome that we measure, which depends on the independent variable.

How well did you know this?

Not at all

Perfectly

Augmentation

Making more data by slightly modifying existing data to improve a model’s performance.

How well did you know this?

Not at all

Perfectly

Oversampling

Creating more copies of data from underrepresented groups to balance the dataset.

How well did you know this?

Not at all

Perfectly

undersampling

Reducing the amount of data from overrepresented groups to balance the dataset.

How well did you know this?

Not at all

Perfectly

Nominal Data

Categories with no specific order.

How well did you know this?

Not at all

Perfectly

Ordinal Data

Categories with a meaningful order, but the difference between them is not measurable.

How well did you know this?

Not at all

Perfectly

Interval Data

Numbers with equal spacing between values, but no true zero point.

How well did you know this?

Not at all

Perfectly

Ratio Data

Like interval data, but with a meaningful zero point, so you can compare ratios.

How well did you know this?

Not at all

Perfectly

Qualitative Data

Qualitative data refers to non-numerical information that describes characteristics, attributes, or properties of something. It is used to capture insights, opinions, emotions, behaviors, and descriptions that cannot be easily measured or counted.

How well did you know this?

Not at all

Perfectly

Probabality Distribution

A probability distribution shows how likely different outcomes are. It assigns a probability (between 0 and 1) to each possible event.

Example: The probability of rolling a dice and getting a 3 is 1/6 = 0.1667.

How well did you know this?

Not at all

Perfectly

Frequency Distribution

frequency distribution shows how many times each value appears in a dataset.

Example: If you survey 10 people about their favorite fruit:
🍎 Apple - 4
🍌 Banana - 3
🍇 Grapes - 2
🍊 Orange - 1

How well did you know this?

Not at all

Perfectly

Cumulative Distribution

Study These Flashcards

A cumulative distribution shows the total count or percentage as you move through the dataset.

Example (Cumulative Frequency)
Apple: 4
Apple + Banana: 4 + 3 = 7
Apple + Banana + Grapes: 7 + 2 = 9
Apple + Banana + Grapes + Orange: 9 + 1 = 10
Each value adds up to the total number of observations.

How to Convert a Frequency Distribution to a Probability Distribution (%)

Study These Flashcards

To convert a frequency table into a probability distribution, follow these steps:

Find the total frequency (sum of all occurrences).
Divide each frequency by the total to get probability.
Multiply by 100 to get percentage.

Cumulative Percentages

Study These Flashcards

This is the cumulative sum of percentages as we move down the list.

Relative Percentages

Study These Flashcards

relative percentage shows how each category compares to the total. It’s the same as the percentage in the probability distribution.

Population Mean (μ) vs. Sample Mean (x̄)

Study These Flashcards

Pop mean: The average of all values in an entire population.

sample mean: The average of values in a selected sample from the population.

Example
Population Mean: If we measure the height of all students in a school, we get the exact population mean (μ).
Sample Mean: If we measure the height of just 50 students, we estimate the mean using the sample mean (x̄).

The Importance of Data Cleaning in Analysis

Study These Flashcards

Data cleaning is the process of fixing or removing incorrect, incomplete, duplicate, or irrelevant data before analysis. It plays a vital role because poor-quality data leads to misleading results and bad decisions.

Improves Accuracy – Dirty data (errors, missing values, duplicates) can lead to wrong conclusions. Cleaning ensures the analysis is based on reliable data.
Enhances Efficiency – Clean data reduces processing time and improves the performance of machine learning models and statistical calculations.
Avoids Bias & Misinterpretation – Inconsistent or incomplete data can skew results, leading to biased decisions.

Which type of mean is used to describe a portion of individuals in a given population?
A. sample mean
B. population mean

Study These Flashcards

Ans: A

When would it be appropriate to calculate a population mean?
A. When data are measured for a portion of individuals from a population.
B. When the sample mean is not available.
C. When data are measured for all members of a population.
D. When it is not possible to measure all data in a population.

Study These Flashcards

Ans: C

The ______ is the sum of all scores (in a sample or population) divided by the number of scores
summed.
A. mode
B. median
C. mean
D. range

Study These Flashcards

Ans: C

What terms refer to the meaning of each of the following measures, respectively: mean, median, and mode. A. middle, most, and average B. average, middle, and most C. average, most, and middle D. most, average, and middle

Ans: B

A researcher records the following data for the number of different food items chosen by seven participants in a buffet-style setting: 1, 6, 2, 5, 4, 3, and 7. Is the mean equal to the median in this example? A. Yes. B. No; the median is larger than the mean. C. No; the mean is larger than the median. D. There is not enough information to answer this question

Ans: A

A researcher records the following data for the number of complaints filed (per day) following a controversial policy change at a local business: 3, 8, 5, 0, 4, 6, 2, 1, 1, 4, 2, and 0. Is the mean equal to the median in this example? A. Yes. B. No; the median is larger than the mean. C. No; the mean is larger than the median. D. There is not enough information to answer this question.

Ans: C

A staff member records the number of victories (per 10-game season) for a new football coach during his first three seasons with a team. The coach won 1, 3, and 8 games, respectively, over the three seasons. Which conclusion is appropriate? A. The football coach won an average of four games per season. B. The football coach won a median of three games per season. C. all of these

The ______ is the middle value in a distribution of scores that are listed in numeric order. A. mean B. median C. mode D. range

The ______ is the value that occurs most often or at the highest frequency in a distribution. A. mean B. median C. mode D. range

A researcher records 17 scores. What is the median position of these scores? A. the 9th score in numeric order B. the average of the 9th and 10th scores in numeric order C. the average of the 8th and 9th scores in numeric order D. It’s not possible to know this without the raw scores.

A researcher records the following data for how participants rated the likability (on a scale from 1 = not liked at all to 7 = very likable) of an individual who blushed after making a mistake: 5, 4, 7, 6, 6, 4, 6, 7, 2, 5, and 6. Is the mode equal to the median in this example? A. Yes. B. No; the median is larger than the mode. C. No; the mode is larger than the median. D. There is not enough information to answer this question

Which of the following will decrease the value of the mean? A. deleting a score below the mean B. adding a score below the mean C. adding a score exactly equal to the mean D. none of these

Which of the following is NOT a characteristic of the mean? A. Add a score above the mean and the mean will increase. B. Add a score below the mean and the mean will decrease. C. Delete a score below the mean and the mean will increase. D. Delete a score above the mean and the mean will increase.

A researcher asks participants to estimate the height (in inches) of a statue that was in a waiting area. The researcher records the following estimates: 40, 46, 30, 50, and 34. If the researcher removes the estimate of 40 (say, due to an experimenter error), then the value of the mean will ______. A. decrease B. increase C. remain the same D. become negative

Quantitative

refers to numerical data that can be measured, counted, and analyzed using statistical methods. It represents quantities, amounts, or numerical values that provide objective, measurable information.

midterm Flashcards

(36 cards)