Topic 1 Flashcards

Introduction to Statistics and Data Analysis

1
Q

Why study Statistics?

A
  1. To critically evaluate and interpret numerical data presented in reports, articles, and everyday life.
  2. To conduct and understand statistical analyses in your field of work, helping you make data driven decisions.
  3. To draw meaningful conclusions about a population based on insights gathered from a sample, improving the accuracy of predictions and decision-making processes.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A field of study concerned with the collection, analysis, and interpretation of data to make
decisions, solve problems, and design products and processes in the presence of variability or
uncertainty

A

Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Simply the science of data or the art of learning from data

A

Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Information, either numerical or categorical, collected through observation or experimentation for the purpose of analysis.

A

Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Classification of Data

A
  1. Quantitative (Numerical)
  2. Qualitative (Categorical)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

2 Types of Quantitative

A

Discrete - or counted
Continuous - or measured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

has distinct interval or meaningful difference between values but there is no true zero

A

Interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

has distinct interval or meaningful difference between values, and there is true zero

A

Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

It refers to a point where there is no presence or amount of the variable being
measured, meaning the quantity is completely absent.

A

True zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

labels, names, or classification and does not imply order or ranking

A

Nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

labels, names, or classification and does imply order or ranking

A

Ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The basic idea behind statistical methods of data analysis

A

make inferences about a
population by studying a relatively small sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Engineering/Scientific Method

A
  1. Develop a clear description of the problem.
  2. Identify the important factors affecting the problem or may play a role in its solution.
  3. Propose or refine a model using engineering knowledge of the phenomenon being studied.
  4. Conduct an appropriate experiment to confirm that the proposed solution to the problem is both effective and efficient.
  5. Refine the model on the basis of the observed data.
  6. Manipulate the model to assist in developing a solution to the problem.
  7. Confirm/Validate the solution by conducting an appropriate experiment
  8. Conclusions and recommendations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Role of Statistics in Engineering

A

Statistics provides tools and methods to analyze data, make informed decisions, and improve processes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

An inherent characteristic of data

A

Variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Repeated observations of a system or phenomenon do not yield identical results

A

Variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

developed to address and manage variability in data. It provides a framework for describing this variability and for learning about which potential sources of variability are the most important or which have the greatest impact.

A

Statistical analysis tools

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What if there is no variability?

A

 Simplified Analysis: statistical analysis would be straightforward, relying only on simple descriptive statistics like the mean
 A single observation would tell us everything about the entire population
 Statistics would be reduced to basic arithmetic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Types of Statistics

A
  1. Descriptive Statistics
  2. Inferential Statistics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

 Summarizes and Presents Data: Focuses on organizing and displaying data in a
meaningful way, providing insights into the central tendency, variability, and overall distribution of observations within the sample
 Visual Representation: utilizes graphs like histograms, stem and-leaf plots, scatter plots, dot plots, and box plots to visually capture the characteristics and “footprint” of the sample
 Key Measures: includes essential calculations such as means, medians, and standard
deviations to describe the data’s central location and spread

A

Descriptive Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

 Draws Conclusion: utilizes techniques that enable us to make inferences about a larger
population based on the analysis of a smaller sample
 Prediction: leverages sample data to predict outcomes for the broader population.
 Key Methods: includes hypothesis testing and confidence intervals to assess claims and estimate population parameters

A

Inferential Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

The first step in statistical analysis is collecting data. To make accurate inferences about a population, the sample must represent the population

A

Collecting Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

a collection of all elements that possess a characteristic of interest

A

Population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

a portion of a population selected for study

A

Sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Key Concepts in Collecting Data
1. Population 2. Sample
25
Types of Reasoning
- deductive reasoning - inductive reasoning
26
Statistical Inference: From a Sample to Population
inductive reasoning
27
From Physical Laws to Product Design
deductive reasoning
28
refers to a process of drawing conclusions or making judgments based on available evidence, data, or reasoning rather than direct observation.
Statistical Inference
29
Three Basic Methods of Collecting Data
1. Retrospective Study 2. Observational Study 3. Designed Experiment
30
use existing records of the population
Retrospective Study
31
some crucial information may be unavailable, missing, there may be transcription or recording errors resulting to outliers, or data on other important factors may not have been collected and archived
Retrospective Study
32
refers to a data point that differs significantly from other observations in a data set or it is an unusual value. It can appear either much higher or much lower (extreme values) than the rest of the data.
Outlier
33
What causes an Outlier?
 Human error in transcription or recording  Variability in data  Unusual occurrences (by chance)
34
collect data by observing the population with as minimal interference as possible
Observational Study
35
conducted for a relatively short period of time
Observational Study
36
information of the population for some conditions of interest may be unavailable and some observation be contaminated by extraneous variables
Observational Study
37
the findings would obtain scientific rigorousness through deliberate control of extraneous variables
Designed Experiment
38
collect data by observing the population while controlling conditions on the experiment plan
Designed Experiment
39
makes an inference or decision about which variables are responsible for the observed changes in output performance
Designed Experiment
40
variables that are not of interest but could affect the outcome of a study
Extraneous variables
41
Types of Studies Defining the Use of Sample in Statistical Inference
1. Enumerative Study 2. Analytic Study
42
Makes an inference to the well-defined population from which the sample is selected
Enumerative Study
43
Example: A sample of three semiconductor wafers selected from a lot (a batch or collection of items) to evaluate the lot’s quality based on the sample.
Enumerative Study
44
Makes an inference to a future (conceptual) population
Analytic Study
45
Example: Sample data from the current production line of semiconductor wafers is used to evaluate the quality
Analytic Study
46
Explaining the relationship between variables
Models
47
Models
1. Mechanistic Model 2. Empirical Model
48
established based on the underlying theory, principle, or law of a physical mechanism
Mechanistic Model
49
theoretical model
Mechanistic Model
50
Example: Measuring the current flow in a thin copper wire: the model will be Ohm’s Law because it relates the variables of current (I), voltage (E), and resistance (R) in a thin copper wire.
Mechanistic Model
51
established based on the experience, observation, or experiment of a system (population) under study.
Empirical Model
52
actual, considering factors or variables not considered in the theory
Empirical Model
53
Example: If there is no underlying physical mechanism to explain a phenomenon, an empirical model will be utilized to explain the relationship between the variables of interest (regression model)
Empirical Model
54
Any characteristic, number, or quantity that can be measured, counted, or observed for record
Variable
55
Types of Variable
1. Response or Dependent Variable 2. Explanatory or Independent Variable 3. Confounding Variable
56
outcome or variable of interest that is being measured or observed in an experiment or study
Response or Dependent Variable
57
serve to explain changes in the response
Explanatory or Independent Variable
58
predictor variable
Explanatory or Independent Variable
59
variable that is manipulated or observed to determine its effect on the response variable
Explanatory or Independent Variable
60
an extraneous variable that is related to other variables, thus having an effect on the relationship between these variables
Confounding Variable
61
An important aspect of quantitative data that gives an estimate of a typical value
Measures of Central Tendency
62
average value of data
Mean
63
eliminating a certain percent of both the largest and smallest set of values, insensitive to outlier but not as more insensitive than the median
Trimmed mean
64
middle value of ordered data
Median
65
the value that occurs most often in the data
Mode
66
Effects of Outlier
 Mean is sensitive to outliers, meaning extreme values can significantly affect its result.  Median is resistant to outliers, meaning its result is not significantly affected by extreme values.
67
It helps determine the appropriate measure of central tendency
Data Distribution Shape
68
Data distribution shapes
1. Symmetric (Bell Curve or Normally Distributed) 2. Left Skewed or Negatively Skewed 3. Right Skewed or Positively Skewed
68
mean, median, and mode are all the same
Symmetric
69
 mean < median  long tail on the left
Left Skewed or Negatively Skewed
70
 mean > median  Long tail on the right
Right Skewed or Positively Skewed
71
If the data distribution is skewed
it is better to use the median as a measure of central tendency compared to the mean, as the median is resistant to outliers.
72
Analysis of Variance (ANOVA) and the t-Test, which will be discussed in more detail in later topics.
Parametric tests
73
Common Sense Notion of Sampling
1. Simple Random Sampling 2. Stratified Random Sampling 3. Systematic Random Sampling 4. Cluster Random Sampling
74
every member (sample) of the population has an equal and independent chance of being selected
Simple Random Sampling
75
the population is divided into subgroups (strata), and random samples are taken from each group
Stratified Random Sampling
76
a random starting point is selected, and then every nth member of the population is chosen
Systematic Random Sampling
77
the population is divided into groups (clusters), and entire clusters are randomly selected.
Cluster Random Sampling
78
the entire production line is selected, and every pipe in the selected line is inspected, regardless of size
Cluster Random Sampling
79
the engineer ensures that each size category is proportionally represented in the sample by selecting pipes from all size groups
Stratified Random Sampling
80
Why assign experimental units randomly?
Not randomly assigning experimental units to treatments can lead to biased results and could “camouflages” result.
81
the smallest entity in an experiment that can receive a treatment or be observed independently
Experimental Unit
82
Measures of Variability
1. Sample Range 2. Sample Standard Deviation 3. Sample Variance
83
difference between the largest and smallest value in the dataset
Sample Range
84
a measure of the amount of variation or dispersion from the mean
Sample Standard Deviation
85
a measure of the average squared deviation from the mean
Sample Variance
86
Parameter  a numerical value that describes a characteristics of an entire population  fixed and constant Population mean: “mew” Population variance: 𝜎^2 Population standard deviation:𝜎 Population proportion: p
Statistic  a numerical value that describes a characteristic of a sample  varies from sample to sample Sample mean: 𝑥̅ Sample variance: 𝑠^2 Sample standard deviation: 𝑠 Sample proportion: 𝑝̂
87
Describing Data Graphically
1. type of data (quantitative or qualitative) 2. size of data (n is small or large) 3. main characteristic of the data that you would like to show (skewness, spread)
88
represent each observation by a dot on a single numerical axis
Dot Plot
89
used for smaller data sets, n < 50
Dot Plot
90
a graphical way to display the frequency of data points within a particular data set
Histogram
91
used for larger data sets, n > 50
Histogram
92
condenses large data sets into manageable and readable graphs
Histogram
93
displays digits from the actual data values to denote the frequency of each value
Stem-and-Leaf Plot
94
used for medium-sized data sets, n = 50
Stem-and-Leaf Plot
95
graphical display of the five-number summary displaying the shape, center, spread, and extreme points of a data set
Boxplot
96
observations that are considered to be unusually far from the bulk of the data
Outlier
96
Five-number summary
1. minimum 2. first quartile (Q1) – 25% 3. median or second quartile (Q2) – 50% 4. third quartile (Q3) – 75% 5. maximum
97
a type of graph used to display and examine the relationship between two quantitative variables
Scatter Plot