Data Science and Statistics Vocab Flashcards
What are the 3 modeling types and describe
Here is a brief description of the three modeling types:
Continuous -are numeric values used directly in an analysis.
Ordinal - values are category labels, but their order is meaningful.
Nominal - values are treated as unordered, categorical names of levels.
The ordinal and nominal modeling types are treated the same in most analyses, and are often referred to collectively as categorical.
Distribution
Provides a histogram for continuous data and a bar chart for nominal or ordinal data, along with
relevant summary statistics. Presents options for many one-sample analyses, based on modeling
type.
Fit Y by X
Shows plots that describe the relationship between any two variables. Provides two-sample
analyses based on the modeling types of the two variables, such as bivariate, oneway, logistic,
and contingency analysis.
Matched Pairs
Analyzes two continuous variables that are measurements on the same experimental unit or
subject.
Tabulate
Constructs tables of descriptive statistics using an interactive interface.
Fit Model
Fits models involving one or more Y variables and multiple X variables. Techniques include
standard least squares, stepwise, generalized regression, mixed models, MANOVA, loglinear
variance, logistic, proportional hazards, parametric survival, generalized linear models, partial
least squares, and response screening.
Modeling
Offers various modeling techniques: nonlinear, neural, Gaussian process, partition analysis, time
series, and model comparison. Screening is for designs with many effects. Response screening
is for a larger number of effects across groups.
Multivariate
Methods
Offers techniques for exploring relationships among multiple variables: multivariate fitting,
clustering, principal components, discriminant analysis, and partial least squares.
Quality and
Process
Offers techniques for evaluating quality-related issues in processes or products: control charts
(including an interactive control chart builder), measurement systems analysis, variability and
attribute gauge charts, capability charts on multiple responses, Pareto plots, and fishbone
(Ishikawa Cause and Effect) diagrams.
Reliability and
Survival
Offers techniques for fitting survival and reliability data: life distribution, fit life by x, recurrence
analysis, degradation, reliability growth and forecasting, product-limit survival fit, parametric
survival distributions, and proportional hazards modeling.
Consumer
Research
Provides methods for studying consumer preferences. Options include categorical response
survey analysis, factor analysis, choice models, item analysis, and uplift models for identifying
the positive affects of marketing actions.
t test
T-test calculation is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is unknown and is replaced by an estimate based on the data, the test statistic (under certain conditions) follows a Students t distribution.
X axis
x axis is defined as the horizontal number line in a Cartesian Coordinate System.
Y Axis
y axis is defined as the vertical number line in a Cartesian Coordinate System.
Box Plot
Used to display the response distribution at different combinations of factor levels. Box plots can reveal differences in the response Mean at different levels, suggesting Main Effects. Box plots can also reveal whether the response variation is homogenous across factor levels, an assumption made in ANOVA.