Tidy Modeling Flashcards by Shaquawnna Unknown

Models are mathematical tools that can describe a system and capture

relationships in the data given to them

How well did you know this?

Not at all

Perfectly

Predicting future events, determining between-group differences, map-based visualizations, and pattern discovery are all

Purposes for which models can be used

How well did you know this?

Not at all

Perfectly

The utility of a model hinges on its ability to be

Reductive (reduce complex relationships to simpler terms)

How well did you know this?

Not at all

Perfectly

Purpose of a descriptive model

Describe or illustrate characteristics of some data

How well did you know this?

Not at all

Perfectly

Descriptive models need not have a purpose other than visually emphasizing an artifact in the data (T/F)

How well did you know this?

Not at all

Perfectly

Producing a decision for a research question or to explore a particular hypothesis is the goal of

Inferential models

How well did you know this?

Not at all

Perfectly

An inferential model starts with a predefined hypothesis about a population and produces a

Statistical conclusion (rejection of hypothesis, interval estimate, etc.)

How well did you know this?

Not at all

Perfectly

Inferential modeling techniques typically produce a __________ output

Probabilistic (p-value, CI, posterior probability)

How well did you know this?

Not at all

Perfectly

To compute probabilistic outputs, probabilistic assumptions must be made about the data and the underlying processes that generated the data because

The quality of statistical modeling is highly dependent on the pre-defined assumptions and how well the data fit them

How well did you know this?

Not at all

Perfectly

The primary goal of predictive models is that the predicted values have

The highest possible fidelity to the true value of the new data

How well did you know this?

Not at all

Perfectly

Problem type being resolved by predictive models is

Estimation

How well did you know this?

Not at all

Perfectly

In predictive models, more interest is vested in the predicted value than

Why the predicted value is what it is

How well did you know this?

Not at all

Perfectly

Predictive models can include measures of uncertainty (T/F)

True

How well did you know this?

Not at all

Perfectly

Most important factor affecting predictive models…

How the model was developed

How well did you know this?

Not at all

Perfectly

Predictive mechanistic models produce a model equation that

Depends on assumptions

How well did you know this?

Not at all

Perfectly

In predictive mechanistic models, data are used to estimate…

Unknown parameters of the model equation to generate predictions

How well did you know this?

Not at all

Perfectly

In predictive mechanistic models, differential equations are set based on

The model’s assumptions

How well did you know this?

Not at all

Perfectly

Unlike inferential models, predictive mechanistic models allow for data-driven statements on how well the model performs based on

How well it predicts the existing data

How well did you know this?

Not at all

Perfectly

Empirically-driven models are created with _____ assumptions

Study These Flashcards

Vague

Empirically-driven modeling most associated with _______ learning

Study These Flashcards

Machine

KNN modeling is an

Study These Flashcards

Empirically-driven predictive model

How does KNN work?

Study These Flashcards

Given reference data, a new sample is predicted by using the values of K most similar data in the reference set

In predictive models, if the structure of the model is good, then

Study These Flashcards

The predictions would be close to the actual values

Three types of models

Study These Flashcards

Descriptive, inferential, and predictive

Two types of predictive models

Mechanistic and empirically-driven

Ordinary Linear Regression model is descriptive when

Restricted smoothing splines (similar to LOESS) are used to describe trends in data using OLR with specialized terms

OLR is inferential when

Statistical results (p-values for ex) are used for inference

OLR is predictive when

A simple linear regression produces accurate predictions

KNN should not be used for inference because

Its nature makes the math required for inference impossible

The predictive capacities of descriptive and inferential models should not be ignored because of how they model how

How variables relate to the probability of outcomes

Predictive performance relates to how close the model's

Fitted values are to the observed data

Whether a model is appropriate cannot be determined by ______ alone

Statistical significance

Unsupervised models learn patterns, clusters, or other characteristics of data (understand relationships between variables) but lack

An outcome (dependent variable) Examples: principal component analysis (PCA), clustering, and autoencoders

Supervised models have an outcome variable. Examples are...

Linear regression, neural networks, etc.

Two sub-categories of supervised models

Regression (predictable numeric outcome) Classification (predicts outcome based on ordered or unordered set of qualitative values)

Outcomes (what is being predicted) are also known as...

Labels, endpoints, or dependent variables

Independent variables (used to make predictions) also known as...

Predictors, features, or covariates

Exploratory data analysis shows

How variables are related to each other (distributions, typical ranges, etc.)

During EDA, two main questions should be answered, which are ...

How did I come by these data? Are the data relevant to the problem?

Performance metrics should be identified prior to

The analysis process

Phases of Modeling

EDA (iterate between numerical analysis and visualization) Feature engineering (use existent variables to create new variables) Model tuning and selection (specifying or optimizing the structural parameters of models) Model evaluation (assess Model performance, examine residual plots)

A main effect is a Model term that contains a

Single predictor variable

Root mean squared error (RMSE) is used in regression models by taking the difference (residuals) between the

Observed and predicted values in calculations

Primary approach for empirical model validation is to split the existing pool of data into two distinct sets WHICH ARE

Training set - majority of data; used to build model Test set - determines whether model is successful (should only be looked at once, or it becomes part of the modeling process)

Simple random sampling is the most common method used to

Split data into training and test sets

Tidy Modeling Flashcards

(45 cards)