Week 3 & 4 Flashcards
Explain the difference between R-squared and Adjusted R-squared as measures of goodness of fit.
●
R-squared: Represents the proportion of variance in the dependent variable that is explained by the independent variables. Higher R-squared values generally indicate a better fit.
●
Adjusted R-squared: Similar to R-squared, but it takes into account the number of variables in the model. It penalizes models with many insignificant variables.
●
Adjusted R-squared is generally considered a more reliable measure because it provides a more balanced view of model fit, especially when comparing models with different numbers of variables
What is bootstrapping and how does it help assess coefficient stability?
Bootstrapping is a resampling technique that involves repeatedly re-estimating the model with different random subsets of the data.
●
By analyzing the distribution of coefficient values across these subsets, we can determine if the coefficients are stable (consistently similar) or if they fluctuate widely.
●
This helps to understand the robustness of the model and the reliability of the estimated coefficients
Explain the gravity model and its key assumptions.
The gravity model is a widely used method for trip distribution that draws an analogy from Newton’s law of gravity.
●
Key Assumptions:
○
The number of trips between two zones is proportional to:
■
The trip production of the origin zone
■
The trip attraction of the destination zone
○
The number of trips is inversely proportional to the distance or travel impedance between the two zones.
●
The model reflects the idea that people are more likely to travel to closer and more attractive destinations.
What is the deterrence function and how is it used in the gravity model?
●
The deterrence function captures the impact of travel impedance (e.g., travel time, distance, cost) on the likelihood of trips between zones.
●
It is a decreasing function, meaning that as travel impedance increases, the number of trips is expected to decrease.
●
Common Functional Forms:
○
Exponential: Provides a smooth decline, suitable for capturing that trips are still possible even at longer distances.
○
Polynomial: May lead to unrealistic results (infinite trips at zero distance) and is generally less preferred.
Trip Generation Model
Caveats
Our assumption is that model coefficients are
stable (over time) and transferable (over space)
* Trip generation models ignore the impact of
transport supply performance (congestion,
delays, waiting times, and accessibility) on trip
making
* Trip chaining is ignored
* Tour-based and activity-based models.
* Trip production models are more reliable than
attraction models
* Scale attractions results to match total
productions
3
What is choice modeling and how does it differ from linear regression in transport modeling?
●
Choice modeling focuses on predicting the probability** of individuals choosing a specific alternative from a set of discrete options (e.g., choosing between driving, public transport, cycling, walking).
Discrete choice models are used
to understand and predict a
decision maker’s choice of one
discrete alternative from a
choice set of alternatives
●
Linear regression typically models continuous variables (e.g., predicting the total number of trips).
What is Random Utility Theory and how does it relate to choice modeling?
●
Random Utility Theory states that every individual has a utility associated with each alternative and they choose the alternative that gives them the highest perceived utility. The role of a choice modeller is to predict this utility for a population through a function. This function, however, can be customized for specific demographics such as age or employment status
●
It assumes that individuals choose the alternative that provides them with the highest utility (satisfaction or benefit).
●
Key Points:
○
Utility is not directly observable; it is a latent variable that needs to be estimated.
○
Utility is composed of both deterministic components (measurable attributes of alternatives) and random components (unobserved factors and individual preferences).
●
Choice models aim to approximate these utility functions and predict the probabilities of choosing different alternatives based on their estimated utilities.
●
Why Not Zero Probability for Lower Utility?
The MNL assigns non-zero probabilities to all alternatives because human choices aren’t perfectly predictable. Even if an alternative seems objectively worse based on measurable attributes, unobserved factors (individual preferences, situational context, etc.) could lead someone to choose it
MNL assumtions
The MNL is a widely used discrete choice model that relies on a few key assumptions:
●
Gumbel Distribution: The error terms are assumed to follow a Gumbel distribution. This assumption allows for a mathematically convenient formula to calculate the probability of choosing each alternative.
●
The Probability Equation: The probability of choosing an alternative is determined by the exponential of its utility divided by the sum of the exponentials of the utilities of all alternatives. This equation captures the idea that even alternatives with lower systematic utilities have a non-zero probability of being chosen due to the unobserved factors included in the error term.
●
Why Not Zero Probability for Lower Utility?
What are deterministic choices?
Deterministic choices are easy to predict because they involve maximizing or minimizing one single quantity. Examples include: buying the cheapest option, choosing the shortest path, or accepting the highest offer in an auction.
What are probabilistic choices?
Probabilistic choices are more complex and harder to predict because they are multidimensional and influenced by factors such as the diversity of preferences among decision-makers, unknown features and measurement errors.
What is the difference between perceived utility and systematic utility?
Perceived utility (Ui) is the true utility an individual associates with an alternative, while systematic utility (Vi) is the part of the utility that can be predicted using measurable attributes
What is the error term in choice models?
The error term (ε) represents the unknown portion of utility, accounting for taste variations, unobserved attributes, and measurement errors. In logit models, the error term is assumed to be Gumbel distributed
What is the limitation of Vista data and how can this be addressed?
Vista data usually only includes information on the attributes of the chosen mode, not on the attributes of alternatives not chosen. To address this, external data sources like Google Maps API and TripGo can be used to estimate travel times and other attributes for all possible modes for each trip, providing a more comprehensive picture of the choices faced by travelers.
What is Biogeme?
Biogeme is a software package designed to estimate the parameters of discrete choice models using maximum likelihood estimation. It takes the data and the specified utility functions as input and estimates the coefficients that best explain the observed choices.
xplain Maximum Likelihood Estimation (MLE).
- The method of MLE can be used to estimate the coefficients of a
model (utility function). - MLE executes a search among all possible values of coefficients
and finds the values that best explain the observed choices
(highest likelihood)