Data Analysis Flashcards
Where data is complied from
Data used in transport modelling is compiled from samples of the population
Sampling Methods
- Simple Random Sampling
- Stratified Random Sampling
Simple Random Sampling
Involved associating an identifier (number) to each unit in population, then selection numbers at random to obtain the sample
Stratified Random Sampling
Population subdivided into homogeneous strata and then random samples taken from each of these groups
Problem with simple random sampling
Far too large sample would be required to ensure sufficient data collected on minority groups
Types of errors that can be introduced in sampling
- Sampling Error
- Sampling Bias
Sampling Error
Error generated due to fact that sample is only proportion of population
Sampling Bias
Caused by mistakes made either
- when defining population of interest
- when selecting sample method
Equations
In lecture slide 6
Type of errors
- Errors in modelling and forecasting
- measurement errors
- sampling errors
- specification errors
- transfer errors
- aggregation errors
Errors in modelling and forecasting
ideal req is to find combo of model complexity and data accuracy which best fits required forecasting precision + study budget
measurement errors
survey questions badly interpreted, answered badly, coding errors, etc, can cause these
sampling errors
due to representation of population by finite data sets
equation in lecture 6
specification errors
arise where phenomenon being modeled is not well understood, eg. irrelevant variable included in model or relevant variable is omitted
transfer errors
arise if model is removed from one area to another
aggregation errors
typically in models, forecasting done for groups of individuals but data is compiled on basis of responses of individuals
type of info required by surveys
- infrastructure eg. road network, public transport network
- land use inventory eg. residential zones
- O-D travel surveys eg. traffic counts
- Socio-economic info eg. income, car ownership
questionnaire design
- keep qs simple + direct
- divide into several sections
roadside interviews
-better method of estimating trip matrices than home interviews as larger samples available
cordon surveys
provide useful info about external-external and external-internal trips
screen-line surveys
divide area into large natural zones eg. at both sides of river of motorway
travel diary surveys
- require similar but more detail to that of an O-D survey
- diaries distributed to members in a HH and each asked to complete diaries for all travel during day
stated preference surveys
where travelers evaluate and rank set of hypothetical options
longitudinal/time series collection metods
- repeated cross sectional survey
- similar measurements conducted on samples at diff times
- individuals may be included in more than one survey
panel survey
similar measurements made on same sample at diff times
cohort survey
some individuals included for only proportion of survey
problems
- panel surveys become unrepresentative as individuals age
- may omit phenomena eg. children leaving home
- typically higher rate of non-response
Accuracy
Overall estimate of errors present in measurements, including systematic effects. Set of observations considered accurate if mean of observations close to that of true value
Precision
represents repeatability of a measurement + is concerned only with random errors. Good precision is obtained from a set of observations closely grouped together with small deviations from mean of observations. A set of observations spread out widely have poor precision
Mean
Sum of all data points divided by number of data points
Standard deviation
measure of spread or dispersion of set of measurements. If small, measurements have good precision.
Standard deviation equation
lecture slides 4&5
Standard error of mean (SEM)
Standard deviation of mean.
Estimates variability between samples whereas standard deviation measures variability within a single sample
Differences between standard deviation and standard error of mean
- SD quantifies scatter: how much values vary from one another
- SEM quantifies how precisely you know true mean of population
- SEM, by definition, always smaller than SD
- SEM gets smaller as samples get larger, as mean of large sample is likely to be closer to true population mean than mean of small sample
Range
difference between lowest and highest values in dataset
Quartiles
where dataset is segmented into four equal segments
outliers
data point in data set that is much larger or smaller than all of the other data points in data set
what outliers can do
- skew mean, standard deviation, standard error
- can provide incorrect result
- can indicate incorrect data and point to a problem in data collection process
methods for checking for outliers
- plot data
- descriptive analysis (average, range, standard error, quartiles)
importance of transport planning
- crucial in planning sustainable developments + ensuring accessibility for all individuals
- design phase of all major public amenities require significant transport planning
- at planning stage of following amenities it is important: sporting venues (stadiums), retail parks, shopping centers, residential areas, industrial parks/commercial centers.
Transport Planning
- justify funding
- obtain planning permission
- environmental considerations
justify funding
detailed plan of how road/service will impact population needs to be conducted in justifying expenditure on new road/public transport service
obtain planning permissions
traffic impact assessment and transportation plan for new site important when large development being planned. These plans included in application for planning permission
environmental considerations
environmental considerations should be taken into account
Sustainable development
a socio-ecological process characterized by fulfillment of human needs while maintaining quality of natural environment indefinitely
key element in sustainable transport planning
-minimize distance individuals have to travel, and if longer distance travel necessary that good public transport links provided
CO2 emissions statistics
- Road transport accounts 21% of Irish CO2 emissions
- Road traffic rising 2% per year
- Global aviation growing at 5% per year
methods of transport planning
- transport impact assessment (TIA)
- traffic forecasting
transport impact analysis/assessment
study which assesses effects a particular development’s traffic will have on transportation network in community
traffic impact studies help communities to
- forecast additional traffic associated w/ new development
- determine improvements necessary to accommodate new dev
- assist in land use decision making
- assist allocating scarce resources to areas which need improvements
- identify potential problems w/ proposed development which may influence developer’s decision to pursue it
- allow community to assess impacts proposed development may have
why traffic forecasting is important
- plan future transport needs
- plan for congestion
- measure maintenance needed on road network
- plan for new large developments
what is traffic forecasting estimated on?
- population + job forecasts
- car ownership forecasts
- travel demand forecasts
- good vehicles forecasts
capacity of a road
max flow of vehicles, per hour or per day, for a road
types of data
- large scale data
- In-dept behaviour data
large scale data
lots of observations, but little info for each
eg. census travel to work/education, Irish Rail Census data
in-depth behaviour data
fewer observations, more detail for each
eg. trips Trinity students make during a college week
transport survey constraints
- can’t collect all the data in all the detail you want
- travel behavior tends to be complex
- data costs money, more in-depth data costs more money
- privacy and data protection issues
3 types of transport surveys
travel diaries
detection apps
survey
travel diary considerations
- sample considerations, who should take part
- can’t get everyone in college to take part, unrealistic, need diff students to get a good reflection of all students
- should try to be as representative as possible of overall population
- need a lot of info over prolonged timescale
- need easy way to capture, store, analyse data
travel diary
diary where people record what trips they took, how they traveled, how long it took, why they traveled, what mode they took, etc
travel diary advantages
- tend to be simple + easy to interpret
- dont require large amounts of digital literacy
-travel diary disadvantages
- participants may forget to input info or put it in later
- estimates may not be accurate (travel time, distance, etc)
- not able to gain more complex data (routes taken, modes available, etc)
detection apps
smartphone applications that automatically record trips
gps apps advantages
- huge data collection potential
- automatic detection
- graphic + route specific outputs (maps etc)
gps apps disadvantages
- not everyone has smartphone (65+ etc)
- issues such as battery use + canyon effects
transport surveys
widely used to gain info about how people act/will act
transport surveys advantages
- can get large no. of responses
- can present hypothetical scenarios
- relatively cheap to do
- can ask large no. of questions + get large no. of info
transport surveys disadvantages
- non-representative samples can bias results
- have to assume respondents are reading all questions + answering honestly
- have to make sure they understand what you are asking them