Population
Creario provides a population with a set of initial plans. The agents in the population are assigned several attributes. Currently, these are
- home location
- age
- gender
- land-use type of the home location
- public transport quality class (if a schedule is available)
- distance to the next train station
- vehicle ownership
These attributes are subsequently used to classify the agents into groups with homogeneous transport patterns. The initial plans - derived from survey data - are then assign depending on these person groups.
Scope
The MATSim population generated by Creario is a representation of the residents of the study area defined by the user. The agents can choose activity locations outside the study area (source traffic) but agents living outside the study area and travelling into the study area (destination traffic) are currently not included, neither is through traffic. In order to obtain more realistic travel diaries the study area is surrounded by a buffer zone where agents can perform activities. The buffer is currently set to 35 km. This value was selected because over 90% of trips in the surveys used in the generation of plans are shorter than 35km.
Input data
Analogously to the network, the default input data for Creario has to be publicly available worldwide. However, since it is more likely, that users have access to additional data, such as population statistics or age distributions, Creario offers users the opportunity to upload additional data that is then used by Creario to achieve a more accurate population. However, Creario can also be used without input data by the user.
Provided input data
The population generation is based on two main data sources:
- the Global Human Settlement Layer (GHSL)
- the population data provided by Worldpop
Global Human Settlement Layer (GHSL)
The GHSL is published by the European Commission and offers different types of layers. For the population generation, the layer GHS-POP for the base year 2020 is used. The population data is provided in 100m raster cells and is based on the raw global census data harmonised by CIESIN for the Gridded Population of the World, version 4.11 at polygon level. This data was disaggregated using the GHS-BUILT-S layer (see Section Location Class). The dataset provides the number of inhabitants per raster cell but no further differentiation by age, gender etc.
Worldpop
The population data by Worldpop is also provided for the base year 2020 and as raster cell data. It has a resolution of 3 arc seconds and is based on several population censuses, the Built-Settlement Growth Model (BSGM) and Geospatial covariates representing factors related to population distribution. This population data is differentiated by age and gender.
A comparison of both datasets with official population statistics showed that the GHSL data is more accurate regarding the number of persones per municipality and better at depicting the spatial distribution within each municipality. But since Worldpop offers a differentiation by age and gender, the two datasets are combined:
- the number of persons per raster cell is taken from GHSL
- for each GHSL raster cell with residents (because there are also empty ones) the (combined) age and gender distribution is calculated from Worldpop
However, it has to be noted that the GHSL-POP dataset has problems with coverage accuracy. An extreme example for this can be seen in Figure 1 for the eastern area of Lake Zurich. As a result, the overall number of agents for the affected municipalities is correct, but the spatial distribution of those agents is skewed. We are planning to correct this in the future taking into account different data, but currently is something the user has to be aware of.
Figure 1: Example for Coverage Issues in GHSL-POP
User input
As stated above, if the user has more accurate population data for the study area, this can be integrated in the population generation provided by Creario. In that case, the user has to provide at least one shape file indicating the boundaries of the statistical units for which the additional data is available. The population statistics can be provided in the same shape file or in a separate csv file (see the subsections below for more specific requirements). If additional data regarding the number of residents per statistical unit is provided by the user, the GHSL raster data is fitted to match these numbers. If additional data regarding age and/or gender distribution is provided, this is used instead of the distributions derived from Worldpop.
The user can provide several datasets, e.g. for different parts of the study area or to provide smaller-scale data for a certain part of the study area. Each dataset has to be given a unique id. If more than one dataset is provided, the user has to assign priorities with regard to the total number of residents, the age distribution and the gender distribution, respectively. Different datasets can have different priorities per aspects as demonstrated in the following example.
Example for user data prioritisation
Dataset 1:
- coverage: entire study area incl. main city
- resolution: municipality
- population data with age and gender distribution
Dataset 2:
- coverage: main city within the study area
- resolution: neighbourhood
- population data with age distribution that is more detailed than in dataset 1 but without gender
User priorities
Attribute | Priority dataset 1 | Priority dataset 2 |
---|---|---|
Total number of residents | 2 | 1 |
Age | 2 | 1 |
Gender | 1 | 99 |
Result
The overall number of residents is fitted to dataset 2 for the main city (where both datasets provide data) and to dataset 1 for all other municipalities. Analogously, the age distribution for the main city is taken from dataset 2 and for all other municipalities from dataset 1. However, the gender distribution of dataset 1 is used for the entire study area.
Requirements age and gender information
- Age and gender information has to be provided as number of residents per person group.
- Age and gender information can be provided separately or combined (see example tables).
Example for separate age and gender information
Spatial entity | Residents | Female | Male | 0To19 | 20To39 | 40To59 | 60To100 |
---|---|---|---|---|---|---|---|
Municipality A | 380 | 198 | 182 | 76 | 61 | 133 | 110 |
Municipality B | 763 | 366 | 397 | 145 | 160 | 149 | 309 |
Municipality C | 1500 | 765 | 735 | 330 | 420 | 300 | 450 |
Example for combined age and gender information
Spatial entity | Residents | Female_0To19 | Female_20To39 | Female_40To59 | Female_60To100 | Male_0To19 | Male_20To39 | Male_40To59 | Male_60To100 |
---|---|---|---|---|---|---|---|---|---|
Municipality A | 380 | 40 | 32 | 69 | 57 | 36 | 29 | 64 | 53 |
Municipality B | 763 | 70 | 77 | 71 | 148 | 75 | 83 | 78 | 161 |
Municipality C | 1500 | 168 | 214 | 153 | 230 | 162 | 206 | 133 | 220 |
Requirements for the shape file
- Per dataset only one shape file can be used.
- It has to contain the boundaries of all spatial entities (e.g. municipalities or neighbourhoods).
- It has to contain an id for each spatial entity and this id has to be consistent with additional csv data. The user has to specify, which attribute contains the id.
- One spatial entity can include more than one polygon, associated polygons must have the same id.
- The name of the spatial entity is optional. If there is a name, the user has to specify which attribute contains the name.
- If the shape file contains the total number of residents, the user has to specify, which attribute contains this information.
- If the shape file contains age and/or gender information, the user has to specify which data is provided and which attributes contain the respective information.
Requirements for a csv file
- The data has to be provided in a csv or similar text file. The delimiter/field separator can be specified by user.
- The user has to specify which column contains the id.
- The id of the spatial entities in the csv have must be consistent with the ids in the shape file.
- If the csv file contains the total number of residents, the user has to specify, which column contains the number of residents.
- If the csv file contains age and/or gender information, the user has to specify which data is provided and which columns contain the respective information.
Overview Processing
The generation and attribution of agents is done in four main phases.
- Initialisation
- Iterative Fitting
- Assign agent attributes
- Assign initial plans
Initialisation
During the initialisation phase the input data is read and an initial set of agents is generated. This includes the following steps:
- If available, read polygons and population statistics data and create a SpatialEntity object for each spatial entity.
- Read raster data from GHSL, assign raster data to spatial entities. Per dataset, each raster cell is assigned to one spatial entity based on the center point of the raster cell. However, a raster cell can be assigned to several spatial entities from overlapping datasets.
- Based on the priorities given by the user, age and gender distributions from user input or Worldpop data are assigned to each raster cell.
- Generate an initial set of agents without any attributes besides home location, with the number of agents per raster cell corresponding to the number of residents in the GHSL dataset.
Iterative fitting
The goal of the iterative fitting phase is to correct the number of agents per raster cell based on the statistical data provided by the user, taking into account that the number of agents for a certain area can differ between datasets.
Differences between datasets are balanced through the iterative fitting. For this, the algorithm iterates through all datasets (starting with the dataset with the lowest priority) and
through all spatial entities in that dataset and compares number of agents per spatial entity with the number of residents according to statistics.
If there is a difference, the ratio between the number of agents currently assigned to the spatial entity and the number of residents according to statistics is calculated.
Based on this ratio and the number of agents currently assigned to a raster cell, a new target number of agents per raster cell is calculated and agents are added or deleted (randomly) until this target number is reached.
Iterations are stopped, if for all spatial entities the relative difference between agents and the number of residents according to statistics is less than or equal to 1% or if 25 iterations have been completed.
Note: If different datasets are used, the user is advised to check if they are overall consistent. The iterative fitting can balance smaller differences in the number of residents for the same raster cell, but might oscillate and not converge if the differences are too big.
Assign agent attributes
Currently, each agent is attributed with
- home location
- age
- gender
- land-use type of the home location
- public transport quality class
- distance public transport stops
- vehicle ownership.
Further attributes will be added in the future such as income or behaviour type.
Home location
The home location is assigned during the agent generation. If the scenario includes facilities, each agent is assigned a home-facility within its raster cell taking into account the weights for home ActivityOptions (see Section Weight for home ActivityOptions). If no home-facilities are associated with the raster cell, a new home-facility is generated for each agent with a weight of 1. If the scenario does not include facilities, the home location is represented by an x,y-coordinate that is chosen randomly within the raster cell.
Age
Each agent is assigned a specific age. The age is randomly drawn from the age distribution with the highest priority available for the agent’s raster cell. Within an age group, the age is randomly assigned.
Gender
Currently, gender only differentiates between male and female since this is what most datasets provide. If a combined age and gender distribution is used, the gender is drawn dependent on age, otherwise gender is independently drawn from the gender distribution for the raster cell.
Land-use type
The land-use type of the agent’s home location will later on be used to differentiate between different patterns in travel behaviour. In general, the classification here follows Eurostat’s methodolgy to calculate the Degree of Urbanisation (DEGURBA) with a few modifications. The reason for these modifications is that Eurostat uses raster cells of 1 km² while the raster cells in Creario comprise an area of only 0.01 km². In addition, it is important to note, that only the population within the study area is used for the classification. This might lead to some border effects, e.g. when a small town or a village belongs to an agglomeration that is just outside the study area. The user is advised to take this into account when choosing the study area.
Processing
The classification procedure comprises the following three steps:
- Classify raster cells according to population density
- Determine population density clusters
- Classify land-use typ of agent home location
Step 1: Classify raster cells according to population density For each raster cell, the population density is calculated. The raster cell is then classified into one of three classes according to the population density threshold outlined in Table 1.
density class | Population density [Inh/km²] |
---|---|
1 (high density) | ≥ 1500 |
2 (medium density) | 300-1499 |
3 (low density) | < 300 |
Table 1: Population Density Thresholds
Step 2: Determine population density clusters
The population density classes are then used to find two types of clusters: urban centres and suburban clusters. Urban centres are determined using raster cells with density class 1 (high density). Raster cells that are non-diagonally contiguous (i.e. excluding raster cells that only touch at the corner) are grouped into a cluster. Gaps within a cluster are filled by iteratively adding raster cells to the cluster if the raster cells are surrounded by 5 or more raster cells belonging to the same cluster. If a cluster has a population of at least 50’000 inhabitants after gap filling it is considered an urban centre.
For suburban clusters raster cells with density class 1 (high density) that are not part of an urban centre and raster cells with density class 2 (medium density) are eligible. Raster cells that share a common border, including raster cells that only touch diagonally at corners, are grouped to a cluster. If a cluster has a population of at least 5’000 inhabitants after gap filling, it is considered a suburban cluster. Note that in the Eurostat methodology, the gap filling is only done for urban centre clusters and not for suburban clusters, but due to the higher resolution, it was decided that in Creario, the gaps in suburban clusters are filled in as well.
The result of the clustering for the Canton of Zurich can be seen in Figure 2. Raster cells belonging to urban centres are marked in red and raster cells belonging to suburban clusters in green. There are two distinct urban clusters. The first one to the east mainly consists of the densely populated parts of the city of Winterthur. The second one is much larger and not only contains the city of Zurich, but a lot of neighbouring municipalities where the settlement areas have grown together.
Figure 2: Population Clusters Canton of Zurich
Step 3: Classify land-use typ of agent home location Finally, the agents are assigned a land-use type based on the raster cells of their home location as outlined in Table 2.
Land-use class home location | Rule |
---|---|
1: urban | Home location raster cell belongs to an urban centre cluster. |
2: suburban | Home location raster cell belongs to a suburban cluster. |
3: rural | Home location raster cell does not belong to a cluster. |
Table 2: Population density thresholds
Public transport quality class (PTQC)
The accessibility by public transport can be measured in a variety of ways. An indicator that has proven itself in practice in Switzerland is the so-called public transport quality class (PTQC) (or ÖV-Güteklasse). The calculation method for PTQC indicator in this application follows largely the methodology described in ARE (2022) First, all public transport stops are categorised according to the transport modes serving the stop and the course interval per transport mode. Then, the public transport quality class can be calculated for any location of interest taking into account the distance to the next public transport stops and the aforementioned stop category. In this application, the PTQC is calculated for each agent individually based on the coordinate of their home location.
Input data
For the calculation of the PTQC a full public transport schedule in the GTFS (General Transit Feed Specification) form is required. If no schedule is available for the study area, this indicator cannot be calculated. Instead, the user is advised to use the distance to public transport stops indicators described in the next section. The GTFS schedule is converted into a MATSim transit schedule for the region including all lines that have at least one stop in the study area as described in Section Public transport. The subsequent calculations are done on the level of StopAreas.
Calculate StopArea categories
Each StopArea is assigned a StopArea category based on transport modes and course intervals. Transport modes are categorised into the three following groups:
Mode Group | Included transport modes |
---|---|
A | Rail |
B | Tram, subway, bus, ferry |
C | Funicular, cable car |
Table 3: Transport Mode Groups for Public Transport Quality Class Calculation
For the calculation of the course interval a work day outside school holidays and high tourist season is used. For each mode group, all departures for lines of that mode group from the StopArea between 06:00 and 20:00 are counted. Then, except for stops at the end of a line and stops with departures in just one direction, the departures are divided by two to obtain the number of departures per direction. The course interval per mode group is subsequently calculated by dividing 840 minutes by the number of departures.
The StopArea category is then assigned according to the following table. If more than one StopArea category is applicable, the highest ranking one is chosen.
Course interval | Mode group A | Mode group B | Mode group C |
---|---|---|---|
< 5 min | I | II | V |
5 - 10 min | II | III | V |
10 - 20 min | II | IV | V |
20 - 40 min | III | V | V |
40 - 60 min | IV | V | V |
≥ 60 min | V | V | V |
Table 4: StopArea Categories
Determine public transport quality class (PTQC)
The PTQC is calculated for each agent according to the rules in the following table.
StopArea category | < 300m | 300 - 500m | 500-750m | 750-1000m |
---|---|---|---|---|
I | A | A | B | C |
II | A | B | C | D |
III | B | C | D | none |
IV | C | D | none | none |
V | D | none | none | none |
Table 5: Public Transport Quality Class (PTQC)
The distance is calculated as the crow-fly distance between the home location of the agent and the geographical center point of the StopArea.
The result for the public transport quality class for a section of the Canton of Zurich can be seen in Figure 3. As expected, PTQC A is found in the major cities and around the train stations including S-Bahn stations with PTQC areas B and C radiating out from these centers. In addition, there are some smaller clusters with a PTQC B in the centre. Overall, the PTCQ level is high as can be expected in the Canton of Zurich.
Figure 3: Public Transport Quality Classes in the Canton of Zurich
Distance to public transport stops
Another indicator for accessibility by public transport is the distance to public transport stops for different modes. It is less comprehensive but also requires less input data, i.e. can also be calculated when no public transport schedule is available. Currently, the distance to four types of public transport stops are determined:
- train station
- metro station
- tram stop
- bus stop
Input data
The calculation of the distance to the next public transport stop is either based on GTFS data as described in Section Public transport quality class (PTQC) or based on OSM stops, which are more widely available. The indicator takes into account the location of each stop in OSM as well as type of stop as indicated in the “railway” and “highway” tags.
Processing
The distance to public transport stops is calculated for each agent based on the crow-fly distance from their home location. For each of the four distance classes outlined in Table 6, it is determined which type of public transport stop is available within the distance interval. Then for each type of stop, the minimum distance class is assigned.
Distance interval | Assigned value |
---|---|
0 - 1000m | 0.5 |
1000 - 2000m | 1.5 |
≥ 2000m | 3.5 |
Table 6: Distance Coding for Public Transport Stops
In Figure 4 the results for train stations, tram and bus stops are shown for the same section of the Canton of Zurich as in Figure 3. The combination of the three indicators shows the structure of the public transport network in the area. While trams only serve the city of Zurich and a few neighbouring municipalities, bus stops can be found in pretty much every village within the canton. Train stations, on the other hand, provide only selective coverage.
Figure 4: Distance to Train Stations (left), Tram Stops (middle) and Bus Stops (right) Canton of Zurich
Vehicle ownership
In order to determine the agents’ vehicle ownership, a two-step approach is used. First, the vehicle ownership rate (number of vehicles per inhabitant) is determined for each raster cell. Then, the vehicles are randomly assigned to the agents old enough to own a car in a way that the sum of vehicles meets the vehicle ownership rate. A linear regression model for vehicle rates was estimated based on UK census data and additional spatial characteristics.
Input data
The data used to estimate the vehicle rate model originates from UK census maps for England and Wales. These maps provide data for a large variety of topics. In this application the following datasets are used:
- number of vehicles
- number of inhabitants
- gender
- age distribution
All datasets except the age distribution are provided on the level of so-called output areas, the age distribution is provided based on middle layer super output areas.
The model is estimated on the level of raster cells.
Therefore, in a first step all variables on output area and middle layer super output area level are assigned to the raster cells within.
The vehicle rate is calculated by dividing the number of vehicles in a spatial area by the number of inhabitants.
Then, the following spatial characteristics are calculated using the algorithms described above:
- land-use type
- distance to train
- distance to tram
Vehicle rate model
Different linear regression models for vehicle rates were tested and evaluated according to model fit and - more importantly - prediction quality. The parameters of the best model are presented in Table 7.
Variable | Parameter | std. error | t-value |
---|---|---|---|
Constant | -1.2681 | 0.0076 | -167 |
PopDensityClass | -0.0486 | 0.0002 | -223 |
LandUseType | 0.0309 | 0.0003 | 94 |
ShareFemales | 0.5109 | 0.0034 | 149 |
DistToTrain | 0.0149 | 0.0002 | 88 |
DistToTram | 0.0475 | 0.0004 | 111 |
AverageAge | 0.0377 | 0.0002 | 202 |
ShareUnder18 | 0.4462 | 0.0070 | 64 |
Share70Up | -1.3237 | 0.0111 | -119 |
Table 7: Model Estimation Results Vehicle Rate Model
The population density is included in the model using the classes outlined in Table 8.
Population density [p/km²] | PopDensityClass |
---|---|
< 25 | 1 |
25 - 300 | 2 |
300 - 1500 | 3 |
≥ 1500m | 4 |
Table 8: Population Density Classes for Vehicle Rate Model
Three age variables are derived from the age distribution provided in the census maps: the average age, the share of minors (i.e. under 18 year olds) and the share of seniors (here 70 years or older).
All model parameters have the expected signs and are significant. Factors for decreasing vehicle rates are increasing population density and increasing share of seniors. Factors for increasing vehicle rates are higher average age, higher share of females or under 18 year olds, more rural areas and higher distances to tram stops or train stations. The constant is context-specific and should be adapted to account for different base level of vehicle rates in different countries / study areas.
Application to scenario population
First, the vehicle rate model is used to calculate the vehicle rate for each raster cell in the study area. Then, vehicles are assigned randomly to agents that are at least of minimum driving age as specified by the user (default set to 18 years) until the required number of vehicles per raster cell is met.
Assign initial plans
Finally, each agent is assigned an initial plan to start the MATSim optimisation process. These initial plans are based on travel diaries derived from a corresponding survey. The survey data contains a travel diary for a day including arrival and departure times at activities, activity types, trip modes, trip distances etc. Based on personal and spatial characteristics, the study participants are classified into different groups. For each group, all travel diaries in the survey are collected. Each agent is assigned one of the travel diaries based on the group they belong to. The travel diary is then transformed into a MATSim plan. The number and oder of activities, including activity type (home, work, shopping, etc.), and the trips, including modes, are directly taken from the travel diary. Arrival and departure times are randomised in an interval around the original times. Then, locations are assigned to each activity, first for primary activities (home, work, education) and then for secondary activities.
Input data
The travel diaries are taken from household travel surveys. Currently, the user can choose between the following travel surveys:
Additional travel surveys will be added in the future. On request, it is also possible to integrate a travel survey provided by the user. Please contact us for more information.
In addition to the diaries, the datasets include personal and household characteristics of the participant such as age, gender, vehicle ownership etc. The travel diaries are filtered to only include diaries that
- occur on weekdays and outside of school holidays
- start and end at home
- have between three and seven activities
- are completed within 24 hours
- do not contain trips of less than 60 seconds
Activity types taken into account are
- home
- work
- education
- shopping
- leisure
Modes taken into account are
- car
- public transport
- bike
- walk
- other
Harmonise sub-tour modes
One relevant concept in MATSim are so-called sub-tours. A sub-tour is a sequence of activities and trips that start and end at the same location, usually the home or work location. For example, the daily plan home-work-home-leisure-home contains two home-based sub-tours: 1) home-work-home and 2) home-leisure-home. A sub-tour can also be nestled within another sub-tour (e.g. the daily plan home-work-leisure-work-home also contains two sub-tours: 1) home-work-[…]-work-home and 2) work-leisure-work. Depending on which modules are used in the MATSim scenario, the replanning assumes that the modes within a sub-tour are consistent. This pertains especially to sub-tours including car trips. Therefore, the daily plans derived from the survey data are divided into sub-tours and if a sub-tour contains at least one car trip, all modes within the sub-tour are set to car.
Classify agents
The survey participants are classified taking into account the attributes detailed in Table 9.
Attribute | Classes |
---|---|
Gender | male, female |
Age | ≤ 16, 17-29, ≥30 |
Vehicle | yes, no |
Land-use type home location | urban, suburban, rural |
Distance to train | <1km, 1-2km, ≥2km |
Table 9: Attributes for the Classification of Travel Survey Participants
Since only a small share of rural locations are closer than 2km to the next train station, the differentiation regarding the attribute “Distance to train” is only done for urban and suburban participants.
Assign plan
For each agent, a travel diary is randomly selected from the set of travel diaries assigned to the corresponding agent class. Then, a MATSim is created adopting the following characteristics from the travel diary:
- number and order of activities incl. activity types
- trips incl. modes and distances
- arrival and departure times
- distance to school / work
Randomise departure time of first trip
Since the number of agents is usually much larger than the number of participants in a household travel survey, adopting all arrival and departure times unchanged would lead to too little variation in departure times potentially resulting in unrealistic peaks in the departure time distribution. Therefore, the departure time of the first trip of a plan is adapted by randomly drawing a departure time in a ± 10 minutes interval around the original departure time in the travel diary. All subsequent arrival and departure times are shifted accordingly, keeping trip and activity durations unchanged, except for the duration of the first and last home activities, that are adapted to accommodate the shift.
Randomise activity durations
In addition to the departure time of the first trip, the durations of the subsequent activities are randomised as well. This is done by randomly drawing a new duration in a ± 10 minutes interval of the original activity duration. Exempt from this are activities with a duration of 15 minutes or less and the first and last activity. Once an activity duration is adapted, the corresponding shift is carried over to the departure and arrival time of the next trip and activity. The duration of the last activity is then adjusted to accommodate the sum of all duration shifts in the plan.
Assign primary locations
The next step is to assign locations, i.e. coordinates and - if applicable - facilities to the activities.
A different approach is used for primary and secondary activities.
Activities are considered primary activities, if their location is determined by long term decisions and cannot be chosen on a daily basis.
In this context, home, work and education are considered primary activities and shopping, leisure and other are considered secondary activities.
The home location of an agent was already determined in the population generation. The location for work and education activities is obtained by a weighted random draw from raster cells within a certain distance around the home location. The weight is based on the attractiveness of the raster cells. The distance is obtained from the survey data with a buffer of ± 1km.
If the scenario includes facilities, the activity is randomly assigned a facility the with the corresponding ActivityOption within raster cell taking into account the weights described in Sections Weight for home ActivityOptions and Weight other ActivityOptions. If the scenario does not include facilities, the activity location is represented by an x,y-coordinate that is chosen randomly within the raster cell.
Distance to work or school
In some household travel surveys, the crow-fly distance to the participant’s work place and/or school is provided with the survey data. If this information is not available and the first school or work trip starts from home, the crow-fly distance of this trip is used.
If neither is available, the distance is drawn from the distance distributions for work and eduction trips in the travel survey using the following two-step procedure. First, the distance interval is chosen according to the probabilities provided in Table 10 or Table 11. Then, a random value within the distance interval is drawn.
Distance | Share [%] |
---|---|
< 5km | 33 |
5-10 km | 28 |
10-50 km | 35 |
≥ 50 km | 4 |
Table 10: Distance Distribution Work Trips/p>
Distance [km] | Share [%] |
---|---|
< 1 | 38 |
1-2 | 16 |
2-6 | 23 |
≥ 6 | 23 |
Table 11: Distance Distribution Education Trips
Attractiveness raster cells for primary activities
The attractiveness of a location is calculated individually for each activity type. It is assigned to the raster cells in the study area.
The attractiveness of a raster cell regarding work locations is a combination of population density and the existence of non-residential buildings as specified in the following equation:
For the variable the population density is classified as specified in Table 12.
Population density [p/km²] | PopDensityClass |
---|---|
< 2’000 | 1 |
2’000-4’000 | 2 |
4’000-6’000 | 3 |
6’000-8’000 | 4 |
8’000-10’000 | 5 |
≥ 10’000 | 6 |
Table 12: Population Density Classes for Workplace Attractiveness Calculation
The variable is derived from the GHSL layer GHS_BUILT_C. The layer contains settlement characteristics in terms of the morphology of the built environment and the functional use. It differentiates between residential and non-residential buildings of different heights. For work place attractiveness, non-residential buildings are the focus as detailed in Table 13.
Building height [m] | GHS_BUILT_C code | NonResBuildingClass |
---|---|---|
≤ 3 | 21 | 1 |
3-6 | 22 | 2 |
6-15 | 23 | 3 |
15-30 | 24 | 4 |
≥ 30 | 25 | 5 |
Table 13: Non-Residential Building Classes for Workplace Attractiveness Calculation
The attractiveness for education activities is currently based solely on (unclassified) population density class as defined in Table 12. This will be improved once more suitable data can be integrated.
Assign secondary locations
The location of secondary activities is determined based on time geography as described in [Horni A., D.M. Scott, M. Balmer and K.W. Axhausen (2009), Location Choice Modeling for Shopping and Leisure Activities with MATSim - Combining Microsimulation and Time Geography, Transportation Research Records (TRR), No. 2135, pp. 87–95.]. First, the activity chain of a plan is divided into sub-chains that start and end with a primary activity and have only secondary activities in between. For example, the activity chain home-work-leisure-work-shopping-leisure-home is divided into three sub-chains: 1) home-work, 2) work-leisure-work and 3) work-shopping-leisure-home. The sub-chains 2) and 3) contain secondary activities, for which the location has to be determined.
Figure 5: Illustration of the Time Geography Approach
The time geography approach is illustrated in Figure 5. It is used to constrain the search space to suitable locations for the secondary activity. To do this, the primary activities at the beginning and the end of the sub-chain are defined as anchor points with a fixed location and departure and arrival times. Then, a centre point (green dot) between the two anchor points (red dots) is determined and a circle (dashed green circle) is drawn around the centre point. The area in the dashed green circle is called the potential path area (PPA) and the location of the secondary activity lies within this PPA. The size of the PPA depends on the available travel time budget. The travel time budget is calculated as the difference between the arrival time at the second primary activity and the departure time of the first primary activity minus the duration of all secondary activities in between.
Two key elements in implementing the time geography approach are how to determine the centre point and the radius of the PPA. In the implementation described by [Horni A., D.M. Scott, M. Balmer and K.W. Axhausen (2009), Location Choice Modeling for Shopping and Leisure Activities with MATSim - Combining Microsimulation and Time Geography, Transportation Research Records (TRR), No. 2135, pp. 87–95.], the centre point is determined as the geographical centre between the two anchor points and the radius of PPA is calculated by multiplying half of the travel time budget with the typical speed in the study area, here set to 25.3 km/h. Using the geographical centre only works well, if the scenario is set in an area without large separating geographic features such as a lake, strong elevation differences or large (transport) infrastructure that can only be crossed at select places. Moreover, it does not account for the fact, that locations for secondary activities are often chosen in the vicinity of the primary activities and not halfway between their locations. Thus, the centre point in Creario is chosen randomly along the least-cost path between the anchor points. Only if at least one of the anchor points is outside the study area and the least-cost path cannot be calculated completely, the geographical centre is used. Calculating the radius based on travel time budget and a typical speed can lead to issues if the plans contain several modes, even more so, if the sub-chains contain mixed modes. Thus, in this implementation, the radius is based on a distance budget. The distance budget is the sum of the reported crow-fly distances of the trips within the sub-chain and the radius is calculated by halving this distance budget.
Figure 6: Time Geography - Location Selection and Routing
Then, a raster cell within the PPA is selected randomly (pink dot in Figure 6) taking into account the attractiveness of the raster cell for the activity type of the secondary activity. If the scenario does not include facilities, the activity location within the raster cell is chosen randomly. If the scenario includes facilities, the facility is selected taking into account the weights described in Sections Weight for home ActivityOptions and Weight other ActivityOptions. To ensure that the resulting trip is realistic, the nearest nodes in the car network are determined for the secondary activity and the two anchor points and then the least-cost route is calculated from the first anchor point to the secondary activity (blue route) and from the secondary activity to the second anchor point (orange route). If the sum of travel times for both routes is smaller or equal to the travel time budget, the secondary activity location is considered valid.
Figure 7: Time Geography - Recursive Search
If there is more than one secondary activity in the sub-chain, the process is repeated for the next secondary activity with the newly fixed secondary activity (pink dot) as one of the two anchor points, as illustrated in Figure 7. The size of the PPA reflects the remaining distance budget and the location of the next secondary activity (purple dot) is drawn within the new smaller PPA. The validity of the locations is, however, checked by comparing the travel times along all trips (blue, yellow and orange) with the overall travel time budget. If the sum of trip travel times exceeds the travel time budget, the locations of all secondary activities within the sub-chain are recalculated. If the travel time budget is exceeded repeatedly, the centre point of PPA is shifted randomly along the route between the two anchor points. In order to ensure that the shifted centre point is not too close to already tested ones, the least-cost path is segmented into ten length intervals. A new centre point can only be selected from a length interval that has not been tested yet. If the travel time exceeds the travel time budget even after testing all length intervals, the travel time budget is increased. This is iteratively repeated until a location is found.
There are a few special cases, where the approached described above has to be adapted.
- The travel time budget is insufficient for the least-cost path between the primary activities.
- At least one of the primary activity locations is outside the network area.
- The two primary locations of a sub-chain are the same (e.g. home-shopping-home), i.e. the sub-chain is a round trip.
Insufficient travel time budget
If the duration of the least-cost path between the two primary locations is longer then the travel time budget derived from activity arrival and departure times,
there can be no valid secondary location choice. If this is the case, the travel time budget is increased by 25% until it is larger than the duration of the least-cost path between the two primary locations.
This will inevitably lead to delays in the plan when it is executed in the MATSim simulation.
However, there are very few cases where this occurs and the MATSim simulation will correct these over the iterations.
Locations outside network area
Since the network is only generated for a 35km buffer zone around the study area and the survey diaries contain trips longer than 35km, some primary activity locations may lie outside the network area.
This is realistic and often called “external travel” in transportation modelling.
For trips to these activity locations no least-cost path can be calculated.
Therefore, the validation against the travel time budget is simplified.
The travel time of the affected trips is estimated by dividing the crow-fly distance between the locations by the “typical travel speed”.
Round trip
If the two anchor points of a sub-chain are the same, the time-geography approach cannot be used.
In this case, the location of the first secondary activity in the sub-chain is determined using the approach for primary activities as described above.
If there is more than one secondary activity, the location thus determined for the first secondary activity is then the new anchor point and the time geography approach is used for all subsequent secondary activities.
Attractiveness raster cells for secondary activities
The approach for calculating the attractiveness of a raster cells for shopping and leisure activities depends on whether the scenario includes facilities or not.
If the scenario includes facilities, the attractiveness of a raster cell for either activity type equals the sum of the ActivityOption weights (see Section Weight other ActivityOptions) of the facilities within the raster cell accommodating that activity type.
If the scenario does not include facilities, the attractiveness for shopping and leisure activities is calculated using the following equations:
Since there are so few facilities for other activities, the attractiveness is calculated regardless of facilities with the following equation.
Assign link
The last step in the generation of initial plans is assigning a network link to each activity. Since MATSim currently requires all activities to occur on a car-accessible link, only the car network is used for this. Another stipulation from the MATSim simulation is, that activity links cannot be sinks or sources since agents have to be able to arrive at the location and depart again. Due to the network filtering and cleaning, the network might contain some sinks or sources. Therefore, connector links are added to the car-only network with the same process as described in Section Network cleaning, simplification and thinning, but with a minimum speed of 50 km/h. Since, as discussed above, some activity locations will lie outside the study area and thus, the network boundaries, different approaches are used for locations within the study area and locations outside the study area.
Within the study area
The first step to assigning a link to activities within the study area is to filter out all network links, that are not suitable as activity locations. These are:
- motorways, motorway access roads (motorway_link) and connector roads
- trunk roads, trunk access roads (trunk_link) and connector roads
- tunnels
- bridges Then, the link closest to the activity location on the filtered network is determined and assigned as the activity link.
Outside the study area
The basic approach for activities outside the study area is similar, but with a few alterations because trips from outside the study area usually enter the network on main roads such as motorways, trunks or primary roads. Therefore, the filter conditions for suitable network links are different. Filtered out are lower level road types such as residential roads, service roads or living streets. In addition, to avoid assigning external activities to the wrong side of a motorway and generate long detours, motorways are also removed from the network of suitable links. Motorway connector links as described in Network cleaning, simplification and thinning remain for trips that enter the network via motorway.
The second step is then again to find the link closest to the activity location. However, to further emphasise that external traffic is more likely to enter the network on higher priority roads, the distance to the location is not the only attribute considered but also the road type. Therefore, all links connected to nodes that are within 500 m of the node closest to the activity location are collected and a link score is calculated for each of these links using the following equation:
where
score of link
difference in the distance between the activity location and link and in the distance between the activity location and the nearest link
weight for difference in distance, here 0.01
road type weight of link i
The is assigned as specified in Table 14. The link with the lowest score is then selected.
Road type | RoadTypeWeight |
---|---|
motorway | 1 |
motorway_link | 2 |
connectorMW" | 1 |
trunk | 2 |
trunk_link | 3 |
connector | 4 |
primary | 4 |
primary_link | 5 |
secondary | 5 |
secondary_link | 5 |
tertiary | 5 |
other | 10 |
Table 14: Road Type Weights for Link Assignment
Figure 8 illustrates the procedure. The red dot represents the activity location outside study area, the orange links are motorways, the pink links motorway connectors and the green links primary roads. Link a) is the closest link, but it is a primary road. The score of link a is: 0 m * 0.01 + 4 = 4. The difference in distance between link a) and link b) is 200 m. Thus, the score of link b) is 200m * 0.01 + 1 = 3. Thus, connector link b) is selected over primary road link a).
Figure 8: Illustration of Nearest Link Selection Weighted by Road Type