Population

Creario provides a population with a set of initial plans. The agents in the population are assigned several attributes. Currently, these are

  • home location
  • age
  • gender
  • land-use type of the home location
  • public transport quality class (if a schedule is available)
  • distance to the next train station
  • vehicle ownership

These attributes are subsequently used to classify the agents into groups with homogeneous transport patterns. The initial plans - derived from survey data - are then assign depending on these person groups.

Scope

The MATSim population generated by Creario is a representation of the residents of the study area defined by the user. The agents can choose activity locations outside the study area (source traffic) but agents living outside the study area and travelling into the study area (destination traffic) are currently not included, neither is through traffic. In order to obtain more realistic travel diaries the study area is surrounded by a buffer zone where agents can perform activities. The buffer is currently set to 35 km. This value was selected because over 90% of trips in the surveys used in the generation of plans are shorter than 35km.

Input data

Analogously to the network, the default input data for Creario has to be publicly available worldwide. However, since it is more likely, that users have access to additional data, such as population statistics or age distributions, Creario offers users the opportunity to upload additional data that is then used by Creario to achieve a more accurate population. However, Creario can also be used without input data by the user.

Provided input data

The population generation is based on two main data sources:

  • the Global Human Settlement Layer (GHSL)
  • the population data provided by Worldpop

Global Human Settlement Layer (GHSL)

The GHSL is published by the European Commission and offers different types of layers. For the population generation, the layer GHS-POP for the base year 2020 is used. The population data is provided in 100m raster cells and is based on the raw global census data harmonised by CIESIN for the Gridded Population of the World, version 4.11 at polygon level. This data was disaggregated using the GHS-BUILT-S layer (see Section Location Class). The dataset provides the number of inhabitants per raster cell but no further differentiation by age, gender etc.

Worldpop

The population data by Worldpop is also provided for the base year 2020 and as raster cell data. It has a resolution of 3 arc seconds and is based on several population censuses, the Built-Settlement Growth Model (BSGM) and Geospatial covariates representing factors related to population distribution. This population data is differentiated by age and gender.

A comparison of both datasets with official population statistics showed that the GHSL data is more accurate regarding the number of persones per municipality and better at depicting the spatial distribution within each municipality. But since Worldpop offers a differentiation by age and gender, the two datasets are combined:

  • the number of persons per raster cell is taken from GHSL
  • for each GHSL raster cell with residents (because there are also empty ones) the (combined) age and gender distribution is calculated from Worldpop

However, it has to be noted that the GHSL-POP dataset has problems with coverage accuracy. An extreme example for this can be seen in Figure 1 for the eastern area of Lake Zurich. As a result, the overall number of agents for the affected municipalities is correct, but the spatial distribution of those agents is skewed. We are planning to correct this in the future taking into account different data, but currently is something the user has to be aware of.

GHSL_POP_datagaps

Figure 1: Example for Coverage Issues in GHSL-POP

User input

As stated above, if the user has more accurate population data for the study area, this can be integrated in the population generation provided by Creario. In that case, the user has to provide at least one shape file indicating the boundaries of the statistical units for which the additional data is available. The population statistics can be provided in the same shape file or in a separate csv file (see the subsections below for more specific requirements). If additional data regarding the number of residents per statistical unit is provided by the user, the GHSL raster data is fitted to match these numbers. If additional data regarding age and/or gender distribution is provided, this is used instead of the distributions derived from Worldpop.

The user can provide several datasets, e.g. for different parts of the study area or to provide smaller-scale data for a certain part of the study area. Each dataset has to be given a unique id. If more than one dataset is provided, the user has to assign priorities with regard to the total number of residents, the age distribution and the gender distribution, respectively. Different datasets can have different priorities per aspects as demonstrated in the following example.

Example for user data prioritisation

Dataset 1:

  • coverage: entire study area incl. main city
  • resolution: municipality
  • population data with age and gender distribution

Dataset 2:

  • coverage: main city within the study area
  • resolution: neighbourhood
  • population data with age distribution that is more detailed than in dataset 1 but without gender

User priorities

Attribute Priority dataset 1 Priority dataset 2
Total number of residents 2 1
Age 2 1
Gender 1 99

Result

The overall number of residents is fitted to dataset 2 for the main city (where both datasets provide data) and to dataset 1 for all other municipalities. Analogously, the age distribution for the main city is taken from dataset 2 and for all other municipalities from dataset 1. However, the gender distribution of dataset 1 is used for the entire study area.

Requirements age and gender information

  • Age and gender information has to be provided as number of residents per person group.
  • Age and gender information can be provided separately or combined (see example tables).

Example for separate age and gender information

Spatial entity Residents Female Male 0To19 20To39 40To59 60To100
Municipality A 380 198 182 76 61 133 110
Municipality B 763 366 397 145 160 149 309
Municipality C 1500 765 735 330 420 300 450

Example for combined age and gender information

Spatial entity Residents Female_0To19 Female_20To39 Female_40To59 Female_60To100 Male_0To19 Male_20To39 Male_40To59 Male_60To100
Municipality A 380 40 32 69 57 36 29 64 53
Municipality B 763 70 77 71 148 75 83 78 161
Municipality C 1500 168 214 153 230 162 206 133 220

Requirements for the shape file

  • Per dataset only one shape file can be used.
  • It has to contain the boundaries of all spatial entities (e.g. municipalities or neighbourhoods).
  • It has to contain an id for each spatial entity and this id has to be consistent with additional csv data. The user has to specify, which attribute contains the id.
  • One spatial entity can include more than one polygon, associated polygons must have the same id.
  • The name of the spatial entity is optional. If there is a name, the user has to specify which attribute contains the name.
  • If the shape file contains the total number of residents, the user has to specify, which attribute contains this information.
  • If the shape file contains age and/or gender information, the user has to specify which data is provided and which attributes contain the respective information.

Requirements for a csv file

  • The data has to be provided in a csv or similar text file. The delimiter/field separator can be specified by user.
  • The user has to specify which column contains the id.
  • The id of the spatial entities in the csv have must be consistent with the ids in the shape file.
  • If the csv file contains the total number of residents, the user has to specify, which column contains the number of residents.
  • If the csv file contains age and/or gender information, the user has to specify which data is provided and which columns contain the respective information.

Overview Processing

The generation and attribution of agents is done in four main phases.

  1. Initialisation
  2. Iterative Fitting
  3. Assign agent attributes
  4. Assign initial plans

Initialisation

During the initialisation phase the input data is read and an initial set of agents is generated. This includes the following steps:

  • If available, read polygons and population statistics data and create a SpatialEntity object for each spatial entity.
  • Read raster data from GHSL, assign raster data to spatial entities. Per dataset, each raster cell is assigned to one spatial entity based on the center point of the raster cell. However, a raster cell can be assigned to several spatial entities from overlapping datasets.
  • Based on the priorities given by the user, age and gender distributions from user input or Worldpop data are assigned to each raster cell.
  • Generate an initial set of agents without any attributes besides home location, with the number of agents per raster cell corresponding to the number of residents in the GHSL dataset.

Iterative fitting

The goal of the iterative fitting phase is to correct the number of agents per raster cell based on the statistical data provided by the user, taking into account that the number of agents for a certain area can differ between datasets. Differences between datasets are balanced through the iterative fitting. For this, the algorithm iterates through all datasets (starting with the dataset with the lowest priority) and through all spatial entities in that dataset and compares number of agents per spatial entity with the number of residents according to statistics. If there is a difference, the ratio between the number of agents currently assigned to the spatial entity and the number of residents according to statistics is calculated. Based on this ratio and the number of agents currently assigned to a raster cell, a new target number of agents per raster cell is calculated and agents are added or deleted (randomly) until this target number is reached.
Iterations are stopped, if for all spatial entities the relative difference between agents and the number of residents according to statistics is less than or equal to 1% or if 25 iterations have been completed.

Note: If different datasets are used, the user is advised to check if they are overall consistent. The iterative fitting can balance smaller differences in the number of residents for the same raster cell, but might oscillate and not converge if the differences are too big.

Assign agent attributes

Currently, each agent is attributed with

  • home location
  • age
  • gender
  • land-use type of the home location
  • public transport quality class
  • distance public transport stops
  • vehicle ownership.

Further attributes will be added in the future such as income or behaviour type.

Home location

The home location is assigned during the agent generation. If the scenario includes facilities, each agent is assigned a home-facility within its raster cell taking into account the weights for home ActivityOptions (see Section Weight for home ActivityOptions). If no home-facilities are associated with the raster cell, a new home-facility is generated for each agent with a weight of 1. If the scenario does not include facilities, the home location is represented by an x,y-coordinate that is chosen randomly within the raster cell.

Age

Each agent is assigned a specific age. The age is randomly drawn from the age distribution with the highest priority available for the agent’s raster cell. Within an age group, the age is randomly assigned.

Gender

Currently, gender only differentiates between male and female since this is what most datasets provide. If a combined age and gender distribution is used, the gender is drawn dependent on age, otherwise gender is independently drawn from the gender distribution for the raster cell.

Land-use type

The land-use type of the agent’s home location will later on be used to differentiate between different patterns in travel behaviour. In general, the classification here follows Eurostat’s methodolgy to calculate the Degree of Urbanisation (DEGURBA) with a few modifications. The reason for these modifications is that Eurostat uses raster cells of 1 km² while the raster cells in Creario comprise an area of only 0.01 km². In addition, it is important to note, that only the population within the study area is used for the classification. This might lead to some border effects, e.g. when a small town or a village belongs to an agglomeration that is just outside the study area. The user is advised to take this into account when choosing the study area.

Processing

The classification procedure comprises the following three steps:

  1. Classify raster cells according to population density
  2. Determine population density clusters
  3. Classify land-use typ of agent home location

Step 1: Classify raster cells according to population density
For each raster cell, the population density is calculated. The raster cell is then classified into one of three classes according to the population density threshold outlined in Table 1.

density class Population density [Inh/km²]
1 (high density) ≥ 1500
2 (medium density) 300-1499
3 (low density) < 300

Table 1: Population Density Thresholds

Step 2: Determine population density clusters

The population density classes are then used to find two types of clusters: urban centres and suburban clusters. Urban centres are determined using raster cells with density class 1 (high density). Raster cells that are non-diagonally contiguous (i.e. excluding raster cells that only touch at the corner) are grouped into a cluster. Gaps within a cluster are filled by iteratively adding raster cells to the cluster if the raster cells are surrounded by 5 or more raster cells belonging to the same cluster. If a cluster has a population of at least 50’000 inhabitants after gap filling it is considered an urban centre.

For suburban clusters raster cells with density class 1 (high density) that are not part of an urban centre and raster cells with density class 2 (medium density) are eligible. Raster cells that share a common border, including raster cells that only touch diagonally at corners, are grouped to a cluster. If a cluster has a population of at least 5’000 inhabitants after gap filling, it is considered a suburban cluster. Note that in the Eurostat methodology, the gap filling is only done for urban centre clusters and not for suburban clusters, but due to the higher resolution, it was decided that in Creario, the gaps in suburban clusters are filled in as well.

The result of the clustering for the Canton of Zurich can be seen in Figure 2. Raster cells belonging to urban centres are marked in red and raster cells belonging to suburban clusters in green. There are two distinct urban clusters. The first one to the east mainly consists of the densely populated parts of the city of Winterthur. The second one is much larger and not only contains the city of Zurich, but a lot of neighbouring municipalities where the settlement areas have grown together.

exampleClusters

Figure 2: Population Clusters Canton of Zurich

Step 3: Classify land-use typ of agent home location
Finally, the agents are assigned a land-use type based on the raster cells of their home location as outlined in Table 2.

Land-use class home location Rule
1: urban Home location raster cell belongs to an urban centre cluster.
2: suburban Home location raster cell belongs to a suburban cluster.
3: rural Home location raster cell does not belong to a cluster.

Table 2: Population density thresholds

Public transport quality class (PTQC)

The accessibility by public transport can be measured in a variety of ways. An indicator that has proven itself in practice in Switzerland is the so-called public transport quality class (PTQC) (or ÖV-Güteklasse). The calculation method for PTQC indicator in this application follows largely the methodology described in ARE (2022) First, all public transport stops are categorised according to the transport modes serving the stop and the course interval per transport mode. Then, the public transport quality class can be calculated for any location of interest taking into account the distance to the next public transport stops and the aforementioned stop category. In this application, the PTQC is calculated for each agent individually based on the coordinate of their home location.

Input data

For the calculation of the PTQC a full public transport schedule in the GTFS (General Transit Feed Specification) form is required. If no schedule is available for the study area, this indicator cannot be calculated. Instead, the user is advised to use the distance to public transport stops indicators described in the next section. The GTFS schedule is converted into a MATSim transit schedule for the region including all lines that have at least one stop in the study area as described in Section Public transport. The subsequent calculations are done on the level of StopAreas.

Calculate StopArea categories

Each StopArea is assigned a StopArea category based on transport modes and course intervals. Transport modes are categorised into the three following groups:

Mode Group Included transport modes
A Rail
B Tram, subway, bus, ferry
C Funicular, cable car

Table 3: Transport Mode Groups for Public Transport Quality Class Calculation

For the calculation of the course interval a work day outside school holidays and high tourist season is used. For each mode group, all departures for lines of that mode group from the StopArea between 06:00 and 20:00 are counted. Then, except for stops at the end of a line and stops with departures in just one direction, the departures are divided by two to obtain the number of departures per direction. The course interval per mode group is subsequently calculated by dividing 840 minutes by the number of departures.

The StopArea category is then assigned according to the following table. If more than one StopArea category is applicable, the highest ranking one is chosen.

Course interval Mode group A Mode group B Mode group C
< 5 min I II V
5 - 10 min II III V
10 - 20 min II IV V
20 - 40 min III V V
40 - 60 min IV V V
≥ 60 min V V V

Table 4: StopArea Categories

Determine public transport quality class (PTQC)

The PTQC is calculated for each agent according to the rules in the following table.

StopArea category < 300m 300 - 500m 500-750m 750-1000m
I A A B C
II A B C D
III B C D none
IV C D none none
V D none none none

Table 5: Public Transport Quality Class (PTQC)

The distance is calculated as the crow-fly distance between the home location of the agent and the geographical center point of the StopArea.

The result for the public transport quality class for a section of the Canton of Zurich can be seen in Figure 3. As expected, PTQC A is found in the major cities and around the train stations including S-Bahn stations with PTQC areas B and C radiating out from these centers. In addition, there are some smaller clusters with a PTQC B in the centre. Overall, the PTCQ level is high as can be expected in the Canton of Zurich.

EXAMPLEP_PTQC

Figure 3: Public Transport Quality Classes in the Canton of Zurich

Distance to public transport stops

Another indicator for accessibility by public transport is the distance to public transport stops for different modes. It is less comprehensive but also requires less input data, i.e. can also be calculated when no public transport schedule is available. Currently, the distance to four types of public transport stops are determined:

  • train station
  • metro station
  • tram stop
  • bus stop

Input data

The calculation of the distance to the next public transport stop is either based on GTFS data as described in Section Public transport quality class (PTQC) or based on OSM stops, which are more widely available. The indicator takes into account the location of each stop in OSM as well as type of stop as indicated in the “railway” and “highway” tags.

Processing

The distance to public transport stops is calculated for each agent based on the crow-fly distance from their home location. For each of the four distance classes outlined in Table 6, it is determined which type of public transport stop is available within the distance interval. Then for each type of stop, the minimum distance class is assigned.

Distance interval Assigned value
0 - 1000m 0.5
1000 - 2000m 1.5
≥ 2000m 3.5

Table 6: Distance Coding for Public Transport Stops

In Figure 4 the results for train stations, tram and bus stops are shown for the same section of the Canton of Zurich as in Figure 3. The combination of the three indicators shows the structure of the public transport network in the area. While trams only serve the city of Zurich and a few neighbouring municipalities, bus stops can be found in pretty much every village within the canton. Train stations, on the other hand, provide only selective coverage.

EXAMPLEP_DistanceToPT

Figure 4: Distance to Train Stations (left), Tram Stops (middle) and Bus Stops (right) Canton of Zurich

Vehicle ownership

In order to determine the agents’ vehicle ownership, a two-step approach is used. First, the vehicle ownership rate (number of vehicles per inhabitant) is determined for each raster cell. Then, the vehicles are randomly assigned to the agents old enough to own a car in a way that the sum of vehicles meets the vehicle ownership rate. A linear regression model for vehicle rates was estimated based on UK census data and additional spatial characteristics.

Input data

The data used to estimate the vehicle rate model originates from UK census maps for England and Wales. These maps provide data for a large variety of topics. In this application the following datasets are used:

  • number of vehicles
  • number of inhabitants
  • gender
  • age distribution

All datasets except the age distribution are provided on the level of so-called output areas, the age distribution is provided based on middle layer super output areas. The model is estimated on the level of raster cells. Therefore, in a first step all variables on output area and middle layer super output area level are assigned to the raster cells within. The vehicle rate is calculated by dividing the number of vehicles in a spatial area by the number of inhabitants.
Then, the following spatial characteristics are calculated using the algorithms described above:

  • land-use type
  • distance to train
  • distance to tram

Vehicle rate model

Different linear regression models for vehicle rates were tested and evaluated according to model fit and - more importantly - prediction quality. The parameters of the best model are presented in Table 7.

Variable Parameter std. error t-value
Constant -1.2681 0.0076 -167
PopDensityClass -0.0486 0.0002 -223
LandUseType 0.0309 0.0003 94
ShareFemales 0.5109 0.0034 149
DistToTrain 0.0149 0.0002 88
DistToTram 0.0475 0.0004 111
AverageAge 0.0377 0.0002 202
ShareUnder18 0.4462 0.0070 64
Share70Up -1.3237 0.0111 -119

Table 7: Model Estimation Results Vehicle Rate Model

The population density is included in the model using the classes outlined in Table 8.

Population density [p/km²] PopDensityClass
< 25 1
25 - 300 2
300 - 1500 3
≥ 1500m 4

Table 8: Population Density Classes for Vehicle Rate Model

Three age variables are derived from the age distribution provided in the census maps: the average age, the share of minors (i.e. under 18 year olds) and the share of seniors (here 70 years or older).

All model parameters have the expected signs and are significant. Factors for decreasing vehicle rates are increasing population density and increasing share of seniors. Factors for increasing vehicle rates are higher average age, higher share of females or under 18 year olds, more rural areas and higher distances to tram stops or train stations. The constant is context-specific and should be adapted to account for different base level of vehicle rates in different countries / study areas.

Application to scenario population

First, the vehicle rate model is used to calculate the vehicle rate for each raster cell in the study area. Then, vehicles are assigned randomly to agents that are at least of minimum driving age as specified by the user (default set to 18 years) until the required number of vehicles per raster cell is met.

Assign initial plans

Finally, each agent is assigned an initial plan to start the MATSim optimisation process. These initial plans are based on travel diaries derived from a corresponding survey. The survey data contains a travel diary for a day including arrival and departure times at activities, activity types, trip modes, trip distances etc. Based on personal and spatial characteristics, the study participants are classified into different groups. For each group, all travel diaries in the survey are collected. Each agent is assigned one of the travel diaries based on the group they belong to. The travel diary is then transformed into a MATSim plan. The number and oder of activities, including activity type (home, work, shopping, etc.), and the trips, including modes, are directly taken from the travel diary. Arrival and departure times are randomised in an interval around the original times. Then, locations are assigned to each activity, first for primary activities (home, work, education) and then for secondary activities.

Input data

The travel diaries are taken from household travel surveys. Currently, the user can choose between the following travel surveys:

Additional travel surveys will be added in the future. On request, it is also possible to integrate a travel survey provided by the user. Please contact us for more information.

In addition to the diaries, the datasets include personal and household characteristics of the participant such as age, gender, vehicle ownership etc. The travel diaries are filtered to only include diaries that

  • occur on weekdays and outside of school holidays
  • start and end at home
  • have between three and seven activities
  • are completed within 24 hours
  • do not contain trips of less than 60 seconds

Activity types taken into account are

  • home
  • work
  • education
  • shopping
  • leisure

Modes taken into account are

  • car
  • public transport
  • bike
  • walk
  • other

Harmonise sub-tour modes

One relevant concept in MATSim are so-called sub-tours. A sub-tour is a sequence of activities and trips that start and end at the same location, usually the home or work location. For example, the daily plan home-work-home-leisure-home contains two home-based sub-tours: 1) home-work-home and 2) home-leisure-home. A sub-tour can also be nestled within another sub-tour (e.g. the daily plan home-work-leisure-work-home also contains two sub-tours: 1) home-work-[…]-work-home and 2) work-leisure-work. Depending on which modules are used in the MATSim scenario, the replanning assumes that the modes within a sub-tour are consistent. This pertains especially to sub-tours including car trips. Therefore, the daily plans derived from the survey data are divided into sub-tours and if a sub-tour contains at least one car trip, all modes within the sub-tour are set to car.

Classify agents

The survey participants are classified taking into account the attributes detailed in Table 9.

Attribute Classes
Gender male, female
Age ≤ 16, 17-29, ≥30
Vehicle yes, no
Land-use type home location urban, suburban, rural
Distance to train <1km, 1-2km, ≥2km

Table 9: Attributes for the Classification of Travel Survey Participants

Since only a small share of rural locations are closer than 2km to the next train station, the differentiation regarding the attribute “Distance to train” is only done for urban and suburban participants.

Assign plan

For each agent, a travel diary is randomly selected from the set of travel diaries assigned to the corresponding agent class. Then, a MATSim is created adopting the following characteristics from the travel diary:

  • number and order of activities incl. activity types
  • trips incl. modes and distances
  • arrival and departure times
  • distance to school / work

Randomise departure time of first trip

Since the number of agents is usually much larger than the number of participants in a household travel survey, adopting all arrival and departure times unchanged would lead to too little variation in departure times potentially resulting in unrealistic peaks in the departure time distribution. Therefore, the departure time of the first trip of a plan is adapted by randomly drawing a departure time in a ± 10 minutes interval around the original departure time in the travel diary. All subsequent arrival and departure times are shifted accordingly, keeping trip and activity durations unchanged, except for the duration of the first and last home activities, that are adapted to accommodate the shift.

Randomise activity durations

In addition to the departure time of the first trip, the durations of the subsequent activities are randomised as well. This is done by randomly drawing a new duration in a ± 10 minutes interval of the original activity duration. Exempt from this are activities with a duration of 15 minutes or less and the first and last activity. Once an activity duration is adapted, the corresponding shift is carried over to the departure and arrival time of the next trip and activity. The duration of the last activity is then adjusted to accommodate the sum of all duration shifts in the plan.

Assign primary locations

The next step is to assign locations, i.e. coordinates and - if applicable - facilities to the activities. A different approach is used for primary and secondary activities. Activities are considered primary activities, if their location is determined by long term decisions and cannot be chosen on a daily basis.
In this context, home, work and education are considered primary activities and shopping, leisure and other are considered secondary activities.

The home location of an agent was already determined in the population generation. The location for work and education activities is obtained by a weighted random draw from raster cells within a certain distance around the home location. The weight is based on the attractiveness of the raster cells. The distance is obtained from the survey data with a buffer of ± 1km.

If the scenario includes facilities, the activity is randomly assigned a facility the with the corresponding ActivityOption within raster cell taking into account the weights described in Sections Weight for home ActivityOptions and Weight other ActivityOptions. If the scenario does not include facilities, the activity location is represented by an x,y-coordinate that is chosen randomly within the raster cell.

Distance to work or school

In some household travel surveys, the crow-fly distance to the participant’s work place and/or school is provided with the survey data. If this information is not available and the first school or work trip starts from home, the crow-fly distance of this trip is used.

If neither is available, the distance is drawn from the distance distributions for work and eduction trips in the travel survey using the following two-step procedure. First, the distance interval is chosen according to the probabilities provided in Table 10 or Table 11. Then, a random value within the distance interval is drawn.

Distance Share [%]
< 5km 33
5-10 km 28
10-50 km 35
≥ 50 km 4

Table 10: Distance Distribution Work Trips/p>

Distance [km] Share [%]
< 1 38
1-2 16
2-6 23
≥ 6 23

Table 11: Distance Distribution Education Trips

Attractiveness raster cells for primary activities

The attractiveness of a location is calculated individually for each activity type. It is assigned to the raster cells in the study area.

The attractiveness of a raster cell regarding work locations is a combination of population density and the existence of non-residential buildings as specified in the following equation:

attractivenesswork=popDensityClass0.3+nonResBuildingClass0.7attractiveness_{work} = popDensityClass * 0.3 + nonResBuildingClass * 0.7

For the variable popDensityClasspopDensityClass the population density is classified as specified in Table 12.

Population density [p/km²] PopDensityClass
< 2’000 1
2’000-4’000 2
4’000-6’000 3
6’000-8’000 4
8’000-10’000 5
≥ 10’000 6

Table 12: Population Density Classes for Workplace Attractiveness Calculation

The variable nonResBuildingClassnonResBuildingClass is derived from the GHSL layer GHS_BUILT_C. The layer contains settlement characteristics in terms of the morphology of the built environment and the functional use. It differentiates between residential and non-residential buildings of different heights. For work place attractiveness, non-residential buildings are the focus as detailed in Table 13.

Building height [m] GHS_BUILT_C code NonResBuildingClass
≤ 3 21 1
3-6 22 2
6-15 23 3
15-30 24 4
≥ 30 25 5

Table 13: Non-Residential Building Classes for Workplace Attractiveness Calculation

The attractiveness for education activities is currently based solely on (unclassified) population density class as defined in Table 12. This will be improved once more suitable data can be integrated.

Assign secondary locations

The location of secondary activities is determined based on time geography as described in [Horni A., D.M. Scott, M. Balmer and K.W. Axhausen (2009), Location Choice Modeling for Shopping and Leisure Activities with MATSim - Combining Microsimulation and Time Geography, Transportation Research Records (TRR), No. 2135, pp. 87–95.]. First, the activity chain of a plan is divided into sub-chains that start and end with a primary activity and have only secondary activities in between. For example, the activity chain home-work-leisure-work-shopping-leisure-home is divided into three sub-chains: 1) home-work, 2) work-leisure-work and 3) work-shopping-leisure-home. The sub-chains 2) and 3) contain secondary activities, for which the location has to be determined.

timeGeographyApproach

Figure 5: Illustration of the Time Geography Approach

The time geography approach is illustrated in Figure 5. It is used to constrain the search space to suitable locations for the secondary activity. To do this, the primary activities at the beginning and the end of the sub-chain are defined as anchor points with a fixed location and departure and arrival times. Then, a centre point (green dot) between the two anchor points (red dots) is determined and a circle (dashed green circle) is drawn around the centre point. The area in the dashed green circle is called the potential path area (PPA) and the location of the secondary activity lies within this PPA. The size of the PPA depends on the available travel time budget. The travel time budget is calculated as the difference between the arrival time at the second primary activity and the departure time of the first primary activity minus the duration of all secondary activities in between.

Two key elements in implementing the time geography approach are how to determine the centre point and the radius of the PPA. In the implementation described by [Horni A., D.M. Scott, M. Balmer and K.W. Axhausen (2009), Location Choice Modeling for Shopping and Leisure Activities with MATSim - Combining Microsimulation and Time Geography, Transportation Research Records (TRR), No. 2135, pp. 87–95.], the centre point is determined as the geographical centre between the two anchor points and the radius of PPA is calculated by multiplying half of the travel time budget with the typical speed in the study area, here set to 25.3 km/h. Using the geographical centre only works well, if the scenario is set in an area without large separating geographic features such as a lake, strong elevation differences or large (transport) infrastructure that can only be crossed at select places. Moreover, it does not account for the fact, that locations for secondary activities are often chosen in the vicinity of the primary activities and not halfway between their locations. Thus, the centre point in Creario is chosen randomly along the least-cost path between the anchor points. Only if at least one of the anchor points is outside the study area and the least-cost path cannot be calculated completely, the geographical centre is used. Calculating the radius based on travel time budget and a typical speed can lead to issues if the plans contain several modes, even more so, if the sub-chains contain mixed modes. Thus, in this implementation, the radius is based on a distance budget. The distance budget is the sum of the reported crow-fly distances of the trips within the sub-chain and the radius is calculated by halving this distance budget.

timeGeographyApproachWithRoute

Figure 6: Time Geography - Location Selection and Routing

Then, a raster cell within the PPA is selected randomly (pink dot in Figure 6) taking into account the attractiveness of the raster cell for the activity type of the secondary activity. If the scenario does not include facilities, the activity location within the raster cell is chosen randomly. If the scenario includes facilities, the facility is selected taking into account the weights described in Sections Weight for home ActivityOptions and Weight other ActivityOptions. To ensure that the resulting trip is realistic, the nearest nodes in the car network are determined for the secondary activity and the two anchor points and then the least-cost route is calculated from the first anchor point to the secondary activity (blue route) and from the secondary activity to the second anchor point (orange route). If the sum of travel times for both routes is smaller or equal to the travel time budget, the secondary activity location is considered valid.

timeGeographyApproachWithRecursiveness

Figure 7: Time Geography - Recursive Search

If there is more than one secondary activity in the sub-chain, the process is repeated for the next secondary activity with the newly fixed secondary activity (pink dot) as one of the two anchor points, as illustrated in Figure 7. The size of the PPA reflects the remaining distance budget and the location of the next secondary activity (purple dot) is drawn within the new smaller PPA. The validity of the locations is, however, checked by comparing the travel times along all trips (blue, yellow and orange) with the overall travel time budget. If the sum of trip travel times exceeds the travel time budget, the locations of all secondary activities within the sub-chain are recalculated. If the travel time budget is exceeded repeatedly, the centre point of PPA is shifted randomly along the route between the two anchor points. In order to ensure that the shifted centre point is not too close to already tested ones, the least-cost path is segmented into ten length intervals. A new centre point can only be selected from a length interval that has not been tested yet. If the travel time exceeds the travel time budget even after testing all length intervals, the travel time budget is increased. This is iteratively repeated until a location is found.

There are a few special cases, where the approached described above has to be adapted.

  • The travel time budget is insufficient for the least-cost path between the primary activities.
  • At least one of the primary activity locations is outside the network area.
  • The two primary locations of a sub-chain are the same (e.g. home-shopping-home), i.e. the sub-chain is a round trip.

Insufficient travel time budget
If the duration of the least-cost path between the two primary locations is longer then the travel time budget derived from activity arrival and departure times, there can be no valid secondary location choice. If this is the case, the travel time budget is increased by 25% until it is larger than the duration of the least-cost path between the two primary locations. This will inevitably lead to delays in the plan when it is executed in the MATSim simulation. However, there are very few cases where this occurs and the MATSim simulation will correct these over the iterations.

Locations outside network area
Since the network is only generated for a 35km buffer zone around the study area and the survey diaries contain trips longer than 35km, some primary activity locations may lie outside the network area. This is realistic and often called “external travel” in transportation modelling. For trips to these activity locations no least-cost path can be calculated. Therefore, the validation against the travel time budget is simplified. The travel time of the affected trips is estimated by dividing the crow-fly distance between the locations by the “typical travel speed”.

Round trip
If the two anchor points of a sub-chain are the same, the time-geography approach cannot be used. In this case, the location of the first secondary activity in the sub-chain is determined using the approach for primary activities as described above. If there is more than one secondary activity, the location thus determined for the first secondary activity is then the new anchor point and the time geography approach is used for all subsequent secondary activities.

Attractiveness raster cells for secondary activities

The approach for calculating the attractiveness of a raster cells for shopping and leisure activities depends on whether the scenario includes facilities or not.

If the scenario includes facilities, the attractiveness of a raster cell for either activity type equals the sum of the ActivityOption weights (see Section Weight other ActivityOptions) of the facilities within the raster cell accommodating that activity type.

If the scenario does not include facilities, the attractiveness for shopping and leisure activities is calculated using the following equations:

attractivenessshopping=popDensityClass0.0003+nonResBuildingClass0.6825attractiveness_{shopping} = popDensityClass * 0.0003 + nonResBuildingClass * 0.6825

attractivenessleisure=popDensityClass0.6584+nonResBuildingClass0.7468attractiveness_{leisure} = popDensityClass * 0.6584 + nonResBuildingClass * 0.7468

Since there are so few facilities for other activities, the attractiveness is calculated regardless of facilities with the following equation.

attractivenessother=popDensityClass0.3512+nonResBuildingClass0.6054attractiveness_{other} = popDensityClass * 0.3512 + nonResBuildingClass * 0.6054

The last step in the generation of initial plans is assigning a network link to each activity. Since MATSim currently requires all activities to occur on a car-accessible link, only the car network is used for this. Another stipulation from the MATSim simulation is, that activity links cannot be sinks or sources since agents have to be able to arrive at the location and depart again. Due to the network filtering and cleaning, the network might contain some sinks or sources. Therefore, connector links are added to the car-only network with the same process as described in Section Network cleaning, simplification and thinning, but with a minimum speed of 50 km/h. Since, as discussed above, some activity locations will lie outside the study area and thus, the network boundaries, different approaches are used for locations within the study area and locations outside the study area.

Within the study area

The first step to assigning a link to activities within the study area is to filter out all network links, that are not suitable as activity locations. These are:

  • motorways, motorway access roads (motorway_link) and connector roads
  • trunk roads, trunk access roads (trunk_link) and connector roads
  • tunnels
  • bridges Then, the link closest to the activity location on the filtered network is determined and assigned as the activity link.

Outside the study area

The basic approach for activities outside the study area is similar, but with a few alterations because trips from outside the study area usually enter the network on main roads such as motorways, trunks or primary roads. Therefore, the filter conditions for suitable network links are different. Filtered out are lower level road types such as residential roads, service roads or living streets. In addition, to avoid assigning external activities to the wrong side of a motorway and generate long detours, motorways are also removed from the network of suitable links. Motorway connector links as described in Network cleaning, simplification and thinning remain for trips that enter the network via motorway.

The second step is then again to find the link closest to the activity location. However, to further emphasise that external traffic is more likely to enter the network on higher priority roads, the distance to the location is not the only attribute considered but also the road type. Therefore, all links connected to nodes that are within 500 m of the node closest to the activity location are collected and a link score is calculated for each of these links using the following equation:

scorei=diffDistidistParam+roadTypeWeightiscore_i = diffDist_i * distParam + roadTypeWeight_i

where
scoreiscore_i score of link ii
diffDistidiffDist_i difference in the distance between the activity location and link ii and in the distance between the activity location and the nearest link jj
distParamdistParam weight for difference in distance, here 0.01
roadTypeWeightiroadTypeWeight_i road type weight of link i

The roadTypeWeightroadTypeWeight is assigned as specified in Table 14. The link with the lowest score is then selected.

Road type RoadTypeWeight
motorway 1
motorway_link 2
connectorMW" 1
trunk 2
trunk_link 3
connector 4
primary 4
primary_link 5
secondary 5
secondary_link 5
tertiary 5
other 10

Table 14: Road Type Weights for Link Assignment

Figure 8 illustrates the procedure. The red dot represents the activity location outside study area, the orange links are motorways, the pink links motorway connectors and the green links primary roads. Link a) is the closest link, but it is a primary road. The score of link a is: 0 m * 0.01 + 4 = 4. The difference in distance between link a) and link b) is 200 m. Thus, the score of link b) is 200m * 0.01 + 1 = 3. Thus, connector link b) is selected over primary road link a).

EXAMPLEP_DistanceToPT

Figure 8: Illustration of Nearest Link Selection Weighted by Road Type