Models

epr.DensityEPR([name, rho, gamma, beta, …])

Density-EPR model.

epr.SpatialEPR([name, rho, gamma, beta, …])

Spatial-EPR model.

epr.Ditras(diary_generator[, name, rho, gamma])

Ditras modelling framework.

markov_diary_generator.MarkovDiaryGenerator([name])

Markov Diary Learner and Generator.

gravity.Gravity([deterrence_func_type, …])

Gravity model.

radiation.Radiation([name])

Radiation model.

EPR

class skmob.models.epr.DensityEPR(name='Density EPR model', rho=0.6, gamma=0.21, beta=0.8, tau=17, min_wait_time_minutes=20)

Density-EPR model.

The d-EPR model of individual human mobility consists of the following mechanisms [PSRPGB2015] [PSR2016]:

Waiting time choice. The waiting time \(\Delta t\) between two movements of the agent is chosen randomly from the distribution \(P(\Delta t) \sim \Delta t^{−1 −\beta} \exp(−\Delta t/ \tau)\). Parameters \(\beta\) and \(\tau\) correspond to arguments beta and tau of the constructor, respectively.

Action selection. With probability \(P_{new}=\rho S^{-\gamma}\), where \(S\) is the number of distinct locations previously visited by the agent, the agent visits a new location (Exploration phase), otherwise it returns to a previously visited location (Return phase). Parameters \(\rho\) and \(\gamma\) correspond to arguments rho and gamma of the constructor, respectively.

Exploration phase. If the agent that is currently in location \(i\) explores a new location, then the new location \(j \neq i\) is selected according to the gravity model with probability \(p_{ij} = \frac{1}{N} \frac{n_i n_j}{r_{ij}^2}\), where \(n_{i (j)}\) is the location’s relevance, that is, the probability of a population to visit location \(i(j)\), \(r_{ij}\) is the geographic distance between \(i\) and \(j\), and \(N = \sum_{i, j \neq i} p_{ij}\) is a normalization constant. The number of distinct locations visited, \(S\), is increased by 1.

Return phase. If the individual returns to a previously visited location, such a location \(i\) is chosen with probability proportional to the number of time the agent visited \(i\), i.e., \(\Pi_i = f_i\), where \(f_i\) is the visitation frequency of location \(i\).

Parameters
  • name (str, optional) – the name of the instantiation of the d-EPR model. The default value is “Density EPR model”.

  • rho (float, optional) – it corresponds to the parameter \(\rho \in (0, 1]\) in the Action selection mechanism \(P_{new} = \rho S^{-\gamma}\) and controls the agent’s tendency to explore a new location during the next move versus returning to a previously visited location. The default value is \(\rho = 0.6\) [SKWB2010].

  • gamma (float, optional) – it corresponds to the parameter \(\gamma\) (\(\gamma \geq 0\)) in the Action selection mechanism \(P_{new} = \rho S^{-\gamma}\) and controls the agent’s tendency to explore a new location during the next move versus returning to a previously visited location. The default value is \(\gamma=0.21\) [SKWB2010].

  • beta (float, optional) – it corresponds to the parameter \(\beta\) of the waiting time distribution in the Waiting time choice mechanism. The default value is \(\beta=0.8\) [SKWB2010].

  • tau (int, optional) – it corresponds to the parameter \(\tau\) of the waiting time distribution in the Waiting time choice mechanism. The default value is \(\tau = 17\), expressed in hours [SKWB2010].

  • min_wait_time_minutes (int) – minimum waiting time between two movements, in minutes.

Variables
  • name (str) – the name of the instantiation of the model.

  • rho (float) – the input parameter \(\rho\).

  • gamma (float) – the input parameters \(\gamma\).

  • beta (float) – the input parameter \(\beta\).

  • tau (int) – the input parameter \(\tau\).

  • min_wait_time_minutes (int) – the input parameters min_wait_time_minutes.

Examples

>>> import skmob
>>> import pandas as pd
>>> import geopandas as gpd
>>> from skmob.models.epr import DensityEPR
>>> url = >>> url = skmob.utils.constants.NY_COUNTIES_2011
>>> tessellation = gpd.read_file(url)
>>> start_time = pd.to_datetime('2019/01/01 08:00:00')
>>> end_time = pd.to_datetime('2019/01/14 08:00:00')
>>> depr = DensityEPR()
>>> tdf = depr.generate(start_time, end_time, tessellation, relevance_column='population', n_agents=100, show_progress=True)
>>> print(tdf.head())
   uid                   datetime        lat        lng
0    1 2019-01-01 08:00:00.000000  42.780819 -76.823724
1    1 2019-01-01 09:45:58.388540  42.728060 -77.775510
2    1 2019-01-01 10:16:09.406408  42.780819 -76.823724
3    1 2019-01-01 17:13:39.999037  42.852827 -77.299810
4    1 2019-01-01 19:24:27.353379  42.728060 -77.775510
>>> print(tdf.parameters)
{'model': {'class': <function DensityEPR.__init__ at 0x7f548a49cf28>, 'generate': {'start_date': Timestamp('2019-01-01 08:00:00'), 'end_date': Timestamp('2019-01-14 08:00:00'), 'gravity_singly': {}, 'n_agents': 100, 'relevance_column': 'population', 'random_state': None, 'show_progress': True}}}

References

PSRPGB2015(1,2)

Pappalardo, L., Simini, F. Rinzivillo, S., Pedreschi, D. Giannotti, F. & Barabasi, A. L. (2015) Returners and Explorers dichotomy in human mobility. Nature Communications 6, https://www.nature.com/articles/ncomms9166

PSR2016(1,2)

Pappalardo, L., Simini, F. Rinzivillo, S. (2016) Human Mobility Modelling: exploration and preferential return meet the gravity model. Procedia Computer Science 83, https://www.sciencedirect.com/science/article/pii/S1877050916302216

SKWB2010(1,2,3,4,5,6,7,8,9,10,11,12)

Song, C., Koren, T., Wang, P. & Barabasi, A.L. (2010) Modelling the scaling properties of human mobility. Nature Physics 6, 818-823, https://www.nature.com/articles/nphys1760

See also

EPR, SpatialEPR, Ditras

generate(start_date, end_date, spatial_tessellation, gravity_singly={}, n_agents=1, starting_locations=None, od_matrix=None, relevance_column='relevance', random_state=None, log_file=None, show_progress=False)

Start the simulation of a set of agents at time start_date till time end_date.

Parameters
  • start_date (datetime) – the starting date of the simulation, in “YYY/mm/dd HH:MM:SS” format.

  • end_date (datetime) – the ending date of the simulation, in “YYY/mm/dd HH:MM:SS” format.

  • spatial_tessellation (geopandas GeoDataFrame) – the spatial tessellation, i.e., a division of the territory in locations.

  • gravity_singly ({} or Gravity, optional) – the gravity model (singly constrained) to use when generating the probability to move between two locations (note, used by DensityEPR). The default is “{}”.

  • n_agents (int, optional) – the number of agents to generate. The default is 1.

  • relevance_column (str, optional) – the name of the column in spatial_tessellation to use as relevance variable. The default is “relevance”.

  • starting_locations (list or None, optional) – a list of integers, each identifying the location from which to start the simulation of each agent. Note that, if starting_locations is not None, its length must be equal to the value of n_agents, i.e., you must specify one starting location per agent. The default is None.

  • od_matrix (numpy array or None, optional) – the origin destination matrix to use for deciding the movements of the agent (element [i,j] is the probability of one trip from location with tessellation index i to j, normalized by origin location). If None, it is computed “on the fly” during the simulation. The default is None.

  • random_state (int or None, optional) – if int, it is the seed used by the random number generator; if None, the random number generator is the RandomState instance used by np.random and random.random. The default is None.

  • log_file (str or None, optional) – the name of the file where to write a log of the execution of the model. The logfile will contain all decisions (returns or explorations) made by the model. The default is None.

  • show_progress (boolean, optional) – if True, show a progress bar. The default is False.

Returns

the synthetic trajectories generated by the model

Return type

TrajDataFrame

class skmob.models.epr.Ditras(diary_generator, name='Ditras model', rho=0.3, gamma=0.21)

Ditras modelling framework.

The DITRAS (DIary-based TRAjectory Simulator) modelling framework to simulate the spatio-temporal patterns of human mobility [PS2018]. DITRAS consists of two phases:

Mobility Diary Generation. In the first phase, DITRAS generates a mobility diary which captures the temporal patterns of human mobility.

Trajectory Generation. In the second phase, DITRAS transforms the mobility diary into a mobility trajectory which captures the spatial patterns of human movements.

https://raw.githubusercontent.com/jonpappalord/DITRAS/master/DITRAS_schema.png

Outline of the DITRAS framework. DITRAS combines two probabilistic models: a diary generator (e.g., \(MD(t)\)) and trajectory generator (e.g., d-EPR). The diary generator produces a mobility diary \(D\). The mobility diary \(D\) is the input of the trajectory generator together with a weighted spatial tessellation of the territory \(L\). From \(D\) and \(L\) the trajectory generator produces a synthetic mobility trajectory \(S\).

Parameters
  • diary_generator (MarkovDiaryGenerator) – the diary generator to use for generating the diary.

  • name (str, optional) – the name of the instantiation of the Ditras model. The default value is “Ditras”.

  • rho (float, optional) – it corresponds to the parameter \(\rho \in (0, 1]\) in the Action selection mechanism of the DensityEPR model \(P_{new} = \rho S^{-\gamma}\) and controls the agent’s tendency to explore a new location during the next move versus returning to a previously visited location. The default value is \(\rho = 0.6\) [SKWB2010].

  • gamma (float, optional) – it corresponds to the parameter \(\gamma\) (\(\gamma \geq 0\)) in the Action selection mechanism of the DensityEPR model \(P_{new} = \rho S^{-\gamma}\) and controls the agent’s tendency to explore a new location during the next move versus returning to a previously visited location. The default value is \(\gamma=0.21\) [SKWB2010].

Variables
  • diary_generator (MarkovDiaryGenerator) – the diary generator to use for generating the diary [PS2018].

  • name (str) – the name of the instantiation of the model.

  • rho (float) – the input parameter \(\rho\).

  • gamma (float) – the input parameters \(\gamma\).

Examples

>>> import skmob
>>> from skmob.models.epr import Ditras
>>> from skmob.models.markov_diary_generator import MarkovDiaryGenerator
>>> from skmob.preprocessing import filtering, compression, detection, clustering
>>>
>>> # load and preprocess data to train the MarkovDiaryGenerator
>>> url = skmob.utils.constants.GEOLIFE_SAMPLE
>>> df = pd.read_csv(url, sep=',', compression='gzip')
>>> tdf = skmob.TrajDataFrame(df, latitude='lat', longitude='lon', user_id='user', datetime='datetime')
>>> ctdf = compression.compress(tdf)
>>> stdf = detection.stops(ctdf)
>>> cstdf = clustering.cluster(stdf)
>>>
>>> # instantiate and train the MarkovDiaryGenerator
>>> mdg = MarkovDiaryGenerator()
>>> mdg.fit(cstdf, 2, lid='cluster')
>>>
>>> # set start time, end time and tessellation for the simulation
>>> start_time = pd.to_datetime('2019/01/01 08:00:00')
>>> end_time = pd.to_datetime('2019/01/14 08:00:00')
>>> tessellation = gpd.GeoDataFrame.from_file("data/NY_counties_2011.geojson")
>>>
>>> # instantiate the model
>>> ditras = Ditras(mdg)
>>>
>>> # run the model
>>> ditras_tdf = ditras.generate(start_time, end_time, tessellation, relevance_column='population',
                n_agents=3, od_matrix=None, show_progress=True)
>>> print(ditras_tdf.head())
   uid            datetime        lat        lng
0    1 2019-01-01 08:00:00  43.382528 -78.230656
1    1 2019-01-02 03:00:00  43.309133 -77.680414
2    1 2019-01-02 23:00:00  43.382528 -78.230656
3    1 2019-01-03 10:00:00  43.382528 -78.230656
4    1 2019-01-03 21:00:00  43.309133 -77.680414
>>> print(ditras_tdf.parameters)
{'model': {'class': <function Ditras.__init__ at 0x7f0cf0b7e158>, 'generate': {'start_date': Timestamp('2019-01-01 08:00:00'), 'end_date': Timestamp('2019-01-14 08:00:00'), 'gravity_singly': {}, 'n_agents': 3, 'relevance_column': 'population', 'random_state': None, 'show_progress': True}}}

References

PS2018

Pappalardo, L. & Simini, F. (2018) Data-driven generation of spatio-temporal routines in human mobility. Data Mining and Knowledge Discovery 32, 787-829, https://link.springer.com/article/10.1007/s10618-017-0548-4

See also

DensityEPR, MarkovDiaryGenerator

generate(start_date, end_date, spatial_tessellation, gravity_singly={}, n_agents=1, starting_locations=None, od_matrix=None, relevance_column='relevance', random_state=None, log_file=None, show_progress=False)

Start the simulation of a set of agents at time start_date till time end_date.

Parameters
  • start_date (datetime) – the starting date of the simulation, in “YYY/mm/dd HH:MM:SS” format.

  • end_date (datetime) – the ending date of the simulation, in “YYY/mm/dd HH:MM:SS” format.

  • spatial_tessellation (geopandas GeoDataFrame) – the spatial tessellation, i.e., a division of the territory in locations.

  • gravity_singly ({} or Gravity, optional) – the (singly constrained) gravity model to use when generating the probability to move between two locations. The default is “{}”.

  • n_agents (int, optional) – the number of agents to generate. The default is 1.

  • starting_locations (list or None, optional) – a list of integers, each identifying the location from which to start the simulation of each agent. Note that, if starting_locations is not None, its length must be equal to the value of n_agents, i.e., you must specify one starting location per agent. The default is None.

  • od_matrix (numpy array or None, optional) – the origin destination matrix to use for deciding the movements of the agent (element [i,j] is the probability of one trip from location with tessellation index i to j, normalized by origin location). If None, it is computed “on the fly” during the simulation. The default is None.

  • relevance_column (str, optional) – the name of the column in spatial_tessellation to use as relevance variable. The default is “relevance”.

  • random_state (int or None, optional) – if int, it is the seed used by the random number generator; if None, the random number generator is the RandomState instance used by np.random and random.random. The default is None.

  • log_file (str or None, optional) – the name of the file where to write a log of the execution of the model. The logfile will contain all decisions (returns or explorations) made by the model. The default is None.

  • show_progress (boolean, optional) – if True, show a progress bar. The default is False.

Returns

the synthetic trajectories generated by the model

Return type

TrajDataFrame

class skmob.models.epr.SpatialEPR(name='Spatial EPR model', rho=0.6, gamma=0.21, beta=0.8, tau=17, min_wait_time_minutes=20)

Spatial-EPR model.

The s-EPR model of individual human mobility consists of the following mechanisms [PSRPGB2015] [PSR2016] [SKWB2010]:

Waiting time choice. The waiting time \(\Delta t\) between two movements of the agent is chosen randomly from the distribution \(P(\Delta t) \sim \Delta t^{−1 −\beta} \exp(−\Delta t/ \tau)\). Parameters \(\beta\) and \(\tau\) correspond to arguments beta and tau of the constructor, respectively.

Action selection. With probability \(P_{new}=\rho S^{-\gamma}\), where \(S\) is the number of distinct locations previously visited by the agent, the agent visits a new location (Exploration phase), otherwise it returns to a previously visited location (Return phase). Parameters \(\rho\) and \(\gamma\) correspond to arguments rho and gamma of the constructor, respectively.

Exploration phase. If the agent that is currently in location \(i\) explores a new location, then the new location \(j \neq i\) is selected according to the distance from the current location \(p_{ij} = \frac{1}{r_{ij}^2}\), where \(r_{ij}\) is the geographic distance between \(i\) and \(j\). The number of distinct locations visited, \(S\), is increased by 1.

Return phase. If the individual returns to a previously visited location, such a location \(i\) is chosen with probability proportional to the number of time the agent visited \(i\), i.e., \(\Pi_i = f_i\), where \(f_i\) is the visitation frequency of location \(i\).

https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fnphys1760/MediaObjects/41567_2010_Article_BFnphys1760_Fig2_HTML.jpg?as=webp

Starting at time \(t\) from the configuration shown in the left panel, indicating that the user visited previously \(S=4\) locations with frequency \(f_i\) that is proportional to the size of circles drawn at each location, at time \(t + \Delta t\) (with \(Delta t\) drawn from the \(P(\Delta t)\) fat-tailed distribution) the user can either visit a new location at distance \(\Delta r\) from his/her present location, or return to a previously visited location with probability \(P_{ret}=\rho S^{-\gamma}\), where the next location will be chosen with probability \(\Pi_i=f_i\) (preferential return; lower panel). Figure from [SKWB2010].

Parameters
  • name (str, optional) – the name of the instantiation of the s-EPR model. The default value is “Spatial EPR model”.

  • rho (float, optional) – it corresponds to the parameter \(\rho \in (0, 1]\) in the Action selection mechanism \(P_{new} = \rho S^{-\gamma}\) and controls the agent’s tendency to explore a new location during the next move versus returning to a previously visited location. The default value is \(\rho = 0.6\) [SKWB2010].

  • gamma (float, optional) – it corresponds to the parameter \(\gamma\) (\(\gamma \geq 0\)) in the Action selection mechanism \(P_{new} = \rho S^{-\gamma}\) and controls the agent’s tendency to explore a new location during the next move versus returning to a previously visited location. The default value is \(\gamma=0.21\) [SKWB2010].

  • beta (float, optional) – it corresponds to the parameter \(\beta\) of the waiting time distribution in the Waiting time choice mechanism. The default value is \(\beta=0.8\) [SKWB2010].

  • tau (int, optional) – it corresponds to the parameter \(\tau\) of the waiting time distribution in the Waiting time choice mechanism. The default value is \(\tau = 17\), expressed in hours [SKWB2010].

  • min_wait_time_minutes (int) – the input parameters min_wait_time_minutes

Variables
  • name (str) – the name of the instantiation of the model.

  • rho (float) – the input parameter \(\rho\).

  • gamma (float) – the input parameters \(\gamma\).

  • beta (float) – the input parameter \(\beta\).

  • tau (int) – the input parameter \(\tau\).

  • min_wait_time_minutes (int) – the input parameters min_wait_time_minutes.

Examples

>>> import skmob
>>> import pandas as pd
>>> import geopandas as gpd
>>> from skmob.models.epr import SpatialEPR
>>> url = >>> url = skmob.utils.constants.NY_COUNTIES_2011
>>> tessellation = gpd.read_file(url)
>>> start_time = pd.to_datetime('2019/01/01 08:00:00')
>>> end_time = pd.to_datetime('2019/01/14 08:00:00')
>>> sepr = SpatialEPR()
>>> tdf = sepr.generate(start_time, end_time, tessellation, n_agents=100, show_progress=True)
>>> print(tdf.head())
   uid                   datetime        lat        lng
0    1 2019-01-01 08:00:00.000000  42.267915 -77.383591
1    1 2019-01-01 13:06:13.973868  42.633510 -77.105324
2    1 2019-01-01 14:17:41.188668  42.452018 -76.473618
3    1 2019-01-01 14:49:30.896248  42.633510 -77.105324
4    1 2019-01-01 15:10:54.133150  43.382528 -78.230656
>>> print(tdf.parameters)
{'model': {'class': <function SpatialEPR.__init__ at 0x7f548a49e048>, 'generate': {'start_date': Timestamp('2019-01-01 08:00:00'), 'end_date': Timestamp('2019-01-14 08:00:00'), 'gravity_singly': {}, 'n_agents': 100, 'relevance_column': None, 'random_state': None, 'show_progress': True}}}

See also

EPR, DensityEPR, Ditras

generate(start_date, end_date, spatial_tessellation, gravity_singly={}, n_agents=1, starting_locations=None, od_matrix=None, random_state=None, log_file=None, show_progress=False)

Start the simulation of a set of agents at time start_date till time end_date.

Parameters
  • start_date (datetime) – the starting date of the simulation, in “YYY/mm/dd HH:MM:SS” format.

  • end_date (datetime) – the ending date of the simulation, in “YYY/mm/dd HH:MM:SS” format.

  • spatial_tessellation (geopandas GeoDataFrame) – the spatial tessellation, i.e., a division of the territory in locations.

  • gravity_singly ({} or Gravity, optional) – the gravity model (singly constrained) to use when generating the probability to move between two locations (note, used by DensityEPR). The default is “{}”.

  • n_agents (int, optional) – the number of agents to generate. The default is 1.

  • starting_locations (list or None, optional) – a list of integers, each identifying the location from which to start the simulation of each agent. Note that, if starting_locations is not None, its length must be equal to the value of n_agents, i.e., you must specify one starting location per agent. The default is None.

  • od_matrix (numpy array or None, optional) – the origin destination matrix to use for deciding the movements of the agent (element [i,j] is the probability of one trip from location with tessellation index i to j, normalized by origin location). If None, it is computed “on the fly” during the simulation. The default is None.

  • random_state (int or None, optional) – if int, it is the seed used by the random number generator; if None, the random number generator is the RandomState instance used by np.random and random.random. The default is None.

  • log_file (str or None, optional) – the name of the file where to write a log of the execution of the model. The logfile will contain all decisions (returns or explorations) made by the model. The default is None.

  • show_progress (boolean, optional) – if True, show a progress bar. The default is False.

Returns

the synthetic trajectories generated by the model

Return type

TrajDataFrame

Markov Diary Generator

class skmob.models.markov_diary_generator.MarkovDiaryGenerator(name='Markov diary')

Markov Diary Learner and Generator.

A diary generator \(G\) produces a mobility diary, \(D(t)\), containing the sequence of trips made by an agent during a time period divided in time slots of \(t\) seconds. For example, \(G(3600)\) and \(G(60)\) produce mobility diaries with temporal resolutions of one hour and one minute, respectively [PS2018].

A Mobility Diary Learner (MDL) is a data-driven algorithm to compute a mobility diary \(MD\) from the mobility trajectories of a set of real individuals. We use a Markov model to describe the probability that an individual follows her routine and visits a typical location at the usual time, or she breaks the routine and visits another location. First, MDL translates mobility trajectory data of real individuals into abstract mobility trajectories. Second, it uses the obtained abstract trajectory data to compute the transition probabilities of the Markov model \(MD(t)\) [PS2018].

Parameters

name (str, optional) – name of the instantiation of the class. The default is “Markov diary”.

Variables
  • name (str) – name of the instantiation of the class.

  • markov_chain_ (dict) – the trained markov chain.

  • time_slot_length (str) – length of the time slot (1h).

Examples

>>> import skmob
>>> import pandas as pd
>>> import geopandas as gpd
>>> from skmob.models.epr import Ditras
>>> from skmob.models.markov_diary_generator import MarkovDiaryGenerator
>>> from skmob.preprocessing import filtering, compression, detection, clustering
>>> url = skmob.utils.constants.GEOLIFE_SAMPLE
>>>
>>> df = pd.read_csv(url, sep=',', compression='gzip')
>>> tdf = skmob.TrajDataFrame(df, latitude='lat', longitude='lon', user_id='user', datetime='datetime')
>>>
>>> ctdf = compression.compress(tdf)
>>> stdf = detection.stops(ctdf)
>>> cstdf = clustering.cluster(stdf)
>>>
>>> mdg = MarkovDiaryGenerator()
>>> mdg.fit(cstdf, 2, lid='cluster')
>>>
>>> start_time = pd.to_datetime('2019/01/01 08:00:00')
>>> diary = mdg.generate(100, start_time)
>>> print(diary)
             datetime  abstract_location
0 2019-01-01 08:00:00                  0
1 2019-01-02 19:00:00                  1
2 2019-01-02 20:00:00                  0
3 2019-01-03 17:00:00                  1
4 2019-01-03 18:00:00                  2
5 2019-01-04 08:00:00                  0
6 2019-01-05 03:00:00                  1

References

PS2018

Pappalardo, L. & Simini, F. (2018) Data-driven generation of spatio-temporal routines in human mobility. Data Mining and Knowledge Discovery 32, 787-829, https://link.springer.com/article/10.1007/s10618-017-0548-4

See also

Ditras

fit(traj, n_individuals, lid='location')

Train the markov mobility diary from real trajectories.

Parameters
  • traj (TrajDataFrame) – the trajectories of the individuals.

  • n_individuals (int) – the number of individuals in the TrajDataFrame to consider.

  • lid (string, optional) – the name of the column containing the identifier of the location. The default is “location”.

generate(diary_length, start_date, random_state=None)

Start the generation of the mobility diary.

Parameters
  • diary_length (int) – the length of the diary in hours.

  • start_date (datetime) – the starting date of the generation.

Returns

the generated mobility diary.

Return type

pandas DataFrame

Gravity

class skmob.models.gravity.Gravity(deterrence_func_type='power_law', deterrence_func_args=[- 2.0], origin_exp=1.0, destination_exp=1.0, gravity_type='singly constrained', name='Gravity model')

Gravity model.

The Gravity model of human migration. In its original formulation, the probability \(T_{ij}\) of moving from a location \(i\) to location \(j\) is defined as [Z1946]:

\[T_{ij} \propto \frac{P_i P_j}{r_{ij}}\]

where \(P_i\) and \(P_j\) are the population of location \(i\) and \(j\) and \(r_{ij}\) is the distance between \(i\) and \(j\). The basic assumptions of this model are that the number of trips leaving location \(i\) is proportional to its population \(P_i\), the attractivity of location \(j\) is also proportional to \(P_j\), and finally, that there is a cost effect in terms of distance traveled. These ideas can be generalized assuming a relation of the type [BBGJLLMRST2018]:

\[T_{ij} = K m_i m_j f(r_{ij})\]

where \(K\) is a constant, the masses \(m_i\) and \(m_j\) relate to the number of trips leaving location \(i\) or the ones attracted by location \(j\), and \(f(r_{ij})\), called deterrence function, is a decreasing function of distance. The deterrence function \(f(r_{ij})\) is commonly modeled with a powerlaw or an exponential form.

Constrained gravity models. Some of the limitations of the gravity model can be resolved via constrained versions. For example, one may hold the number of people originating from a location \(i\) to be a known quantity \(O_i\), and the gravity model is then used to estimate the destination, constituting a so-called singly constrained gravity model of the form:

\[T_{ij} = K_i O_i m_j f(r_{ij}) = O_i \frac{m_i f(r_{ij})}{\sum_k m_k f(r_{ik})}.\]

In this formulation, the proportionality constants \(K_i\) depend on the location of the origin and its distance to the other places considered. We can fix also the total number of travelers arriving at a destination \(j\) as \(D_j = \sum_i T_{ij}\), leading to a doubly-constrained gravity model. For each origin-destination pair, the flow is calculated as:

\[T_{ij} = K_i O_i L_j D_j f(r_{ij})\]

where there are now two flavors of proportionality constants

\[K_i = \frac{1}{\sum_j L_j D_j f(r_{ij})}, L_j = \frac{1}{\sum_i K_i O_i f(r_{ij})}.\]
Parameters
  • deterrence_func_type (str, optional) – the deterrence function to use. The possible deterrence function are “power_law” and “exponential”. The default is “power_law”.

  • deterrence_func_args (list, optional) – the arguments of the deterrence function. The default is [-2.0].

  • origin_exp (float, optional) – the exponent of the origin’s relevance (only relevant to globally-constrained model). The default is 1.0.

  • destination_exp (float, optional) – the explonent of the destination’s relevance. The default is 1.0.

  • gravity_type (str, optional) – the type of gravity model. Possible values are “singly constrained” and “globally constrained”. The default is “singly constrained”.

  • name (str, optional) – the name of the instantiation of the Gravity model. The default is “Gravity model”.

Variables
  • deterrence_func_type (str) – the deterrence function to use. The possible deterrence function are “power_law” and “exponential”.

  • deterrence_func_args (list) – the arguments of the deterrence function.

  • origin_exp (float) – the exponent of the origin’s relevance (only relevant to globally-constrained model).

  • destination_exp (float) – the explonent of the destination’s relevance.

  • gravity_type (str) – the type of gravity model. Possible values are “singly constrained” and “globally constrained”.

  • name (str) – the name of the instantiation of the Gravity model.

Examples

>>> import skmob
>>> from skmob.utils import utils, constants
>>> import pandas as pd
>>> import geopandas as gpd
>>> import numpy as np
>>> from skmob.models import Gravity
>>> # load a spatial tessellation
>>> url_tess = skmob.utils.constants.NY_COUNTIES_2011
>>> tessellation = gpd.read_file(url_tess).rename(columns={'tile_id': 'tile_ID'})
>>> print(tessellation.head())
  tile_ID  population                                           geometry
0   36019       81716  POLYGON ((-74.006668 44.886017, -74.027389 44....
1   36101       99145  POLYGON ((-77.099754 42.274215, -77.0996569999...
2   36107       50872  POLYGON ((-76.25014899999999 42.296676, -76.24...
3   36059     1346176  POLYGON ((-73.707662 40.727831, -73.700272 40....
4   36011       79693  POLYGON ((-76.279067 42.785866, -76.2753479999...
>>> # load real flows into a FlowDataFrame
>>> fdf = skmob.FlowDataFrame.from_file(skmob.utils.constants.NY_FLOWS_2011,
                                        tessellation=tessellation,
                                        tile_id='tile_ID',
                                        sep=",")
>>> print(fdf.head())
     flow origin destination
0  121606  36001       36001
1       5  36001       36005
2      29  36001       36007
3      11  36001       36017
4      30  36001       36019
>>> # compute the total outflows from each location of the tessellation (excluding self loops)
>>> tot_outflows = fdf[fdf['origin'] != fdf['destination']].groupby(by='origin', axis=0)[['flow']].sum().fillna(0)
>>> tessellation = tessellation.merge(tot_outflows, left_on='tile_ID', right_on='origin').rename(columns={'flow': constants.TOT_OUTFLOW})
>>> print(tessellation.head())
  tile_id  population                                           geometry      0   36019       81716  POLYGON ((-74.006668 44.886017, -74.027389 44....
1   36101       99145  POLYGON ((-77.099754 42.274215, -77.0996569999...
2   36107       50872  POLYGON ((-76.25014899999999 42.296676, -76.24...
3   36059     1346176  POLYGON ((-73.707662 40.727831, -73.700272 40....
4   36011       79693  POLYGON ((-76.279067 42.785866, -76.2753479999...
   tot_outflow
0        29981
1         5319
2       295916
3         8665
4         8871
>>> # instantiate a singly constrained Gravity model
>>> gravity_singly = Gravity(gravity_type='singly constrained')
>>> print(gravity_singly)
Gravity(name="Gravity model", deterrence_func_type="power_law", deterrence_func_args=[-2.0], origin_exp=1.0, destination_exp=1.0, gravity_type="singly constrained")
>>> np.random.seed(0)
>>> synth_fdf = gravity_singly.generate(tessellation,
                                   tile_id_column='tile_ID',
                                   tot_outflows_column='tot_outflow',
                                   relevance_column= 'population',
                                   out_format='flows')
>>> print(synth_fdf.head())
  origin destination  flow
0  36019       36101   101
1  36019       36107    66
2  36019       36059  1041
3  36019       36011   151
4  36019       36123    33
>>> # fit the parameters of the Gravity model from real fluxes
>>> gravity_singly_fitted = Gravity(gravity_type='singly constrained')
>>> print(gravity_singly_fitted)
Gravity(name="Gravity model", deterrence_func_type="power_law", deterrence_func_args=[-2.0], origin_exp=1.0, destination_exp=1.0, gravity_type="singly constrained")
>>> gravity_singly_fitted.fit(fdf, relevance_column='population')
>>> print(gravity_singly_fitted)
Gravity(name="Gravity model", deterrence_func_type="power_law", deterrence_func_args=[-1.9947152031914186], origin_exp=1.0, destination_exp=0.6471759552223144, gravity_type="singly constrained")
>>> np.random.seed(0)
>>> synth_fdf_fitted = gravity_singly_fitted.generate(tessellation,
                                                        tile_id_column='tile_ID',
                                                        tot_outflows_column='tot_outflow',
                                                        relevance_column= 'population',
                                                        out_format='flows')
>>> print(synth_fdf_fitted.head())
  origin destination  flow
0  36019       36101   102
1  36019       36107    66
2  36019       36059  1044
3  36019       36011   152
4  36019       36123    33

References

Z1946

Zipf, G. K. (1946) The P 1 P 2/D hypothesis: on the intercity movement of persons. American sociological review 11(6), 677-686, https://www.jstor.org/stable/2087063?seq=1#metadata_info_tab_contents

W1971

Wilson, A. G. (1971) A family of spatial interaction models, and associated developments. Environment and Planning A 3(1), 1-32, https://econpapers.repec.org/article/pioenvira/v_3a3_3ay_3a1971_3ai_3a1_3ap_3a1-32.htm

BBGJLLMRST2018

Barbosa, H., Barthelemy, M., Ghoshal, G., James, C. R., Lenormand, M., Louail, T., Menezes, R., Ramasco, J. J. , Simini, F. & Tomasini, M. (2018) Human mobility: Models and applications. Physics Reports 734, 1-74, https://www.sciencedirect.com/science/article/abs/pii/S037015731830022X

See also

Radiation

fit(flow_df, relevance_column='relevance')

Fit the parameters of the Gravity model to the flows provided in input, using a Generalized Linear Model (GLM) with a Poisson regression [FM1982].

Parameters
  • flow_df (FlowDataFrame) – the real flows on the spatial tessellation.

  • relevance_column (str, optional) – the column in the spatial tessellation with the relevance of the location. The default is constants.RELEVANCE.

References

FM1982

Flowerdew, R. & Murray, A. (1982) A method of fitting the gravity model based on the Poisson distribution. Journal of regional science 22(2), 191-202, https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9787.1982.tb00744.x

generate(spatial_tessellation, tile_id_column='tile_ID', tot_outflows_column='tot_outflow', relevance_column='relevance', out_format='flows')

Start the simulation of the Gravity model.

Parameters
  • spatial_tessellation (GeoDataFrame) – the spatial tessellation on which to run the simulation.

  • tile_id_column (str, optional) – the column in spatial_tessellation of the location identifier. The default is constants.TILE_ID.

  • tot_outflows_column (str, optional) – the column in spatial_tessellation with the outflow of the location. The default is constants.TOT_OUTFLOW.

  • relevance_column (str, optional) – the column in spatial_tessellation with the relevance of the location. The default is constants.RELEVANCE.

  • out_format (str, optional) – the format of the generated flows. Possible values are “flows” (average flow between two locations), “flows_sample” (random sample of flows), and “probabilities” (probability of a unit flow between two locations). The default is “flows”.

Returns

the flows generated by the Gravity model.

Return type

FlowDataFrame

Radiation

class skmob.models.radiation.Radiation(name='Radiation model')

Radiation model.

The radiation model for human migration. The radiation model assumes that the choice of a traveler’s destination consists of two steps. First, each opportunity in every location is assigned a fitness represented by a number \(z\), chosen from some distribution \(P(z)\) whose value represents the quality of the opportunity for the traveler. Second, the traveler ranks all opportunities according to their distances from the origin location and chooses the closest opportunity with a fitness higher than the traveler’s fitness threshold, which is another random number extracted from the fitness distribution \(P(z)\). As a result, the average number of travelers from location \(i\) to location \(j\) takes the form [SGMB2012]:

\[T_{ij} = O_i \frac{1}{1 - \frac{m_i}{M}}\frac{m_i m_j}{(m_i + s_{ij})(m_i + m_j + s_{ij})}.\]

The destination of the \(O_i\) trips originating in \(i\) is sampled from a distribution of probabilities that a trip originating in \(i\) ends in location \(j\). This probability depends on the number of opportunities at the origin \(m_i\), at the destination \(m_j\) and the number of opportunities \(s_{ij}\) in a circle of radius \(r_{ij}\) centered in \(i\) (excluding the source and destination). This conditional probability needs to be normalized so that the probability that a trip originating in the region of interest ends in this region is equal to 1. In case of a finite system it is possible to show that this is equal to \(1 - \frac{m_i}{M}\) where \(M=\sum_i m_i\) is the total number of opportunities. In the original version of the radiation model, the number of opportunities is approximated by the population, but the total inflows \(D_j\) to each destination can also be used.

https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fnature10856/MediaObjects/41586_2012_Article_BFnature10856_Fig1_HTML.jpg?as=webp

(a) To demonstrate the limitations of the gravity law we highlight two pairs of counties, one in Utah (UT) and the other in Alabama (AL), with similar origin (\(m\), blue) and destination (\(n\), green) populations and comparable distance \(r\) between them (see bottom left table). The US census 2000 reports a flux that is an order of magnitude greater between the Utah counties, a difference correctly captured by the radiation model (b, c). (b) The definition of the radiation model: an individual (for example, living in Saratoga County, New York) applies for jobs in all counties and collects potential employment offers. The number of job opportunities in each county (\(j\)) is \(n_j / n_{jobs}\), chosen to be proportional to the resident population \(n_j\). Each offer’s attractiveness (benefit) is represented by a random variable with distribution \(P(z)\), the numbers placed in each county representing the best offer among the \(n_j / n_{jobs}\) trials in that area. Each county is marked in green (red) if its best offer is better (lower) than the best offer in the home county (here \(z = 10\)). (c) An individual accepts the closest job that offers better benefits than his home county. In the shown configuration the individual will commute to Oneida County, New York, the closest county whose benefit \(z = 13\) exceeds the home county benefit \(z = 10\). This process is repeated for each potential commuter, choosing new benefit variables \(z\) in each case. Figure from [SGMB2012].

Parameters

name (str, optional) – the name of the instantiation of the radiation model. The default is ‘Radiation model’.

Variables

name (str) – the name of the instantiation of the model.

Examples

>>> import skmob
>>> from skmob.utils import utils, constants
>>> import pandas as pd
>>> import geopandas as gpd
>>> import numpy as np
>>> from skmob.models import Radiation
>>> # load a spatial tessellation
>>> url_tess = skmob.utils.constants.NY_COUNTIES_2011
>>> tessellation = gpd.read_file(url_tess).rename(columns={'tile_id': 'tile_ID'})
>>> print(tessellation.head())
  tile_ID  population                                           geometry
0   36019       81716  POLYGON ((-74.006668 44.886017, -74.027389 44....
1   36101       99145  POLYGON ((-77.099754 42.274215, -77.0996569999...
2   36107       50872  POLYGON ((-76.25014899999999 42.296676, -76.24...
3   36059     1346176  POLYGON ((-73.707662 40.727831, -73.700272 40....
4   36011       79693  POLYGON ((-76.279067 42.785866, -76.2753479999...
>>> # load real flows into a FlowDataFrame
>>> fdf = skmob.FlowDataFrame.from_file(skmob.utils.constants.NY_FLOWS_2011,
                                        tessellation=tessellation,
                                        tile_id='tile_ID',
                                        sep=",")
>>> print(fdf.head())
     flow origin destination
0  121606  36001       36001
1       5  36001       36005
2      29  36001       36007
3      11  36001       36017
4      30  36001       36019
>>> # compute the total outflows from each location of the tessellation (excluding self loops)
>>> tot_outflows = fdf[fdf['origin'] != fdf['destination']].groupby(by='origin', axis=0)[['flow']].sum().fillna(0)
>>> tessellation = tessellation.merge(tot_outflows, left_on='tile_ID', right_on='origin').rename(columns={'flow': constants.TOT_OUTFLOW})
>>> print(tessellation.head())
  tile_id  population                                           geometry      0   36019       81716  POLYGON ((-74.006668 44.886017, -74.027389 44....
1   36101       99145  POLYGON ((-77.099754 42.274215, -77.0996569999...
2   36107       50872  POLYGON ((-76.25014899999999 42.296676, -76.24...
3   36059     1346176  POLYGON ((-73.707662 40.727831, -73.700272 40....
4   36011       79693  POLYGON ((-76.279067 42.785866, -76.2753479999...
   tot_outflow
0        29981
1         5319
2       295916
3         8665
4         8871
>>> np.random.seed(0)
>>> radiation = Radiation()
>>> rad_flows = radiation.generate(tessellation, tile_id_column='tile_ID',  tot_outflows_column='tot_outflow', relevance_column='population', out_format='flows_sample')
>>> print(rad_flows.head())
  origin destination   flow
0  36019       36033  11648
1  36019       36031   4232
2  36019       36089   5598
3  36019       36113   1596
4  36019       36041    117

References

SGMB2012(1,2)

Simini, F., Gonzàlez, M. C., Maritan, A. & Barabasi, A.-L. (2012) A universal model for mobility and migration patterns. Nature 484(7392), 96-100, https://www.nature.com/articles/nature10856

generate(spatial_tessellation, tile_id_column='tile_ID', tot_outflows_column='tot_outflow', relevance_column='relevance', out_format='flows')

Start the simulation of the Radiation model.

Parameters
  • spatial_tessellation (GeoDataFrame) – the spatial tessellation on which to perform the simulation.

  • tile_id_column (str, optional) – the column in spatial_tessellation of the location identifier. The default is constants.TILE_ID.

  • tot_outflows_column (str, optional) – the column in spatial_tessellation with the outflow of the location. The default is constants.TOT_OUTFLOW.

  • relevance_column (str, optional) – the column in spatial_tessellation with the relevance of the location. The default is constants.RELEVANCE.

  • out_format (str, optional) – the format of the generated flows. Possible values are: “flows” (average flow between two locations), “flows_sample” (random sample of flows), and “probabilities” (probability of a unit flow between two locations). The default is “flows”.

Returns

the fluxes generated by the Radiation model.

Return type

FlowDataFrame