Collective measures

random_location_entropy(traj[, show_progress])

Random location entropy.

uncorrelated_location_entropy(traj[, ...])

Temporal-uncorrelated entropy.

mean_square_displacement(traj[, days, ...])

Mean Square Displacement.

visits_per_location(traj)

Visits per location.

homes_per_location(traj[, start_night, ...])

Homes per location.

visits_per_time_unit(traj[, time_unit])

Visits per time unit.

skmob.measures.collective.homes_per_location(traj, start_night='22:00', end_night='07:00')

Homes per location.

Compute the number of home locations in each location. The number of home locations in a location \(j\) is computed as [PRS2016]:

\[N_{homes}(j) = |\{h_u | h_u = j, u \in U \}|\]

where \(h_u\) indicates the home location of an individual \(u\) and \(U\) is the set of individuals.

Parameters
  • traj (TrajDataFrame) – the trajectories of the individuals.

  • start_night (str, optional) – the starting time of the night (format HH:MM). The default is ‘22:00’.

  • end_night (str, optional) – the ending time for the night (format HH:MM). The default is ‘07:00’.

Returns

the number of homes per location.

Return type

pandas DataFrame

Examples

>>> import skmob
>>> from skmob.measures.collective import homes_per_location
>>> url = "https://snap.stanford.edu/data/loc-brightkite_totalCheckins.txt.gz"
>>> df = pd.read_csv(url, sep='\t', header=0, nrows=100000, names=['user', 'check-in_time', 'latitude', 'longitude', 'location id'])
>>> tdf = skmob.TrajDataFrame(df, latitude='latitude', longitude='longitude', datetime='check-in_time', user_id='user').sort_values(by='datetime')
>>> hl_df = homes_per_location(tdf).sort_values(by='n_homes', ascending=False)
>>> print(hl_df.head())
         lat         lng  n_homes
0  39.739154 -104.984703       15
1  37.584103 -122.366083        6
2  40.014986 -105.270546        5
3  37.580304 -122.343679        5
4  37.774929 -122.419415        4

References

PRS2016(1,2)

Pappalardo, L., Rinzivillo, S. & Simini, F. (2016) Human Mobility Modelling: exploration and preferential return meet the gravity model. Procedia Computer Science 83, 934-939, http://dx.doi.org/10.1016/j.procs.2016.04.188

skmob.measures.collective.mean_square_displacement(traj, days=0, hours=1, minutes=0, show_progress=True)

Mean Square Displacement.

Compute the mean square displacement across the individuals in a TrajDataFrame. The mean squared displacement is a measure of the deviation of the position of an object with respect to a reference position over time [BHG2006] [SKWB2010]. It is defined as:

\[MSD = \langle |r(t) - r(0)| \rangle = \frac{1}{N} \sum_{i = 1}^N |r^{(i)}(t) - r^{(i)}(0)|^2\]

where \(N\) is the number of individuals to be averaged, vector \(x^{(i)}(0)\) is the reference position of the \(i\)-th individual, and vector \(x^{(i)}(t)\) is the position of the \(i\)-th individual at time \(t\) [FS2002].

Parameters
  • traj (TrajDataFrame) – the trajectories of the individuals.

  • days (int, optional) – the days since the starting time. The default is 0.

  • hours (int, optional) – the hours since the days since the starting time. The default is 1.

  • minutes (int, optional) – the minutes since the hours since the days since the starting time. The default is 0.

  • show_progress (boolean, optional) – if True, show a progress bar. The default is True.

Returns

the mean square displacement.

Return type

float

Warning

The input TrajDataFrame must be sorted in ascending order by datetime.

Examples

>>> import skmob
>>> url = "https://snap.stanford.edu/data/loc-brightkite_totalCheckins.txt.gz"
>>> df = pd.read_csv(url, sep='\t', header=0, nrows=100000, names=['user', 'check-in_time', 'latitude', 'longitude', 'location id'])
>>> tdf = skmob.TrajDataFrame(df, latitude='latitude', longitude='longitude', datetime='check-in_time', user_id='user').sort_values(by='datetime')
>>> msd = mean_square_displacement(tdf, days=0, hours=1, minutes=0)
>>> print(msd)
534672.3361996822

References

FS2002

Frenkel, D. & Smit, B. (2002) Understanding molecular simulation: From algorithms to applications. Academic Press, 196 (2nd Ed.), https://www.sciencedirect.com/book/9780122673511/understanding-molecular-simulation.

BHG2006

Brockmann, D., Hufnagel, L. & Geisel, T. (2006) The scaling laws of human travel. Nature 439, 462-465, https://www.nature.com/articles/nature04292

SKWB2010

Song, C., Koren, T., Wang, P. & Barabasi, A.L. (2010) Modelling the scaling properties of human mobility. Nature Physics 6, 818-823, https://www.nature.com/articles/nphys1760

skmob.measures.collective.random_location_entropy(traj, show_progress=True)

Random location entropy.

Compute the random location entropy of the locations in a TrajDataFrame. The random location entropy of a location \(j\) captures the degree of predictability of \(j\) if each individual visits it with equal probability, and it is defined as:

\[LE_{rand}(j) = log_2(N_j)\]

where \(N_j\) is the number of distinct individuals that visited location \(j\).

Parameters
  • traj (TrajDataFrame) – the trajectories of the individuals.

  • show_progress (boolean, optional) – if True, show a progress bar. The default is True.

Returns

the random location entropy of the locations.

Return type

pandas DataFrame

Example

>>> import skmob
>>> from skmob.measures.collective import random_location_entropy
>>> url = "https://snap.stanford.edu/data/loc-brightkite_totalCheckins.txt.gz"
>>> df = pd.read_csv(url, sep='\t', header=0, nrows=100000,
             names=['user', 'check-in_time', 'latitude', 'longitude', 'location id'])
>>> tdf = skmob.TrajDataFrame(df, latitude='latitude', longitude='longitude', datetime='check-in_time', user_id='user')
>>> rle_df = random_location_entropy(tdf, show_progress=True).sort_values(by='random_location_entropy', ascending=False)
>>> print(rle_df.head())
             lat         lng  random_location_entropy
10286  39.739154 -104.984703                 6.129283
49      0.000000    0.000000                 5.643856
5991   37.774929 -122.419415                 5.523562
12504  39.878664 -104.682105                 5.491853
5377   37.615223 -122.389979                 5.247928
skmob.measures.collective.uncorrelated_location_entropy(traj, normalize=False, show_progress=True)

Temporal-uncorrelated entropy.

Compute the temporal-uncorrelated location entropy of the locations in a TrajDataFrame. The temporal-uncorrelated location entropy \(LE_{unc}(j)\) of a location \(j\) is the historical probability that \(j\) is visited by an individual \(u\). Formally, it is defined as [CML2011]:

\[LE_{unc}(j) = -\sum_{i=j}^{N_j} p_jlog_2(p_j)\]

where \(N_j\) is the number of distinct individuals that visited \(j\) and \(p_j\) is the historical probability that a visit to location \(j\) is by individual \(u\).

Parameters
  • traj (TrajDataFrame) – the trajectories of the individuals.

  • normalize (boolean, optional) – if True, normalize the location entropy by dividing by \(log2(N_j)\), where \(N_j\) is the number of distinct individuals that visited location \(j\). The default is False.

  • show_progress (boolean) – if True, show a progress bar. The default is True.

Returns

the temporal-uncorrelated location entropies of the locations.

Return type

pandas DataFrame

Examples

>>> import skmob
>>> from skmob.measures.collective import uncorrelated_location_entropy
>>> url = "https://snap.stanford.edu/data/loc-brightkite_totalCheckins.txt.gz"
>>> df = pd.read_csv(url, sep='\t', header=0, nrows=100000,
             names=['user', 'check-in_time', 'latitude', 'longitude', 'location id'])
>>> tdf = skmob.TrajDataFrame(df, latitude='latitude', longitude='longitude', datetime='check-in_time', user_id='user')
>>> ule_df = uncorrelated_location_entropy(tdf, show_progress=True).sort_values(by='uncorrelated_location_entropy', ascending=False)
>>> print(ule_df.head())
             lat         lng  uncorrelated_location_entropy
12504  39.878664 -104.682105                       3.415713
5377   37.615223 -122.389979                       3.176950
10286  39.739154 -104.984703                       3.118656
12435  39.861656 -104.673177                       2.918413
12361  39.848233 -104.675031                       2.899175
dtype: float64

References

CML2011

Cho, E., Myers, S. A. & Leskovec, J. (2011) Friendship and mobility: user movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 1082-1090, https://dl.acm.org/citation.cfm?id=2020579

skmob.measures.collective.visits_per_location(traj)

Visits per location.

Compute the number of visits to each location in a TrajDataFrame [PF2018].

Parameters

traj (TrajDataFrame) – the trajectories of the individuals.

Returns

the number of visits per location.

Return type

pandas DataFrame

Examples

>>> import skmob
>>> url = "https://snap.stanford.edu/data/loc-brightkite_totalCheckins.txt.gz"
>>> df = pd.read_csv(url, sep='\t', header=0, nrows=100000, names=['user', 'check-in_time', 'latitude', 'longitude', 'location id'])
>>> tdf = skmob.TrajDataFrame(df, latitude='latitude', longitude='longitude', datetime='check-in_time', user_id='user').sort_values(by='datetime')
>>> vl_df = visits_per_location(df)
>>> print(vl_df.head())
         lat         lng  n_visits
0  39.739154 -104.984703      3392
1  37.580304 -122.343679      2248
2  39.099275  -76.848306      1715
3  39.762146 -104.982480      1442
4  40.014986 -105.270546      1310

References

PF2018

Pappalardo, L. & Simini, F. (2018) Data-driven generation of spatio-temporal routines in human mobility. Data Mining and Knowledge Discovery 32, 787-829, https://link.springer.com/article/10.1007/s10618-017-0548-4

skmob.measures.collective.visits_per_time_unit(traj, time_unit='1h')

Visits per time unit.

Compute the number of data points per time unit in the TrajDataFrame [PRS2016].

Parameters
Returns

the number of visits per time unit.

Return type

pandas Series

Examples

>>> import skmob
>>> from skmob.measures.collective import visits_per_time_unit
>>> url = "https://snap.stanford.edu/data/loc-brightkite_totalCheckins.txt.gz"
>>> df = pd.read_csv(url, sep='\t', header=0, nrows=100000, names=['user', 'check-in_time', 'latitude', 'longitude', 'location id'])
>>> tdf = skmob.TrajDataFrame(df, latitude='latitude', longitude='longitude', datetime='check-in_time', user_id='user').sort_values(by='datetime')
>>> vtu_df = visits_per_time_unit(df)
>>> print(vtu_df.head())
                           n_visits
datetime
2008-03-22 05:00:00+00:00         2
2008-03-22 06:00:00+00:00         2
2008-03-22 07:00:00+00:00         0
2008-03-22 08:00:00+00:00         0
2008-03-22 09:00:00+00:00         0