Data

The data module of scikit-mobility provides users with an easy way to: 1) Download ready-to-use mobility data (e.g., trajectories, flows, spatial tessellations, and auxiliary data); 2) Load and transform the downloaded dataset into standard skmob structures (TrajDataFrame, GeoDataFrame, FlowDataFrame, DataFrame); 3) Allow developers and contributors to add new datasets to the library.

skmob.data.load.list_datasets([details, ...])

List datasets

skmob.data.load.load_dataset(name[, ...])

Load dataset

skmob.data.load.get_dataset_info(name)

Get dataset info

skmob.data.load.get_dataset_info(name)

Get dataset info

It returns the dataset information stored in the JSON file associated with the dataset.

Parameters

name (str) – the name of the dataset of which to return the information

Returns

the information stored in the JSON file associated with the dataset

Return type

dict

Examples

>>> import skmob
>>> from skmob.data.load import get_dataset_info
>>>
>>> get_dataset_info("foursquare_nyc")
{'name': 'Foursquare_NYC',
 'description': 'Dataset containing the Foursquare checkins of individuals moving in New York City',
 'url': 'http://www-public.it-sudparis.eu/~zhang_da/pub/dataset_tsmc2014.zip',
 'hash': 'cbe3fdab373d24b09b5fc53509c8958c77ff72b6c1a68589ce337d4f9a80235b',
 'auth': 'no',
 'data_type': 'trajectory',
 'download_format': 'zip',
 'sep': '   ',
 'encoding': 'ISO-8859-1'}
skmob.data.load.list_datasets(details=False, data_types=None)

List datasets

List all the names of the datasets available in the data module of scikit-mobility.

Parameters
  • details (boolean) – whether to return the full details of the dataset instead of the name only. The default is False.

  • data_types (list of string, optional) – specify which dataset types to show. The default is None.

Return type

an object listing the available datasets

Examples

>>> import skmob
>>> from skmob.data.load import list_datasets
>>>
>>> list_datasets()
['flow_foursquare_nyc',
 'foursquare_nyc',
 'nyc_boundaries',
 'parking_san_francisco',
 'taxi_san_francisco']
skmob.data.load.load_dataset(name, drop_columns=False, auth=None, show_progress=False)

Load dataset

Load one of the datasets that are present in the repository of scikit-mobility.

Parameters
  • name (str) – the name of the dataset to load (e.g., foursquare_nyc)

  • drop_columns (boolean, optional) – whether to keep additional columns when returning TrajDataFrame or FlowDataFrame object. The default is False.

  • auth ((str, str), optional) – pair of strings (user, psw) used when the dataset loading requires an authentication. The default is None.

  • show_progress (boolean, optional) – if True, show a progress bar. The default is False.

Returns

an object containing the downloaded dataset

Return type

TrajDataFrame/FlowDataFrame/GeoDataFrame/DataFrame

Examples

>>> import skmob
>>> from skmob.data.load import load_dataset, list_datasets
>>>
>>> tdf_nyc = load_dataset("foursquare_nyc", drop_columns=True)
>>> print(tdf_nyc.head())
   uid        lat        lng                  datetime
0  470  40.719810 -74.002581 2012-04-03 18:00:09+00:00
1  979  40.606800 -74.044170 2012-04-03 18:00:25+00:00
2   69  40.716162 -73.883070 2012-04-03 18:02:24+00:00
3  395  40.745164 -73.982519 2012-04-03 18:02:41+00:00
4   87  40.740104 -73.989658 2012-04-03 18:03:00+00:00