Data
The data module of scikit-mobility provides users with an easy way to: 1) Download ready-to-use mobility data (e.g., trajectories, flows, spatial tessellations, and auxiliary data); 2) Load and transform the downloaded dataset into standard skmob structures (TrajDataFrame, GeoDataFrame, FlowDataFrame, DataFrame); 3) Allow developers and contributors to add new datasets to the library.
|
List datasets |
|
Load dataset |
Get dataset info |
- skmob.data.load.get_dataset_info(name)
Get dataset info
It returns the dataset information stored in the JSON file associated with the dataset.
- Parameters
name (str) – the name of the dataset of which to return the information
- Returns
the information stored in the JSON file associated with the dataset
- Return type
dict
Examples
>>> import skmob >>> from skmob.data.load import get_dataset_info >>> >>> get_dataset_info("foursquare_nyc") {'name': 'Foursquare_NYC', 'description': 'Dataset containing the Foursquare checkins of individuals moving in New York City', 'url': 'http://www-public.it-sudparis.eu/~zhang_da/pub/dataset_tsmc2014.zip', 'hash': 'cbe3fdab373d24b09b5fc53509c8958c77ff72b6c1a68589ce337d4f9a80235b', 'auth': 'no', 'data_type': 'trajectory', 'download_format': 'zip', 'sep': ' ', 'encoding': 'ISO-8859-1'}
- skmob.data.load.list_datasets(details=False, data_types=None)
List datasets
List all the names of the datasets available in the data module of scikit-mobility.
- Parameters
details (boolean) – whether to return the full details of the dataset instead of the name only. The default is False.
data_types (list of string, optional) – specify which dataset types to show. The default is None.
- Return type
an object listing the available datasets
Examples
>>> import skmob >>> from skmob.data.load import list_datasets >>> >>> list_datasets() ['flow_foursquare_nyc', 'foursquare_nyc', 'nyc_boundaries', 'parking_san_francisco', 'taxi_san_francisco']
- skmob.data.load.load_dataset(name, drop_columns=False, auth=None, show_progress=False)
Load dataset
Load one of the datasets that are present in the repository of scikit-mobility.
- Parameters
name (str) – the name of the dataset to load (e.g., foursquare_nyc)
drop_columns (boolean, optional) – whether to keep additional columns when returning TrajDataFrame or FlowDataFrame object. The default is False.
auth ((str, str), optional) – pair of strings (user, psw) used when the dataset loading requires an authentication. The default is None.
show_progress (boolean, optional) – if True, show a progress bar. The default is False.
- Returns
an object containing the downloaded dataset
- Return type
TrajDataFrame/FlowDataFrame/GeoDataFrame/DataFrame
Examples
>>> import skmob >>> from skmob.data.load import load_dataset, list_datasets >>> >>> tdf_nyc = load_dataset("foursquare_nyc", drop_columns=True) >>> print(tdf_nyc.head()) uid lat lng datetime 0 470 40.719810 -74.002581 2012-04-03 18:00:09+00:00 1 979 40.606800 -74.044170 2012-04-03 18:00:25+00:00 2 69 40.716162 -73.883070 2012-04-03 18:02:24+00:00 3 395 40.745164 -73.982519 2012-04-03 18:02:41+00:00 4 87 40.740104 -73.989658 2012-04-03 18:03:00+00:00