Preprocessing
|
Trajectory filtering. |
|
Trajectory compression. |
|
Stops detection. |
|
- skmob.preprocessing.filtering.filter(tdf, max_speed_kmh=500.0, include_loops=False, speed_kmh=5.0, max_loop=6, ratio_max=0.25)
Trajectory filtering.
For each individual in a TrajDataFrame, filter out the trajectory points that are considered noise or outliers [Z2015].
- Parameters
tdf (TrajDataFrame) – the trajectories of the individuals.
max_speed_kmh (float, optional) – delete a trajectory point if the speed (in km/h) from the previous point is higher than max_speed_kmh. The default is 500.0.
include_loops (boolean, optional) – If True, trajectory points belonging to short and fast “loops” are removed. Specifically, points are removed if within the next max_loop points the individual has come back to a distance (ratio_max * the maximum distance reached), AND the average speed (in km/h) is higher than speed. The default is False.
speed (float, optional) – the default is 5km/h (walking speed).
max_loop (int, optional) – the default is 6.
ratio_max (float, optional) – the default is 0.25.
- Returns
the TrajDataFrame without the trajectory points that have been filtered out.
- Return type
Warning
if include_loops is True, the filter is very slow. Use only if raw data is really noisy.
Examples
>>> import skmob >>> import pandas as pd >>> from skmob.preprocessing import filtering >>> # read the trajectory data (GeoLife) >>> url = skmob.utils.constants.GEOLIFE_SAMPLE >>> df = pd.read_csv(url, sep=',', compression='gzip') >>> tdf = skmob.TrajDataFrame(df, latitude='lat', longitude='lon', user_id='user', datetime='datetime') >>> print(tdf.head()) lat lng datetime uid 0 39.984094 116.319236 2008-10-23 05:53:05 1 1 39.984198 116.319322 2008-10-23 05:53:06 1 2 39.984224 116.319402 2008-10-23 05:53:11 1 3 39.984211 116.319389 2008-10-23 05:53:16 1 4 39.984217 116.319422 2008-10-23 05:53:21 1 >>> # filter out all points with a speed (in km/h) from the previous point higher than 500 km/h >>> ftdf = filtering.filter(tdf, max_speed_kmh=500.) >>> print(ftdf.parameters) {'filter': {'function': 'filter', 'max_speed_kmh': 500.0, 'include_loops': False, 'speed_kmh': 5.0, 'max_loop': 6, 'ratio_max': 0.25}} >>> n_deleted_points = len(tdf) - len(ftdf) # number of deleted points >>> print(n_deleted_points) 54
References
- Z2015
Zheng, Y. (2015) Trajectory data mining: an overview. ACM Transactions on Intelligent Systems and Technology 6(3), https://dl.acm.org/citation.cfm?id=2743025
- skmob.preprocessing.compression.compress(tdf, spatial_radius_km=0.2)
Trajectory compression.
Reduce the number of points in a trajectory for each individual in a TrajDataFrame. All points within a radius of spatial_radius_km kilometers from a given initial point are compressed into a single point that has the median coordinates of all points and the time of the initial point [Z2015].
- Parameters
tdf (TrajDataFrame) – the input trajectories of the individuals.
spatial_radius_km (float, optional) – the minimum distance (in km) between consecutive points of the compressed trajectory. The default is 0.2.
- Returns
the compressed TrajDataFrame.
- Return type
Examples
>>> import skmob >>> import pandas as pd >>> from skmob.preprocessing import compression >>> # read the trajectory data (GeoLife) >>> url = skmob.utils.constants.GEOLIFE_SAMPLE >>> df = pd.read_csv(url, sep=',', compression='gzip') >>> tdf = skmob.TrajDataFrame(df, latitude='lat', longitude='lon', user_id='user', datetime='datetime') >>> print(tdf.head()) lat lng datetime uid 0 39.984094 116.319236 2008-10-23 05:53:05 1 1 39.984198 116.319322 2008-10-23 05:53:06 1 2 39.984224 116.319402 2008-10-23 05:53:11 1 3 39.984211 116.319389 2008-10-23 05:53:16 1 4 39.984217 116.319422 2008-10-23 05:53:21 1 >>> # compress the trajectory using a spatial radius of 0.2 km >>> ctdf = compression.compress(tdf, spatial_radius_km=0.2) >>> print('Points of the original trajectory:\t%s'%len(tdf)) >>> print('Points of the compressed trajectory:\t%s'%len(ctdf)) Points of the original trajectory: 217653 Points of the compressed trajectory: 6281
References
- Z2015
Zheng, Y. (2015) Trajectory data mining: an overview. ACM Transactions on Intelligent Systems and Technology 6(3), https://dl.acm.org/citation.cfm?id=2743025
- skmob.preprocessing.detection.stay_locations(tdf, stop_radius_factor=0.5, minutes_for_a_stop=20.0, spatial_radius_km=0.2, leaving_time=True, no_data_for_minutes=1000000000000.0, min_speed_kmh=None)
Stops detection.
Detect the stay locations (or stops) for each individual in a TrajDataFrame. A stop is detected when the individual spends at least minutes_for_a_stop minutes within a distance stop_radius_factor * spatial_radius km from a given trajectory point. The stop’s coordinates are the median latitude and longitude values of the points found within the specified distance [RT2004] [Z2015].
- Parameters
tdf (TrajDataFrame) – the input trajectories of the individuals.
stop_radius_factor (float, optional) – if argument spatial_radius_km is None, the spatial_radius used is the value specified in the TrajDataFrame properties (“spatial_radius_km” assigned by a preprocessing.compression function) multiplied by this argument, stop_radius_factor. The default is 0.5.
minutes_for_a_stop (float, optional) – the minimum stop duration, in minutes. The default is 20.0.
spatial_radius_km (float or None, optional) – the radius of the ball enclosing all trajectory points within the stop location. The default is 0.2.
leaving_time (boolean, optional) – if True, a new column ‘leaving_datetime’ is added with the departure time from the stop location. The default is True.
no_data_for_minutes (float, optional) – if the number of minutes between two consecutive points is larger than no_data_for_minutes, then this is interpreted as missing data and does not count as a stop. The default is 1e12.
min_speed_kmh (float or None, optional) – if not None, remove the points at the end of a stop if their speed is larger than min_speed_kmh km/h. The default is None.
- Returns
a TrajDataFrame with the coordinates (latitude, longitude) of the stop locations.
- Return type
Examples
>>> import skmob >>> import pandas as pd >>> from skmob.preprocessing import detection >>> # read the trajectory data (GeoLife) >>> url = skmob.utils.constants.GEOLIFE_SAMPLE >>> df = pd.read_csv(url, sep=',', compression='gzip') >>> tdf = skmob.TrajDataFrame(df, latitude='lat', longitude='lon', user_id='user', datetime='datetime') >>> print(tdf.head()) lat lng datetime uid 0 39.984094 116.319236 2008-10-23 05:53:05 1 1 39.984198 116.319322 2008-10-23 05:53:06 1 2 39.984224 116.319402 2008-10-23 05:53:11 1 3 39.984211 116.319389 2008-10-23 05:53:16 1 4 39.984217 116.319422 2008-10-23 05:53:21 1 >>> stdf = detection.stay_locations(tdf, stop_radius_factor=0.5, minutes_for_a_stop=20.0, spatial_radius_km=0.2, leaving_time=True) >>> print(stdf.head()) lat lng datetime uid leaving_datetime 0 39.978030 116.327481 2008-10-23 06:01:37 1 2008-10-23 10:32:53 1 40.013820 116.306532 2008-10-23 11:10:19 1 2008-10-23 23:45:27 2 39.978419 116.326870 2008-10-24 00:21:52 1 2008-10-24 01:47:30 3 39.981166 116.308475 2008-10-24 02:02:31 1 2008-10-24 02:30:29 4 39.981431 116.309902 2008-10-24 02:30:29 1 2008-10-24 03:16:35 >>> print(stdf.parameters) {'detect': {'function': 'stay_locations', 'stop_radius_factor': 0.5, 'minutes_for_a_stop': 20.0, 'spatial_radius_km': 0.2, 'leaving_time': True, 'no_data_for_minutes': 1000000000000.0, 'min_speed_kmh': None}} >>> print('Points of the original trajectory:\t%s'%len(tdf)) >>> print('Points of stops:\t\t\t%s'%len(stdf)) Points of the original trajectory: 217653 Points of stops: 391
References
- RT2004
Ramaswamy, H. & Toyama, K. (2004) Project Lachesis: parsing and modeling location histories. In International Conference on Geographic Information Science, 106-124, http://kentarotoyama.com/papers/Hariharan_2004_Project_Lachesis.pdf
- Z2015
Zheng, Y. (2015) Trajectory data mining: an overview. ACM Transactions on Intelligent Systems and Technology 6(3), https://dl.acm.org/citation.cfm?id=2743025
- skmob.preprocessing.clustering.cluster(tdf, cluster_radius_km=0.1, min_samples=1)