Privacy

`LocationAttack`(knowledge_length)	Location Attack
`LocationTimeAttack`(knowledge_length[, ...])	Location Time Attack
`UniqueLocationAttack`(knowledge_length)	Unique Location Attack
`LocationSequenceAttack`(knowledge_length)	Location Sequence Attack In a location sequence attack the adversary knows the coordinates of locations visited by an individual and the order in which they were visited and matches them against trajectories.
`LocationFrequencyAttack`(knowledge_length[, ...])	Location Frequency Attack
`LocationProbabilityAttack`(knowledge_length)	Location Probability Attack
`LocationProportionAttack`(knowledge_length[, ...])	Location Proportion Attack
`HomeWorkAttack`([knowledge_length])	Home And Work Attack

class skmob.privacy.attacks.HomeWorkAttack(knowledge_length=1)

Home And Work Attack

In a home and work attack the adversary knows the coordinates of the two locations most frequently visited by an individual, and matches them against frequency vectors. A frequency vector is an aggregation on trajectory data showing the unique locations visited by an individual and the frequency with which he visited those locations. This attack does not require the generation of combinations to build the possible instances of background knowledge.

Parameters: knowledge_length (int) – the length of the background knowledge that we want to simulate. The length of the background knowledge specifies the amount of knowledge that the adversary will use for her attack. For each individual all the combinations of points of length k will be evaluated.
Variables: knowledge_length (int) – the length of the background knowledge that we want to simulate.

Examples

>>> import skmob
>>> from skmob.privacy import attacks
>>> from skmob.core.trajectorydataframe import TrajDataFrame
>>> # load data
>>> url_priv_ex = "https://github.com/scikit-mobility/tutorials/blob/master/AMLD%202020/data/privacy_toy.csv"
>>> trjdat = TrajDataFrame.from_file(filename=url_priv_ex)
>>> # create a location attack and assess risk
>>> at = attacks.HomeWorkAttack()
>>> r = at.assess_risk(trjdat)
>>> print(r)
   uid  risk
0    1  0.25
1    2  0.25
2    3  0.25
3    4  0.25
4    5  1.00
5    6  1.00
6    7  1.00

>>> # limit privacy assessment to some target uids
>>> r = at.assess_risk(trjdat, targets=[1,2])
>>> print(r)
   uid  risk
0    1  0.25
1    2  0.25

>>> # inspect probability of reidentification for each background knowledge instance
>>> r = at.assess_risk(trjdat, targets=[1,2], force_instances=True)
>>> print(r)
   lat       lng  datetime  uid  instance  instance_elem  prob
0  1.0  43.54427  10.32615  1.0         1              1  0.25
1  1.0  43.70853  10.40360  1.0         1              2  0.25
2  2.0  43.54427  10.32615  1.0         1              1  0.25
3  2.0  43.70853  10.40360  1.0         1              2  0.25

References

TIST2018: Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, and Anna Monreale. 2017. A Data Mining Approach to Assess Privacy Risk in Human Mobility Data. ACM Trans. Intell. Syst. Technol. 9, 3, Article 31 (December 2017), 27 pages. DOI: https://doi.org/10.1145/3106774
MOB2018: Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, Anna Monreale: Analyzing Privacy Risk in Human Mobility Data. STAF Workshops 2018: 114-129

assess_risk(traj, targets=None, force_instances=False, show_progress=False)

Assess privacy risk for a TrajectoryDataFrame. An attack must implement an assessing strategy. This could involve some preprocessing, for example transforming the original data, and calls to the risk function. If it is not required to compute the risk for the entire data, the targets parameter can be used to select a portion of users to perform the assessment on.

Parameters

traj (TrajectoryDataFrame) – the dataframe on which to assess privacy risk.
targets (TrajectoryDataFrame or list, optional) – the users_id target of the attack. They must be compatible with the trajectory data. Default values is None in which case risk is computed on all users in traj. The defaul is None.
force_instances (boolean, optional) – if True, returns all possible instances of background knowledge with their respective probability of reidentification. The defaul is False.
show_progress (boolean, optional) – if True, shows the progress of the computation. The defaul is False.

Returns

a DataFrame with the privacy risk for each user, in the form (user_id, risk).

Return type

DataFrame

class skmob.privacy.attacks.LocationAttack(knowledge_length)

Location Attack

In a location attack the adversary knows the coordinates of the locations visited by an individual and matches them against trajectories.

Parameters: knowledge_length (int) – the length of the background knowledge that we want to simulate. The length of the background knowledge specifies the amount of knowledge that the adversary will use for her attack. For each individual all the combinations of points of length k will be evaluated.
Variables: knowledge_length (int) – the length of the background knowledge that we want to simulate.

Examples

>>> import skmob
>>> from skmob.privacy import attacks
>>> from skmob.core.trajectorydataframe import TrajDataFrame
>>> # load data
>>> url_priv_ex = "https://github.com/scikit-mobility/tutorials/blob/master/AMLD%202020/data/privacy_toy.csv"
>>> trjdat = TrajDataFrame.from_file(filename=url_priv_ex)
>>> # create a location attack and assess risk
>>> at = attacks.LocationAttack(knowledge_length=2)
>>> r = at.assess_risk(trjdat)
>>> print(r)
    uid     risk
0   1       0.333333
1   2       0.500000
2   3       0.333333
3   4       0.333333
4   5       0.250000
5   6       0.250000
6   7       0.500000

>>> # change the length of the background knowledge and reassess risk
>>> at.knowledge_length = 3
>>> r = at.assess_risk(trjdat)
>>> print(r)
   uid      risk
0    1  0.500000
1    2  1.000000
2    3  0.500000
3    4  0.333333
4    5  0.333333
5    6  0.250000
6    7  1.000000

>>> # limit privacy assessment to some target uids
>>> r = at.assess_risk(trjdat, targets=[1,2])
>>> print(r)
   uid  risk
0    1   0.5
1    2   1.0

>>> # inspect probability of reidentification for each background knowledge instance
>>> r = at.assess_risk(trjdat, targets=[1,2], force_instances=True)
>>> print(r)
          lat        lng            datetime  uid  instance  instance_elem      prob
 43.843014  10.507994 2011-02-03 08:34:04    1         1              1  0.333333
 43.544270  10.326150 2011-02-03 09:34:04    1         1              2  0.333333
 43.708530  10.403600 2011-02-03 10:34:04    1         1              3  0.333333
 43.843014  10.507994 2011-02-03 08:34:04    1         2              1  0.500000
 43.544270  10.326150 2011-02-03 09:34:04    1         2              2  0.500000
 43.779250  11.246260 2011-02-04 10:34:04    1         2              3  0.500000
 43.843014  10.507994 2011-02-03 08:34:04    1         3              1  0.333333
 43.708530  10.403600 2011-02-03 10:34:04    1         3              2  0.333333
 43.779250  11.246260 2011-02-04 10:34:04    1         3              3  0.333333
 43.544270  10.326150 2011-02-03 09:34:04    1         4              1  0.333333
43.708530  10.403600 2011-02-03 10:34:04    1         4              2  0.333333
43.779250  11.246260 2011-02-04 10:34:04    1         4              3  0.333333
43.843014  10.507994 2011-02-03 08:34:04    2         1              1  1.000000
43.708530  10.403600 2011-02-03 09:34:04    2         1              2  1.000000
43.843014  10.507994 2011-02-04 10:34:04    2         1              3  1.000000
43.843014  10.507994 2011-02-03 08:34:04    2         2              1  0.333333
43.708530  10.403600 2011-02-03 09:34:04    2         2              2  0.333333
43.544270  10.326150 2011-02-04 11:34:04    2         2              3  0.333333
43.843014  10.507994 2011-02-03 08:34:04    2         3              1  1.000000
43.843014  10.507994 2011-02-04 10:34:04    2         3              2  1.000000
43.544270  10.326150 2011-02-04 11:34:04    2         3              3  1.000000
43.708530  10.403600 2011-02-03 09:34:04    2         4              1  0.333333
43.843014  10.507994 2011-02-04 10:34:04    2         4              2  0.333333
43.544270  10.326150 2011-02-04 11:34:04    2         4              3  0.333333

References

TIST2018: Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, and Anna Monreale. 2017. A Data Mining Approach to Assess Privacy Risk in Human Mobility Data. ACM Trans. Intell. Syst. Technol. 9, 3, Article 31 (December 2017), 27 pages. DOI: https://doi.org/10.1145/3106774
MOB2018: Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, Anna Monreale: Analyzing Privacy Risk in Human Mobility Data. STAF Workshops 2018: 114-129

assess_risk(traj, targets=None, force_instances=False, show_progress=False)

Assess privacy risk for a TrajectoryDataFrame. An attack must implement an assessing strategy. This could involve some preprocessing, for example transforming the original data, and calls to the risk function. If it is not required to compute the risk for the entire data, the targets parameter can be used to select a portion of users to perform the assessment on.

Parameters

traj (TrajectoryDataFrame) – the dataframe on which to assess privacy risk.
targets (TrajectoryDataFrame or list, optional) – the users_id target of the attack. They must be compatible with the trajectory data. Default values is None in which case risk is computed on all users in traj. The defaul is None.
force_instances (boolean, optional) – if True, returns all possible instances of background knowledge with their respective probability of reidentification. The defaul is False.
show_progress (boolean, optional) – if True, shows the progress of the computation. The defaul is False.

Returns

a DataFrame with the privacy risk for each user, in the form (user_id, risk).

Return type

DataFrame

class skmob.privacy.attacks.LocationFrequencyAttack(knowledge_length, tolerance=0.0)

Location Frequency Attack

In a location frequency attack the adversary knows the coordinates of the unique locations visited by an individual and the frequency with which he visited them, and matches them against frequency vectors. A frequency vector, is an aggregation on trajectory data showing the unique locations visited by an individual and the frequency with which he visited those locations. It is possible to specify a tolerance level for the matching of the frequency.

Parameters

knowledge_length (int) – the length of the background knowledge that we want to simulate. The length of the background knowledge specifies the amount of knowledge that the adversary will use for her attack. For each individual all the combinations of points of length k will be evaluated.
tolerance (float, optional) – the tolarance with which to match the frequency. It can assume values between 0 and 1. The defaul is 0.

Variables

knowledge_length (int) – the length of the background knowledge that we want to simulate.
tolerance (float) – the tolarance with which to match the frequency.

Examples

>>> import skmob
>>> from skmob.privacy import attacks
>>> from skmob.core.trajectorydataframe import TrajDataFrame
>>> # load data
>>> url_priv_ex = "https://github.com/scikit-mobility/tutorials/blob/master/AMLD%202020/data/privacy_toy.csv"
>>> trjdat = TrajDataFrame.from_file(filename=url_priv_ex)
>>> # create a location attack and assess risk
>>> at = attacks.LocationFrequencyAttack(knowledge_length=2)
>>> r = at.assess_risk(trjdat)
>>> print(r)
   uid      risk
0    1  0.333333
1    2  1.000000
2    3  0.333333
3    4  0.333333
4    5  0.333333
5    6  0.333333
6    7  1.000000

>>> # change the tolerance with witch the frequency is matched
>>> at.tolerance = 0.5
>>> r = at.assess_risk(trjdat)
>>> print(r)
   uid      risk
0    1  0.333333
1    2  1.000000
2    3  0.333333
3    4  0.333333
4    5  0.250000
5    6  0.250000
6    7  1.000000

>>> # change the length of the background knowledge and reassess risk
>>> at.knowledge_length = 3
>>> r = at.assess_risk(trjdat)
>>> print(r)
   uid      risk
0    1  0.500000
1    2  1.000000
2    3  0.500000
3    4  0.333333
4    5  0.333333
5    6  0.250000
6    7  1.000000

>>> # limit privacy assessment to some target uids
>>> r = at.assess_risk(trjdat, targets=[1,2])
>>> print(r)
   uid  risk
0    1   0.5
1    2   1.0

>>> # inspect probability of reidentification for each background knowledge instance
>>> r = at.assess_risk(trjdat, targets=[1,2], force_instances=True)
>>> print(r)
    lat        lng   datetime  uid  instance  instance_elem      prob
 1.0  43.544270  10.326150  1.0         1              1  0.333333
 1.0  43.708530  10.403600  1.0         1              2  0.333333
 1.0  43.779250  11.246260  1.0         1              3  0.333333
 1.0  43.544270  10.326150  1.0         2              1  0.333333
 1.0  43.708530  10.403600  1.0         2              2  0.333333
 1.0  43.843014  10.507994  1.0         2              3  0.333333
 1.0  43.544270  10.326150  1.0         3              1  0.500000
 1.0  43.779250  11.246260  1.0         3              2  0.500000
 1.0  43.843014  10.507994  1.0         3              3  0.500000
 1.0  43.708530  10.403600  1.0         4              1  0.333333
1.0  43.779250  11.246260  1.0         4              2  0.333333
1.0  43.843014  10.507994  1.0         4              3  0.333333
2.0  43.544270  10.326150  1.0         1              1  1.000000
2.0  43.708530  10.403600  1.0         1              2  1.000000
2.0  43.843014  10.507994  2.0         1              3  1.000000

References

TIST2018: Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, and Anna Monreale. 2017. A Data Mining Approach to Assess Privacy Risk in Human Mobility Data. ACM Trans. Intell. Syst. Technol. 9, 3, Article 31 (December 2017), 27 pages. DOI: https://doi.org/10.1145/3106774
MOB2018: Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, Anna Monreale: Analyzing Privacy Risk in Human Mobility Data. STAF Workshops 2018: 114-129

assess_risk(traj, targets=None, force_instances=False, show_progress=False)

Assess privacy risk for a TrajectoryDataFrame. An attack must implement an assessing strategy. This could involve some preprocessing, for example transforming the original data, and calls to the risk function. If it is not required to compute the risk for the entire data, the targets parameter can be used to select a portion of users to perform the assessment on.

Parameters

traj (TrajectoryDataFrame) – the dataframe on which to assess privacy risk.
targets (TrajectoryDataFrame or list, optional) – the users_id target of the attack. They must be compatible with the trajectory data. Default values is None in which case risk is computed on all users in traj. The defaul is None.
force_instances (boolean, optional) – if True, returns all possible instances of background knowledge with their respective probability of reidentification. The defaul is False.
show_progress (boolean, optional) – if True, shows the progress of the computation. The defaul is False.

Returns

a DataFrame with the privacy risk for each user, in the form (user_id, risk).

Return type

DataFrame

class skmob.privacy.attacks.LocationProbabilityAttack(knowledge_length, tolerance=0.0)

Location Probability Attack

In a location probability attack the adversary knows the coordinates of the unique locations visited by an individual and the probability with which he visited them, and matches them against probability vectors. A probability vector, is an aggregation on trajectory data showing the unique locations visited by an individual and the probability with which he visited those locations. It is possible to specify a tolerance level for the matching of the probability.

Parameters

knowledge_length (int) – the length of the background knowledge that we want to simulate. The length of the background knowledge specifies the amount of knowledge that the adversary will use for her attack. For each individual all the combinations of points of length k will be evaluated.
tolerance (float, optional) – the tolarance with which to match the probability. It can assume values between 0 and 1. The defaul is 0.

Variables

knowledge_length (int) – the length of the background knowledge that we want to simulate.
tolerance (float) – the tolarance with which to match the probability.

Examples

>>> import skmob
>>> from skmob.privacy import attacks
>>> from skmob.core.trajectorydataframe import TrajDataFrame
>>> # load data
>>> url_priv_ex = "https://github.com/scikit-mobility/tutorials/blob/master/AMLD%202020/data/privacy_toy.csv"
>>> trjdat = TrajDataFrame.from_file(filename=url_priv_ex)
>>> # create a location attack and assess risk
>>> at = attacks.LocationProbabilityAttack(knowledge_length=2)
>>> r = at.assess_risk(trjdat)
>>> print(r)
   uid  risk
0    1   0.5
1    2   1.0
2    3   0.5
3    4   1.0
4    5   1.0
5    6   1.0
6    7   1.0

>>> # change the tolerance with witch the frequency is matched
>>> at.tolerance = 0.5
>>> r = at.assess_risk(trjdat)
>>> print(r)
   uid      risk
0    1  0.333333
1    2  0.500000
2    3  0.333333
3    4  0.333333
4    5  0.250000
5    6  1.000000
6    7  1.000000

>>> # change the length of the background knowledge and reassess risk
>>> at.knowledge_length = 3
>>> r = at.assess_risk(trjdat)
>>> print(r)
   uid      risk
0    1  0.500000
1    2  1.000000
2    3  0.500000
3    4  0.333333
4    5  0.333333
5    6  1.000000
6    7  1.000000

>>> # limit privacy assessment to some target uids
>>> r = at.assess_risk(trjdat, targets=[1,2])
>>> print(r)
   uid  risk
0    1   0.5
1    2   1.0
>>> # inspect probability of reidentification for each background knowledge instance
>>> r = at.assess_risk(trjdat, targets=[1,2], force_instances=True)
>>> print(r)
    lat        lng   datetime   uid  instance  instance_elem      prob
0   1.0  43.544270  10.326150  0.25         1              1  0.333333
1   1.0  43.708530  10.403600  0.25         1              2  0.333333
2   1.0  43.779250  11.246260  0.25         1              3  0.333333
3   1.0  43.544270  10.326150  0.25         2              1  0.333333
4   1.0  43.708530  10.403600  0.25         2              2  0.333333
5   1.0  43.843014  10.507994  0.25         2              3  0.333333
6   1.0  43.544270  10.326150  0.25         3              1  0.500000
7   1.0  43.779250  11.246260  0.25         3              2  0.500000
8   1.0  43.843014  10.507994  0.25         3              3  0.500000
9   1.0  43.708530  10.403600  0.25         4              1  0.333333
10  1.0  43.779250  11.246260  0.25         4              2  0.333333
11  1.0  43.843014  10.507994  0.25         4              3  0.333333
12  2.0  43.544270  10.326150  0.25         1              1  1.000000
13  2.0  43.708530  10.403600  0.25         1              2  1.000000
14  2.0  43.843014  10.507994  0.50         1              3  1.000000

References

TIST2018: Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, and Anna Monreale. 2017. A Data Mining Approach to Assess Privacy Risk in Human Mobility Data. ACM Trans. Intell. Syst. Technol. 9, 3, Article 31 (December 2017), 27 pages. DOI: https://doi.org/10.1145/3106774
MOB2018: Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, Anna Monreale: Analyzing Privacy Risk in Human Mobility Data. STAF Workshops 2018: 114-129

assess_risk(traj, targets=None, force_instances=False, show_progress=False)

Assess privacy risk for a TrajectoryDataFrame. An attack must implement an assessing strategy. This could involve some preprocessing, for example transforming the original data, and calls to the risk function. If it is not required to compute the risk for the entire data, the targets parameter can be used to select a portion of users to perform the assessment on.

Parameters

traj (TrajectoryDataFrame) – the dataframe on which to assess privacy risk.
targets (TrajectoryDataFrame or list, optional) – the users_id target of the attack. They must be compatible with the trajectory data. Default values is None in which case risk is computed on all users in traj. The defaul is None.
force_instances (boolean, optional) – if True, returns all possible instances of background knowledge with their respective probability of reidentification. The defaul is False.
show_progress (boolean, optional) – if True, shows the progress of the computation. The defaul is False.

Returns

a DataFrame with the privacy risk for each user, in the form (user_id, risk).

Return type

DataFrame

class skmob.privacy.attacks.LocationProportionAttack(knowledge_length, tolerance=0.0)

Location Proportion Attack

In a location proportion attack the adversary knows the coordinates of the unique locations visited by an individual and the relative proportions between their frequencies of visit, and matches them against frequency vectors. A frequency vector is an aggregation on trajectory data showing the unique locations visited by an individual and the frequency with which he visited those locations. It is possible to specify a tolerance level for the matching of the proportion.

Parameters

knowledge_length (int) – the length of the background knowledge that we want to simulate. The length of the background knowledge specifies the amount of knowledge that the adversary will use for her attack. For each individual all the combinations of points of length k will be evaluated.
tolerance (float, optional) – the tolarance with which to match the frequency. It can assume values between 0 and 1. The defaul is 0.

Variables

knowledge_length (int) – the length of the background knowledge that we want to simulate.
tolerance (float) – the tolarance with which to match the frequency.

Examples

>>> import skmob
>>> from skmob.privacy import attacks
>>> from skmob.core.trajectorydataframe import TrajDataFrame
>>> # load data
>>> url_priv_ex = "https://github.com/scikit-mobility/tutorials/blob/master/AMLD%202020/data/privacy_toy.csv"
>>> trjdat = TrajDataFrame.from_file(filename=url_priv_ex)
>>> # create a location attack and assess risk
>>> at = attacks.LocationProportionAttack(knowledge_length=2)
>>> r = at.assess_risk(trjdat)
>>> print(r)
   uid      risk
0    1  0.333333
1    2  1.000000
2    3  0.333333
3    4  0.333333
4    5  0.333333
5    6  0.333333
6    7  1.000000

>>> # change the tolerance with witch the frequency is matched
>>> at.tolerance = 0.5
>>> r = at.assess_risk(trjdat)
>>> print(r)
   uid      risk
0    1  0.333333
1    2  0.250000
2    3  0.333333
3    4  0.333333
4    5  0.333333
5    6  0.333333
6    7  0.250000

>>> # change the length of the background knowledge and reassess risk
>>> at.knowledge_length = 3
>>> r = at.assess_risk(trjdat)
>>> print(r)
   uid      risk
0    1  0.500000
1    2  0.333333
2    3  0.500000
3    4  0.333333
4    5  0.333333
5    6  0.333333
6    7  0.250000

>>> # limit privacy assessment to some target uids
>>> r = at.assess_risk(trjdat, targets=[1,2])
>>> print(r)
   uid      risk
0    1  0.500000
1    2  0.333333

>>> # inspect probability of reidentification for each background knowledge instance
>>> r = at.assess_risk(trjdat, targets=[1,2], force_instances=True)
>>> print(r)
    lat        lng   datetime  uid  instance  instance_elem      prob
 1.0  43.544270  10.326150  1.0         1              1  0.333333
 1.0  43.708530  10.403600  1.0         1              2  0.333333
 1.0  43.779250  11.246260  1.0         1              3  0.333333
 1.0  43.544270  10.326150  1.0         2              1  0.500000
 1.0  43.708530  10.403600  1.0         2              2  0.500000
 1.0  43.843014  10.507994  1.0         2              3  0.500000
 1.0  43.544270  10.326150  1.0         3              1  0.500000
 1.0  43.779250  11.246260  1.0         3              2  0.500000
 1.0  43.843014  10.507994  1.0         3              3  0.500000
 1.0  43.708530  10.403600  1.0         4              1  0.333333
1.0  43.779250  11.246260  1.0         4              2  0.333333
1.0  43.843014  10.507994  1.0         4              3  0.333333
2.0  43.544270  10.326150  1.0         1              1  0.333333
2.0  43.708530  10.403600  1.0         1              2  0.333333
2.0  43.843014  10.507994  2.0         1              3  0.333333

References

TIST2018: Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, and Anna Monreale. 2017. A Data Mining Approach to Assess Privacy Risk in Human Mobility Data. ACM Trans. Intell. Syst. Technol. 9, 3, Article 31 (December 2017), 27 pages. DOI: https://doi.org/10.1145/3106774
MOB2018: Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, Anna Monreale: Analyzing Privacy Risk in Human Mobility Data. STAF Workshops 2018: 114-129

assess_risk(traj, targets=None, force_instances=False, show_progress=False)

Assess privacy risk for a TrajectoryDataFrame. An attack must implement an assessing strategy. This could involve some preprocessing, for example transforming the original data, and calls to the risk function. If it is not required to compute the risk for the entire data, the targets parameter can be used to select a portion of users to perform the assessment on.

Parameters

traj (TrajectoryDataFrame) – the dataframe on which to assess privacy risk.
targets (TrajectoryDataFrame or list, optional) – the users_id target of the attack. They must be compatible with the trajectory data. Default values is None in which case risk is computed on all users in traj. The defaul is None.
force_instances (boolean, optional) – if True, returns all possible instances of background knowledge with their respective probability of reidentification. The defaul is False.
show_progress (boolean, optional) – if True, shows the progress of the computation. The defaul is False.

Returns

a DataFrame with the privacy risk for each user, in the form (user_id, risk).

Return type

DataFrame

class skmob.privacy.attacks.LocationSequenceAttack(knowledge_length)

Location Sequence Attack In a location sequence attack the adversary knows the coordinates of locations visited by an individual and the order in which they were visited and matches them against trajectories.

Parameters: knowledge_length (int) – the length of the background knowledge that we want to simulate. The length of the background knowledge specifies the amount of knowledge that the adversary will use for her attack. For each individual all the combinations of points of length k will be evaluated.
Variables: knowledge_length (int) – the length of the background knowledge that we want to simulate.

Examples

>>> import skmob
>>> from skmob.privacy import attacks
>>> from skmob.core.trajectorydataframe import TrajDataFrame
>>> # load data
>>> url_priv_ex = "https://github.com/scikit-mobility/tutorials/blob/master/AMLD%202020/data/privacy_toy.csv"
>>> trjdat = TrajDataFrame.from_file(filename=url_priv_ex)
>>> # create a location attack and assess risk
>>> at = attacks.LocationSequenceAttack(knowledge_length=2)
>>> r = at.assess_risk(trjdat)
>>> print(r)
   uid      risk
0    1  0.500000
1    2  0.500000
2    3  1.000000
3    4  0.500000
4    5  1.000000
5    6  0.333333
6    7  0.500000

>>> # change the length of the background knowledge and reassess risk
>>> at.knowledge_length = 3
>>> r = at.assess_risk(trjdat)
>>> print(r)
   uid      risk
0    1  1.000000
1    2  1.000000
2    3  1.000000
3    4  1.000000
4    5  1.000000
5    6  0.333333
6    7  1.000000

>>> # limit privacy assessment to some target uids
>>> r = at.assess_risk(trjdat, targets=[1,2])
>>> print(r)
   uid  risk
0    1   1.0
1    2   1.0

>>> # inspect probability of reidentification for each background knowledge instance
>>> r = at.assess_risk(trjdat, targets=[1,2], force_instances=True)
>>> print(r)
          lat        lng            datetime  uid  instance  instance_elem  prob
 43.843014  10.507994 2011-02-03 08:34:04    1         1              1   1.0
 43.544270  10.326150 2011-02-03 09:34:04    1         1              2   1.0
 43.708530  10.403600 2011-02-03 10:34:04    1         1              3   1.0
 43.843014  10.507994 2011-02-03 08:34:04    1         2              1   1.0
 43.544270  10.326150 2011-02-03 09:34:04    1         2              2   1.0
 43.779250  11.246260 2011-02-04 10:34:04    1         2              3   1.0
 43.843014  10.507994 2011-02-03 08:34:04    1         3              1   1.0
 43.708530  10.403600 2011-02-03 10:34:04    1         3              2   1.0
 43.779250  11.246260 2011-02-04 10:34:04    1         3              3   1.0
 43.544270  10.326150 2011-02-03 09:34:04    1         4              1   0.5
43.708530  10.403600 2011-02-03 10:34:04    1         4              2   0.5
43.779250  11.246260 2011-02-04 10:34:04    1         4              3   0.5
43.843014  10.507994 2011-02-03 08:34:04    2         1              1   1.0
43.708530  10.403600 2011-02-03 09:34:04    2         1              2   1.0
43.843014  10.507994 2011-02-04 10:34:04    2         1              3   1.0
43.843014  10.507994 2011-02-03 08:34:04    2         2              1   1.0
43.708530  10.403600 2011-02-03 09:34:04    2         2              2   1.0
43.544270  10.326150 2011-02-04 11:34:04    2         2              3   1.0
43.843014  10.507994 2011-02-03 08:34:04    2         3              1   1.0
43.843014  10.507994 2011-02-04 10:34:04    2         3              2   1.0
43.544270  10.326150 2011-02-04 11:34:04    2         3              3   1.0
43.708530  10.403600 2011-02-03 09:34:04    2         4              1   1.0
43.843014  10.507994 2011-02-04 10:34:04    2         4              2   1.0
43.544270  10.326150 2011-02-04 11:34:04    2         4              3   1.0

References

TIST2018: Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, and Anna Monreale. 2017. A Data Mining Approach to Assess Privacy Risk in Human Mobility Data. ACM Trans. Intell. Syst. Technol. 9, 3, Article 31 (December 2017), 27 pages. DOI: https://doi.org/10.1145/3106774
MOB2018: Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, Anna Monreale: Analyzing Privacy Risk in Human Mobility Data. STAF Workshops 2018: 114-129

assess_risk(traj, targets=None, force_instances=False, show_progress=False)

Assess privacy risk for a TrajectoryDataFrame. An attack must implement an assessing strategy. This could involve some preprocessing, for example transforming the original data, and calls to the risk function. If it is not required to compute the risk for the entire data, the targets parameter can be used to select a portion of users to perform the assessment on.

Parameters

traj (TrajectoryDataFrame) – the dataframe on which to assess privacy risk.
targets (TrajectoryDataFrame or list, optional) – the users_id target of the attack. They must be compatible with the trajectory data. Default values is None in which case risk is computed on all users in traj. The defaul is None.
force_instances (boolean, optional) – if True, returns all possible instances of background knowledge with their respective probability of reidentification. The defaul is False.
show_progress (boolean, optional) – if True, shows the progress of the computation. The defaul is False.

Returns

a DataFrame with the privacy risk for each user, in the form (user_id, risk).

Return type

DataFrame

class skmob.privacy.attacks.LocationTimeAttack(knowledge_length, time_precision='Hour')

Location Time Attack

In a location time attack the adversary knows the coordinates of locations visited by an individual and the time in which they were visited and matches them against trajectories. The precision at which to consider the temporal information can also be specified.

Parameters

knowledge_length (int) – the length of the background knowledge that we want to simulate. The length of the background knowledge specifies the amount of knowledge that the adversary will use for her attack. For each individual all the combinations of points of length k will be evaluated.
time_precision (string, optional) – the precision at which to consider the timestamps for the visits. The possible precisions are: Year, Month, Day, Hour, Minute, Second. The default is Hour

Variables

knowledge_length (int) – the length of the background knowledge that we want to simulate.
time_precision (string) – the precision at which to consider the timestamps for the visits.

Examples

>>> import skmob
>>> from skmob.privacy import attacks
>>> from skmob.core.trajectorydataframe import TrajDataFrame
>>> # load data
>>> url_priv_ex = "https://github.com/scikit-mobility/tutorials/blob/master/AMLD%202020/data/privacy_toy.csv"
>>> trjdat = TrajDataFrame.from_file(filename=url_priv_ex)
>>> # create a location attack and assess risk
>>> at = attacks.LocationTimeAttack(knowledge_length=2)
>>> r = at.assess_risk(trjdat)
>>> print(r)
   uid  risk
0    1   1.0
1    2   1.0
2    3   1.0
3    4   1.0
4    5   1.0
5    6   0.5
6    7   1.0

>>> #change the time granularity of the attack
>>> at.time_precision = "Month"
>>> r = at.assess_risk(trjdat)
>>> print(r)
   uid      risk
0    1  0.333333
1    2  0.500000
2    3  0.333333
3    4  0.333333
4    5  0.250000
5    6  0.250000
6    7  0.500000

>>> # change the length of the background knowledge and reassess risk
>>> at.knowledge_length = 3
>>> r = at.assess_risk(trjdat)
>>> print(r)
   uid      risk
0    1  0.500000
1    2  1.000000
2    3  0.500000
3    4  0.333333
4    5  0.333333
5    6  0.250000
6    7  1.000000

>>> # limit privacy assessment to some target uids
>>> r = at.assess_risk(trjdat, targets=[1,2])
>>> print(r)
   uid      risk
0    1  0.500000
1    2  1.000000

>>> # inspect probability of reidentification for each background knowledge instance
>>> r = at.assess_risk(trjdat, targets=[1,2], force_instances=True)
>>> print(r)
          lat        lng            datetime  uid  instance  instance_elem      prob
 43.843014  10.507994 2011-02-03 08:34:04    1         1              1  0.333333
 43.544270  10.326150 2011-02-03 09:34:04    1         1              2  0.333333
 43.708530  10.403600 2011-02-03 10:34:04    1         1              3  0.333333
 43.843014  10.507994 2011-02-03 08:34:04    1         2              1  0.500000
 43.544270  10.326150 2011-02-03 09:34:04    1         2              2  0.500000
 43.779250  11.246260 2011-02-04 10:34:04    1         2              3  0.500000
 43.843014  10.507994 2011-02-03 08:34:04    1         3              1  0.333333
 43.708530  10.403600 2011-02-03 10:34:04    1         3              2  0.333333
 43.779250  11.246260 2011-02-04 10:34:04    1         3              3  0.333333
 43.544270  10.326150 2011-02-03 09:34:04    1         4              1  0.333333
43.708530  10.403600 2011-02-03 10:34:04    1         4              2  0.333333
43.779250  11.246260 2011-02-04 10:34:04    1         4              3  0.333333
43.843014  10.507994 2011-02-03 08:34:04    2         1              1  1.000000
43.708530  10.403600 2011-02-03 09:34:04    2         1              2  1.000000
43.843014  10.507994 2011-02-04 10:34:04    2         1              3  1.000000
43.843014  10.507994 2011-02-03 08:34:04    2         2              1  0.333333
43.708530  10.403600 2011-02-03 09:34:04    2         2              2  0.333333
43.544270  10.326150 2011-02-04 11:34:04    2         2              3  0.333333
43.843014  10.507994 2011-02-03 08:34:04    2         3              1  1.000000
43.843014  10.507994 2011-02-04 10:34:04    2         3              2  1.000000
43.544270  10.326150 2011-02-04 11:34:04    2         3              3  1.000000
43.708530  10.403600 2011-02-03 09:34:04    2         4              1  0.333333
43.843014  10.507994 2011-02-04 10:34:04    2         4              2  0.333333
43.544270  10.326150 2011-02-04 11:34:04    2         4              3  0.333333

References

TIST2018: Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, and Anna Monreale. 2017. A Data Mining Approach to Assess Privacy Risk in Human Mobility Data. ACM Trans. Intell. Syst. Technol. 9, 3, Article 31 (December 2017), 27 pages. DOI: https://doi.org/10.1145/3106774
MOB2018: Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, Anna Monreale: Analyzing Privacy Risk in Human Mobility Data. STAF Workshops 2018: 114-129

assess_risk(traj, targets=None, force_instances=False, show_progress=False)

Assess privacy risk for a TrajectoryDataFrame. An attack must implement an assessing strategy. This could involve some preprocessing, for example transforming the original data, and calls to the risk function. If it is not required to compute the risk for the entire data, the targets parameter can be used to select a portion of users to perform the assessment on.

Parameters

traj (TrajectoryDataFrame) – the dataframe on which to assess privacy risk.
targets (TrajectoryDataFrame or list, optional) – the users_id target of the attack. They must be compatible with the trajectory data. Default values is None in which case risk is computed on all users in traj. The defaul is None.
force_instances (boolean, optional) – if True, returns all possible instances of background knowledge with their respective probability of reidentification. The defaul is False.
show_progress (boolean, optional) – if True, shows the progress of the computation. The defaul is False.

Returns

a DataFrame with the privacy risk for each user, in the form (user_id, risk).

Return type

DataFrame

class skmob.privacy.attacks.UniqueLocationAttack(knowledge_length)

Unique Location Attack

In a unique location attack the adversary knows the coordinates of unique locations visited by an individual, and matches them against frequency vectors. A frequency vector, is an aggregation on trajectory data showing the unique locations visited by an individual and the frequency with which he visited those locations.

Parameters: knowledge_length (int) – the length of the background knowledge that we want to simulate. The length of the background knowledge specifies the amount of knowledge that the adversary will use for her attack. For each individual all the combinations of points of length k will be evaluated.
Variables: knowledge_length (int) – the length of the background knowledge that we want to simulate.

Examples

>>> import skmob
>>> from skmob.privacy import attacks
>>> from skmob.core.trajectorydataframe import TrajDataFrame
>>> # load data
>>> url_priv_ex = "https://github.com/scikit-mobility/tutorials/blob/master/AMLD%202020/data/privacy_toy.csv"
>>> trjdat = TrajDataFrame.from_file(filename=url_priv_ex)
>>> # create a location attack and assess risk
>>> at = attacks.UniqueLocationAttack(knowledge_length=2)
>>> r = at.assess_risk(trjdat)
>>> print(r)
   uid      risk
0    1  0.333333
1    2  0.250000
2    3  0.333333
3    4  0.333333
4    5  0.250000
5    6  0.250000
6    7  0.250000

>>> # change the length of the background knowledge and reassess risk
>>> at.knowledge_length = 3
>>> r = at.assess_risk(trjdat)
>>> print(r)
   uid      risk
0    1  0.500000
1    2  0.333333
2    3  0.500000
3    4  0.333333
4    5  0.333333
5    6  0.250000
6    7  0.250000

>>> # limit privacy assessment to some target uids
>>> r = at.assess_risk(trjdat, targets=[1,2])
>>> print(r)
   uid      risk
0    1  0.500000
1    2  0.333333

>>> # inspect probability of reidentification for each background knowledge instance
>>> r = at.assess_risk(trjdat, targets=[1,2], force_instances=True)
>>> print(r)
    lat        lng   datetime  uid  instance  instance_elem      prob
 1.0  43.544270  10.326150  1.0         1              1  0.333333
 1.0  43.708530  10.403600  1.0         1              2  0.333333
 1.0  43.779250  11.246260  1.0         1              3  0.333333
 1.0  43.544270  10.326150  1.0         2              1  0.333333
 1.0  43.708530  10.403600  1.0         2              2  0.333333
 1.0  43.843014  10.507994  1.0         2              3  0.333333
 1.0  43.544270  10.326150  1.0         3              1  0.500000
 1.0  43.779250  11.246260  1.0         3              2  0.500000
 1.0  43.843014  10.507994  1.0         3              3  0.500000
 1.0  43.708530  10.403600  1.0         4              1  0.333333
1.0  43.779250  11.246260  1.0         4              2  0.333333
1.0  43.843014  10.507994  1.0         4              3  0.333333
2.0  43.544270  10.326150  1.0         1              1  0.333333
2.0  43.708530  10.403600  1.0         1              2  0.333333
2.0  43.843014  10.507994  2.0         1              3  0.333333

References

TIST2018: Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, and Anna Monreale. 2017. A Data Mining Approach to Assess Privacy Risk in Human Mobility Data. ACM Trans. Intell. Syst. Technol. 9, 3, Article 31 (December 2017), 27 pages. DOI: https://doi.org/10.1145/3106774
MOB2018: Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, Anna Monreale: Analyzing Privacy Risk in Human Mobility Data. STAF Workshops 2018: 114-129

assess_risk(traj, targets=None, force_instances=False, show_progress=False)

Assess privacy risk for a TrajectoryDataFrame. An attack must implement an assessing strategy. This could involve some preprocessing, for example transforming the original data, and calls to the risk function. If it is not required to compute the risk for the entire data, the targets parameter can be used to select a portion of users to perform the assessment on.

Parameters

traj (TrajectoryDataFrame) – the dataframe on which to assess privacy risk.
targets (TrajectoryDataFrame or list, optional) – the users_id target of the attack. They must be compatible with the trajectory data. Default values is None in which case risk is computed on all users in traj. The defaul is None.
force_instances (boolean, optional) – if True, returns all possible instances of background knowledge with their respective probability of reidentification. The defaul is False.
show_progress (boolean, optional) – if True, shows the progress of the computation. The defaul is False.

Returns

a DataFrame with the privacy risk for each user, in the form (user_id, risk).

Return type

DataFrame