Privacy
|
Location Attack |
|
Location Time Attack |
|
Unique Location Attack |
|
Location Sequence Attack In a location sequence attack the adversary knows the coordinates of locations visited by an individual and the order in which they were visited and matches them against trajectories. |
|
Location Frequency Attack |
|
Location Probability Attack |
|
Location Proportion Attack |
|
Home And Work Attack |
- class skmob.privacy.attacks.HomeWorkAttack(knowledge_length=1)
Home And Work Attack
In a home and work attack the adversary knows the coordinates of the two locations most frequently visited by an individual, and matches them against frequency vectors. A frequency vector is an aggregation on trajectory data showing the unique locations visited by an individual and the frequency with which he visited those locations. This attack does not require the generation of combinations to build the possible instances of background knowledge.
- Parameters
knowledge_length (int) – the length of the background knowledge that we want to simulate. The length of the background knowledge specifies the amount of knowledge that the adversary will use for her attack. For each individual all the combinations of points of length k will be evaluated.
- Variables
knowledge_length (int) – the length of the background knowledge that we want to simulate.
Examples
>>> import skmob >>> from skmob.privacy import attacks >>> from skmob.core.trajectorydataframe import TrajDataFrame >>> # load data >>> url_priv_ex = "https://github.com/scikit-mobility/tutorials/blob/master/AMLD%202020/data/privacy_toy.csv" >>> trjdat = TrajDataFrame.from_file(filename=url_priv_ex) >>> # create a location attack and assess risk >>> at = attacks.HomeWorkAttack() >>> r = at.assess_risk(trjdat) >>> print(r) uid risk 0 1 0.25 1 2 0.25 2 3 0.25 3 4 0.25 4 5 1.00 5 6 1.00 6 7 1.00
>>> # limit privacy assessment to some target uids >>> r = at.assess_risk(trjdat, targets=[1,2]) >>> print(r) uid risk 0 1 0.25 1 2 0.25
>>> # inspect probability of reidentification for each background knowledge instance >>> r = at.assess_risk(trjdat, targets=[1,2], force_instances=True) >>> print(r) lat lng datetime uid instance instance_elem prob 0 1.0 43.54427 10.32615 1.0 1 1 0.25 1 1.0 43.70853 10.40360 1.0 1 2 0.25 2 2.0 43.54427 10.32615 1.0 1 1 0.25 3 2.0 43.70853 10.40360 1.0 1 2 0.25
References
- TIST2018
Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, and Anna Monreale. 2017. A Data Mining Approach to Assess Privacy Risk in Human Mobility Data. ACM Trans. Intell. Syst. Technol. 9, 3, Article 31 (December 2017), 27 pages. DOI: https://doi.org/10.1145/3106774
- MOB2018
Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, Anna Monreale: Analyzing Privacy Risk in Human Mobility Data. STAF Workshops 2018: 114-129
- assess_risk(traj, targets=None, force_instances=False, show_progress=False)
Assess privacy risk for a TrajectoryDataFrame. An attack must implement an assessing strategy. This could involve some preprocessing, for example transforming the original data, and calls to the risk function. If it is not required to compute the risk for the entire data, the targets parameter can be used to select a portion of users to perform the assessment on.
- Parameters
traj (TrajectoryDataFrame) – the dataframe on which to assess privacy risk.
targets (TrajectoryDataFrame or list, optional) – the users_id target of the attack. They must be compatible with the trajectory data. Default values is None in which case risk is computed on all users in traj. The defaul is None.
force_instances (boolean, optional) – if True, returns all possible instances of background knowledge with their respective probability of reidentification. The defaul is False.
show_progress (boolean, optional) – if True, shows the progress of the computation. The defaul is False.
- Returns
a DataFrame with the privacy risk for each user, in the form (user_id, risk).
- Return type
DataFrame
- class skmob.privacy.attacks.LocationAttack(knowledge_length)
Location Attack
In a location attack the adversary knows the coordinates of the locations visited by an individual and matches them against trajectories.
- Parameters
knowledge_length (int) – the length of the background knowledge that we want to simulate. The length of the background knowledge specifies the amount of knowledge that the adversary will use for her attack. For each individual all the combinations of points of length k will be evaluated.
- Variables
knowledge_length (int) – the length of the background knowledge that we want to simulate.
Examples
>>> import skmob >>> from skmob.privacy import attacks >>> from skmob.core.trajectorydataframe import TrajDataFrame >>> # load data >>> url_priv_ex = "https://github.com/scikit-mobility/tutorials/blob/master/AMLD%202020/data/privacy_toy.csv" >>> trjdat = TrajDataFrame.from_file(filename=url_priv_ex) >>> # create a location attack and assess risk >>> at = attacks.LocationAttack(knowledge_length=2) >>> r = at.assess_risk(trjdat) >>> print(r) uid risk 0 1 0.333333 1 2 0.500000 2 3 0.333333 3 4 0.333333 4 5 0.250000 5 6 0.250000 6 7 0.500000
>>> # change the length of the background knowledge and reassess risk >>> at.knowledge_length = 3 >>> r = at.assess_risk(trjdat) >>> print(r) uid risk 0 1 0.500000 1 2 1.000000 2 3 0.500000 3 4 0.333333 4 5 0.333333 5 6 0.250000 6 7 1.000000
>>> # limit privacy assessment to some target uids >>> r = at.assess_risk(trjdat, targets=[1,2]) >>> print(r) uid risk 0 1 0.5 1 2 1.0
>>> # inspect probability of reidentification for each background knowledge instance >>> r = at.assess_risk(trjdat, targets=[1,2], force_instances=True) >>> print(r) lat lng datetime uid instance instance_elem prob 0 43.843014 10.507994 2011-02-03 08:34:04 1 1 1 0.333333 1 43.544270 10.326150 2011-02-03 09:34:04 1 1 2 0.333333 2 43.708530 10.403600 2011-02-03 10:34:04 1 1 3 0.333333 3 43.843014 10.507994 2011-02-03 08:34:04 1 2 1 0.500000 4 43.544270 10.326150 2011-02-03 09:34:04 1 2 2 0.500000 5 43.779250 11.246260 2011-02-04 10:34:04 1 2 3 0.500000 6 43.843014 10.507994 2011-02-03 08:34:04 1 3 1 0.333333 7 43.708530 10.403600 2011-02-03 10:34:04 1 3 2 0.333333 8 43.779250 11.246260 2011-02-04 10:34:04 1 3 3 0.333333 9 43.544270 10.326150 2011-02-03 09:34:04 1 4 1 0.333333 10 43.708530 10.403600 2011-02-03 10:34:04 1 4 2 0.333333 11 43.779250 11.246260 2011-02-04 10:34:04 1 4 3 0.333333 12 43.843014 10.507994 2011-02-03 08:34:04 2 1 1 1.000000 13 43.708530 10.403600 2011-02-03 09:34:04 2 1 2 1.000000 14 43.843014 10.507994 2011-02-04 10:34:04 2 1 3 1.000000 15 43.843014 10.507994 2011-02-03 08:34:04 2 2 1 0.333333 16 43.708530 10.403600 2011-02-03 09:34:04 2 2 2 0.333333 17 43.544270 10.326150 2011-02-04 11:34:04 2 2 3 0.333333 18 43.843014 10.507994 2011-02-03 08:34:04 2 3 1 1.000000 19 43.843014 10.507994 2011-02-04 10:34:04 2 3 2 1.000000 20 43.544270 10.326150 2011-02-04 11:34:04 2 3 3 1.000000 21 43.708530 10.403600 2011-02-03 09:34:04 2 4 1 0.333333 22 43.843014 10.507994 2011-02-04 10:34:04 2 4 2 0.333333 23 43.544270 10.326150 2011-02-04 11:34:04 2 4 3 0.333333
References
- TIST2018
Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, and Anna Monreale. 2017. A Data Mining Approach to Assess Privacy Risk in Human Mobility Data. ACM Trans. Intell. Syst. Technol. 9, 3, Article 31 (December 2017), 27 pages. DOI: https://doi.org/10.1145/3106774
- MOB2018
Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, Anna Monreale: Analyzing Privacy Risk in Human Mobility Data. STAF Workshops 2018: 114-129
- assess_risk(traj, targets=None, force_instances=False, show_progress=False)
Assess privacy risk for a TrajectoryDataFrame. An attack must implement an assessing strategy. This could involve some preprocessing, for example transforming the original data, and calls to the risk function. If it is not required to compute the risk for the entire data, the targets parameter can be used to select a portion of users to perform the assessment on.
- Parameters
traj (TrajectoryDataFrame) – the dataframe on which to assess privacy risk.
targets (TrajectoryDataFrame or list, optional) – the users_id target of the attack. They must be compatible with the trajectory data. Default values is None in which case risk is computed on all users in traj. The defaul is None.
force_instances (boolean, optional) – if True, returns all possible instances of background knowledge with their respective probability of reidentification. The defaul is False.
show_progress (boolean, optional) – if True, shows the progress of the computation. The defaul is False.
- Returns
a DataFrame with the privacy risk for each user, in the form (user_id, risk).
- Return type
DataFrame
- class skmob.privacy.attacks.LocationFrequencyAttack(knowledge_length, tolerance=0.0)
Location Frequency Attack
In a location frequency attack the adversary knows the coordinates of the unique locations visited by an individual and the frequency with which he visited them, and matches them against frequency vectors. A frequency vector, is an aggregation on trajectory data showing the unique locations visited by an individual and the frequency with which he visited those locations. It is possible to specify a tolerance level for the matching of the frequency.
- Parameters
knowledge_length (int) – the length of the background knowledge that we want to simulate. The length of the background knowledge specifies the amount of knowledge that the adversary will use for her attack. For each individual all the combinations of points of length k will be evaluated.
tolerance (float, optional) – the tolarance with which to match the frequency. It can assume values between 0 and 1. The defaul is 0.
- Variables
knowledge_length (int) – the length of the background knowledge that we want to simulate.
tolerance (float) – the tolarance with which to match the frequency.
Examples
>>> import skmob >>> from skmob.privacy import attacks >>> from skmob.core.trajectorydataframe import TrajDataFrame >>> # load data >>> url_priv_ex = "https://github.com/scikit-mobility/tutorials/blob/master/AMLD%202020/data/privacy_toy.csv" >>> trjdat = TrajDataFrame.from_file(filename=url_priv_ex) >>> # create a location attack and assess risk >>> at = attacks.LocationFrequencyAttack(knowledge_length=2) >>> r = at.assess_risk(trjdat) >>> print(r) uid risk 0 1 0.333333 1 2 1.000000 2 3 0.333333 3 4 0.333333 4 5 0.333333 5 6 0.333333 6 7 1.000000
>>> # change the tolerance with witch the frequency is matched >>> at.tolerance = 0.5 >>> r = at.assess_risk(trjdat) >>> print(r) uid risk 0 1 0.333333 1 2 1.000000 2 3 0.333333 3 4 0.333333 4 5 0.250000 5 6 0.250000 6 7 1.000000
>>> # change the length of the background knowledge and reassess risk >>> at.knowledge_length = 3 >>> r = at.assess_risk(trjdat) >>> print(r) uid risk 0 1 0.500000 1 2 1.000000 2 3 0.500000 3 4 0.333333 4 5 0.333333 5 6 0.250000 6 7 1.000000
>>> # limit privacy assessment to some target uids >>> r = at.assess_risk(trjdat, targets=[1,2]) >>> print(r) uid risk 0 1 0.5 1 2 1.0
>>> # inspect probability of reidentification for each background knowledge instance >>> r = at.assess_risk(trjdat, targets=[1,2], force_instances=True) >>> print(r) lat lng datetime uid instance instance_elem prob 0 1.0 43.544270 10.326150 1.0 1 1 0.333333 1 1.0 43.708530 10.403600 1.0 1 2 0.333333 2 1.0 43.779250 11.246260 1.0 1 3 0.333333 3 1.0 43.544270 10.326150 1.0 2 1 0.333333 4 1.0 43.708530 10.403600 1.0 2 2 0.333333 5 1.0 43.843014 10.507994 1.0 2 3 0.333333 6 1.0 43.544270 10.326150 1.0 3 1 0.500000 7 1.0 43.779250 11.246260 1.0 3 2 0.500000 8 1.0 43.843014 10.507994 1.0 3 3 0.500000 9 1.0 43.708530 10.403600 1.0 4 1 0.333333 10 1.0 43.779250 11.246260 1.0 4 2 0.333333 11 1.0 43.843014 10.507994 1.0 4 3 0.333333 12 2.0 43.544270 10.326150 1.0 1 1 1.000000 13 2.0 43.708530 10.403600 1.0 1 2 1.000000 14 2.0 43.843014 10.507994 2.0 1 3 1.000000
References
- TIST2018
Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, and Anna Monreale. 2017. A Data Mining Approach to Assess Privacy Risk in Human Mobility Data. ACM Trans. Intell. Syst. Technol. 9, 3, Article 31 (December 2017), 27 pages. DOI: https://doi.org/10.1145/3106774
- MOB2018
Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, Anna Monreale: Analyzing Privacy Risk in Human Mobility Data. STAF Workshops 2018: 114-129
- assess_risk(traj, targets=None, force_instances=False, show_progress=False)
Assess privacy risk for a TrajectoryDataFrame. An attack must implement an assessing strategy. This could involve some preprocessing, for example transforming the original data, and calls to the risk function. If it is not required to compute the risk for the entire data, the targets parameter can be used to select a portion of users to perform the assessment on.
- Parameters
traj (TrajectoryDataFrame) – the dataframe on which to assess privacy risk.
targets (TrajectoryDataFrame or list, optional) – the users_id target of the attack. They must be compatible with the trajectory data. Default values is None in which case risk is computed on all users in traj. The defaul is None.
force_instances (boolean, optional) – if True, returns all possible instances of background knowledge with their respective probability of reidentification. The defaul is False.
show_progress (boolean, optional) – if True, shows the progress of the computation. The defaul is False.
- Returns
a DataFrame with the privacy risk for each user, in the form (user_id, risk).
- Return type
DataFrame
- class skmob.privacy.attacks.LocationProbabilityAttack(knowledge_length, tolerance=0.0)
Location Probability Attack
In a location probability attack the adversary knows the coordinates of the unique locations visited by an individual and the probability with which he visited them, and matches them against probability vectors. A probability vector, is an aggregation on trajectory data showing the unique locations visited by an individual and the probability with which he visited those locations. It is possible to specify a tolerance level for the matching of the probability.
- Parameters
knowledge_length (int) – the length of the background knowledge that we want to simulate. The length of the background knowledge specifies the amount of knowledge that the adversary will use for her attack. For each individual all the combinations of points of length k will be evaluated.
tolerance (float, optional) – the tolarance with which to match the probability. It can assume values between 0 and 1. The defaul is 0.
- Variables
knowledge_length (int) – the length of the background knowledge that we want to simulate.
tolerance (float) – the tolarance with which to match the probability.
Examples
>>> import skmob >>> from skmob.privacy import attacks >>> from skmob.core.trajectorydataframe import TrajDataFrame >>> # load data >>> url_priv_ex = "https://github.com/scikit-mobility/tutorials/blob/master/AMLD%202020/data/privacy_toy.csv" >>> trjdat = TrajDataFrame.from_file(filename=url_priv_ex) >>> # create a location attack and assess risk >>> at = attacks.LocationProbabilityAttack(knowledge_length=2) >>> r = at.assess_risk(trjdat) >>> print(r) uid risk 0 1 0.5 1 2 1.0 2 3 0.5 3 4 1.0 4 5 1.0 5 6 1.0 6 7 1.0
>>> # change the tolerance with witch the frequency is matched >>> at.tolerance = 0.5 >>> r = at.assess_risk(trjdat) >>> print(r) uid risk 0 1 0.333333 1 2 0.500000 2 3 0.333333 3 4 0.333333 4 5 0.250000 5 6 1.000000 6 7 1.000000
>>> # change the length of the background knowledge and reassess risk >>> at.knowledge_length = 3 >>> r = at.assess_risk(trjdat) >>> print(r) uid risk 0 1 0.500000 1 2 1.000000 2 3 0.500000 3 4 0.333333 4 5 0.333333 5 6 1.000000 6 7 1.000000
>>> # limit privacy assessment to some target uids >>> r = at.assess_risk(trjdat, targets=[1,2]) >>> print(r) uid risk 0 1 0.5 1 2 1.0 >>> # inspect probability of reidentification for each background knowledge instance >>> r = at.assess_risk(trjdat, targets=[1,2], force_instances=True) >>> print(r) lat lng datetime uid instance instance_elem prob 0 1.0 43.544270 10.326150 0.25 1 1 0.333333 1 1.0 43.708530 10.403600 0.25 1 2 0.333333 2 1.0 43.779250 11.246260 0.25 1 3 0.333333 3 1.0 43.544270 10.326150 0.25 2 1 0.333333 4 1.0 43.708530 10.403600 0.25 2 2 0.333333 5 1.0 43.843014 10.507994 0.25 2 3 0.333333 6 1.0 43.544270 10.326150 0.25 3 1 0.500000 7 1.0 43.779250 11.246260 0.25 3 2 0.500000 8 1.0 43.843014 10.507994 0.25 3 3 0.500000 9 1.0 43.708530 10.403600 0.25 4 1 0.333333 10 1.0 43.779250 11.246260 0.25 4 2 0.333333 11 1.0 43.843014 10.507994 0.25 4 3 0.333333 12 2.0 43.544270 10.326150 0.25 1 1 1.000000 13 2.0 43.708530 10.403600 0.25 1 2 1.000000 14 2.0 43.843014 10.507994 0.50 1 3 1.000000
References
- TIST2018
Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, and Anna Monreale. 2017. A Data Mining Approach to Assess Privacy Risk in Human Mobility Data. ACM Trans. Intell. Syst. Technol. 9, 3, Article 31 (December 2017), 27 pages. DOI: https://doi.org/10.1145/3106774
- MOB2018
Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, Anna Monreale: Analyzing Privacy Risk in Human Mobility Data. STAF Workshops 2018: 114-129
- assess_risk(traj, targets=None, force_instances=False, show_progress=False)
Assess privacy risk for a TrajectoryDataFrame. An attack must implement an assessing strategy. This could involve some preprocessing, for example transforming the original data, and calls to the risk function. If it is not required to compute the risk for the entire data, the targets parameter can be used to select a portion of users to perform the assessment on.
- Parameters
traj (TrajectoryDataFrame) – the dataframe on which to assess privacy risk.
targets (TrajectoryDataFrame or list, optional) – the users_id target of the attack. They must be compatible with the trajectory data. Default values is None in which case risk is computed on all users in traj. The defaul is None.
force_instances (boolean, optional) – if True, returns all possible instances of background knowledge with their respective probability of reidentification. The defaul is False.
show_progress (boolean, optional) – if True, shows the progress of the computation. The defaul is False.
- Returns
a DataFrame with the privacy risk for each user, in the form (user_id, risk).
- Return type
DataFrame
- class skmob.privacy.attacks.LocationProportionAttack(knowledge_length, tolerance=0.0)
Location Proportion Attack
In a location proportion attack the adversary knows the coordinates of the unique locations visited by an individual and the relative proportions between their frequencies of visit, and matches them against frequency vectors. A frequency vector is an aggregation on trajectory data showing the unique locations visited by an individual and the frequency with which he visited those locations. It is possible to specify a tolerance level for the matching of the proportion.
- Parameters
knowledge_length (int) – the length of the background knowledge that we want to simulate. The length of the background knowledge specifies the amount of knowledge that the adversary will use for her attack. For each individual all the combinations of points of length k will be evaluated.
tolerance (float, optional) – the tolarance with which to match the frequency. It can assume values between 0 and 1. The defaul is 0.
- Variables
knowledge_length (int) – the length of the background knowledge that we want to simulate.
tolerance (float) – the tolarance with which to match the frequency.
Examples
>>> import skmob >>> from skmob.privacy import attacks >>> from skmob.core.trajectorydataframe import TrajDataFrame >>> # load data >>> url_priv_ex = "https://github.com/scikit-mobility/tutorials/blob/master/AMLD%202020/data/privacy_toy.csv" >>> trjdat = TrajDataFrame.from_file(filename=url_priv_ex) >>> # create a location attack and assess risk >>> at = attacks.LocationProportionAttack(knowledge_length=2) >>> r = at.assess_risk(trjdat) >>> print(r) uid risk 0 1 0.333333 1 2 1.000000 2 3 0.333333 3 4 0.333333 4 5 0.333333 5 6 0.333333 6 7 1.000000
>>> # change the tolerance with witch the frequency is matched >>> at.tolerance = 0.5 >>> r = at.assess_risk(trjdat) >>> print(r) uid risk 0 1 0.333333 1 2 0.250000 2 3 0.333333 3 4 0.333333 4 5 0.333333 5 6 0.333333 6 7 0.250000
>>> # change the length of the background knowledge and reassess risk >>> at.knowledge_length = 3 >>> r = at.assess_risk(trjdat) >>> print(r) uid risk 0 1 0.500000 1 2 0.333333 2 3 0.500000 3 4 0.333333 4 5 0.333333 5 6 0.333333 6 7 0.250000
>>> # limit privacy assessment to some target uids >>> r = at.assess_risk(trjdat, targets=[1,2]) >>> print(r) uid risk 0 1 0.500000 1 2 0.333333
>>> # inspect probability of reidentification for each background knowledge instance >>> r = at.assess_risk(trjdat, targets=[1,2], force_instances=True) >>> print(r) lat lng datetime uid instance instance_elem prob 0 1.0 43.544270 10.326150 1.0 1 1 0.333333 1 1.0 43.708530 10.403600 1.0 1 2 0.333333 2 1.0 43.779250 11.246260 1.0 1 3 0.333333 3 1.0 43.544270 10.326150 1.0 2 1 0.500000 4 1.0 43.708530 10.403600 1.0 2 2 0.500000 5 1.0 43.843014 10.507994 1.0 2 3 0.500000 6 1.0 43.544270 10.326150 1.0 3 1 0.500000 7 1.0 43.779250 11.246260 1.0 3 2 0.500000 8 1.0 43.843014 10.507994 1.0 3 3 0.500000 9 1.0 43.708530 10.403600 1.0 4 1 0.333333 10 1.0 43.779250 11.246260 1.0 4 2 0.333333 11 1.0 43.843014 10.507994 1.0 4 3 0.333333 12 2.0 43.544270 10.326150 1.0 1 1 0.333333 13 2.0 43.708530 10.403600 1.0 1 2 0.333333 14 2.0 43.843014 10.507994 2.0 1 3 0.333333
References
- TIST2018
Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, and Anna Monreale. 2017. A Data Mining Approach to Assess Privacy Risk in Human Mobility Data. ACM Trans. Intell. Syst. Technol. 9, 3, Article 31 (December 2017), 27 pages. DOI: https://doi.org/10.1145/3106774
- MOB2018
Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, Anna Monreale: Analyzing Privacy Risk in Human Mobility Data. STAF Workshops 2018: 114-129
- assess_risk(traj, targets=None, force_instances=False, show_progress=False)
Assess privacy risk for a TrajectoryDataFrame. An attack must implement an assessing strategy. This could involve some preprocessing, for example transforming the original data, and calls to the risk function. If it is not required to compute the risk for the entire data, the targets parameter can be used to select a portion of users to perform the assessment on.
- Parameters
traj (TrajectoryDataFrame) – the dataframe on which to assess privacy risk.
targets (TrajectoryDataFrame or list, optional) – the users_id target of the attack. They must be compatible with the trajectory data. Default values is None in which case risk is computed on all users in traj. The defaul is None.
force_instances (boolean, optional) – if True, returns all possible instances of background knowledge with their respective probability of reidentification. The defaul is False.
show_progress (boolean, optional) – if True, shows the progress of the computation. The defaul is False.
- Returns
a DataFrame with the privacy risk for each user, in the form (user_id, risk).
- Return type
DataFrame
- class skmob.privacy.attacks.LocationSequenceAttack(knowledge_length)
Location Sequence Attack In a location sequence attack the adversary knows the coordinates of locations visited by an individual and the order in which they were visited and matches them against trajectories.
- Parameters
knowledge_length (int) – the length of the background knowledge that we want to simulate. The length of the background knowledge specifies the amount of knowledge that the adversary will use for her attack. For each individual all the combinations of points of length k will be evaluated.
- Variables
knowledge_length (int) – the length of the background knowledge that we want to simulate.
Examples
>>> import skmob >>> from skmob.privacy import attacks >>> from skmob.core.trajectorydataframe import TrajDataFrame >>> # load data >>> url_priv_ex = "https://github.com/scikit-mobility/tutorials/blob/master/AMLD%202020/data/privacy_toy.csv" >>> trjdat = TrajDataFrame.from_file(filename=url_priv_ex) >>> # create a location attack and assess risk >>> at = attacks.LocationSequenceAttack(knowledge_length=2) >>> r = at.assess_risk(trjdat) >>> print(r) uid risk 0 1 0.500000 1 2 0.500000 2 3 1.000000 3 4 0.500000 4 5 1.000000 5 6 0.333333 6 7 0.500000
>>> # change the length of the background knowledge and reassess risk >>> at.knowledge_length = 3 >>> r = at.assess_risk(trjdat) >>> print(r) uid risk 0 1 1.000000 1 2 1.000000 2 3 1.000000 3 4 1.000000 4 5 1.000000 5 6 0.333333 6 7 1.000000
>>> # limit privacy assessment to some target uids >>> r = at.assess_risk(trjdat, targets=[1,2]) >>> print(r) uid risk 0 1 1.0 1 2 1.0
>>> # inspect probability of reidentification for each background knowledge instance >>> r = at.assess_risk(trjdat, targets=[1,2], force_instances=True) >>> print(r) lat lng datetime uid instance instance_elem prob 0 43.843014 10.507994 2011-02-03 08:34:04 1 1 1 1.0 1 43.544270 10.326150 2011-02-03 09:34:04 1 1 2 1.0 2 43.708530 10.403600 2011-02-03 10:34:04 1 1 3 1.0 3 43.843014 10.507994 2011-02-03 08:34:04 1 2 1 1.0 4 43.544270 10.326150 2011-02-03 09:34:04 1 2 2 1.0 5 43.779250 11.246260 2011-02-04 10:34:04 1 2 3 1.0 6 43.843014 10.507994 2011-02-03 08:34:04 1 3 1 1.0 7 43.708530 10.403600 2011-02-03 10:34:04 1 3 2 1.0 8 43.779250 11.246260 2011-02-04 10:34:04 1 3 3 1.0 9 43.544270 10.326150 2011-02-03 09:34:04 1 4 1 0.5 10 43.708530 10.403600 2011-02-03 10:34:04 1 4 2 0.5 11 43.779250 11.246260 2011-02-04 10:34:04 1 4 3 0.5 12 43.843014 10.507994 2011-02-03 08:34:04 2 1 1 1.0 13 43.708530 10.403600 2011-02-03 09:34:04 2 1 2 1.0 14 43.843014 10.507994 2011-02-04 10:34:04 2 1 3 1.0 15 43.843014 10.507994 2011-02-03 08:34:04 2 2 1 1.0 16 43.708530 10.403600 2011-02-03 09:34:04 2 2 2 1.0 17 43.544270 10.326150 2011-02-04 11:34:04 2 2 3 1.0 18 43.843014 10.507994 2011-02-03 08:34:04 2 3 1 1.0 19 43.843014 10.507994 2011-02-04 10:34:04 2 3 2 1.0 20 43.544270 10.326150 2011-02-04 11:34:04 2 3 3 1.0 21 43.708530 10.403600 2011-02-03 09:34:04 2 4 1 1.0 22 43.843014 10.507994 2011-02-04 10:34:04 2 4 2 1.0 23 43.544270 10.326150 2011-02-04 11:34:04 2 4 3 1.0
References
- TIST2018
Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, and Anna Monreale. 2017. A Data Mining Approach to Assess Privacy Risk in Human Mobility Data. ACM Trans. Intell. Syst. Technol. 9, 3, Article 31 (December 2017), 27 pages. DOI: https://doi.org/10.1145/3106774
- MOB2018
Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, Anna Monreale: Analyzing Privacy Risk in Human Mobility Data. STAF Workshops 2018: 114-129
- assess_risk(traj, targets=None, force_instances=False, show_progress=False)
Assess privacy risk for a TrajectoryDataFrame. An attack must implement an assessing strategy. This could involve some preprocessing, for example transforming the original data, and calls to the risk function. If it is not required to compute the risk for the entire data, the targets parameter can be used to select a portion of users to perform the assessment on.
- Parameters
traj (TrajectoryDataFrame) – the dataframe on which to assess privacy risk.
targets (TrajectoryDataFrame or list, optional) – the users_id target of the attack. They must be compatible with the trajectory data. Default values is None in which case risk is computed on all users in traj. The defaul is None.
force_instances (boolean, optional) – if True, returns all possible instances of background knowledge with their respective probability of reidentification. The defaul is False.
show_progress (boolean, optional) – if True, shows the progress of the computation. The defaul is False.
- Returns
a DataFrame with the privacy risk for each user, in the form (user_id, risk).
- Return type
DataFrame
- class skmob.privacy.attacks.LocationTimeAttack(knowledge_length, time_precision='Hour')
Location Time Attack
In a location time attack the adversary knows the coordinates of locations visited by an individual and the time in which they were visited and matches them against trajectories. The precision at which to consider the temporal information can also be specified.
- Parameters
knowledge_length (int) – the length of the background knowledge that we want to simulate. The length of the background knowledge specifies the amount of knowledge that the adversary will use for her attack. For each individual all the combinations of points of length k will be evaluated.
time_precision (string, optional) – the precision at which to consider the timestamps for the visits. The possible precisions are: Year, Month, Day, Hour, Minute, Second. The default is Hour
- Variables
knowledge_length (int) – the length of the background knowledge that we want to simulate.
time_precision (string) – the precision at which to consider the timestamps for the visits.
Examples
>>> import skmob >>> from skmob.privacy import attacks >>> from skmob.core.trajectorydataframe import TrajDataFrame >>> # load data >>> url_priv_ex = "https://github.com/scikit-mobility/tutorials/blob/master/AMLD%202020/data/privacy_toy.csv" >>> trjdat = TrajDataFrame.from_file(filename=url_priv_ex) >>> # create a location attack and assess risk >>> at = attacks.LocationTimeAttack(knowledge_length=2) >>> r = at.assess_risk(trjdat) >>> print(r) uid risk 0 1 1.0 1 2 1.0 2 3 1.0 3 4 1.0 4 5 1.0 5 6 0.5 6 7 1.0
>>> #change the time granularity of the attack >>> at.time_precision = "Month" >>> r = at.assess_risk(trjdat) >>> print(r) uid risk 0 1 0.333333 1 2 0.500000 2 3 0.333333 3 4 0.333333 4 5 0.250000 5 6 0.250000 6 7 0.500000
>>> # change the length of the background knowledge and reassess risk >>> at.knowledge_length = 3 >>> r = at.assess_risk(trjdat) >>> print(r) uid risk 0 1 0.500000 1 2 1.000000 2 3 0.500000 3 4 0.333333 4 5 0.333333 5 6 0.250000 6 7 1.000000
>>> # limit privacy assessment to some target uids >>> r = at.assess_risk(trjdat, targets=[1,2]) >>> print(r) uid risk 0 1 0.500000 1 2 1.000000
>>> # inspect probability of reidentification for each background knowledge instance >>> r = at.assess_risk(trjdat, targets=[1,2], force_instances=True) >>> print(r) lat lng datetime uid instance instance_elem prob 0 43.843014 10.507994 2011-02-03 08:34:04 1 1 1 0.333333 1 43.544270 10.326150 2011-02-03 09:34:04 1 1 2 0.333333 2 43.708530 10.403600 2011-02-03 10:34:04 1 1 3 0.333333 3 43.843014 10.507994 2011-02-03 08:34:04 1 2 1 0.500000 4 43.544270 10.326150 2011-02-03 09:34:04 1 2 2 0.500000 5 43.779250 11.246260 2011-02-04 10:34:04 1 2 3 0.500000 6 43.843014 10.507994 2011-02-03 08:34:04 1 3 1 0.333333 7 43.708530 10.403600 2011-02-03 10:34:04 1 3 2 0.333333 8 43.779250 11.246260 2011-02-04 10:34:04 1 3 3 0.333333 9 43.544270 10.326150 2011-02-03 09:34:04 1 4 1 0.333333 10 43.708530 10.403600 2011-02-03 10:34:04 1 4 2 0.333333 11 43.779250 11.246260 2011-02-04 10:34:04 1 4 3 0.333333 12 43.843014 10.507994 2011-02-03 08:34:04 2 1 1 1.000000 13 43.708530 10.403600 2011-02-03 09:34:04 2 1 2 1.000000 14 43.843014 10.507994 2011-02-04 10:34:04 2 1 3 1.000000 15 43.843014 10.507994 2011-02-03 08:34:04 2 2 1 0.333333 16 43.708530 10.403600 2011-02-03 09:34:04 2 2 2 0.333333 17 43.544270 10.326150 2011-02-04 11:34:04 2 2 3 0.333333 18 43.843014 10.507994 2011-02-03 08:34:04 2 3 1 1.000000 19 43.843014 10.507994 2011-02-04 10:34:04 2 3 2 1.000000 20 43.544270 10.326150 2011-02-04 11:34:04 2 3 3 1.000000 21 43.708530 10.403600 2011-02-03 09:34:04 2 4 1 0.333333 22 43.843014 10.507994 2011-02-04 10:34:04 2 4 2 0.333333 23 43.544270 10.326150 2011-02-04 11:34:04 2 4 3 0.333333
References
- TIST2018
Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, and Anna Monreale. 2017. A Data Mining Approach to Assess Privacy Risk in Human Mobility Data. ACM Trans. Intell. Syst. Technol. 9, 3, Article 31 (December 2017), 27 pages. DOI: https://doi.org/10.1145/3106774
- MOB2018
Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, Anna Monreale: Analyzing Privacy Risk in Human Mobility Data. STAF Workshops 2018: 114-129
- assess_risk(traj, targets=None, force_instances=False, show_progress=False)
Assess privacy risk for a TrajectoryDataFrame. An attack must implement an assessing strategy. This could involve some preprocessing, for example transforming the original data, and calls to the risk function. If it is not required to compute the risk for the entire data, the targets parameter can be used to select a portion of users to perform the assessment on.
- Parameters
traj (TrajectoryDataFrame) – the dataframe on which to assess privacy risk.
targets (TrajectoryDataFrame or list, optional) – the users_id target of the attack. They must be compatible with the trajectory data. Default values is None in which case risk is computed on all users in traj. The defaul is None.
force_instances (boolean, optional) – if True, returns all possible instances of background knowledge with their respective probability of reidentification. The defaul is False.
show_progress (boolean, optional) – if True, shows the progress of the computation. The defaul is False.
- Returns
a DataFrame with the privacy risk for each user, in the form (user_id, risk).
- Return type
DataFrame
- class skmob.privacy.attacks.UniqueLocationAttack(knowledge_length)
Unique Location Attack
In a unique location attack the adversary knows the coordinates of unique locations visited by an individual, and matches them against frequency vectors. A frequency vector, is an aggregation on trajectory data showing the unique locations visited by an individual and the frequency with which he visited those locations.
- Parameters
knowledge_length (int) – the length of the background knowledge that we want to simulate. The length of the background knowledge specifies the amount of knowledge that the adversary will use for her attack. For each individual all the combinations of points of length k will be evaluated.
- Variables
knowledge_length (int) – the length of the background knowledge that we want to simulate.
Examples
>>> import skmob >>> from skmob.privacy import attacks >>> from skmob.core.trajectorydataframe import TrajDataFrame >>> # load data >>> url_priv_ex = "https://github.com/scikit-mobility/tutorials/blob/master/AMLD%202020/data/privacy_toy.csv" >>> trjdat = TrajDataFrame.from_file(filename=url_priv_ex) >>> # create a location attack and assess risk >>> at = attacks.UniqueLocationAttack(knowledge_length=2) >>> r = at.assess_risk(trjdat) >>> print(r) uid risk 0 1 0.333333 1 2 0.250000 2 3 0.333333 3 4 0.333333 4 5 0.250000 5 6 0.250000 6 7 0.250000
>>> # change the length of the background knowledge and reassess risk >>> at.knowledge_length = 3 >>> r = at.assess_risk(trjdat) >>> print(r) uid risk 0 1 0.500000 1 2 0.333333 2 3 0.500000 3 4 0.333333 4 5 0.333333 5 6 0.250000 6 7 0.250000
>>> # limit privacy assessment to some target uids >>> r = at.assess_risk(trjdat, targets=[1,2]) >>> print(r) uid risk 0 1 0.500000 1 2 0.333333
>>> # inspect probability of reidentification for each background knowledge instance >>> r = at.assess_risk(trjdat, targets=[1,2], force_instances=True) >>> print(r) lat lng datetime uid instance instance_elem prob 0 1.0 43.544270 10.326150 1.0 1 1 0.333333 1 1.0 43.708530 10.403600 1.0 1 2 0.333333 2 1.0 43.779250 11.246260 1.0 1 3 0.333333 3 1.0 43.544270 10.326150 1.0 2 1 0.333333 4 1.0 43.708530 10.403600 1.0 2 2 0.333333 5 1.0 43.843014 10.507994 1.0 2 3 0.333333 6 1.0 43.544270 10.326150 1.0 3 1 0.500000 7 1.0 43.779250 11.246260 1.0 3 2 0.500000 8 1.0 43.843014 10.507994 1.0 3 3 0.500000 9 1.0 43.708530 10.403600 1.0 4 1 0.333333 10 1.0 43.779250 11.246260 1.0 4 2 0.333333 11 1.0 43.843014 10.507994 1.0 4 3 0.333333 12 2.0 43.544270 10.326150 1.0 1 1 0.333333 13 2.0 43.708530 10.403600 1.0 1 2 0.333333 14 2.0 43.843014 10.507994 2.0 1 3 0.333333
References
- TIST2018
Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, and Anna Monreale. 2017. A Data Mining Approach to Assess Privacy Risk in Human Mobility Data. ACM Trans. Intell. Syst. Technol. 9, 3, Article 31 (December 2017), 27 pages. DOI: https://doi.org/10.1145/3106774
- MOB2018
Roberto Pellungrini, Luca Pappalardo, Francesca Pratesi, Anna Monreale: Analyzing Privacy Risk in Human Mobility Data. STAF Workshops 2018: 114-129
- assess_risk(traj, targets=None, force_instances=False, show_progress=False)
Assess privacy risk for a TrajectoryDataFrame. An attack must implement an assessing strategy. This could involve some preprocessing, for example transforming the original data, and calls to the risk function. If it is not required to compute the risk for the entire data, the targets parameter can be used to select a portion of users to perform the assessment on.
- Parameters
traj (TrajectoryDataFrame) – the dataframe on which to assess privacy risk.
targets (TrajectoryDataFrame or list, optional) – the users_id target of the attack. They must be compatible with the trajectory data. Default values is None in which case risk is computed on all users in traj. The defaul is None.
force_instances (boolean, optional) – if True, returns all possible instances of background knowledge with their respective probability of reidentification. The defaul is False.
show_progress (boolean, optional) – if True, shows the progress of the computation. The defaul is False.
- Returns
a DataFrame with the privacy risk for each user, in the form (user_id, risk).
- Return type
DataFrame