使用 DP 查询跟踪联邦学习过程中的隐私保证
Track privacy guarantees in a federated learning process with DP-query
我对 TFF 有点陌生,我检查了 github 并按照 EMNIST 示例使用 DP-FedAvg
算法训练了差分私有联邦模型。这主要是通过将 dp-query
附加到 aggregation_process
然后训练联合模型来完成的。
我有一个问题请教:
1.鉴于将 dp-query
附加到聚合过程将导致参与者级别的 Central-DP ,我将如何在训练期间跟踪隐私保证(eps,delta)?
下面是一个代码片段,其中设置了一个有 100 个参与者的差异私有联合模型,这就是为什么 expected_total_weight
和 expected_clients_per_round
都设置为 100
def model_fn():
keras_model = create_keras_model()
return tff.learning.from_keras_model(
keras_model=keras_model,
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
input_spec=preprocessed_first_client_dataset.element_spec,
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])
dp_query = tff.utils.build_dp_query(
clip=1.0,
noise_multiplier=0.3,
expected_total_weight=100,
adaptive_clip_learning_rate=0,
target_unclipped_quantile=0.5,
clipped_count_budget_allocation=0.1,
expected_clients_per_round=100
)
weights_type = tff.learning.framework.weights_type_from_model(model_fn)
aggregation_process = tff.utils.build_dp_aggregate_process(weights_type.trainable, dp_query)
iterative_process = tff.learning.build_federated_averaging_process(
model_fn=model_fn,
client_optimizer_fn=lambda: tf.keras.optimizers.SGD(0.1),
server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=1.0),
aggregation_process=aggregation_process
)
我在 TF-Privacy 中遇到了几种计算 epsilon 和 delta 的方法,但它们似乎是为了跟踪传统 DP-SGD
算法的隐私保证,并期望接收诸如 steps
、n
和 batch_size
非常感谢
有几种方法可以执行此计算。我们将在下面讨论两个选项。
重新利用 DPSGD 分析工具
你说得对,这些工具接受以 DP-SGD 设置命名的参数;然而,他们的论点可以以相当直接的方式重新映射到联合设置。
假设我们有来自 TFP's analysis library 的符号 apply_dp_sgd_analysis
。我们可以编写一个简单的函数,实质上修改联合设置的 compute_dp_sgd_privacy
的主体:
def compute_fl_privacy(num_rounds, noise_multiplier, num_users, users_per_round):
# This actually assumes Poisson subsampling, which may not be *quite*
# right in your setting, but the approximation should be close either way.
q = users_per_round / num_users # q - the sampling ratio.
# These orders are inlined from the body of compute_dp_sgd_privacy
orders = ([1.25, 1.5, 1.75, 2., 2.25, 2.5, 3., 3.5, 4., 4.5] +
list(range(5, 64)) + [128, 256, 512])
# Depending on whether your TFF code by default uses adaptive clipping or not,
# you may need to rescale your noise_multiplier argument.
return apply_dp_sgd_analysis(
q, sigma=noise_multiplier, steps=num_rounds, orders=orders, delta=num_users ** (-1))
使用 TFP PrivacyLedger
如果您使用的是相对较新的 tff.aggregators.DifferentiallyPrivateFactory
(which I would suggest over the DP process used above), you can pass an already-constructed DPQuery
, which can be decorated with a PrivacyLedger
. This ledger could then be passed directly into a function like compute_rdp_from_ledger
,它应该会跟踪每次查询调用所花费的隐私。
我对 TFF 有点陌生,我检查了 github 并按照 EMNIST 示例使用 DP-FedAvg
算法训练了差分私有联邦模型。这主要是通过将 dp-query
附加到 aggregation_process
然后训练联合模型来完成的。
我有一个问题请教:
1.鉴于将 dp-query
附加到聚合过程将导致参与者级别的 Central-DP ,我将如何在训练期间跟踪隐私保证(eps,delta)?
下面是一个代码片段,其中设置了一个有 100 个参与者的差异私有联合模型,这就是为什么 expected_total_weight
和 expected_clients_per_round
都设置为 100
def model_fn():
keras_model = create_keras_model()
return tff.learning.from_keras_model(
keras_model=keras_model,
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
input_spec=preprocessed_first_client_dataset.element_spec,
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])
dp_query = tff.utils.build_dp_query(
clip=1.0,
noise_multiplier=0.3,
expected_total_weight=100,
adaptive_clip_learning_rate=0,
target_unclipped_quantile=0.5,
clipped_count_budget_allocation=0.1,
expected_clients_per_round=100
)
weights_type = tff.learning.framework.weights_type_from_model(model_fn)
aggregation_process = tff.utils.build_dp_aggregate_process(weights_type.trainable, dp_query)
iterative_process = tff.learning.build_federated_averaging_process(
model_fn=model_fn,
client_optimizer_fn=lambda: tf.keras.optimizers.SGD(0.1),
server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=1.0),
aggregation_process=aggregation_process
)
我在 TF-Privacy 中遇到了几种计算 epsilon 和 delta 的方法,但它们似乎是为了跟踪传统 DP-SGD
算法的隐私保证,并期望接收诸如 steps
、n
和 batch_size
非常感谢
有几种方法可以执行此计算。我们将在下面讨论两个选项。
重新利用 DPSGD 分析工具
你说得对,这些工具接受以 DP-SGD 设置命名的参数;然而,他们的论点可以以相当直接的方式重新映射到联合设置。
假设我们有来自 TFP's analysis library 的符号 apply_dp_sgd_analysis
。我们可以编写一个简单的函数,实质上修改联合设置的 compute_dp_sgd_privacy
的主体:
def compute_fl_privacy(num_rounds, noise_multiplier, num_users, users_per_round):
# This actually assumes Poisson subsampling, which may not be *quite*
# right in your setting, but the approximation should be close either way.
q = users_per_round / num_users # q - the sampling ratio.
# These orders are inlined from the body of compute_dp_sgd_privacy
orders = ([1.25, 1.5, 1.75, 2., 2.25, 2.5, 3., 3.5, 4., 4.5] +
list(range(5, 64)) + [128, 256, 512])
# Depending on whether your TFF code by default uses adaptive clipping or not,
# you may need to rescale your noise_multiplier argument.
return apply_dp_sgd_analysis(
q, sigma=noise_multiplier, steps=num_rounds, orders=orders, delta=num_users ** (-1))
使用 TFP PrivacyLedger
如果您使用的是相对较新的 tff.aggregators.DifferentiallyPrivateFactory
(which I would suggest over the DP process used above), you can pass an already-constructed DPQuery
, which can be decorated with a PrivacyLedger
. This ledger could then be passed directly into a function like compute_rdp_from_ledger
,它应该会跟踪每次查询调用所花费的隐私。