联合学习过程与参与者的泊松抽样
Federating learning process with poisson subsampling of participants
我用 TFF
进行了一些实验。在这一个中,我想根据 poisson subsampling
在每次培训中对参与的客户进行抽样,其中每个客户的抽样概率为 p = users_per_round / num_users
在每一轮,执行poisson subsampling
直到列表sampled_ids
被唯一ID填满,等于users_per_round
的数量。
total_rounds = 100
num_users = 500
users_per_round = 150
lambda_value = np.random.rand()
for round_num in range(total_rounds):
sampled_ids = []
while len(sampled_ids) < users_per_round:
subsampling = np.random.poisson(lambda_value, num_users)
whether = subsampling > 1 - users_per_round / num_users
for i in np.arange(num_users):
if whether[i] and len(sampled_ids) < users_per_round and i
not in sampled_ids:
sampled_ids.append(i)
sampled_clients = [train_data.client_ids[i] for i in sampled_ids]
sampled_train_data =
[train_data.create_tf_dataset_for_client(client) for client in
sampled_clients]
server_state, train_metrics = iterative_process.next(server_state,
sampled_train_data)
是否有更好的方法来执行 poisson subsampling
,特别是如果在 differentially private FL
中应用子采样,以便 RDP accountant
产生准确的隐私分析结果?
设置 lambda
值而不是 random
值的最佳策略是什么?
泊松二次抽样意味着每个用户被包含的概率为 q
。如果 q
很小,那么你从这个过程中获得的每一轮的用户数量近似于泊松分布。如果您想要样本 预期 您在一轮中有 users_per_round
个用户,您可以执行以下操作:
users_this_round = np.random.poisson(users_per_round)
sampled_ids = np.random.choice(num_users, size=users_this_round, replace=False)
如果您想准确选择 users_per_round
个用户(从技术上讲这不是泊松子采样),您可以这样做:
sampled_ids = np.random.choice(num_users, size=users_per_round, replace=False)
我用 TFF
进行了一些实验。在这一个中,我想根据 poisson subsampling
在每次培训中对参与的客户进行抽样,其中每个客户的抽样概率为 p = users_per_round / num_users
在每一轮,执行poisson subsampling
直到列表sampled_ids
被唯一ID填满,等于users_per_round
的数量。
total_rounds = 100
num_users = 500
users_per_round = 150
lambda_value = np.random.rand()
for round_num in range(total_rounds):
sampled_ids = []
while len(sampled_ids) < users_per_round:
subsampling = np.random.poisson(lambda_value, num_users)
whether = subsampling > 1 - users_per_round / num_users
for i in np.arange(num_users):
if whether[i] and len(sampled_ids) < users_per_round and i
not in sampled_ids:
sampled_ids.append(i)
sampled_clients = [train_data.client_ids[i] for i in sampled_ids]
sampled_train_data =
[train_data.create_tf_dataset_for_client(client) for client in
sampled_clients]
server_state, train_metrics = iterative_process.next(server_state,
sampled_train_data)
是否有更好的方法来执行 poisson subsampling
,特别是如果在 differentially private FL
中应用子采样,以便 RDP accountant
产生准确的隐私分析结果?
设置 lambda
值而不是 random
值的最佳策略是什么?
泊松二次抽样意味着每个用户被包含的概率为 q
。如果 q
很小,那么你从这个过程中获得的每一轮的用户数量近似于泊松分布。如果您想要样本 预期 您在一轮中有 users_per_round
个用户,您可以执行以下操作:
users_this_round = np.random.poisson(users_per_round)
sampled_ids = np.random.choice(num_users, size=users_this_round, replace=False)
如果您想准确选择 users_per_round
个用户(从技术上讲这不是泊松子采样),您可以这样做:
sampled_ids = np.random.choice(num_users, size=users_per_round, replace=False)