使用 create_tf_dataset_for_client() 定义数据集中的训练示例

Question

我正在为联合设置准备一个数据集，在下面的代码中，我有多个 CSV 文件，每个文件都被视为一个客户端。

dataset_paths = {
  'client_0': '/content/drive/ds1.csv',
  'client_1': '/content/drive/ds2.csv',
  'client_2': '/content/drive/ds3.csv',
  'client_3': '/content/drive/ds4.csv',
  'client_4': '/content/drive/ds5.csv',
}
## Defining the Dtyps for each columns in the datasets 
record_defaults = [int(), int(), int(), int(), float(),float(),float(),float(),float(),float(), int(), int()]

@tf.function
def create_tf_dataset_for_client_fn(dataset_path):
   return tf.data.experimental.CsvDataset(
     dataset_path, record_defaults=record_defaults, header=True )

source = tff.simulation.datasets.FilePerUserClientData(
  dataset_paths, create_tf_dataset_for_client_fn)

我想访问数据，以便确定 features 和 label 列。所以我输入：

for x in source.create_tf_dataset_for_client('client_1'):
  print(x)  
>>> (<tf.Tensor: shape=(), dtype=int32, numpy=-2145209674>, <tf.Tensor: shape=(), dtype=int32, numpy=1>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=14>, <tf.Tensor: shape=(), dtype=float32, numpy=64.17>, <tf.Tensor: shape=(), dtype=float32, numpy=18.0>, <tf.Tensor: shape=(), dtype=float32, numpy=70.0>, <tf.Tensor: shape=(), dtype=float32, numpy=80.0>, <tf.Tensor: shape=(), dtype=float32, numpy=30.0>, <tf.Tensor: shape=(), dtype=float32, numpy=270.14>, <tf.Tensor: shape=(), dtype=int32, numpy=7>, <tf.Tensor: shape=(), dtype=int32, numpy=2>)
(<tf.Tensor: shape=(), dtype=int32, numpy=-2143677297>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=1>, <tf.Tensor: shape=(), dtype=int32, numpy=9>, <tf.Tensor: shape=(), dtype=float32, numpy=60.83>, <tf.Tensor: shape=(), dtype=float32, numpy=14.89>, <tf.Tensor: shape=(), dtype=float32, numpy=65.0>, <tf.Tensor: shape=(), dtype=float32, numpy=75.0>, <tf.Tensor: shape=(), dtype=float32, numpy=42.5>, <tf.Tensor: shape=(), dtype=float32, numpy=184.72>, <tf.Tensor: shape=(), dtype=int32, numpy=8>, <tf.Tensor: shape=(), dtype=int32, numpy=2>)
(<tf.Tensor: shape=(), dtype=int32, numpy=-2138537298>, <tf.Tensor: shape=(), dtype=int32, numpy=1>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=11>, <tf.Tensor: shape=(), dtype=float32, numpy=65.83>, <tf.Tensor: shape=(), dtype=float32, numpy=18.82>, <tf.Tensor: shape=(), dtype=float32, numpy=70.0>, <tf.Tensor: shape=(), dtype=float32, numpy=85.0>, <tf.Tensor: shape=(), dtype=float32, numpy=30.0>, <tf.Tensor: shape=(), dtype=float32, numpy=295.14>, <tf.Tensor: shape=(), dtype=int32, numpy=7>, <tf.Tensor: shape=(), dtype=int32, numpy=2>)
(<tf.Tensor: shape=(), dtype=int32, numpy=-2103817421>, <tf.Tensor: shape=(), dtype=int32, numpy=1>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=9>, <tf.Tensor: shape=(), dtype=float32, numpy=77.5>, <tf.Tensor: shape=(), dtype=float32, numpy=8.8>, <tf.Tensor: shape=(), dtype=float32, numpy=75.0>, <tf.Tensor: shape=(), dtype=float32, numpy=90.0>, <tf.Tensor: shape=(), dtype=float32, numpy=65.0>, <tf.Tensor: shape=(), dtype=float32, numpy=64.58>, <tf.Tensor: shape=(), dtype=int32, numpy=6>, <tf.Tensor: shape=(), dtype=int32, numpy=1>)
(<tf.Tensor: shape=(), dtype=int32, numpy=-2081702335>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=10>, <tf.Tensor: shape=(), dtype=float32, numpy=75.83>, <tf.Tensor: shape=(), dtype=float32, numpy=9.7>, <tf.Tensor: shape=(), dtype=float32, numpy=77.5>, <tf.Tensor: shape=(), dtype=float32, numpy=90.0>, <tf.Tensor: shape=(), dtype=float32, numpy=65.0>, <tf.Tensor: shape=(), dtype=float32, numpy=78.47>, <tf.Tensor: shape=(), dtype=int32, numpy=6>, <tf.Tensor: shape=(), dtype=int32, numpy=1>)
(<tf.Tensor: shape=(), dtype=int32, numpy=-2067936920>, <tf.Tensor: shape=(), dtype=int32, numpy=1>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=11>, <tf.Tensor: shape=(), dtype=float32, numpy=80.0>, <tf.Tensor: shape=(), dtype=float32, numpy=10.95>, <tf.Tensor: shape=(), dtype=float32, numpy=77.5>, <tf.Tensor: shape=(), dtype=float32, numpy=95.0>, <tf.Tensor: shape=(), dtype=float32, numpy=65.0>, <tf.Tensor: shape=(), dtype=float32, numpy=100.0>, <tf.Tensor: shape=(), dtype=int32, numpy=6>, <tf.Tensor: shape=(), dtype=int32, numpy=2>)
(<tf.Tensor: shape=(), dtype=int32, numpy=-2065922700>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=0>, <tf.Tensor: shape=(), dtype=int32, numpy=11>, <tf.Tensor: shape=(), dtype=float32, numpy=65.83>, <tf.Tensor: shape=(), dtype=float32, numpy=3.76>, <tf.Tensor: shape=(), dtype=float32, numpy=65.0>, <tf.Tensor: shape=(), dtype=float32, numpy=70.0>, <tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=11.81>, <tf.Tensor: shape=(), dtype=int32, numpy=6>, <tf.Tensor: shape=(), dtype=int32, numpy=3>)

由于我的数据量很大，所以行数更多 所以我可以访问这些数据，因为它们是张量对象， 问题1我怎么说DataFrame.iloc[1:-1] #Features DataFrame.iloc[:-1] #Label 问题2如何将每个文件拆分为训练集和测试集以开始训练过程？

Answer 1

您可以尝试这样的操作：

import tensorflow as tf

# Create dummy data
samples = 5
data = (tf.random.uniform((samples,), maxval=50, dtype=tf.int32),
        tf.random.uniform((samples,), maxval=50, dtype=tf.int32),
        tf.random.uniform((samples,), maxval=50, dtype=tf.int32),
        tf.random.uniform((samples,), maxval=50, dtype=tf.int32),
        tf.random.normal((samples,)),
        tf.random.normal((samples,)),
        tf.random.normal((samples,)),
        tf.random.normal((samples,)),
        tf.random.normal((samples,)),
        tf.random.normal((samples,)),
        tf.random.uniform((samples,), maxval=50, dtype=tf.int32),
        tf.random.uniform((samples,), maxval=50, dtype=tf.int32))

client1_dataset = tf.data.Dataset.from_tensor_slices(data)

client1_dataset = client1_dataset.map(lambda *x: (x[1:-1], x[:-1]))

for x in client1_dataset:
  print(x)

((<tf.Tensor: id=2291, shape=(), dtype=int32, numpy=43>, <tf.Tensor: id=2292, shape=(), dtype=int32, numpy=47>, <tf.Tensor: id=2293, shape=(), dtype=int32, numpy=5>, <tf.Tensor: id=2294, shape=(), dtype=float32, numpy=0.6790141>, <tf.Tensor: id=2295, shape=(), dtype=float32, numpy=-0.996265>, <tf.Tensor: id=2296, shape=(), dtype=float32, numpy=-0.13631395>, <tf.Tensor: id=2297, shape=(), dtype=float32, numpy=-0.25907364>, <tf.Tensor: id=2298, shape=(), dtype=float32, numpy=-0.0063462467>, <tf.Tensor: id=2299, shape=(), dtype=float32, numpy=-0.6242705>, <tf.Tensor: id=2300, shape=(), dtype=int32, numpy=20>), (<tf.Tensor: id=2301, shape=(), dtype=int32, numpy=29>, <tf.Tensor: id=2302, shape=(), dtype=int32, numpy=43>, <tf.Tensor: id=2303, shape=(), dtype=int32, numpy=47>, <tf.Tensor: id=2304, shape=(), dtype=int32, numpy=5>, <tf.Tensor: id=2305, shape=(), dtype=float32, numpy=0.6790141>, <tf.Tensor: id=2306, shape=(), dtype=float32, numpy=-0.996265>, <tf.Tensor: id=2307, shape=(), dtype=float32, numpy=-0.13631395>, <tf.Tensor: id=2308, shape=(), dtype=float32, numpy=-0.25907364>, <tf.Tensor: id=2309, shape=(), dtype=float32, numpy=-0.0063462467>, <tf.Tensor: id=2310, shape=(), dtype=float32, numpy=-0.6242705>, <tf.Tensor: id=2311, shape=(), dtype=int32, numpy=20>))
((<tf.Tensor: id=2312, shape=(), dtype=int32, numpy=5>, <tf.Tensor: id=2313, shape=(), dtype=int32, numpy=29>, <tf.Tensor: id=2314, shape=(), dtype=int32, numpy=7>, <tf.Tensor: id=2315, shape=(), dtype=float32, numpy=-3.1088789>, <tf.Tensor: id=2316, shape=(), dtype=float32, numpy=1.1138679>, <tf.Tensor: id=2317, shape=(), dtype=float32, numpy=0.60722053>, <tf.Tensor: id=2318, shape=(), dtype=float32, numpy=0.22470044>, <tf.Tensor: id=2319, shape=(), dtype=float32, numpy=-0.9214293>, <tf.Tensor: id=2320, shape=(), dtype=float32, numpy=-0.40438855>, <tf.Tensor: id=2321, shape=(), dtype=int32, numpy=32>), (<tf.Tensor: id=2322, shape=(), dtype=int32, numpy=16>, <tf.Tensor: id=2323, shape=(), dtype=int32, numpy=5>, <tf.Tensor: id=2324, shape=(), dtype=int32, numpy=29>, <tf.Tensor: id=2325, shape=(), dtype=int32, numpy=7>, <tf.Tensor: id=2326, shape=(), dtype=float32, numpy=-3.1088789>, <tf.Tensor: id=2327, shape=(), dtype=float32, numpy=1.1138679>, <tf.Tensor: id=2328, shape=(), dtype=float32, numpy=0.60722053>, <tf.Tensor: id=2329, shape=(), dtype=float32, numpy=0.22470044>, <tf.Tensor: id=2330, shape=(), dtype=float32, numpy=-0.9214293>, <tf.Tensor: id=2331, shape=(), dtype=float32, numpy=-0.40438855>, <tf.Tensor: id=2332, shape=(), dtype=int32, numpy=32>))
((<tf.Tensor: id=2333, shape=(), dtype=int32, numpy=43>, <tf.Tensor: id=2334, shape=(), dtype=int32, numpy=17>, <tf.Tensor: id=2335, shape=(), dtype=int32, numpy=1>, <tf.Tensor: id=2336, shape=(), dtype=float32, numpy=0.26826212>, <tf.Tensor: id=2337, shape=(), dtype=float32, numpy=-0.2259336>, <tf.Tensor: id=2338, shape=(), dtype=float32, numpy=-1.5942549>, <tf.Tensor: id=2339, shape=(), dtype=float32, numpy=-0.8693648>, <tf.Tensor: id=2340, shape=(), dtype=float32, numpy=0.71869636>, <tf.Tensor: id=2341, shape=(), dtype=float32, numpy=-1.5996522>, <tf.Tensor: id=2342, shape=(), dtype=int32, numpy=16>), (<tf.Tensor: id=2343, shape=(), dtype=int32, numpy=6>, <tf.Tensor: id=2344, shape=(), dtype=int32, numpy=43>, <tf.Tensor: id=2345, shape=(), dtype=int32, numpy=17>, <tf.Tensor: id=2346, shape=(), dtype=int32, numpy=1>, <tf.Tensor: id=2347, shape=(), dtype=float32, numpy=0.26826212>, <tf.Tensor: id=2348, shape=(), dtype=float32, numpy=-0.2259336>, <tf.Tensor: id=2349, shape=(), dtype=float32, numpy=-1.5942549>, <tf.Tensor: id=2350, shape=(), dtype=float32, numpy=-0.8693648>, <tf.Tensor: id=2351, shape=(), dtype=float32, numpy=0.71869636>, <tf.Tensor: id=2352, shape=(), dtype=float32, numpy=-1.5996522>, <tf.Tensor: id=2353, shape=(), dtype=int32, numpy=16>))
((<tf.Tensor: id=2354, shape=(), dtype=int32, numpy=18>, <tf.Tensor: id=2355, shape=(), dtype=int32, numpy=35>, <tf.Tensor: id=2356, shape=(), dtype=int32, numpy=29>, <tf.Tensor: id=2357, shape=(), dtype=float32, numpy=-0.9065403>, <tf.Tensor: id=2358, shape=(), dtype=float32, numpy=0.52284646>, <tf.Tensor: id=2359, shape=(), dtype=float32, numpy=1.3090674>, <tf.Tensor: id=2360, shape=(), dtype=float32, numpy=0.98598105>, <tf.Tensor: id=2361, shape=(), dtype=float32, numpy=1.0676131>, <tf.Tensor: id=2362, shape=(), dtype=float32, numpy=-0.11418144>, <tf.Tensor: id=2363, shape=(), dtype=int32, numpy=46>), (<tf.Tensor: id=2364, shape=(), dtype=int32, numpy=45>, <tf.Tensor: id=2365, shape=(), dtype=int32, numpy=18>, <tf.Tensor: id=2366, shape=(), dtype=int32, numpy=35>, <tf.Tensor: id=2367, shape=(), dtype=int32, numpy=29>, <tf.Tensor: id=2368, shape=(), dtype=float32, numpy=-0.9065403>, <tf.Tensor: id=2369, shape=(), dtype=float32, numpy=0.52284646>, <tf.Tensor: id=2370, shape=(), dtype=float32, numpy=1.3090674>, <tf.Tensor: id=2371, shape=(), dtype=float32, numpy=0.98598105>, <tf.Tensor: id=2372, shape=(), dtype=float32, numpy=1.0676131>, <tf.Tensor: id=2373, shape=(), dtype=float32, numpy=-0.11418144>, <tf.Tensor: id=2374, shape=(), dtype=int32, numpy=46>))
((<tf.Tensor: id=2375, shape=(), dtype=int32, numpy=48>, <tf.Tensor: id=2376, shape=(), dtype=int32, numpy=23>, <tf.Tensor: id=2377, shape=(), dtype=int32, numpy=35>, <tf.Tensor: id=2378, shape=(), dtype=float32, numpy=-0.67218304>, <tf.Tensor: id=2379, shape=(), dtype=float32, numpy=2.060095>, <tf.Tensor: id=2380, shape=(), dtype=float32, numpy=0.33271575>, <tf.Tensor: id=2381, shape=(), dtype=float32, numpy=-0.073634386>, <tf.Tensor: id=2382, shape=(), dtype=float32, numpy=-0.7267375>, <tf.Tensor: id=2383, shape=(), dtype=float32, numpy=1.6494459>, <tf.Tensor: id=2384, shape=(), dtype=int32, numpy=13>), (<tf.Tensor: id=2385, shape=(), dtype=int32, numpy=36>, <tf.Tensor: id=2386, shape=(), dtype=int32, numpy=48>, <tf.Tensor: id=2387, shape=(), dtype=int32, numpy=23>, <tf.Tensor: id=2388, shape=(), dtype=int32, numpy=35>, <tf.Tensor: id=2389, shape=(), dtype=float32, numpy=-0.67218304>, <tf.Tensor: id=2390, shape=(), dtype=float32, numpy=2.060095>, <tf.Tensor: id=2391, shape=(), dtype=float32, numpy=0.33271575>, <tf.Tensor: id=2392, shape=(), dtype=float32, numpy=-0.073634386>, <tf.Tensor: id=2393, shape=(), dtype=float32, numpy=-0.7267375>, <tf.Tensor: id=2394, shape=(), dtype=float32, numpy=1.6494459>, <tf.Tensor: id=2395, shape=(), dtype=int32, numpy=13>))

要创建测试和训练子集，只需使用 take 和 skip:

test = client1_dataset.take(2)
train = client1_dataset.skip(2)

如果您想将每个 csv 文件拆分为测试和训练数据集，您应该在创建 tf 数据集之前执行此操作。

使用 create_tf_dataset_for_client() 定义数据集中的训练示例

Using create_tf_dataset_for_client() to define the training examples in the dataset

python

tensorflow

tensorflow-datasets

tensorflow-federated

federated-learning