Tensorflow 数据集 - pandas dataframe.info 等价于什么?

Tensorflow Dataset - what is pandas dataframe.info equivalent?

Pandas 数据框有 info 方法,我们可以通过该方法查看其架构。

df = pd.read_csv(titanic_file)
df.info()
---
RangeIndex: 627 entries, 0 to 626
Data columns (total 10 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   survived            627 non-null    int64  
 1   sex                 627 non-null    object 
 2   age                 627 non-null    float64
 3   n_siblings_spouses  627 non-null    int64  
 4   parch               627 non-null    int64  
 5   fare                627 non-null    float64
 6   class               627 non-null    object 
 7   deck                627 non-null    object 
 8   embark_town         627 non-null    object 
 9   alone               627 non-null    object 
dtypes: float64(2), int64(3), object(5)
memory usage: 49.1+ KB

除了逐列检查之外,Tensorflow 数据集中的等效项是什么?

titanic = tf.data.experimental.make_csv_dataset(
    titanic_file,
    label_name="survived",
    batch_size=1,   # To compre with the head of CSV
    shuffle=False,  # To compre with the head of CSV
    header=True,
)

for row in titanic.take(1):  # Take the first batch 
    features = row[0]        # Diectionary
    label = row[1]
    
    for feature, value in features.items():
        print(f"{feature:20s}: {value.dtype}")
    
    print(f"label/survived      : {label.dtype}")       
---
sex                 : <dtype: 'string'>
age                 : <dtype: 'float32'>
n_siblings_spouses  : <dtype: 'int32'>
parch               : <dtype: 'int32'>
fare                : <dtype: 'float32'>
class               : <dtype: 'string'>
deck                : <dtype: 'string'>
embark_town         : <dtype: 'string'>
alone               : <dtype: 'string'>
label/survived      : <dtype: 'int32'>

我想到的最接近的是tf.data.experimental.get_structure

import tensorflow as tf
import tensorflow_datasets as tfds

# Construct a tf.data.Dataset
ds = tfds.load('mnist', split='train', shuffle_files=True)
tf.data.experimental.get_structure(ds)

输出:

{'image': TensorSpec(shape=(28, 28, 1), dtype=tf.uint8, name=None),
 'label': TensorSpec(shape=(), dtype=tf.int64, name=None)}

对于 titanic 数据集(列可能略有不同,具体取决于来源):

(OrderedDict([('PassengerId',
               TensorSpec(shape=(1,), dtype=tf.int32, name=None)),
              ('Pclass', TensorSpec(shape=(1,), dtype=tf.int32, name=None)),
              ('Name', TensorSpec(shape=(1,), dtype=tf.string, name=None)),
              ('Sex', TensorSpec(shape=(1,), dtype=tf.string, name=None)),
              ('Age', TensorSpec(shape=(1,), dtype=tf.float32, name=None)),
              ('SibSp', TensorSpec(shape=(1,), dtype=tf.int32, name=None)),
              ('Parch', TensorSpec(shape=(1,), dtype=tf.int32, name=None)),
              ('Ticket', TensorSpec(shape=(1,), dtype=tf.string, name=None)),
              ('Fare', TensorSpec(shape=(1,), dtype=tf.float32, name=None)),
              ('Cabin', TensorSpec(shape=(1,), dtype=tf.string, name=None)),
              ('Embarked',
               TensorSpec(shape=(1,), dtype=tf.string, name=None))]),
 TensorSpec(shape=(1,), dtype=tf.int32, name=None))