标准化多列 list/tuple 数据
Normalize multiple columns of list/tuple data
我有一个包含多列元组数据的数据框。我正在尝试为每列的每一行规范化元组中的数据。这是一个列表的例子,但对于元组来说应该是相同的概念-
df = pd.DataFrame(np.random.randn(5, 10), columns=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
df['arr1'] = df[['a', 'b', 'c', 'd', 'e']].values.tolist()
df['arr2'] = df[['f', 'g', 'h', 'i', 'j']].values.tolist()
如果我希望将每个列表行标准化为几列,我会这样做-
df['arr1'] = [preprocessing.scale(row) for row in df['arr1']]
df['arr2'] = [preprocessing.scale(row) for row in df['arr2']]
但是,由于我的原始数据集中有大约 100 个这样的列,我显然不想对每列进行手动归一化。我如何遍历所有列?
您可以像这样查看 DataFrame 中的列来处理每一列:
for col in df.columns:
df[col] = [preprocessing.scale(row) for row in df[col]]
当然,这仅在您想要处理 DataFrame 中的 所有 列时才有效。如果你只想要一个子集,你可以先创建一个列列表,或者你可以删除其他列。
# Here's an example where you manually specify the columns
cols_to_process = ["arr1", "arr2"]
for col in cols_to_process:
df[col] = [preprocessing.scale(row) for row in df[col]]
# Here's an example where you drop the unwanted columns first
cols_to_drop = ["a", "b", "c"]
df = df.drop(columns=cols_to_drop)
for col in cols_to_process:
df[col] = [preprocessing.scale(row) for row in df[col]]
# Or, if you didn't want to actually drop the columns
# from the original DataFrame you could do it like this:
cols_to_drop = ["a", "b", "c"]
for col in df.drop(columns=cols_to_drop):
df[col] = [preprocessing.scale(row) for row in df[col]]
我有一个包含多列元组数据的数据框。我正在尝试为每列的每一行规范化元组中的数据。这是一个列表的例子,但对于元组来说应该是相同的概念-
df = pd.DataFrame(np.random.randn(5, 10), columns=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
df['arr1'] = df[['a', 'b', 'c', 'd', 'e']].values.tolist()
df['arr2'] = df[['f', 'g', 'h', 'i', 'j']].values.tolist()
如果我希望将每个列表行标准化为几列,我会这样做-
df['arr1'] = [preprocessing.scale(row) for row in df['arr1']]
df['arr2'] = [preprocessing.scale(row) for row in df['arr2']]
但是,由于我的原始数据集中有大约 100 个这样的列,我显然不想对每列进行手动归一化。我如何遍历所有列?
您可以像这样查看 DataFrame 中的列来处理每一列:
for col in df.columns:
df[col] = [preprocessing.scale(row) for row in df[col]]
当然,这仅在您想要处理 DataFrame 中的 所有 列时才有效。如果你只想要一个子集,你可以先创建一个列列表,或者你可以删除其他列。
# Here's an example where you manually specify the columns
cols_to_process = ["arr1", "arr2"]
for col in cols_to_process:
df[col] = [preprocessing.scale(row) for row in df[col]]
# Here's an example where you drop the unwanted columns first
cols_to_drop = ["a", "b", "c"]
df = df.drop(columns=cols_to_drop)
for col in cols_to_process:
df[col] = [preprocessing.scale(row) for row in df[col]]
# Or, if you didn't want to actually drop the columns
# from the original DataFrame you could do it like this:
cols_to_drop = ["a", "b", "c"]
for col in df.drop(columns=cols_to_drop):
df[col] = [preprocessing.scale(row) for row in df[col]]