如何使用 sklearn 转换器将 pandas 数据帧内的数组类型展平?
How to flatten array types inside pandas dataframe with an sklearn transformer?
我有一个包含标量列和数组列的 pandas 数据框,例如
df = pd.DataFrame({
"scalar": [1, 2, 3, 4],
"array": [[10,20], [30,40], [50, 60], [70, 80]],
})
我想写一个sklearn transformer来压平它,这样
transformer = ???
transformer.fit_transform(df)
===>
[[1 10 20
2 30 40
3 50 60
4 70 80]]
我怎样才能做到这一点?
由于这是无状态转换,您可以使用 FunctionTransformer
从函数定义转换器。
import pandas as pd
import numpy as np
from sklearn.preprocessing import FunctionTransformer
df = pd.DataFrame({
"scalar": [1, 2, 3, 4],
"array": [[10,20], [30,40], [50, 60], [70, 80]],
})
def flatten_df_rows(df):
def flatten(row):
# flatten lists recursively
for val in row:
if isinstance(val, list):
yield from flatten(val)
else:
yield val
# flatten each row of the df recursively
return np.array([list(flatten(row)) for row in df.values.tolist()])
transform = FunctionTransformer(flatten_df_rows)
out = transform.fit_transform(df)
输出:
>>> out
array([[ 1, 10, 20],
[ 2, 30, 40],
[ 3, 50, 60],
[ 4, 70, 80]])
我有一个包含标量列和数组列的 pandas 数据框,例如
df = pd.DataFrame({
"scalar": [1, 2, 3, 4],
"array": [[10,20], [30,40], [50, 60], [70, 80]],
})
我想写一个sklearn transformer来压平它,这样
transformer = ???
transformer.fit_transform(df)
===>
[[1 10 20
2 30 40
3 50 60
4 70 80]]
我怎样才能做到这一点?
由于这是无状态转换,您可以使用 FunctionTransformer
从函数定义转换器。
import pandas as pd
import numpy as np
from sklearn.preprocessing import FunctionTransformer
df = pd.DataFrame({
"scalar": [1, 2, 3, 4],
"array": [[10,20], [30,40], [50, 60], [70, 80]],
})
def flatten_df_rows(df):
def flatten(row):
# flatten lists recursively
for val in row:
if isinstance(val, list):
yield from flatten(val)
else:
yield val
# flatten each row of the df recursively
return np.array([list(flatten(row)) for row in df.values.tolist()])
transform = FunctionTransformer(flatten_df_rows)
out = transform.fit_transform(df)
输出:
>>> out
array([[ 1, 10, 20],
[ 2, 30, 40],
[ 3, 50, 60],
[ 4, 70, 80]])