将数据框列与列表值匹配并将数据框附加到匹配行
match the dataframe columns with list values and append dataframe with matching rows
我在两个数据框中读取了两个不同的 csv。我想将列 df1['building_type] 与 df2['model'] 匹配并将相应的行附加到 df1.
数据框 1:
data = [{'length': '34', 'width': '58.5', 'height': '60.2', 'building_type': ['concrete','wood','steel','laminate']},
{'length': '42', 'width': '33', 'height': '23', 'building_type': ['concrete_double','wood_double','steel_double']}]
df1 = pd.DataFrame(data)
print(df1)
数据框 2:
data2 = [{'type': 'A1', 'floor': '2', 'model': ['wood','laminate','concrete','steel']},
{'type': 'B3', 'floor': '4', 'model': ['wood_double','concrete_double','steel_double']}]
df2=pd.DataFrame(data2)
print(df2)
最终数据帧:
length width height building_type type floor
0 34 58.5 60.2 [concrete, wood, steel, laminate] A1 2
1 42 33 23 [concrete_double, wood_double, steel_double] B3 4
pd.merge
似乎是必要的工具,但我们需要一个不可变的数据类型。 list
是可变的,无法加入。我们可以将list
(可变)转换为tuple
或frozenset
,两者都是不可变的,可以用来连接。由于示例输出显示顺序无关紧要,因此我选择了 frozenset
.
代码如下:
import pandas as pd
data = [{'length': '34', 'width': '58.5', 'height': '60.2', 'building_type': ['concrete','wood','steel','laminate']},
{'length': '42', 'width': '33', 'height': '23', 'building_type': ['concrete_double','wood_double','steel_double']}]
df1 = pd.DataFrame(data)
print(df1)
data2 = [{'type': 'A1', 'floor': '2', 'model': ['wood','laminate','concrete','steel']},
{'type': 'B3', 'floor': '4', 'model': ['wood_double','concrete_double','steel_double']}]
df2=pd.DataFrame(data2)
print(df2)
# Note: Merge fails on mutable dtype
# pd.merge(df1, df2, left_on='building_type', right_on='model')
# Produces `TypeError: unhashable type: 'list'`
# Convert mutable type to immutable type and merge.
# `tuple` is best if order matters for you. I am assuming that the
# order doesn't matter based on the sample output, so `frozenset` is more
# appropriate.
df1['building_type'] = df1['building_type'].apply(frozenset)
df2['model'] = df2['model'].apply(frozenset)
# Now, merge. Note that since column names are different both
# 'building_type' and 'model' would be retained. You can remove one of them.
final_df = pd.merge(df1, df2, left_on='building_type', right_on='model')
final_df = final_df.drop(['model'], axis=1)
print(final_df)
我机器上的输出:
length width height building_type
0 34 58.5 60.2 [concrete, wood, steel, laminate]
1 42 33 23 [concrete_double, wood_double, steel_double]
type floor model
0 A1 2 [wood, laminate, concrete, steel]
1 B3 4 [wood_double, concrete_double, steel_double]
length width height building_type type floor
0 34 58.5 60.2 (laminate, wood, steel, concrete) A1 2
1 42 33 23 (concrete_double, steel_double, wood_double) B3 4
我在两个数据框中读取了两个不同的 csv。我想将列 df1['building_type] 与 df2['model'] 匹配并将相应的行附加到 df1.
数据框 1:
data = [{'length': '34', 'width': '58.5', 'height': '60.2', 'building_type': ['concrete','wood','steel','laminate']},
{'length': '42', 'width': '33', 'height': '23', 'building_type': ['concrete_double','wood_double','steel_double']}]
df1 = pd.DataFrame(data)
print(df1)
数据框 2:
data2 = [{'type': 'A1', 'floor': '2', 'model': ['wood','laminate','concrete','steel']},
{'type': 'B3', 'floor': '4', 'model': ['wood_double','concrete_double','steel_double']}]
df2=pd.DataFrame(data2)
print(df2)
最终数据帧:
length width height building_type type floor
0 34 58.5 60.2 [concrete, wood, steel, laminate] A1 2
1 42 33 23 [concrete_double, wood_double, steel_double] B3 4
pd.merge
似乎是必要的工具,但我们需要一个不可变的数据类型。 list
是可变的,无法加入。我们可以将list
(可变)转换为tuple
或frozenset
,两者都是不可变的,可以用来连接。由于示例输出显示顺序无关紧要,因此我选择了 frozenset
.
代码如下:
import pandas as pd
data = [{'length': '34', 'width': '58.5', 'height': '60.2', 'building_type': ['concrete','wood','steel','laminate']},
{'length': '42', 'width': '33', 'height': '23', 'building_type': ['concrete_double','wood_double','steel_double']}]
df1 = pd.DataFrame(data)
print(df1)
data2 = [{'type': 'A1', 'floor': '2', 'model': ['wood','laminate','concrete','steel']},
{'type': 'B3', 'floor': '4', 'model': ['wood_double','concrete_double','steel_double']}]
df2=pd.DataFrame(data2)
print(df2)
# Note: Merge fails on mutable dtype
# pd.merge(df1, df2, left_on='building_type', right_on='model')
# Produces `TypeError: unhashable type: 'list'`
# Convert mutable type to immutable type and merge.
# `tuple` is best if order matters for you. I am assuming that the
# order doesn't matter based on the sample output, so `frozenset` is more
# appropriate.
df1['building_type'] = df1['building_type'].apply(frozenset)
df2['model'] = df2['model'].apply(frozenset)
# Now, merge. Note that since column names are different both
# 'building_type' and 'model' would be retained. You can remove one of them.
final_df = pd.merge(df1, df2, left_on='building_type', right_on='model')
final_df = final_df.drop(['model'], axis=1)
print(final_df)
我机器上的输出:
length width height building_type
0 34 58.5 60.2 [concrete, wood, steel, laminate]
1 42 33 23 [concrete_double, wood_double, steel_double]
type floor model
0 A1 2 [wood, laminate, concrete, steel]
1 B3 4 [wood_double, concrete_double, steel_double]
length width height building_type type floor
0 34 58.5 60.2 (laminate, wood, steel, concrete) A1 2
1 42 33 23 (concrete_double, steel_double, wood_double) B3 4