Python,如何用列表填充空数据框
Python, how to fill an empty dataframe with lists
我正在尝试编写代码以将某些列表之间的公共元素保存在矩阵中。
示例:
包含所有列表的数据框:
ID
elements of the ID
G1
P1,P2,P3,P4
G2
P3,P5
G3
P1,P3,P5
G4
P6
我从一个以 G1、G2、G3、G4 作为列名和行名的空矩阵开始,单元格用 nan 填充,我想要获得的结果如下:
X
G1
G2
G3
G4
G1
P1,P2,P3,P4
P3
P1
None
G2
P3
P3,P5
P3,P5
None
G3
P1,P5
P3,P5
P1,P3,P5
None
G4
None
None
None
P6
这是我的代码:
import sys
import pandas as pd
def intersection(lst1, lst2):
return [value for value in lst1 if value in lst2]
data = pd.read_csv(sys.argv[1], sep="\t")
p_mat = pd.read_csv(sys.argv[2], sep="\t", index_col=0)
c_mat = pd.read_csv(sys.argv[3], sep="\t", index_col=0)
#I need this since the elements of the second column once imported are seen as a single string instead of being lists
for i in range(0,len(data)):
data['MP term list'][i] = data['MP term list'][i].split(",")
for i in p_mat:
for j in p_mat.columns:
r = intersection(data[data['MGI id'] == i]['MP term list'].values.tolist()[0],data[data['MGI id'] == j]['MP term list'].values.tolist()[0])
if len(r)!=0:
p_mat.at[i,j] = r
else:
p_mat.at[i, j] = None
del(r)
目前我只能正确填充第一个单元格,然后在我尝试存储在单元格中的第一个非空结果时出现此错误:
ValueError: Must have equal len keys and value when setting with an iterable
我该如何解决?感谢大家的帮助
尝试使用十字 merge
,设置 intersection
和 pivot
:
df["elements"] = df["elements of the ID"].str.split(",").map(set)
cross = df[["ID", "elements"]].merge(df[["ID", "elements"]], how="cross")
cross["intersection"] = (cross.apply(lambda row: row["elements_x"].intersection(row["elements_y"]), axis=1)
.map(",".join)
.replace("",None)
)
output = cross.pivot("ID_x", "ID_y", "intersection").rename_axis(None, axis=1).rename_axis(None)
>>> output
G1 G2 G3 G4
G1 P2,P1,P3,P4 P3 P1,P3 None
G2 P3 P3,P5 P3,P5 None
G3 P1,P3 P3,P5 P1,P3,P5 None
G4 None None None P6
输入 df:
df = pd.DataFrame({"ID": [f"G{i+1}" for i in range(4)],
"elements of the ID": ["P1,P2,P3,P4", "P3,P5", "P1,P3,P5", "P6"]})
import pandas as pd
ID = ["G1","G2","G3","G4"]
Elements = [["P1","P2","P3","P4"],
["P3","P5"],
["P1","P3","P5"],
["P6"]]
df = pd.DataFrame(zip(ID,Elements),columns = ["ID","Elements"])
df1 = pd.DataFrame(columns = ID)
df1["ID"] = ID
for i in ID:
for j in ID:
if i == j:
df1.loc[df1.ID == i,j] = df.loc[df.ID == i,"Elements"]
else:
df1 = df1.astype("object")
df1.loc[df1.ID == i,j] = df1.loc[df1.ID == i,j].apply(
lambda x : list(set(list(df.loc[df.ID == i,"Elements"])[0]) & set(list(df.loc[df.ID == j,"Elements"])[0])))
输出:
df1
Out[38]:
G1 G2 G3 G4 ID
0 [P1, P2, P3, P4] [P3] [P1, P3] [] G1
1 [P3] [P3, P5] [P5, P3] [] G2
2 [P1, P3] [P5, P3] [P1, P3, P5] [] G3
3 [] [] [] [P6] G4
我正在尝试编写代码以将某些列表之间的公共元素保存在矩阵中。 示例:
包含所有列表的数据框:
ID | elements of the ID |
---|---|
G1 | P1,P2,P3,P4 |
G2 | P3,P5 |
G3 | P1,P3,P5 |
G4 | P6 |
我从一个以 G1、G2、G3、G4 作为列名和行名的空矩阵开始,单元格用 nan 填充,我想要获得的结果如下:
X | G1 | G2 | G3 | G4 |
---|---|---|---|---|
G1 | P1,P2,P3,P4 | P3 | P1 | None |
G2 | P3 | P3,P5 | P3,P5 | None |
G3 | P1,P5 | P3,P5 | P1,P3,P5 | None |
G4 | None | None | None | P6 |
这是我的代码:
import sys
import pandas as pd
def intersection(lst1, lst2):
return [value for value in lst1 if value in lst2]
data = pd.read_csv(sys.argv[1], sep="\t")
p_mat = pd.read_csv(sys.argv[2], sep="\t", index_col=0)
c_mat = pd.read_csv(sys.argv[3], sep="\t", index_col=0)
#I need this since the elements of the second column once imported are seen as a single string instead of being lists
for i in range(0,len(data)):
data['MP term list'][i] = data['MP term list'][i].split(",")
for i in p_mat:
for j in p_mat.columns:
r = intersection(data[data['MGI id'] == i]['MP term list'].values.tolist()[0],data[data['MGI id'] == j]['MP term list'].values.tolist()[0])
if len(r)!=0:
p_mat.at[i,j] = r
else:
p_mat.at[i, j] = None
del(r)
目前我只能正确填充第一个单元格,然后在我尝试存储在单元格中的第一个非空结果时出现此错误:
ValueError: Must have equal len keys and value when setting with an iterable
我该如何解决?感谢大家的帮助
尝试使用十字 merge
,设置 intersection
和 pivot
:
df["elements"] = df["elements of the ID"].str.split(",").map(set)
cross = df[["ID", "elements"]].merge(df[["ID", "elements"]], how="cross")
cross["intersection"] = (cross.apply(lambda row: row["elements_x"].intersection(row["elements_y"]), axis=1)
.map(",".join)
.replace("",None)
)
output = cross.pivot("ID_x", "ID_y", "intersection").rename_axis(None, axis=1).rename_axis(None)
>>> output
G1 G2 G3 G4
G1 P2,P1,P3,P4 P3 P1,P3 None
G2 P3 P3,P5 P3,P5 None
G3 P1,P3 P3,P5 P1,P3,P5 None
G4 None None None P6
输入 df:
df = pd.DataFrame({"ID": [f"G{i+1}" for i in range(4)],
"elements of the ID": ["P1,P2,P3,P4", "P3,P5", "P1,P3,P5", "P6"]})
import pandas as pd
ID = ["G1","G2","G3","G4"]
Elements = [["P1","P2","P3","P4"],
["P3","P5"],
["P1","P3","P5"],
["P6"]]
df = pd.DataFrame(zip(ID,Elements),columns = ["ID","Elements"])
df1 = pd.DataFrame(columns = ID)
df1["ID"] = ID
for i in ID:
for j in ID:
if i == j:
df1.loc[df1.ID == i,j] = df.loc[df.ID == i,"Elements"]
else:
df1 = df1.astype("object")
df1.loc[df1.ID == i,j] = df1.loc[df1.ID == i,j].apply(
lambda x : list(set(list(df.loc[df.ID == i,"Elements"])[0]) & set(list(df.loc[df.ID == j,"Elements"])[0])))
输出:
df1
Out[38]:
G1 G2 G3 G4 ID
0 [P1, P2, P3, P4] [P3] [P1, P3] [] G1
1 [P3] [P3, P5] [P5, P3] [] G2
2 [P1, P3] [P5, P3] [P1, P3, P5] [] G3
3 [] [] [] [P6] G4