根据列值重新排列行
Rearrange row upon column value
我有一个 DataFrame,我想在其中重新排列给定列的数据。
我有:
text KEYWORD
0 Fetch.ai will transform economies, healthcare,... supplies chain issues
1 self
2 secured key partnership
3 real world challenge
4 autonomous economic agent
5 learning traffic signal
6 autonomous machine learning
7 disruptive ai tech
8 parking issues
9 traffic reduction
10
11
12 The two most popular cryptocurrencies on the p... bitcoin
13 limited supplies
14 ethereum
我想要什么:
text KEYWORD
0 Fetch.ai will transform economies, healthcare,... supplies chain issues, self, secured key partnership, real world challenge, autonomous economic agent, learning traffic signal, autonomous machine learning, disruptive ai tech, parking issues, traffic reduction
1 The two most popular cryptocurrencies on the p... bitcoin, limited supplies, emphasized text, ethereum
包含文本的每一行都显示在“文本”列中。 “文本”列已被分析,关键字已从中提取并显示在“KEYWORD”列中。烦人的部分是,如果从“文本”列中提取 10 个关键词,它将创建 10 行并每行添加 1 个关键词。我想将所有这些关键字加入一行(对应于好文)。
很遗憾,我无法访问由软件完成的关键字提取过程。
试试 groupby
:
#replace blank cells with NaN
df = df.replace(r"^\s*$",np.nan,regex=True)
#drop rows that are all NaN and forward fill
df = df.dropna(how="all").ffill()
#groupby and aggregate
output = df.groupby("text", as_index=False)["KEYWORD"].agg(", ".join)
>>> output
text KEYWORD
0 Fetch.ai will transform economies, healthcare,... supplies chain issues, self, secured key partn...
1 The two most popular cryptocurrencies on the p... bitcoin, limited supplies, ethereum
我有一个 DataFrame,我想在其中重新排列给定列的数据。
我有:
text KEYWORD
0 Fetch.ai will transform economies, healthcare,... supplies chain issues
1 self
2 secured key partnership
3 real world challenge
4 autonomous economic agent
5 learning traffic signal
6 autonomous machine learning
7 disruptive ai tech
8 parking issues
9 traffic reduction
10
11
12 The two most popular cryptocurrencies on the p... bitcoin
13 limited supplies
14 ethereum
我想要什么:
text KEYWORD
0 Fetch.ai will transform economies, healthcare,... supplies chain issues, self, secured key partnership, real world challenge, autonomous economic agent, learning traffic signal, autonomous machine learning, disruptive ai tech, parking issues, traffic reduction
1 The two most popular cryptocurrencies on the p... bitcoin, limited supplies, emphasized text, ethereum
包含文本的每一行都显示在“文本”列中。 “文本”列已被分析,关键字已从中提取并显示在“KEYWORD”列中。烦人的部分是,如果从“文本”列中提取 10 个关键词,它将创建 10 行并每行添加 1 个关键词。我想将所有这些关键字加入一行(对应于好文)。
很遗憾,我无法访问由软件完成的关键字提取过程。
试试 groupby
:
#replace blank cells with NaN
df = df.replace(r"^\s*$",np.nan,regex=True)
#drop rows that are all NaN and forward fill
df = df.dropna(how="all").ffill()
#groupby and aggregate
output = df.groupby("text", as_index=False)["KEYWORD"].agg(", ".join)
>>> output
text KEYWORD
0 Fetch.ai will transform economies, healthcare,... supplies chain issues, self, secured key partn...
1 The two most popular cryptocurrencies on the p... bitcoin, limited supplies, ethereum