Pandas: 按组 ID 逐行填充 NaN 值
Pandas: Fill NaN Values row by row by group ID
我正在尝试根据组 ID 逐行填充 NaN 值。
我试过使用 fillNA,使用向前和向后填充选项,但 fillNA 函数不会逐行填充数据帧。此外,我想确保在填充 NaN 值之前公司匹配。在这种情况下,使用前向填充将导致公司 "Pear" 填充来自公司 "Banana".
的数据
appended = appended.sort_values(by=['Company','Intro'],na_position='last')
appended = appended.reset_index(drop=True)
for i in appended.index:
if i==0:
pass
else:
if appended.at[i,'Company']==appended.at[i-1,'Company']:
appended.fillna(method='ffill',inplace=True)
else:
pass
附加数据框
Company Intro Categories Headquarters Founded Date Funding Stage
Apple xyz Healthcare, Big Data New York 2018 Series A
Apple NaN NaN NaN NaN NaN
Apple NaN NaN NaN NaN NaN
Banana Lier Government Europe 2010 Series B
Pear NaN NaN NaN NaN NaN
这是我希望达到的预期结果:
Expected Result
Company Intro Categories Headquarters Founded Date Funding Stage
Apple xyz Healthcare, Big Data New York 2018 Series A
Apple xyz Healthcare, Big Data New York 2018 Series A
Apple xyz Healthcare, Big Data New York 2018 Series A
Banana Lier Government Europe 2010 Series B
Pear NaN NaN NaN NaN NaN
df.groupby(['Company']).ffill()
Company Intro Categories Headquarters Founded Date Funding Stage
0 Apple xyz Healthcare, Big Data New York 2018.0 Series A
1 Apple xyz Healthcare, Big Data New York 2018.0 Series A
2 Apple xyz Healthcare, Big Data New York 2018.0 Series A
3 Banana Lier Government Europe 2010.0 Series B
4 Pear NaN NaN NaN NaN NaN
import pandas as pd
from io import StringIO
# sample data
df = pd.read_fwf(StringIO("""
Company Intro Categories Headquarters Founded_Date Funding_Stage
Apple xyz Healthcare, Big Data New York 2018 Series A
Apple NaN NaN NaN NaN NaN
Apple NaN NaN NaN NaN NaN
Banana Lier Government Europe 2010 Series B
Pear NaN NaN NaN NaN NaN"""), header=1)
# Create the summary level - assumes repeat data comes first
df_summary = df.groupby("Company").head(1)
# Join the result
df_result = df[['Company']].merge(df_summary, on="Company")
# Company Intro Categories Headquarters Founded_Date Funding_Stage
#0 Apple xyz Healthcare, Big Data New York 2018.0 Series A
#1 Apple xyz Healthcare, Big Data New York 2018.0 Series A
#2 Apple xyz Healthcare, Big Data New York 2018.0 Series A
#3 Banana Lier Government Europe 2010.0 Series B
#4 Pear NaN NaN NaN NaN NaN
我正在尝试根据组 ID 逐行填充 NaN 值。
我试过使用 fillNA,使用向前和向后填充选项,但 fillNA 函数不会逐行填充数据帧。此外,我想确保在填充 NaN 值之前公司匹配。在这种情况下,使用前向填充将导致公司 "Pear" 填充来自公司 "Banana".
的数据
appended = appended.sort_values(by=['Company','Intro'],na_position='last')
appended = appended.reset_index(drop=True)
for i in appended.index:
if i==0:
pass
else:
if appended.at[i,'Company']==appended.at[i-1,'Company']:
appended.fillna(method='ffill',inplace=True)
else:
pass
附加数据框
Company Intro Categories Headquarters Founded Date Funding Stage
Apple xyz Healthcare, Big Data New York 2018 Series A
Apple NaN NaN NaN NaN NaN
Apple NaN NaN NaN NaN NaN
Banana Lier Government Europe 2010 Series B
Pear NaN NaN NaN NaN NaN
这是我希望达到的预期结果:
Expected Result
Company Intro Categories Headquarters Founded Date Funding Stage
Apple xyz Healthcare, Big Data New York 2018 Series A
Apple xyz Healthcare, Big Data New York 2018 Series A
Apple xyz Healthcare, Big Data New York 2018 Series A
Banana Lier Government Europe 2010 Series B
Pear NaN NaN NaN NaN NaN
df.groupby(['Company']).ffill()
Company Intro Categories Headquarters Founded Date Funding Stage
0 Apple xyz Healthcare, Big Data New York 2018.0 Series A
1 Apple xyz Healthcare, Big Data New York 2018.0 Series A
2 Apple xyz Healthcare, Big Data New York 2018.0 Series A
3 Banana Lier Government Europe 2010.0 Series B
4 Pear NaN NaN NaN NaN NaN
import pandas as pd
from io import StringIO
# sample data
df = pd.read_fwf(StringIO("""
Company Intro Categories Headquarters Founded_Date Funding_Stage
Apple xyz Healthcare, Big Data New York 2018 Series A
Apple NaN NaN NaN NaN NaN
Apple NaN NaN NaN NaN NaN
Banana Lier Government Europe 2010 Series B
Pear NaN NaN NaN NaN NaN"""), header=1)
# Create the summary level - assumes repeat data comes first
df_summary = df.groupby("Company").head(1)
# Join the result
df_result = df[['Company']].merge(df_summary, on="Company")
# Company Intro Categories Headquarters Founded_Date Funding_Stage
#0 Apple xyz Healthcare, Big Data New York 2018.0 Series A
#1 Apple xyz Healthcare, Big Data New York 2018.0 Series A
#2 Apple xyz Healthcare, Big Data New York 2018.0 Series A
#3 Banana Lier Government Europe 2010.0 Series B
#4 Pear NaN NaN NaN NaN NaN