使用 NaN 条件迭代数据帧
Iterate over dataframe with NaN condition
我有一个 table,我想从 table A 中提取数据并将其整理为 table B:
Table答:
Day
City A
City B
City C
Mon
NaN
Mike
NaN
Tue
NaN
NaN
Joe
Wed
Jack
Charlie
NaN
Table乙:
Day
Name
City
Mon
Mike
City B
Tue
Joe
City C
Wed
Jack
City A
Wed
Charlie
City B
我从 excel sheet 中提取了这些信息,并且正在使用 python 来完成这项任务。
我的想法是我需要将数据作为数据框绘制,遍历行以查找不包含 NaN 的条目并将它们的位置和关联数据存储在新数据框中。
不幸的是,我在设置忽略 NaN 条目的条件时卡住了,我正在尝试逐步测试它并取得了进展:
import pandas as pd
df = pd.read_excel('./csvtasks/rosta.xlsx',sheet_name='Sheet2')
#open new excel to write to with new variable df2
#determine whether null or not
dg=df.notnull()
#loop over rows
for i,j in dg.iteritems():
if dg.bool==FALSE:
print('skipped something') #i want this to skip but using this print to see if it's actually skipped anything
else:
print (i,j)
#this will be replace by some command that uses the df.iloc[something] and writes to df2, printing for now so i can see what it does
#loop to end
#close file
所有这些所做的就是把整个数据框作为一个 bool 像这样给我:
Day
City A
City B
City C
True
False
True
False
True
False
False
True
True
True
True
False
试试 stack
s = df.set_index('Day').stack().reset_index()
s.columns = ['Day','City','Name']
s
Out[43]:
Day City Name
0 Mon City B Mike
1 Tue City C Joe
2 Wed City A Jack
3 Wed City B Charlie
您可以尝试 melt
然后 dropna
out = (df.melt(id_vars='Day', var_name='Name', value_name='City')
.dropna())
print(out)
Day Name City
2 Wed City A Jack
3 Mon City B Mike
5 Wed City B Charlie
7 Tue City C Joe
我有一个 table,我想从 table A 中提取数据并将其整理为 table B:
Table答:
Day | City A | City B | City C |
---|---|---|---|
Mon | NaN | Mike | NaN |
Tue | NaN | NaN | Joe |
Wed | Jack | Charlie | NaN |
Table乙:
Day | Name | City |
---|---|---|
Mon | Mike | City B |
Tue | Joe | City C |
Wed | Jack | City A |
Wed | Charlie | City B |
我从 excel sheet 中提取了这些信息,并且正在使用 python 来完成这项任务。 我的想法是我需要将数据作为数据框绘制,遍历行以查找不包含 NaN 的条目并将它们的位置和关联数据存储在新数据框中。
不幸的是,我在设置忽略 NaN 条目的条件时卡住了,我正在尝试逐步测试它并取得了进展:
import pandas as pd
df = pd.read_excel('./csvtasks/rosta.xlsx',sheet_name='Sheet2')
#open new excel to write to with new variable df2
#determine whether null or not
dg=df.notnull()
#loop over rows
for i,j in dg.iteritems():
if dg.bool==FALSE:
print('skipped something') #i want this to skip but using this print to see if it's actually skipped anything
else:
print (i,j)
#this will be replace by some command that uses the df.iloc[something] and writes to df2, printing for now so i can see what it does
#loop to end
#close file
所有这些所做的就是把整个数据框作为一个 bool 像这样给我:
Day | City A | City B | City C |
---|---|---|---|
True | False | True | False |
True | False | False | True |
True | True | True | False |
试试 stack
s = df.set_index('Day').stack().reset_index()
s.columns = ['Day','City','Name']
s
Out[43]:
Day City Name
0 Mon City B Mike
1 Tue City C Joe
2 Wed City A Jack
3 Wed City B Charlie
您可以尝试 melt
然后 dropna
out = (df.melt(id_vars='Day', var_name='Name', value_name='City')
.dropna())
print(out)
Day Name City
2 Wed City A Jack
3 Mon City B Mike
5 Wed City B Charlie
7 Tue City C Joe