Pandas 系列删除重复问题
Pandas series removing duplicates issue
我有一个重复的系列,我正试图摆脱它们
0 RWAY001
1 RWAY001
2 RWAY002
3 RWAY002
...
112 RWAY057
113 RWAY057
114 RWAY058
115 RWAY058
Length: 116
Drop.duplicates() 似乎将长度缩短为 58,但索引似乎仍从 0 变为 116,只是跳过了重复项:
0 RWAY001
2 RWAY002
...
112 RWAY057
114 RWAY058
Length: 58
所以中间的行似乎仍然存在且值为 NaN。我尝试了 dropna() 但它对数据没有任何影响。
这是我的代码:
df = pd.read_csv(path + flnm)
fields = df.file
fields = fields.drop_duplicates()
print fields
非常感谢任何帮助。谢谢
我想你需要 reset_index
和参数 drop=True
:
fields.reset_index(inplace=True, drop=True)
或:
fields = fields.reset_index(drop=True)
样本:
import pandas as pd
df = pd.DataFrame({'file': {0: 'RWAY001', 1: 'RWAY001', 2: 'RWAY002', 3: 'RWAY002', 115: 'RWAY058', 113: 'RWAY057', 112: 'RWAY057', 114: 'RWAY058'}})
print (df)
file
0 RWAY001
1 RWAY001
2 RWAY002
3 RWAY002
112 RWAY057
113 RWAY057
114 RWAY058
115 RWAY058
print (df.file.drop_duplicates())
0 RWAY001
2 RWAY002
112 RWAY057
114 RWAY058
Name: file, dtype: object
print (df.file.drop_duplicates().reset_index(drop=True))
0 RWAY001
1 RWAY002
2 RWAY057
3 RWAY058
Name: file, dtype: object
我有一个重复的系列,我正试图摆脱它们
0 RWAY001
1 RWAY001
2 RWAY002
3 RWAY002
...
112 RWAY057
113 RWAY057
114 RWAY058
115 RWAY058
Length: 116
Drop.duplicates() 似乎将长度缩短为 58,但索引似乎仍从 0 变为 116,只是跳过了重复项:
0 RWAY001
2 RWAY002
...
112 RWAY057
114 RWAY058
Length: 58
所以中间的行似乎仍然存在且值为 NaN。我尝试了 dropna() 但它对数据没有任何影响。
这是我的代码:
df = pd.read_csv(path + flnm)
fields = df.file
fields = fields.drop_duplicates()
print fields
非常感谢任何帮助。谢谢
我想你需要 reset_index
和参数 drop=True
:
fields.reset_index(inplace=True, drop=True)
或:
fields = fields.reset_index(drop=True)
样本:
import pandas as pd
df = pd.DataFrame({'file': {0: 'RWAY001', 1: 'RWAY001', 2: 'RWAY002', 3: 'RWAY002', 115: 'RWAY058', 113: 'RWAY057', 112: 'RWAY057', 114: 'RWAY058'}})
print (df)
file
0 RWAY001
1 RWAY001
2 RWAY002
3 RWAY002
112 RWAY057
113 RWAY057
114 RWAY058
115 RWAY058
print (df.file.drop_duplicates())
0 RWAY001
2 RWAY002
112 RWAY057
114 RWAY058
Name: file, dtype: object
print (df.file.drop_duplicates().reset_index(drop=True))
0 RWAY001
1 RWAY002
2 RWAY057
3 RWAY058
Name: file, dtype: object