Loc 过滤并排除空值

Loc filter and exclude null values

1. vat.loc[(vat['Sum of VAT'].isin([np.nan, 0])) &
2.        (vat['Comment'] == "Transactions 0DKK") &
3.        (vat['Type'].isin(['Bill', 'Bill Credit'])) &
4.        (vat['Maximum of Linked Invoice'].notnull()), 'Comment'] = 'Linked invoice'
5. vat[vat["Comment"] == "Linked invoice"]

大家好,

我的线路有问题:

(vat['Maximum of Linked Invoice'].notnull()

当我尝试排除行中的所有空值时,它似乎无法正常工作。事实上,它并不排除空值,而是包含在数据框的输出中。其余语法完美无缺。我已尝试使用不同的语法,但空值仍包含在列 'Maximum of Linked Invoice' 中。我不明白为什么它不起作用?

你好,

我做了一些更多的研究,似乎 csv 文件在导入时对 'Maximum of Linked Invoice' 列有 62107 个非空值,但在打开 csv_file 时这是不正确的并检查,它确实在行中有数千个空白,但为什么在导入时没有将其读取为空值?你以前见过这样的东西吗?

请看下面的信息

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 62108 entries, 0 to 62107
Data columns (total 35 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   External ID                          62108 non-null  object 
 1   Document Number                      62107 non-null  object 
 2   Transaction Number                   62107 non-null  object 
 3   Maximum of Linked Invoice            62107 non-null  object 
 4   Type                                 62107 non-null  object 
 5   Date                                 62107 non-null  object 
 6   Period                               62107 non-null  object 
 7   Terms                                62107 non-null  object 
 8   Maximum of Due Date/Receive By       50885 non-null  object 
 9   Company Name                         62107 non-null  object 
 10  Customer VAT Registration Number     62107 non-null  object 
 11  Bill to City                         62107 non-null  object 
 12  Bill to State                        62107 non-null  object 
 13  Bill to Country                      62107 non-null  object 
 14  Bill to Zip                          62107 non-null  object 
 15  Source System                        62107 non-null  object 
 16  Source System Identifier             62107 non-null  object 
 17  City                                 62107 non-null  object 
 18  State/Province                       62107 non-null  object 
 19  Country                              62107 non-null  object 
 20  Zip                                  62107 non-null  object 
 21  Currency                             62107 non-null  object 
 22  Memo (Main)                          62107 non-null  object 
 23  Maximum of GMAX Tax Code             24189 non-null  object 
 24  Maximum of NetSuite Tax Item         59815 non-null  object 
 25  Maximum of Coupa Tax Code            0 non-null      float64
 26  Maximum of External System Tax Code  0 non-null      float64
 27  Maximum of Tax Code (Consolidated)   59815 non-null  object 
 28  FOP Type                             62107 non-null  object 
 29  Sum of Assets                        60680 non-null  float64
 30  Sum of Accounts Payable              3741 non-null   float64
 31  Sum of Other Liabilities             57066 non-null  float64
 32  Sum of Income                        60290 non-null  float64
 33  Sum of Expense                       300 non-null    float64
 34  Sum of VAT                           56269 non-null  float64
dtypes: float64(8), object(27)
memory usage: 16.6+ MB

如果有人正在阅读这篇文章,那么我已经找到了答案。我的语法没有问题,但问题出在 CSV 文件本身。 pandas 将列 'Maximum of Linked Invoice' 读取为 62107 非空的原因是因为在该列的每一行中嵌入了一个 space。一开始我唯一看到的是空白行,但这是不准确的。因此,我强烈建议您检查 CSV 文件,以避免花费任何时间来解决这些类型的棘手问题。

这是第 4 行代码的解决方案:

(~vat['Maximum of Linked Invoice'].isin([np.nan, ' '])