Loc 过滤并排除空值
Loc filter and exclude null values
1. vat.loc[(vat['Sum of VAT'].isin([np.nan, 0])) &
2. (vat['Comment'] == "Transactions 0DKK") &
3. (vat['Type'].isin(['Bill', 'Bill Credit'])) &
4. (vat['Maximum of Linked Invoice'].notnull()), 'Comment'] = 'Linked invoice'
5. vat[vat["Comment"] == "Linked invoice"]
大家好,
我的线路有问题:
(vat['Maximum of Linked Invoice'].notnull()
当我尝试排除行中的所有空值时,它似乎无法正常工作。事实上,它并不排除空值,而是包含在数据框的输出中。其余语法完美无缺。我已尝试使用不同的语法,但空值仍包含在列 'Maximum of Linked Invoice' 中。我不明白为什么它不起作用?
你好,
我做了一些更多的研究,似乎 csv 文件在导入时对 'Maximum of Linked Invoice' 列有 62107 个非空值,但在打开 csv_file 时这是不正确的并检查,它确实在行中有数千个空白,但为什么在导入时没有将其读取为空值?你以前见过这样的东西吗?
请看下面的信息
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 62108 entries, 0 to 62107
Data columns (total 35 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 External ID 62108 non-null object
1 Document Number 62107 non-null object
2 Transaction Number 62107 non-null object
3 Maximum of Linked Invoice 62107 non-null object
4 Type 62107 non-null object
5 Date 62107 non-null object
6 Period 62107 non-null object
7 Terms 62107 non-null object
8 Maximum of Due Date/Receive By 50885 non-null object
9 Company Name 62107 non-null object
10 Customer VAT Registration Number 62107 non-null object
11 Bill to City 62107 non-null object
12 Bill to State 62107 non-null object
13 Bill to Country 62107 non-null object
14 Bill to Zip 62107 non-null object
15 Source System 62107 non-null object
16 Source System Identifier 62107 non-null object
17 City 62107 non-null object
18 State/Province 62107 non-null object
19 Country 62107 non-null object
20 Zip 62107 non-null object
21 Currency 62107 non-null object
22 Memo (Main) 62107 non-null object
23 Maximum of GMAX Tax Code 24189 non-null object
24 Maximum of NetSuite Tax Item 59815 non-null object
25 Maximum of Coupa Tax Code 0 non-null float64
26 Maximum of External System Tax Code 0 non-null float64
27 Maximum of Tax Code (Consolidated) 59815 non-null object
28 FOP Type 62107 non-null object
29 Sum of Assets 60680 non-null float64
30 Sum of Accounts Payable 3741 non-null float64
31 Sum of Other Liabilities 57066 non-null float64
32 Sum of Income 60290 non-null float64
33 Sum of Expense 300 non-null float64
34 Sum of VAT 56269 non-null float64
dtypes: float64(8), object(27)
memory usage: 16.6+ MB
如果有人正在阅读这篇文章,那么我已经找到了答案。我的语法没有问题,但问题出在 CSV 文件本身。 pandas 将列 'Maximum of Linked Invoice' 读取为 62107 非空的原因是因为在该列的每一行中嵌入了一个 space。一开始我唯一看到的是空白行,但这是不准确的。因此,我强烈建议您检查 CSV 文件,以避免花费任何时间来解决这些类型的棘手问题。
这是第 4 行代码的解决方案:
(~vat['Maximum of Linked Invoice'].isin([np.nan, ' '])
1. vat.loc[(vat['Sum of VAT'].isin([np.nan, 0])) &
2. (vat['Comment'] == "Transactions 0DKK") &
3. (vat['Type'].isin(['Bill', 'Bill Credit'])) &
4. (vat['Maximum of Linked Invoice'].notnull()), 'Comment'] = 'Linked invoice'
5. vat[vat["Comment"] == "Linked invoice"]
大家好,
我的线路有问题:
(vat['Maximum of Linked Invoice'].notnull()
当我尝试排除行中的所有空值时,它似乎无法正常工作。事实上,它并不排除空值,而是包含在数据框的输出中。其余语法完美无缺。我已尝试使用不同的语法,但空值仍包含在列 'Maximum of Linked Invoice' 中。我不明白为什么它不起作用?
你好,
我做了一些更多的研究,似乎 csv 文件在导入时对 'Maximum of Linked Invoice' 列有 62107 个非空值,但在打开 csv_file 时这是不正确的并检查,它确实在行中有数千个空白,但为什么在导入时没有将其读取为空值?你以前见过这样的东西吗?
请看下面的信息
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 62108 entries, 0 to 62107
Data columns (total 35 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 External ID 62108 non-null object
1 Document Number 62107 non-null object
2 Transaction Number 62107 non-null object
3 Maximum of Linked Invoice 62107 non-null object
4 Type 62107 non-null object
5 Date 62107 non-null object
6 Period 62107 non-null object
7 Terms 62107 non-null object
8 Maximum of Due Date/Receive By 50885 non-null object
9 Company Name 62107 non-null object
10 Customer VAT Registration Number 62107 non-null object
11 Bill to City 62107 non-null object
12 Bill to State 62107 non-null object
13 Bill to Country 62107 non-null object
14 Bill to Zip 62107 non-null object
15 Source System 62107 non-null object
16 Source System Identifier 62107 non-null object
17 City 62107 non-null object
18 State/Province 62107 non-null object
19 Country 62107 non-null object
20 Zip 62107 non-null object
21 Currency 62107 non-null object
22 Memo (Main) 62107 non-null object
23 Maximum of GMAX Tax Code 24189 non-null object
24 Maximum of NetSuite Tax Item 59815 non-null object
25 Maximum of Coupa Tax Code 0 non-null float64
26 Maximum of External System Tax Code 0 non-null float64
27 Maximum of Tax Code (Consolidated) 59815 non-null object
28 FOP Type 62107 non-null object
29 Sum of Assets 60680 non-null float64
30 Sum of Accounts Payable 3741 non-null float64
31 Sum of Other Liabilities 57066 non-null float64
32 Sum of Income 60290 non-null float64
33 Sum of Expense 300 non-null float64
34 Sum of VAT 56269 non-null float64
dtypes: float64(8), object(27)
memory usage: 16.6+ MB
如果有人正在阅读这篇文章,那么我已经找到了答案。我的语法没有问题,但问题出在 CSV 文件本身。 pandas 将列 'Maximum of Linked Invoice' 读取为 62107 非空的原因是因为在该列的每一行中嵌入了一个 space。一开始我唯一看到的是空白行,但这是不准确的。因此,我强烈建议您检查 CSV 文件,以避免花费任何时间来解决这些类型的棘手问题。
这是第 4 行代码的解决方案:
(~vat['Maximum of Linked Invoice'].isin([np.nan, ' '])