Pandas pivot_table 给出 KeyError

Question

（我是 Python 的新手，也是 Pandas 的新手。）

我在制表符分隔的 txt 文件中有软件使用数据，如下所示：

IP_Addr Date    Col2    Version Col4    Col5    Lang    Country
160.86.229.29   2021-11-01  00:00:14.919    9.6 337722669   3   ja  JPN
154.28.188.105  2021-11-01  00:00:19.774    9.7 480113424   3   de  DEU
154.6.16.129    2021-11-01  00:00:52.460    9.0 3278201755  2   en  USA
218.45.244.124  2021-11-01  00:01:33.853    9.7 1961440872  2   ja  JPN
178.248.141.33  2021-11-01  00:01:51.114    9.5 2795265301  2   en  EST

DataFrame导入正确，像这样的groupby方法可以正常工作：

df.IP_Addr.groupby(df.Country).nunique()

但是，当我尝试使用此行创建枢轴 table 时：

country_and_lang = df.pivot_table(index=df.Country, columns=df.Lang, values=df.IP_Addr, aggfunc=df.IP_Addr.count)

我明白了

KeyError: '160.86.229.29'

其中“密钥”是第一个 IP 值 - 根本不应将其用作密钥。

我做错了什么？

Answer 1

使用列名代替值：

country_and_lang = df.pivot_table(index='Country', columns='Lang', 
                                  values='IP_Addr', aggfunc='count')
print(country_and_lang)

# Output
Lang      de   en   ja
Country               
DEU      1.0  NaN  NaN
EST      NaN  1.0  NaN
JPN      NaN  NaN  2.0
USA      NaN  1.0  NaN

或使用pd.crosstab:

country_and_lang = pd.crosstab(df['Country'], df['Lang'], 
                               df['IP_Addr'], aggfunc='count')
print(country_and_lang)

# Output
Lang      de   en   ja
Country               
DEU      1.0  NaN  NaN
EST      NaN  1.0  NaN
JPN      NaN  NaN  2.0
USA      NaN  1.0  NaN

Pandas pivot_table 给出 KeyError

Pandas pivot_table gives KeyError

python

pivot-table

pandas