pandas 显示类别在 matplotlib 中显示不正确

pandas display categories incorrect displayed in matplotlib

我试图在 matplotlib 中表示类别,出于某种原因,我的类别在 x 轴上重叠,并且缺少类别,但存在 y 轴值。我在问题底部的图片中用红色箭头标记了这一点。

数据包含在 sales.csv 文件中,如下所示:

date,first name,last name,city,cost,rooms,bathrooms,type,status
2018-03-04 12:13:21,Linda,Evangelista,Balm Beach,333000,2,2,townhouse,sold
2018-02-01 07:20:20,Rita,Ford,Balm Beach,818000,2,2,detached,sold
2018-03-08 07:13:00,Ali,Hassan,Bowmanville,413000,2,2,bungalow,forsale
2018-05-08 21:00:00,Rashid,Forani,Bowmanville,467000,2,2,townhouse,sold
2018-02-07 16:43:00,Kumar,Yoshi,Bowmanville,613000,3,3,bungalow,sold
2018-01-05 13:43:00,Srini,Santinaram,Bowmanville,723000,2,2,bungalow,forsale
2018-01-03 14:19:00,Maria,Dugall,Brampton,900000,4,3,semidetached,forsale
2018-05-04 19:22:00,Zina,Evangel,Burlington,221000,1,1,townhouse,forsale
2018-05-01 19:44:00,Pierre,Merci,Gatineau,3199000,14,14,bungalow,forsale
2018-05-31 18:10:00,Istvan,Kerekes,Kingston,1110000,4,5,bungalow,sold
2018-03-25 08:22:00,Dumitru,Plamada,Kingston,1650000,5,5,bungalow,forsale
2018-01-01 11:54:00,John,Smith,Markham,1200000,3,3,bungalow,sold
2018-05-07 15:30:00,Arturo,Gonzales,Mississauga,187000,3,3,bungalow,forsale
2018-03-07 22:20:00,Lei,Zhang,North York,122000,1,1,townhouse,forsale
2018-05-04 20:04:00,William,King,Oaks,,3,3,bungalow,sold
2018-03-04 13:05:00,Jeffrey,Kong,Oakville,,2,2,townhouse,forsale
2018-01-04 17:23:00,Abdul,Karrem,Orillia,883000,3,4,townhouse,sold
2018-03-01 13:09:00,Jean,Paumier,Ottawa,1520000,4,4,townhouse,sold
2018-02-01 10:00:00,Ken,Beaufort,Ottawa,3440000,5,5,bungalow,forsale
2018-02-15 11:33:00,Gheorghe,Ionescu,Richmond Hill,1630000,4,3,bungalow,forsale
2018-01-05 10:32:00,Ion,Popescu,Scarborough,1420000,5,3,semidetached,sold
2018-02-07 11:44:00,Xu,Yang,Toronto,422000,2,2,townhouse,forsale
2018-05-29 00:33:00,Giovanni,Gianparello,Toronto,1917000,4,4,bungalow,forsale
2018-03-25 08:27:00,John,Saint-Claire,Toronto,3337000,5,4,bungalow,forsale
2018-01-06 14:06:00,Ann,Murdoch Pyrell,Toronto,1427000,5,4,bungalow,forsale
2018-02-15 13:12:00,Claire,Coldwell,Toronto,3777000,5,4,bungalow,forsale
2018-01-02 09:37:00,Kyle,MCDonald,Toronto,,2,2,townhouse,forsale
2018-02-01 21:22:00,Miriam,Berg,Toronto,,4,4,townhouse,forsale

加载数据和显示图表的代码如下:

import pandas as pd
import matplotlib.pyplot as plt
# Load data
sales_brute = pd.read_csv('sales.csv', parse_dates=True, index_col='date')

# Fix the columns names by stripping the extra spaces
sales_brute = sales_brute.rename(columns=lambda x: x.strip())

# Fix the N/A from cost column
sales_brute['cost'].fillna(sales_brute['cost'].mean(), inplace=True)

# Draws a scattered plot, price by cities. Change the colors of plot.
plt.scatter(sales_brute['city'], sales_brute['cost'], color='red')

# Rotates the ticks with 70 grd
plt.xticks(sales_brute['city'], rotation=70)

plt.tight_layout()
# Add grid
plt.grid()

plt.show()

结果看起来很奇怪:

Incorrect display of categories

plt.scatter似乎很乐意以字符串为横坐标,按字母顺序排列。 plt.xticks,但是,想要一个与刻度数匹配且顺序相同的列表。

如果你改变:

plt.xticks(sales_brute['city'], rotation=70)

plt.xticks(sales_brute['city'].sort_values().unique(), rotation=70),

你会得到你想要的效果。

也许我们有不同版本的 matplotlib,但我根本无法使用 plt.scattersales_brute['city'] 作为第一个参数。

ValueError: could not convert string to float: 'Toronto'

相反,我创建了一个新的 x 轴:

x = range(len(sales_brute))
plt.scatter(x=x, y=sales_brute['cost'], color='red')
plt.xticks(x, sales_brute['city'], rotation=70)
plt.show()

这导致:

(需要拉伸才能看到全名)