如何将异常值作为单独的彩色标记添加到线图中
How to add outliers as separate colored markers to a line plot
val time
5.6 2021-11-18 03:00:00
2.034 2021-11-18 05:00:00
1.171 2021-11-18 07:00:00
3.023 2021-11-18 09:00:00
4.202 2021-11-18 16:00:00
1.202 2021-11-18 17:00:00
5.202 2021-11-18 18:00:00
7.202 2021-11-18 19:00:00
2.202 2021-11-18 20:00:00
12.202 2021-11-18 21:00:00
1.202 2021-11-18 21:00:00
上面是我的数据框,我想绘制它 (x=time,y=value),并在 (val>5) 处将值绘制为红色。
plt.plot(ab['time'], ab['value'], '-gD', markevery=marks, label='line with select markers')
其中标记 [7.202,12.202]
是我手动创建的列表。但这不起作用。
error -: markevery is iterable but not a valid numpy fancy index
我在这里找到了,但是要是积分多的话,这个比较费时间
- 最简单的解决方案是使用 Boolean indexing to create a separate dataframe for values greater then 5, and then plot them as a scatter plot with
pandas.DataFrame.plot
- x 轴自动格式化为
%M-%d %H
。当有更多数据时格式会改变,还有其他答案讨论如何格式化 pandas 日期时间轴。
import pandas as pd
import matplotlib.pyplot as plt
# sample data
data = {'val': [5.6, 2.034, 1.171, 3.023, 4.202, 1.202, 5.202, 7.202, 2.202, 12.202, 1.202], 'time': ['2021-11-18 03:00:00', '2021-11-18 05:00:00', '2021-11-18 07:00:00', '2021-11-18 09:00:00', '2021-11-18 16:00:00', '2021-11-18 17:00:00', '2021-11-18 18:00:00', '2021-11-18 19:00:00', '2021-11-18 20:00:00', '2021-11-18 21:00:00', '2021-11-18 21:00:00']}
df = pd.DataFrame(data)
# convert the time column to a datetime dtype
df.time = pd.to_datetime(df.time)
# get the values greater than 5
masked = df[df.val.gt(5)]
# plot the line plot
ax = df.plot(x='time', marker='o', figsize=(15, 5), zorder=0)
# plot those greater than 5
masked.plot(kind='scatter', x='time', y='val', color='red', ax=ax, s=30, label='outliers')
val time
5.6 2021-11-18 03:00:00
2.034 2021-11-18 05:00:00
1.171 2021-11-18 07:00:00
3.023 2021-11-18 09:00:00
4.202 2021-11-18 16:00:00
1.202 2021-11-18 17:00:00
5.202 2021-11-18 18:00:00
7.202 2021-11-18 19:00:00
2.202 2021-11-18 20:00:00
12.202 2021-11-18 21:00:00
1.202 2021-11-18 21:00:00
上面是我的数据框,我想绘制它 (x=time,y=value),并在 (val>5) 处将值绘制为红色。
plt.plot(ab['time'], ab['value'], '-gD', markevery=marks, label='line with select markers')
其中标记 [7.202,12.202]
是我手动创建的列表。但这不起作用。
error -: markevery is iterable but not a valid numpy fancy index
- 最简单的解决方案是使用 Boolean indexing to create a separate dataframe for values greater then 5, and then plot them as a scatter plot with
pandas.DataFrame.plot
- x 轴自动格式化为
%M-%d %H
。当有更多数据时格式会改变,还有其他答案讨论如何格式化 pandas 日期时间轴。
import pandas as pd
import matplotlib.pyplot as plt
# sample data
data = {'val': [5.6, 2.034, 1.171, 3.023, 4.202, 1.202, 5.202, 7.202, 2.202, 12.202, 1.202], 'time': ['2021-11-18 03:00:00', '2021-11-18 05:00:00', '2021-11-18 07:00:00', '2021-11-18 09:00:00', '2021-11-18 16:00:00', '2021-11-18 17:00:00', '2021-11-18 18:00:00', '2021-11-18 19:00:00', '2021-11-18 20:00:00', '2021-11-18 21:00:00', '2021-11-18 21:00:00']}
df = pd.DataFrame(data)
# convert the time column to a datetime dtype
df.time = pd.to_datetime(df.time)
# get the values greater than 5
masked = df[df.val.gt(5)]
# plot the line plot
ax = df.plot(x='time', marker='o', figsize=(15, 5), zorder=0)
# plot those greater than 5
masked.plot(kind='scatter', x='time', y='val', color='red', ax=ax, s=30, label='outliers')