从多类 pandas 数据帧绘制 CDF
Plotting a CDF from a multiclass pandas dataframe
我了解包 empiricaldist
根据 documentation.
提供了 CDF 函数
但是,我发现在具有多个值的列中绘制我的数据框很棘手。
df.head()
+------+---------+---------------+-------------+----------+----------+-------+--------------+-----------+-----------+-----------+-----------+------------+
| | trip_id | seconds_start | seconds_end | duration | distance | speed | acceleration | lat_start | lon_start | lat_end | lon_end | travelmode |
+------+---------+---------------+-------------+----------+----------+-------+--------------+-----------+-----------+-----------+-----------+------------+
| 0 | 318410 | 1461743310 | 1461745298 | 1988 | 5121.49 | 2.58 | 0.00130 | 41.162687 | -8.615425 | 41.177888 | -8.597549 | car |
| 1 | 318411 | 1461749359 | 1461750290 | 931 | 1520.71 | 1.63 | 0.00175 | 41.177949 | -8.597074 | 41.177839 | -8.597574 | bus |
| 2 | 318421 | 1461806871 | 1461806941 | 70 | 508.15 | 7.26 | 0.10370 | 37.091240 | -8.211239 | 37.092322 | -8.206681 | foot |
| 3 | 318422 | 1461837354 | 1461838024 | 670 | 1207.39 | 1.80 | 0.00269 | 37.092082 | -8.205060 | 37.091659 | -8.206462 | car |
| 4 | 318425 | 1461852790 | 1461853845 | 1055 | 1470.49 | 1.39 | 0.00132 | 37.091628 | -8.202143 | 37.092095 | -8.205070 | foot |
+------+---------+---------------+-------------+----------+----------+-------+--------------+-----------+-----------+-----------+-----------+------------+
想要为每个出行模式的 travelmode
列绘制 CDF。
groups = df.groupby('travelmode')
但是,我真的不明白如何从文档中做到这一点。
你可以像这样循环绘制它们
import matplotlib.pyplot as plt
def decorate_plot(title):
''' Adds labels to plot '''
plt.xlabel('Outcome')
plt.ylabel('CDF')
plt.title(title)
for tm in df['travelmode'].unique():
for col in df.columns:
if col != 'travelmode':
# Create new figures for each plot
fig, ax = plt.subplots()
d4 = Cdf.from_seq(df[col])
d4.plot()
decorate_plot(f"{tm} - {col}")
我了解包 empiricaldist
根据 documentation.
但是,我发现在具有多个值的列中绘制我的数据框很棘手。
df.head()
+------+---------+---------------+-------------+----------+----------+-------+--------------+-----------+-----------+-----------+-----------+------------+
| | trip_id | seconds_start | seconds_end | duration | distance | speed | acceleration | lat_start | lon_start | lat_end | lon_end | travelmode |
+------+---------+---------------+-------------+----------+----------+-------+--------------+-----------+-----------+-----------+-----------+------------+
| 0 | 318410 | 1461743310 | 1461745298 | 1988 | 5121.49 | 2.58 | 0.00130 | 41.162687 | -8.615425 | 41.177888 | -8.597549 | car |
| 1 | 318411 | 1461749359 | 1461750290 | 931 | 1520.71 | 1.63 | 0.00175 | 41.177949 | -8.597074 | 41.177839 | -8.597574 | bus |
| 2 | 318421 | 1461806871 | 1461806941 | 70 | 508.15 | 7.26 | 0.10370 | 37.091240 | -8.211239 | 37.092322 | -8.206681 | foot |
| 3 | 318422 | 1461837354 | 1461838024 | 670 | 1207.39 | 1.80 | 0.00269 | 37.092082 | -8.205060 | 37.091659 | -8.206462 | car |
| 4 | 318425 | 1461852790 | 1461853845 | 1055 | 1470.49 | 1.39 | 0.00132 | 37.091628 | -8.202143 | 37.092095 | -8.205070 | foot |
+------+---------+---------------+-------------+----------+----------+-------+--------------+-----------+-----------+-----------+-----------+------------+
想要为每个出行模式的 travelmode
列绘制 CDF。
groups = df.groupby('travelmode')
但是,我真的不明白如何从文档中做到这一点。
你可以像这样循环绘制它们
import matplotlib.pyplot as plt
def decorate_plot(title):
''' Adds labels to plot '''
plt.xlabel('Outcome')
plt.ylabel('CDF')
plt.title(title)
for tm in df['travelmode'].unique():
for col in df.columns:
if col != 'travelmode':
# Create new figures for each plot
fig, ax = plt.subplots()
d4 = Cdf.from_seq(df[col])
d4.plot()
decorate_plot(f"{tm} - {col}")