Python 订购数据集天气

Question

我有一个按城市划分的 temperature/weather 的数据集历史记录，如下所示：

{"city": "Barcelona", "date": "2016-10-16", "temperature": "13", "weather": "cloudy"}
{"city": "Berlin", "date": "2016-10-16", "temperature": "-1", "weather": "sunny"}
{"city": "Pekin", "date": "2016-10-16", "temperature": "19", "weather": "cloudy"}
{"city": "Paris", "date": "2016-10-16", "temperature": "-8", "weather": "sunny"}

我想创建一个前 5 名，按最佳平均值排序 temperature.In 这个结果我想知道天气类型的天数（晴天多云下雨）

示例：

Rank - City -      Average Temperature - Cloudy days - Sunny days - Rainy Days
1 -    Barcelona -           20 -           93 -        298 -       29

如何在 Python 中做到这一点？

谢谢

马特

Answer 1

我相信你需要pandas:

第一个 read_json 来自 json

DataFrame

获得前 5 个城市 groupby with aggregate mean and nlargest
过滤 boolean indexing
按 count 聚合，按 unstack
reindex by index of s 正确排序
通过 insert with map
最后添加排名 range

import pandas as pd

import pandas as pd

df = pd.read_json('a.json', lines=True)
print (df)
        city       date  temperature weather
0  Barcelona 2016-10-16           13  cloudy
1     Berlin 2016-10-16           -1   sunny
2      Pekin 2016-10-16           19  cloudy
3      Paris 2016-10-16           -8   sunny

s = df.groupby(['city'])['temperature'].mean().nlargest(5)
print (s)
city
Pekin        19
Barcelona    13
Berlin       -1
Paris        -8
Name: temperature, dtype: int64

df2 = (df[df['city'].isin(s.index)]
               .groupby(['city', 'weather'])['temperature']
               .size()
               .unstack(fill_value=0)
               .add_suffix(' days')
               .reindex(s.index)
               .reset_index()
               .rename_axis(None, axis=1))

df2.insert(1, 'temp avg', df2['city'].map(s))
df2.insert(0, 'rank', range(1, len(df2) + 1))
print (df2)
   rank       city  temp avg  cloudy days  sunny days
0     1      Pekin        19            1           0
1     2  Barcelona        13            1           0
2     3     Berlin        -1            0           1
3     4      Paris        -8            0           1

Python 订购数据集天气

Python order dataset weather

python

ranking

dataset