Python 订购数据集天气
Python order dataset weather
我有一个按城市划分的 temperature/weather 的数据集历史记录,如下所示:
{"city": "Barcelona", "date": "2016-10-16", "temperature": "13", "weather": "cloudy"}
{"city": "Berlin", "date": "2016-10-16", "temperature": "-1", "weather": "sunny"}
{"city": "Pekin", "date": "2016-10-16", "temperature": "19", "weather": "cloudy"}
{"city": "Paris", "date": "2016-10-16", "temperature": "-8", "weather": "sunny"}
我想创建一个前 5 名,按最佳平均值排序 temperature.In 这个结果我想知道天气类型的天数(晴天多云下雨)
示例:
Rank - City - Average Temperature - Cloudy days - Sunny days - Rainy Days
1 - Barcelona - 20 - 93 - 298 - 29
如何在 Python 中做到这一点?
谢谢
马特
我相信你需要pandas:
- 第一个
read_json
来自 json
的 DataFrame
- 获得前 5 个城市
groupby
with aggregate mean
and nlargest
- 过滤
boolean indexing
- 按
count
聚合,按 unstack
重塑
reindex
by index
of s
正确排序
- 通过
insert
with map
添加新列
- 最后添加排名
range
import pandas as pd
import pandas as pd
df = pd.read_json('a.json', lines=True)
print (df)
city date temperature weather
0 Barcelona 2016-10-16 13 cloudy
1 Berlin 2016-10-16 -1 sunny
2 Pekin 2016-10-16 19 cloudy
3 Paris 2016-10-16 -8 sunny
s = df.groupby(['city'])['temperature'].mean().nlargest(5)
print (s)
city
Pekin 19
Barcelona 13
Berlin -1
Paris -8
Name: temperature, dtype: int64
df2 = (df[df['city'].isin(s.index)]
.groupby(['city', 'weather'])['temperature']
.size()
.unstack(fill_value=0)
.add_suffix(' days')
.reindex(s.index)
.reset_index()
.rename_axis(None, axis=1))
df2.insert(1, 'temp avg', df2['city'].map(s))
df2.insert(0, 'rank', range(1, len(df2) + 1))
print (df2)
rank city temp avg cloudy days sunny days
0 1 Pekin 19 1 0
1 2 Barcelona 13 1 0
2 3 Berlin -1 0 1
3 4 Paris -8 0 1
我有一个按城市划分的 temperature/weather 的数据集历史记录,如下所示:
{"city": "Barcelona", "date": "2016-10-16", "temperature": "13", "weather": "cloudy"}
{"city": "Berlin", "date": "2016-10-16", "temperature": "-1", "weather": "sunny"}
{"city": "Pekin", "date": "2016-10-16", "temperature": "19", "weather": "cloudy"}
{"city": "Paris", "date": "2016-10-16", "temperature": "-8", "weather": "sunny"}
我想创建一个前 5 名,按最佳平均值排序 temperature.In 这个结果我想知道天气类型的天数(晴天多云下雨)
示例:
Rank - City - Average Temperature - Cloudy days - Sunny days - Rainy Days
1 - Barcelona - 20 - 93 - 298 - 29
如何在 Python 中做到这一点?
谢谢
马特
我相信你需要pandas:
- 第一个
read_json
来自json
的 - 获得前 5 个城市
groupby
with aggregatemean
andnlargest
- 过滤
boolean indexing
- 按
count
聚合,按unstack
重塑
reindex
byindex
ofs
正确排序- 通过
insert
withmap
添加新列
- 最后添加排名
range
DataFrame
import pandas as pd
import pandas as pd
df = pd.read_json('a.json', lines=True)
print (df)
city date temperature weather
0 Barcelona 2016-10-16 13 cloudy
1 Berlin 2016-10-16 -1 sunny
2 Pekin 2016-10-16 19 cloudy
3 Paris 2016-10-16 -8 sunny
s = df.groupby(['city'])['temperature'].mean().nlargest(5)
print (s)
city
Pekin 19
Barcelona 13
Berlin -1
Paris -8
Name: temperature, dtype: int64
df2 = (df[df['city'].isin(s.index)]
.groupby(['city', 'weather'])['temperature']
.size()
.unstack(fill_value=0)
.add_suffix(' days')
.reindex(s.index)
.reset_index()
.rename_axis(None, axis=1))
df2.insert(1, 'temp avg', df2['city'].map(s))
df2.insert(0, 'rank', range(1, len(df2) + 1))
print (df2)
rank city temp avg cloudy days sunny days
0 1 Pekin 19 1 0
1 2 Barcelona 13 1 0
2 3 Berlin -1 0 1
3 4 Paris -8 0 1