我如何 df.fillna 使用类别中值
How do i df.fillna with category median values
我有一个大约 100 万行的大数据集,大约有 5000 个缺失坐标(我想用类别“城市”的中值填充它们,但 fillna 正在工作,如何实现它?
city = ['London', 'Paris', 'Vienna', 'Milan','London', 'Paris', 'Vienna', 'Milan']
lat = [51.510843900000005, 48.8671391, 48.204465500000005, 45.4787357, 51.510843900000005, 48.8671391, None, None]
lng = [-0.1424476, 2.328075, 16.3686397, 9.1961308, -0.14244, 2.329, None, None]
data = pd.DataFrame(list(zip(city, lat, lng)),columns =['city', 'lat', 'lng'])
display(data['lat'].isna().sum()) # 2
display(data['lng'].isna().sum()) # 2
for city_name in set(data['city']):
data[data['city'] == city_name ]['lat'].fillna(data[data['city'] == city_name ]['lat'].median())
data[data['city'] == city_name ]['lng'].fillna(data[data['city'] == city_name ]['lng'].median())
print(city_name, data[data['city'] == city_name ]['lat'].median(),data[data['city'] == city_name ]['lng'].median())
display(data['lat'].isna().sum()) # 2
display(data['lng'].isna().sum()) # 2
你可以这样做:
data.groupby("city").transform(lambda x: x.fillna(x.median()))
首先groupby with the city, then use transform with fillna并计算中位数。 (你可以使用任何数学运算)
您可以直接对数据框执行 fillna
:
data.fillna(data.groupby("city").transform("median"))
city lat lng
0 London 51.510844 -0.142448
1 Paris 48.867139 2.328075
2 Vienna 48.204466 16.368640
3 Milan 45.478736 9.196131
4 London 51.510844 -0.142440
5 Paris 48.867139 2.329000
6 Vienna 48.204466 16.368640
7 Milan 45.478736 9.196131
我有一个大约 100 万行的大数据集,大约有 5000 个缺失坐标(我想用类别“城市”的中值填充它们,但 fillna 正在工作,如何实现它?
city = ['London', 'Paris', 'Vienna', 'Milan','London', 'Paris', 'Vienna', 'Milan']
lat = [51.510843900000005, 48.8671391, 48.204465500000005, 45.4787357, 51.510843900000005, 48.8671391, None, None]
lng = [-0.1424476, 2.328075, 16.3686397, 9.1961308, -0.14244, 2.329, None, None]
data = pd.DataFrame(list(zip(city, lat, lng)),columns =['city', 'lat', 'lng'])
display(data['lat'].isna().sum()) # 2
display(data['lng'].isna().sum()) # 2
for city_name in set(data['city']):
data[data['city'] == city_name ]['lat'].fillna(data[data['city'] == city_name ]['lat'].median())
data[data['city'] == city_name ]['lng'].fillna(data[data['city'] == city_name ]['lng'].median())
print(city_name, data[data['city'] == city_name ]['lat'].median(),data[data['city'] == city_name ]['lng'].median())
display(data['lat'].isna().sum()) # 2
display(data['lng'].isna().sum()) # 2
你可以这样做:
data.groupby("city").transform(lambda x: x.fillna(x.median()))
首先groupby with the city, then use transform with fillna并计算中位数。 (你可以使用任何数学运算)
您可以直接对数据框执行 fillna
:
data.fillna(data.groupby("city").transform("median"))
city lat lng
0 London 51.510844 -0.142448
1 Paris 48.867139 2.328075
2 Vienna 48.204466 16.368640
3 Milan 45.478736 9.196131
4 London 51.510844 -0.142440
5 Paris 48.867139 2.329000
6 Vienna 48.204466 16.368640
7 Milan 45.478736 9.196131