合并两个数据框 - 条件行

Merging two dataframes - conditional rows

我现在正在处理数据,我有一个 csv 文件,来自 Indian Food Prices 中的 kaggle,我已经将其转换成一个数据框,其中一列 'Market' 有一些分类位置数据。对于 ML 问题,我需要通过开放街道地图 api 设法获得的纬度和经度数据 - 但是由于原始数据框有 131000 多行,我创建了一个单独的数据框,其中只有一个位置实例和这个新数据框有 165 行。

我现在需要合并这两个数据帧,但想出一个循环,该循环将使用 131000 多行的原始数据帧中的所有行填充来自 165 行的较小数据帧中的纬度和经度数据,但是纬度和经度与 'Market' 列中的位置匹配。

如有任何建议,我们将不胜感激,

这是尝试实现上述目标的尝试

def using_where(ndf):
ndf['Lat-Long'] = np.where(df['Market']='Delhi', '28.6517178, 77.2219388'

这是我的大数据框的头部'ndf'

<bound method NDFrame.head of         Unnamed: 0        Date     Market               Category  

\
0                1  1994-01-15      Delhi     cereals and tubers   
1                2  1994-01-15      Delhi     cereals and tubers   
2                3  1994-01-15      Delhi     miscellaneous food   
3                4  1994-01-15      Delhi           oil and fats   
4                5  1994-01-15  Ahmedabad     cereals and tubers   
...            ...         ...        ...                    ...   
139529      139530  2021-09-15  Kharagpur        pulses and nuts   
139530      139531  2021-09-15  Kharagpur        pulses and nuts   
139531      139532  2021-09-15  Kharagpur        pulses and nuts   
139532      139533  2021-09-15  Kharagpur  vegetables and fruits   
139533      139534  2021-09-15  Kharagpur  vegetables and fruits   

              Commodity Unit PriceFlag PriceType Currency  Price  USD_Price  
0                  Rice   KG    actual    Retail      INR    8.0     0.2545  
1                 Wheat   KG    actual    Retail      INR    5.0     0.1590  
2                 Sugar   KG    actual    Retail      INR   13.5     0.4294  
3         Oil (mustard)   KG    actual    Retail      INR   31.0     0.9860  
4                  Rice   KG    actual    Retail      INR    6.8     0.2163  
...                 ...  ...       ...       ...      ...    ...        ...  
139529  Lentils (masur)   KG    actual    Retail      INR  110.0     1.4972  
139530  Lentils (moong)   KG    actual    Retail      INR  120.0     1.6333  
139531   Lentils (urad)   KG    actual    Retail      INR  115.0     1.5653  
139532           Onions   KG    actual    Retail      INR   30.0     0.4083  
139533         Tomatoes   KG    actual    Retail      INR   40.0     0.5444  

这是我的小数据框的头部'df'

    <bound method NDFrame.head of             Market                         geocoded
0            Delhi         (28.6517178, 77.2219388)
4        Ahmedabad         (23.0216238, 72.5797068)
8           Shimla         (31.1041526, 77.1709729)
11       Bengaluru          (12.9767936, 77.590082)
14          Bhopal          (23.2584857, 77.401989)
...            ...                              ...
136823   Dantewada  (18.8640648, 81.38339468738648)
136970     Selamba                               -1
137053      Bodeli         (22.2748105, 73.7166363)
137326     Dhanbad         (23.7952809, 86.4309638)
137389  Jamshedpur         (22.8015194, 86.2029579)

[165 rows x 2 columns]>

我想你可以直接使用 merge(),除非我遗漏了什么:

ndf = pd.merge(ndf, df, how='inner', on='Market')

这里有一个带有测试用例的完整代码示例:

import pandas as pd
ndf = pd.DataFrame({'Date':['1994-01-15']*5 + ['2021-09-15']*5, 'Market':'Delhi,Delhi,Delhi,Delhi,Ahmedabad,Kharagpur,Kharagpur,Kharagpur,Kharagpur,Kharagpur'.split(','), 
    'Category':'cereals and tubers,cereals and tubers,miscellaneous food,oil and fats,cereals and tubers,pulses and nuts,pulses and nuts,pulses and nuts,vegetables and fruits,vegetables and fruits'.split(',')})

df = pd.DataFrame({'Market':'Delhi,Ahmedabad,Shimla,Bengaluru,Bhopal,Kharagpur'.split(','), 
    'geocoded':[(28.6517178, 77.2219388),(23.0216238, 72.5797068),(31.1041526, 77.1709729),(12.9767936, 77.590082),(23.2584857, 77.401989),(22.22, 73.73)]})

ndf = pd.merge(ndf, df, how='inner', on='Market')
print(ndf)

输出:

         Date     Market               Category                  geocoded
0  1994-01-15      Delhi     cereals and tubers  (28.6517178, 77.2219388)
1  1994-01-15      Delhi     cereals and tubers  (28.6517178, 77.2219388)
2  1994-01-15      Delhi     miscellaneous food  (28.6517178, 77.2219388)
3  1994-01-15      Delhi           oil and fats  (28.6517178, 77.2219388)
4  1994-01-15  Ahmedabad     cereals and tubers  (23.0216238, 72.5797068)
5  2021-09-15  Kharagpur        pulses and nuts            (22.22, 73.73)
6  2021-09-15  Kharagpur        pulses and nuts            (22.22, 73.73)
7  2021-09-15  Kharagpur        pulses and nuts            (22.22, 73.73)
8  2021-09-15  Kharagpur  vegetables and fruits            (22.22, 73.73)
9  2021-09-15  Kharagpur  vegetables and fruits            (22.22, 73.73)