如何在 python 中的数据框中设置字典中的值
how to set values in a dict from a dataframe in python
我有一个如下所示的数据框:
创建 df 的代码:
dd = {'name': ["HARDIE'S MOBILE HOME PARK", 'CRESTVIEW RV PARK',
'HOMESTEAD TRAILER PARK', 'HOUSTON PARK MOBILE HOME PARK',
'HUDSON MOBILE HOME PARK', 'BEACH DRIVE MOBILE HOME PARK',
'EVANS TRAILER PARK'],
'country': ['USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA'],
'coordinates': ['30.44126118, -86.6240656099999',
'30.7190163500001, -86.5716222299999',
'30.5115772500001, -86.4628417499999',
'30.4424195300001, -86.64733076',
'30.7629176200001, -86.5928893399999', '30.44417349, -86.59951996',
'30.4427800300001, -86.62941091'],
'status':['OPEN', 'CLOSED', 'OPEN', 'OPEN', 'OPEN', 'OPEN', 'OPEN']}
df2 = pd.DataFrame(data=dd)
我想做的是创建一个具有以下结构的字典:
{'destination1': 'CRESTVIEW RV PARK; 30.7190163500001, -86.5716222299999',
'destination2': 'HOMESTEAD TRAILER PARK; 30.5115772500001, -86.4628417499999',
'destination3': 'HOUSTON PARK MOBILE HOME PARK; 30.4424195300001, -86.64733076',
'destination4': 'HUDSON MOBILE HOME PARK; 30.7629176200001, -86.5928893399999',
'destination5': 'BEACH DRIVE MOBILE HOME PARK ; 30.44417349, -86.59951996'}
如您所见,每个值都必须包含名称;从第二行到最后一行的坐标。我正在使用以下代码来执行此操作:
d1 = {f"destination{k}":v + "; " + i for k in range(1, len(df1)-1) for v,i in zip(df1.name, df1.coordinates)}
但是,这是我得到的输出:
{'destination1': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091',
'destination2': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091',
'destination3': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091',
'destination4': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091',
'destination5': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091'}
它只读取数据帧的最后一行,每个键都有相同的值,但我想要的是,对于每个键,它的值必须来自数据帧的每一行,从第二行到最后一行.
如果有人知道如何做到这一点,我将非常感谢您的帮助。
你可以这样枚举 zip,
dd = {'name': ["HARDIE'S MOBILE HOME PARK", 'CRESTVIEW RV PARK',
'HOMESTEAD TRAILER PARK', 'HOUSTON PARK MOBILE HOME PARK',
'HUDSON MOBILE HOME PARK', 'BEACH DRIVE MOBILE HOME PARK',
'EVANS TRAILER PARK'],
'country': ['USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA'],
'coordinates': ['30.44126118, -86.6240656099999',
'30.7190163500001, -86.5716222299999',
'30.5115772500001, -86.4628417499999',
'30.4424195300001, -86.64733076',
'30.7629176200001, -86.5928893399999', '30.44417349, -86.59951996',
'30.4427800300001, -86.62941091'],
'status':['OPEN', 'CLOSED', 'OPEN', 'OPEN', 'OPEN', 'OPEN', 'OPEN']}
df1 = pd.DataFrame(data=dd)
d_out = {
f"destination{idx+1}":'; '.join(v) for idx, v in enumerate(zip(df1.name[1:], df1.coordinates[1:]))
}
d_out
{'destination1': 'CRESTVIEW RV PARK; 30.7190163500001, -86.5716222299999',
'destination2': 'HOMESTEAD TRAILER PARK; 30.5115772500001, -86.4628417499999',
'destination3': 'HOUSTON PARK MOBILE HOME PARK; 30.4424195300001, -86.64733076',
'destination4': 'HUDSON MOBILE HOME PARK; 30.7629176200001, -86.5928893399999',
'destination5': 'BEACH DRIVE MOBILE HOME PARK; 30.44417349, -86.59951996',
'destination6': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091'}
您不必通过字典理解来获得此结果,如果您可以像这样在 pandas 数据框中创建几列,就可以获得此结果。
df1['destination'] = [f"destination{k}" for k in range(len(df1))]
df1['value'] = df1['name'] + "; " + df1['coordinates']
df1[['destination', 'value']][1:].set_index("destination").to_dict()['value']
{'destination1': 'CRESTVIEW RV PARK; 30.7190163500001, -86.5716222299999',
'destination2': 'HOMESTEAD TRAILER PARK; 30.5115772500001, -86.4628417499999',
'destination3': 'HOUSTON PARK MOBILE HOME PARK; 30.4424195300001, -86.64733076',
'destination4': 'HUDSON MOBILE HOME PARK; 30.7629176200001, -86.5928893399999',
'destination5': 'BEACH DRIVE MOBILE HOME PARK; 30.44417349, -86.59951996',
'destination6': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091'}
您示例中的字典理解有两个 for 循环:
d1 = {
f"destination{k}":v + "; " + i
for k in range(1, len(df1)-1)
for v,i in zip(df1.name, df1.coordinates)
}
在这些循环中,k 独立于 v 和 i 进行迭代。第二个循环有很多问题(要理解它们,只需逐步执行 df1.name
、df1.coordinates
和 zip(df1.name, df1.coordinates)
操作,看看这是怎么回事——请注意 df1.name 是一个保留属性,指的是数据框的名称,而不是“名称”列)。
您真正想要的是为每一行循环遍历 df1 中的多个元素。为此,只需使用第一个循环,但在构建值时从 df 访问所需的元素:
d1 = {
f"destination{k}": (df1.loc[k, 'name'] + "; " + df1.loc[k, 'coordinates'])
for k in range(1, len(df1)-1)
}
查看有关理解的 this FullStack Python guide's 部分了解更多信息。
或者,(最好)使用 pandas!
d1 = pd.Series(
df1['name'] + '; ' + df['coordinates'],
index=('destination' + df.index.astype(str)),
)
如果此时您确实需要字典,可以使用 d1 = d1.to_dict()
将系列转换为字典
我有一个如下所示的数据框:
创建 df 的代码:
dd = {'name': ["HARDIE'S MOBILE HOME PARK", 'CRESTVIEW RV PARK',
'HOMESTEAD TRAILER PARK', 'HOUSTON PARK MOBILE HOME PARK',
'HUDSON MOBILE HOME PARK', 'BEACH DRIVE MOBILE HOME PARK',
'EVANS TRAILER PARK'],
'country': ['USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA'],
'coordinates': ['30.44126118, -86.6240656099999',
'30.7190163500001, -86.5716222299999',
'30.5115772500001, -86.4628417499999',
'30.4424195300001, -86.64733076',
'30.7629176200001, -86.5928893399999', '30.44417349, -86.59951996',
'30.4427800300001, -86.62941091'],
'status':['OPEN', 'CLOSED', 'OPEN', 'OPEN', 'OPEN', 'OPEN', 'OPEN']}
df2 = pd.DataFrame(data=dd)
我想做的是创建一个具有以下结构的字典:
{'destination1': 'CRESTVIEW RV PARK; 30.7190163500001, -86.5716222299999',
'destination2': 'HOMESTEAD TRAILER PARK; 30.5115772500001, -86.4628417499999',
'destination3': 'HOUSTON PARK MOBILE HOME PARK; 30.4424195300001, -86.64733076',
'destination4': 'HUDSON MOBILE HOME PARK; 30.7629176200001, -86.5928893399999',
'destination5': 'BEACH DRIVE MOBILE HOME PARK ; 30.44417349, -86.59951996'}
如您所见,每个值都必须包含名称;从第二行到最后一行的坐标。我正在使用以下代码来执行此操作:
d1 = {f"destination{k}":v + "; " + i for k in range(1, len(df1)-1) for v,i in zip(df1.name, df1.coordinates)}
但是,这是我得到的输出:
{'destination1': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091',
'destination2': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091',
'destination3': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091',
'destination4': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091',
'destination5': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091'}
它只读取数据帧的最后一行,每个键都有相同的值,但我想要的是,对于每个键,它的值必须来自数据帧的每一行,从第二行到最后一行.
如果有人知道如何做到这一点,我将非常感谢您的帮助。
你可以这样枚举 zip,
dd = {'name': ["HARDIE'S MOBILE HOME PARK", 'CRESTVIEW RV PARK',
'HOMESTEAD TRAILER PARK', 'HOUSTON PARK MOBILE HOME PARK',
'HUDSON MOBILE HOME PARK', 'BEACH DRIVE MOBILE HOME PARK',
'EVANS TRAILER PARK'],
'country': ['USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA'],
'coordinates': ['30.44126118, -86.6240656099999',
'30.7190163500001, -86.5716222299999',
'30.5115772500001, -86.4628417499999',
'30.4424195300001, -86.64733076',
'30.7629176200001, -86.5928893399999', '30.44417349, -86.59951996',
'30.4427800300001, -86.62941091'],
'status':['OPEN', 'CLOSED', 'OPEN', 'OPEN', 'OPEN', 'OPEN', 'OPEN']}
df1 = pd.DataFrame(data=dd)
d_out = {
f"destination{idx+1}":'; '.join(v) for idx, v in enumerate(zip(df1.name[1:], df1.coordinates[1:]))
}
d_out
{'destination1': 'CRESTVIEW RV PARK; 30.7190163500001, -86.5716222299999',
'destination2': 'HOMESTEAD TRAILER PARK; 30.5115772500001, -86.4628417499999',
'destination3': 'HOUSTON PARK MOBILE HOME PARK; 30.4424195300001, -86.64733076',
'destination4': 'HUDSON MOBILE HOME PARK; 30.7629176200001, -86.5928893399999',
'destination5': 'BEACH DRIVE MOBILE HOME PARK; 30.44417349, -86.59951996',
'destination6': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091'}
您不必通过字典理解来获得此结果,如果您可以像这样在 pandas 数据框中创建几列,就可以获得此结果。
df1['destination'] = [f"destination{k}" for k in range(len(df1))]
df1['value'] = df1['name'] + "; " + df1['coordinates']
df1[['destination', 'value']][1:].set_index("destination").to_dict()['value']
{'destination1': 'CRESTVIEW RV PARK; 30.7190163500001, -86.5716222299999',
'destination2': 'HOMESTEAD TRAILER PARK; 30.5115772500001, -86.4628417499999',
'destination3': 'HOUSTON PARK MOBILE HOME PARK; 30.4424195300001, -86.64733076',
'destination4': 'HUDSON MOBILE HOME PARK; 30.7629176200001, -86.5928893399999',
'destination5': 'BEACH DRIVE MOBILE HOME PARK; 30.44417349, -86.59951996',
'destination6': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091'}
您示例中的字典理解有两个 for 循环:
d1 = {
f"destination{k}":v + "; " + i
for k in range(1, len(df1)-1)
for v,i in zip(df1.name, df1.coordinates)
}
在这些循环中,k 独立于 v 和 i 进行迭代。第二个循环有很多问题(要理解它们,只需逐步执行 df1.name
、df1.coordinates
和 zip(df1.name, df1.coordinates)
操作,看看这是怎么回事——请注意 df1.name 是一个保留属性,指的是数据框的名称,而不是“名称”列)。
您真正想要的是为每一行循环遍历 df1 中的多个元素。为此,只需使用第一个循环,但在构建值时从 df 访问所需的元素:
d1 = {
f"destination{k}": (df1.loc[k, 'name'] + "; " + df1.loc[k, 'coordinates'])
for k in range(1, len(df1)-1)
}
查看有关理解的 this FullStack Python guide's 部分了解更多信息。
或者,(最好)使用 pandas!
d1 = pd.Series(
df1['name'] + '; ' + df['coordinates'],
index=('destination' + df.index.astype(str)),
)
如果此时您确实需要字典,可以使用 d1 = d1.to_dict()