使用 python 合并两个文件?
Combining two files using python?
我使用 pycharm 版本 2.7.9 和 pandas 版本 - 0.19.2。
我有一个名为 - 'data' 的 csv 文件,其中包含带有纬度和经度的站名数据。
df = pd.read_csv(data.csv, sep="^", dtype=object)
df["train_board_station"] = ['Tokyo','LA','Paris','New_York','Delhi']
df["train_off_station"] = ['Phoenix','London','Sydney','Berlin','Shanghai']
我拿了另一个 csv 文件,如果 'train_board_station' 和 'train_off_station' 与 'station' 相同,则向其中添加纬度和经度数据。
ref = pd.read_csv(ref.csv, sep="^", header=0, dtype=str)
df["station"] = ['Tokyo','London','Paris','New_York','Shanghai,''LA','Sydney','Berlin','Phoenix','Delhi']
df["latitude"] = ['-34.54','56.789','-78,98','45.62','111.67','23.78','-98.40','-76.89','23.98','23.89']
df["longitude"] = ['34.89','-78.55','78.89','34.12','56.56','23.23','-78.65','34.76','23.67','21.645']
我想合并 'ref.csv' 文件中的 'latitude' 和 'longitude' 如果 'train_board_station'、'train_off_station' 与 'station' 在data.csv
for x in ["board", "off"]:
df["station"] = df["train_" + x + "_station"]
df = pd.concat([df, ref], axis=1, join_axes=[df.index])
df[x + "_latitude"] = df["latitude"]
df[x + "_longitude"] = df["longitude"]
当我尝试 运行 代码时。我得到的错误是 -
KeyError: 'train_board_station'
这里是一个使用.join
函数的例子。首先是一个最小的设置。
import pandas as pd
df = pd.DataFrame({
"station": ["Tokyo", "Amsterdam"],
"size": [100, 200]
})
ref = pd.DataFrame({
"train_station": ["Amsterdam", "Tokyo"],
"lon": [23, 12],
"lat": [34, 12]
})
加入的关键是在两个数据帧上设置索引。
df.set_index("station").join(ref.set_index("train_station"))
出
df = pd.DataFrame()
df["train_board_station"] = ['Tokyo','LA','Paris','New_York','Delhi']
df["train_off_station"] = ['Phoenix','London','Sydney','Berlin','Shanghai']
#need ref DataFrame
ref = pd.DataFrame()
ref["station"] = ['Tokyo','London','Paris','New_York','Shanghai','LA','Sydney','Berlin','Phoenix','Delhi']
ref["latitude"] = ['-34.54','56.789','-78,98','45.62','111.67','23.78','-98.40','-76.89','23.98','23.89']
ref["longitude"] = ['34.89','-78.55','78.89','34.12','56.56','23.23','-78.65','34.76','23.67','21.645']
我相信你可以在循环中使用 join
:
for x in ["board", 'off']:
val = "train_" + x + "_station"
df1 = df[[val]].join(ref.set_index('station'), on=val)
df[x + "_latitude"] = df1["latitude"]
df[x + "_longitude"] = df1["longitude"]
print (df)
train_board_station train_off_station board_latitude board_longitude \
0 Tokyo Phoenix -34.54 34.89
1 LA London 23.78 23.23
2 Paris Sydney -78,98 78.89
3 New_York Berlin 45.62 34.12
4 Delhi Shanghai 23.89 21.645
off_latitude off_longitude
0 23.98 23.67
1 56.789 -78.55
2 -98.40 -78.65
3 -76.89 34.76
4 111.67 56.56
我使用 pycharm 版本 2.7.9 和 pandas 版本 - 0.19.2。 我有一个名为 - 'data' 的 csv 文件,其中包含带有纬度和经度的站名数据。
df = pd.read_csv(data.csv, sep="^", dtype=object)
df["train_board_station"] = ['Tokyo','LA','Paris','New_York','Delhi']
df["train_off_station"] = ['Phoenix','London','Sydney','Berlin','Shanghai']
我拿了另一个 csv 文件,如果 'train_board_station' 和 'train_off_station' 与 'station' 相同,则向其中添加纬度和经度数据。
ref = pd.read_csv(ref.csv, sep="^", header=0, dtype=str)
df["station"] = ['Tokyo','London','Paris','New_York','Shanghai,''LA','Sydney','Berlin','Phoenix','Delhi']
df["latitude"] = ['-34.54','56.789','-78,98','45.62','111.67','23.78','-98.40','-76.89','23.98','23.89']
df["longitude"] = ['34.89','-78.55','78.89','34.12','56.56','23.23','-78.65','34.76','23.67','21.645']
我想合并 'ref.csv' 文件中的 'latitude' 和 'longitude' 如果 'train_board_station'、'train_off_station' 与 'station' 在data.csv
for x in ["board", "off"]:
df["station"] = df["train_" + x + "_station"]
df = pd.concat([df, ref], axis=1, join_axes=[df.index])
df[x + "_latitude"] = df["latitude"]
df[x + "_longitude"] = df["longitude"]
当我尝试 运行 代码时。我得到的错误是 -
KeyError: 'train_board_station'
这里是一个使用.join
函数的例子。首先是一个最小的设置。
import pandas as pd
df = pd.DataFrame({
"station": ["Tokyo", "Amsterdam"],
"size": [100, 200]
})
ref = pd.DataFrame({
"train_station": ["Amsterdam", "Tokyo"],
"lon": [23, 12],
"lat": [34, 12]
})
加入的关键是在两个数据帧上设置索引。
df.set_index("station").join(ref.set_index("train_station"))
出
df = pd.DataFrame()
df["train_board_station"] = ['Tokyo','LA','Paris','New_York','Delhi']
df["train_off_station"] = ['Phoenix','London','Sydney','Berlin','Shanghai']
#need ref DataFrame
ref = pd.DataFrame()
ref["station"] = ['Tokyo','London','Paris','New_York','Shanghai','LA','Sydney','Berlin','Phoenix','Delhi']
ref["latitude"] = ['-34.54','56.789','-78,98','45.62','111.67','23.78','-98.40','-76.89','23.98','23.89']
ref["longitude"] = ['34.89','-78.55','78.89','34.12','56.56','23.23','-78.65','34.76','23.67','21.645']
我相信你可以在循环中使用 join
:
for x in ["board", 'off']:
val = "train_" + x + "_station"
df1 = df[[val]].join(ref.set_index('station'), on=val)
df[x + "_latitude"] = df1["latitude"]
df[x + "_longitude"] = df1["longitude"]
print (df)
train_board_station train_off_station board_latitude board_longitude \
0 Tokyo Phoenix -34.54 34.89
1 LA London 23.78 23.23
2 Paris Sydney -78,98 78.89
3 New_York Berlin 45.62 34.12
4 Delhi Shanghai 23.89 21.645
off_latitude off_longitude
0 23.98 23.67
1 56.789 -78.55
2 -98.40 -78.65
3 -76.89 34.76
4 111.67 56.56