从一个文件中提取一行值，并将该值放入另一个文件中的另一行（文件名对应于前一个文件的行）

Question

我有一个 CSV 文件名列表（在另一个名为 CSV_file_1 的 CSV 文件中）。但是，我想在 CSV_file_1 中添加额外的两列，其中行值将来自数千个单独的 CSV 文件。

CSV_file_1内容如下：

1.csv
2.csv
3.csv

在另一个文件夹中的数千个文件中，它包含我想放入 CSV_file_1 中的值。例如，1.csv 包含以下行：

LATITUDE : ;13.63345
LONGITUDE : ;123.207083

2.csv 包含以下行：

LATITUDE : ;13.11111
LONGITUDE : ;123.22222

3.csv 包含以下行：

LATITUDE : ;13.22222
LONGITUDE : ;123.11111

等等。

我想要CSV_file_1的结果如下：

FILENAME:      LATITUDE:     LONGITUDE:
    1.csv      13.63345      123.207083
    2.csv      13.11111      123.22222
    3.csv      13.22222      123.11111

我已经设法得到我的CSV_file_1，但还没有纬度和经度（这将来自如上所示分隔的单个文件）。

我的代码是这样的：

import pandas as pd

import glob

print(glob.glob("D:/2021/*.csv")) 

#list of all the filenames collated and put in CSV_file_1
CSV_file_1 = pd.DataFrame(glob.glob("D:/2021/*.csv")) 


 #creating blank columns in CSV_file_1
CSV_file_1 ['Latitude'] = ""
CSV_file_1 ['Longitude'] = ""

#here im trying to access each file in the given folder(file name must correspond to the row in CSV_file_1), extract the data (latitude and longitude) and copy it to CSV_file_1
 import csv
 with open('D:/2021/*.csv','rt')as file:
      data = csv.reader(file)
      for row in file:
            if glob.glob("D:/2021/*.csv") = CSV_file_1['FILENAME']:
                CSV_file_1.iloc[i] ['LATITUDE:'] ==file.iloc[i]
        
        
        
    CSV_file_1.to_csv('D:/2021/CSV_file_1.csv', index = False)

但语法无效。

 if glob.glob("D:/2021/*.csv") = CSV_file_1['FILENAME']:
            ^
SyntaxError: invalid syntax

我是 python 新手，所以我想寻求帮助来修复我的代码。

Answer 1

如果我正确理解你的问题，我认为你的方法有点复杂。我实现了一个创建所需输出的脚本。

首先，将包含其他文件名称的 CSV 文件直接读入数据框的第一列。然后，文件名用于从每个文件中提取经度和纬度。为此，我创建了一个函数，您可以在脚本的第一部分看到它。最后，我将提取的值添加到数据框中，并将其存储在所需格式的文件中。

import pandas as pd
import csv

# Function that takes 
def get_lati_and_long_from_csv(csv_path):
    with open(csv_path,'rt') as file:
        # Read csv file content to list of rows
        data = list(csv.reader(file, delimiter =';'))
        
        # Take values from row zero and one
        latitude = data[0][1]
        longitude = data[1][1]
      
        
        return (latitude, longitude)

def main():      
    # Define path of first csv file
    csv_file_1_path = "CSV_file_1.csv"

    # Read data frame from csv file and create correct column name
    CSV_file_1 = pd.read_csv(csv_file_1_path, header=None)
    CSV_file_1.columns = ['FILENAME:']
    
    # Create list of files to read the coordinates
    list_of_csvs = list(CSV_file_1['FILENAME:'])

    # Define empty lists to add the coordinates
    lat_list = []
    lon_list = []
    
    # Iterate over all csv files and extract longitude and latitude
    for csv_path in list_of_csvs:
        lat, lon = get_lati_and_long_from_csv(csv_path)
        lat_list.append(lat)
        lon_list.append(lon)
        
    # Add coordinates to the data frame
    CSV_file_1['Latitude:'] = lat_list
    CSV_file_1['Longitude:'] = lon_list
 
    # Save final data frame to csv file
    CSV_file_1.to_csv(csv_file_1_path+'.out', index = False, sep='\t')
    
if __name__ == "__main__":
    main()

测试输入文件内容：

1.csv
2.csv
3.csv

测试输出文件内容：

FILENAME:   Latitude:   Longitude:
1.csv   13.63345    123.207083  
2.csv   13.11111    123.22222 
3.csv   13.22222    123.11111

编辑： 如果您的文件不包含任何其他数据，我会建议简化事情并删除 pandas 因为它不需要。以下 main() 函数产生相同的结果，但仅使用 CSV 模块。

def main():      
    # Define path of first csv file
    csv_file_1_path = "CSV_file_1.csv"

    # Read file to list containing the paths of the other csv files
    with open(csv_file_1_path,'rt') as file:
        list_of_csvs = file.read().splitlines()
        
    print(list_of_csvs)
    # Define empty lists to add the coordinates
    lat_list = []
    lon_list = []
    
    # Iterate over all csv files and extract longitude and latitude
    for csv_path in list_of_csvs:
        lat, lon = get_lati_and_long_from_csv(csv_path)
        lat_list.append(lat)
        lon_list.append(lon)
    
    # Combine the three different lists to create the rows of the new csv file
    data = list(zip(list_of_csvs, lat_list, lon_list))
    
    # Create the headers and combine them with the other rows
    rows = [['FILENAME:', 'Latitude:', 'Longitude:']]
    rows.extend(data)
    
    # Write everything to the final csv file
    with open(csv_file_1_path + '.out','w') as file:
        csv_writer = csv.writer(file, dialect='excel', delimiter='\t')
        csv_writer.writerows(rows)

从一个文件中提取一行值，并将该值放入另一个文件中的另一行（文件名对应于前一个文件的行）

Extracting a row value from one file, and putting that value to another row in another file (with filename corresponds to row of the previous file)

python

dataframe

data-extraction

pandas