如果列没有标题,如何使用 python 分隔单列 CSV 文件,然后将其保存到新的 excel 文件中?

How to use python to seperate a one column CSV file if the columns have no headings, then save this into a new excel file?

所以,我对 python 很陌生,一直在谷歌搜索,但没有找到好的解决方案。我想要做的是在没有 headers.

的 excel 文档中使用 python 自动将文本分列

这是excelsheet我有

it is a CSV file where all the data is in one column without headers

例如。嗨 ho loe 时间工作理发师 吉姆琼你好

009 00487 08234 0240 2.0348 20.34829

分隔符是space和逗号

What I want to come out is saved in another excel with the first two rows deleted and seperated into columns (这可以使用 excel 中的文本到列来完成,但我想在几个 excel sheet 中自动执行此操作)

009 | 00487 | 08234 | 0240 | 2.0348 | 20.34829

到目前为止我写的代码是这样的:

    import pandas as pd
    import csv


    path = 'C:/Users/ionan/OneDrive - Universiteit Utrecht/Desktop/UCU/test_excel'

    os.chdir(path)

    for root, dirs, files in os.walk(path):


        for f in files:

            df = pd.read_csv(f, delimiter='\t' + ';', engine = 'python') 

原始文件名称为data.xlsx:

这意味着我们需要的所有数据都在 Data 栏下。

将单个文件的数据拆分为多列的代码:

import pandas as pd 
import numpy as np 

f = 'data.xlsx'

# -- Insert the following code in your `for f in files` loop -- 
file_data = pd.read_excel(f) 

# Since number of values to be split is not known, set the value of `num_cols` to
# number of columns you expect in the modified excel file
num_cols = 20

# Create a dataframe with twenty columns 
new_file = pd.DataFrame(columns = ["col_{}".format(i) for i in range(num_cols)])

# Change the column name of the first column in new_file to "Data"
new_file = new_file.rename(columns = {"col_0": file_data.columns[0]})

# Add the value of the first cell in the original file to the first cell of the 
# new excel file
new_file.loc[0, new_file.columns[0]] = file_data.iloc[0, 0]

# Loop through all rows of original excel file
for index, row in file_data.iterrows():

    # Skip the first row
    if index == 0:
        continue

    # Split the row by `space`. This gives us a list of strings.
    split_data = file_data.loc[index, "Data"].split(" ")
    print(split_data)

    # Convert each element to a float (a number) if we want numbers and not strings
    # split_data = [float(i) for i in split_data]

    # Make sure the size of the list matches to the number of columns in the `new_file` 
    # np.NaN represents no value. 
    split_data = [np.NaN]  + split_data + [np.NaN] * (num_cols - len(split_data) - 1)

    # Store the list at a given index using `.loc` method
    new_file.loc[index] = split_data

# Drop all the columns where there is not a single number
new_file.dropna(axis=1, how='all', inplace=True)

# Get the original excel file name
new_file_name = f.split(".")[0]

# Save the new excel file at the same location where the original file is. 
new_file.to_excel(new_file_name + "_modified.xlsx", index=False)

这将创建一个名为 data_modified.xlsx 的新 excel 文件(带有一个 sheet):

总结(无注释的代码):

import pandas as pd 
import numpy as np 

f = 'data.xlsx'

file_data = pd.read_excel(f) 

num_cols = 20
new_file = pd.DataFrame(columns = ["col_{}".format(i) for i in range(num_cols)])
new_file = new_file.rename(columns = {"col_0": file_data.columns[0]})
new_file.loc[0, new_file.columns[0]] = file_data.iloc[0, 0]

for index, row in file_data.iterrows():

    if index == 0:
        continue

    split_data = file_data.loc[index, "Data"].split(" ")
    split_data = [np.NaN]  + split_data + [np.NaN] * (num_cols - len(split_data) - 1)
    new_file.loc[index] = split_data

new_file.dropna(axis=1, how='all', inplace=True)
new_file_name = f.split(".")[0]
new_file.to_excel(new_file_name + "_modified.xlsx", index=False)