如何按 Python 中的位置分隔 CSV 列

Question

我想将数据分成 3 列，形成 CVS 文件中的一列。原始文件如下所示：

0400000006340000000000965871       
0700000007850000000000336487    
0100000003360000000000444444

我想将列分开以类似于下面的列表，同时仍保留前导零：

04 0000000634 0000000000965871   
07 0000000785 0000000000336487   
01 0000000336 0000000000444444

我可以将文件上传到 Python，但我不知道必须使用哪个定界符或定位。到目前为止我的代码：

import pandas as pd   
df = pd.read_cvs('new_numbers.txt', header=None)

感谢您的帮助。

Answer 1

好像没有定界符，您使用的是固定长度。

按列表符号中的位置访问固定长度。

例如：

str1 = "0400000006340000000000965871"

str1A = str1[:2]
str1B = str1[3:14]
str1C = str1[14:]

除非您需要远端的数据帧，否则我不会特别在意 pandas。

Answer 2

您不需要 pandas 加载您的文本文件并读取其内容（而且，您没有加载 csv 文件）。

with open("new_numbers.txt") as f:
    lines = f.readlines()

我建议你使用 re 模块。

import re

PATTERN = re.compile(r"(0*[1-9]+)(0*[1-9]+)(0*[1-9]+)")

您可以 check here 您的示例中此表达式的结果。

然后你需要从你的行中获取匹配项，并用 space.

加入它们

matches = []
for line in lines:
    match = PATTERN.match(line)
    first, second, third = match.group(1, 2, 3)
    matches.append(" ".join([first, second, third]))

最后，matches 将是一个由 space 分隔的数字（带前导零）的数组。

此时你可以将它们写入另一个文件，或者用它做任何你需要做的事情。

towrite = "\n".join(matches)

with open("output.txt", "w") as f:
    f.write(towrite)

Answer 3

使用 pandas read_fwf() 方法 - 代表“固定宽度格式”：

pd.read_fwf('new_numbers.txt', widths=[2, 10, 16], header=None)

这将删除前导零：

   0    1       2
0  4  634  965871
1  7  785  336487
2  1  336  444444

要保留它们，请将 dtype 指定为带有 object:

的字符串

pd.read_fwf('new_numbers.txt', widths=[2, 10, 16], dtype=object, header=None)

输出：

    0           1                 2
0  04  0000000634  0000000000965871
1  07  0000000785  0000000000336487
2  01  0000000336  0000000000444444

如何按 Python 中的位置分隔 CSV 列

How to separate a CVS column by position in Python

python

csv

fixed-width

pandas