使用 Python 分隔频率并多次写入值 (7895 7895 7895 7895) 而不是 (4*7895)

Question

我是一个基本的 Python 用户，我有一个很大的文本数据文件 (OUT2.txt)，其中有许多写成 2*150 的值，这意味着有两个 150 个值 (150 150) 或 4*7895 表示四个 7895 值 ( 7895 7895 7895 7895)。我想将所有这些类型的值更改为彼此相邻的值，这意味着 7895 7895 7895 7895 而不是 4*7895.

尝试过此代码但出现以下错误：

**parts = fl.split()
AttributeError: 'list' object has no attribute 'split'**

fl = open('OUT2.txt', 'r').readlines()
parts = fl.split()
lst = []
for part in parts:
    _parts = part.split('*')
    if len(_parts) == 1:
        lst.append(_parts[0])
    else:
        times = int(_parts[0])
        for i in range(times):
            lst.append(_parts[1])
open('OUT.3.txt','w+').writelines(lst)

请提出任何建议。谢谢。

来自这个文本数据文件示例

2*8.17997 723.188 4*33.33 3*11.0524 380.811 149.985 5*13.9643 22.8987 76.2205 2*24.7059 64.821

进入这个

8.17997 8.17997 723.188 33.33 33.33 33.3 3 33.33 11.0524 11.0524 11.0524 and so on...

Answer 1

以下应该有效

with open('in.txt') as f:
    out_lines = []
    lines = [l.strip() for l in f.readlines()]
    for l in lines:
        parts = l.split()
        lst = []
        for part in parts:
            _parts = part.split('*')
            if len(_parts) == 1:
                lst.append(_parts[0])
            else:
                times = int(_parts[0])
                for i in range(times):
                    lst.append(_parts[1])
        out_lines.append(' '.join(lst))
with open('out.txt', 'w') as f1:
    for line in out_lines:
        f1.write(line + '\n')

in.txt

2*8.17997 723.188 4*33.33 3*11.0524 380.811 149.985 5*13.9643 22.8987 76.2205 2*24.7059 64.821
10*8.17997 723.188 4*33.33 3*11.0524 380.811 149.985 5*13.9643 22.8987 76.2205 2*24.7059 64.821

out.txt

8.17997 8.17997 723.188 33.33 33.33 33.33 33.33 11.0524 11.0524 11.0524 380.811 149.985 13.9643 13.9643 13.9643 13.9643 13.9643 22.8987 76.2205 24.7059 24.7059 64.821
8.17997 8.17997 8.17997 8.17997 8.17997 8.17997 8.17997 8.17997 8.17997 8.17997 723.188 33.33 33.33 33.33 33.33 11.0524 11.0524 11.0524 380.811 149.985 13.9643 13.9643 13.9643 13.9643 13.9643 22.8987 76.2205 24.7059 24.7059 64.821

Answer 2

拆分字符串，拆分 * 并转换回字符串

s = "2*8.17997 723.188 4*33.33 3*11.0524 380.811 149.985 5*13.9643 22.8987 76.2205 2*24.7059 64.821"

# split the string
l = s.split()

# split on "*"
l = [x.split('*') for x in l]

# multiply recurring values, keep the single ones
l = [x[0] if len(x) == 1 else " ".join([x[1]] * int(x[0])) for x in l]

# join back to a string
result = " ".join(l)

如果一个项目没有 *，它只是作为一个字符串保存（x[0] 因为 split("*") 将返回一个单一的元素列表）。如果是，则 split("*") 将返回 2 个值，第一个 x[0] 需要解析为 int，[x[1]] * i 是 i 重复项的列表，这些重复项被连接在白色 space :

>>> ["11.883"] * 4
["11.883", "11.883", "11.883", "11.883"]
>>> " ".join(["11.883"] * 4)
>>> "11.883 11.883 11.883 11.883"

Answer 3

尝试使用正则表达式：

import re

# this is what you'll have after you read the file, for example
text = "2*8.17997 723.188 4*33.33 3*11.0524"

matches = re.findall(r'(\d+\*)?(\d+\.\d+)', text)
# matches = [('2*', '8.17997'), ('', '723.188'), ('4*', '33.33'), ('3*', '11.0524')]

output = []
for match in matches:
    if match[0]:
        times = int(match[0][:-1])  # remove the `*`
    else:
        times = 1  # no `x*y` means one time y
    for _ in range(times):
        output.append(match[1])

output_str = ' '.join(output)
# output_str = '8.17997 8.17997 723.188 33.33 33.33 33.33 33.33 11.0524 11.0524 11.0524'

这段代码不是很好，只是为了让你理解这个想法。这里有趣的部分是正则表达式。您可以在此处查看更多详细信息：https://regex101.com/

使用 Python 分隔频率并多次写入值 (7895 7895 7895 7895) 而不是 (4*7895)

Separate frequencies using Python and write values as many times (7895 7895 7895 7895) instead of (4*7895)

python

frequency

text-files