特定位置字符的识别和转换

Recognition and conversion of characters in specific positions

输入:

0      1     2
TNN    R11W  MSLQEMFRFPRGLLLGSVLLVASAPATL
ASTN1  E5V   MALAALCALLACCWGPAAVLATAAGDVDPSK
HSPB7  H19P  MSHRTSSTFRAERSFHSSHSSSSSSTSSSASRALPAQDPPMEK
CLCNKB C3Y   MECFVGLREGSSGNPVTLQELWGPCPRIRRGIRG
SZRD1  P10L  MEDEEVAESWEEAADSGEIDRRLEKKL

预期输出:

0      1     2
TNN    R11W  MSLQEMFRFPWGLLLGSVLLVASAPATL
ASTN1  E5V   NaN
HSPB7  H19P  MSHRTSSTFRAERSFHSSPSSSSSSTSSSASRALPAQDPPMEK
CLCNKB C3Y   MEYFVGLREGSSGNPVTLQELWGPCPRIRRGIRG
SZRD1  P10L  NaN

代码:示例

with open('temp.txt', 'w') as fw:
    for x in range(len(merge_two_files[1])):
        for i in range(len(merge_two_files[2])):
            if merge_two_files[1][x] == something:
                data = anything
                fw.write(str(data))

我想用 'column 1' 的索引更改 'column 2' 中的一个字符。例如,在第一行中,如果我检查 'column 1' 的索引,我将在 'column 2' 的第 11 个字符中查找 'R'。如果字符是'R',我想把它改成'W'。如果没有,我想在单元格中写 'NaN' 。抱歉,Pandas 对我有什么建议吗?

编写自定义函数:

def replace_char(row):
    # explode 'R11W' into c='R', p=11, r='W'
    c, p, r =  (row[1][0], int(row[1][1:-1]), row[1][-1])
    s1 = row[2]
    s2 = np.NaN
    if s1[p-1] == c:
        s2 = f"{s1[:p-1]}{r}{s1[p:]}"
    return s2

df[2] = df.apply(replace_char, axis=1)

输出:

>>> df
        0     1                                            2
0     TNN  R11W                 MSLQEMFRFPWGLLLGSVLLVASAPATL
1   ASTN1   E5V                                          NaN
2   HSPB7  H19P  MSHRTSSTFRAERSFHSSPSSSSSSTSSSASRALPAQDPPMEK
3  CLCNKB   C3Y           MEYFVGLREGSSGNPVTLQELWGPCPRIRRGIRG
4   SZRD1  P10L                                          NaN

设置:

df = pd.read_csv('file1', header=None)
print(df)

# Output:
        0     1                                            2
0   ASTN1   E5V              MALAALCALLACCWGPAAVLATAAGDVDPSK
1   HSPB7  H19P  MSHRTSSTFRAERSFHSSHSSSSSSTSSSASRALPAQDPPMEK
2  CLCNKB   C3Y           MECFVGLREGSSGNPVTLQELWGPCPRIRRGIRG
3   SZRD1  P10L                  MEDEEVAESWEEAADSGEIDRRLEKKL

这是我的回答:

data = [["TNN",    "R11W",  "MSLQEMFRFPRGLLLGSVLLVASAPATL"], ["ASTN1",  "E5V",   "MALAALCALLACCWGPAAVLATAAGDVDPSK"],
["HSPB7",  "H19P",  "MSHRTSSTFRAERSFHSSHSSSSSSTSSSASRALPAQDPPMEK"],
["CLCNKB", "C3Y",   "MECFVGLREGSSGNPVTLQELWGPCPRIRRGIRG"],
["SZRD1",  "P10L",  "MEDEEVAESWEEAADSGEIDRRLEKKL"]]


result = []

for row in data:
    res_row = []
    res_row.append(row[0])
    res_row.append(row[1])

    c1 = row[1][0]
    c2 = row[1][-1]
    num = int(row[1][1:-1])


    if row[2][num-1] == c1:
        c3 = (row[2])
        l = list(c3)
        l[num-1] = c2
        c3=''.join(l)
        res_row.append(c3)
    else:
        res_row.append("NaN")

    result.append(res_row)

print(result)