特定位置字符的识别和转换
Recognition and conversion of characters in specific positions
输入:
0 1 2
TNN R11W MSLQEMFRFPRGLLLGSVLLVASAPATL
ASTN1 E5V MALAALCALLACCWGPAAVLATAAGDVDPSK
HSPB7 H19P MSHRTSSTFRAERSFHSSHSSSSSSTSSSASRALPAQDPPMEK
CLCNKB C3Y MECFVGLREGSSGNPVTLQELWGPCPRIRRGIRG
SZRD1 P10L MEDEEVAESWEEAADSGEIDRRLEKKL
预期输出:
0 1 2
TNN R11W MSLQEMFRFPWGLLLGSVLLVASAPATL
ASTN1 E5V NaN
HSPB7 H19P MSHRTSSTFRAERSFHSSPSSSSSSTSSSASRALPAQDPPMEK
CLCNKB C3Y MEYFVGLREGSSGNPVTLQELWGPCPRIRRGIRG
SZRD1 P10L NaN
代码:示例
with open('temp.txt', 'w') as fw:
for x in range(len(merge_two_files[1])):
for i in range(len(merge_two_files[2])):
if merge_two_files[1][x] == something:
data = anything
fw.write(str(data))
我想用 'column 1' 的索引更改 'column 2' 中的一个字符。例如,在第一行中,如果我检查 'column 1' 的索引,我将在 'column 2' 的第 11 个字符中查找 'R'。如果字符是'R',我想把它改成'W'。如果没有,我想在单元格中写 'NaN' 。抱歉,Pandas 对我有什么建议吗?
编写自定义函数:
def replace_char(row):
# explode 'R11W' into c='R', p=11, r='W'
c, p, r = (row[1][0], int(row[1][1:-1]), row[1][-1])
s1 = row[2]
s2 = np.NaN
if s1[p-1] == c:
s2 = f"{s1[:p-1]}{r}{s1[p:]}"
return s2
df[2] = df.apply(replace_char, axis=1)
输出:
>>> df
0 1 2
0 TNN R11W MSLQEMFRFPWGLLLGSVLLVASAPATL
1 ASTN1 E5V NaN
2 HSPB7 H19P MSHRTSSTFRAERSFHSSPSSSSSSTSSSASRALPAQDPPMEK
3 CLCNKB C3Y MEYFVGLREGSSGNPVTLQELWGPCPRIRRGIRG
4 SZRD1 P10L NaN
设置:
df = pd.read_csv('file1', header=None)
print(df)
# Output:
0 1 2
0 ASTN1 E5V MALAALCALLACCWGPAAVLATAAGDVDPSK
1 HSPB7 H19P MSHRTSSTFRAERSFHSSHSSSSSSTSSSASRALPAQDPPMEK
2 CLCNKB C3Y MECFVGLREGSSGNPVTLQELWGPCPRIRRGIRG
3 SZRD1 P10L MEDEEVAESWEEAADSGEIDRRLEKKL
这是我的回答:
data = [["TNN", "R11W", "MSLQEMFRFPRGLLLGSVLLVASAPATL"], ["ASTN1", "E5V", "MALAALCALLACCWGPAAVLATAAGDVDPSK"],
["HSPB7", "H19P", "MSHRTSSTFRAERSFHSSHSSSSSSTSSSASRALPAQDPPMEK"],
["CLCNKB", "C3Y", "MECFVGLREGSSGNPVTLQELWGPCPRIRRGIRG"],
["SZRD1", "P10L", "MEDEEVAESWEEAADSGEIDRRLEKKL"]]
result = []
for row in data:
res_row = []
res_row.append(row[0])
res_row.append(row[1])
c1 = row[1][0]
c2 = row[1][-1]
num = int(row[1][1:-1])
if row[2][num-1] == c1:
c3 = (row[2])
l = list(c3)
l[num-1] = c2
c3=''.join(l)
res_row.append(c3)
else:
res_row.append("NaN")
result.append(res_row)
print(result)
输入:
0 1 2
TNN R11W MSLQEMFRFPRGLLLGSVLLVASAPATL
ASTN1 E5V MALAALCALLACCWGPAAVLATAAGDVDPSK
HSPB7 H19P MSHRTSSTFRAERSFHSSHSSSSSSTSSSASRALPAQDPPMEK
CLCNKB C3Y MECFVGLREGSSGNPVTLQELWGPCPRIRRGIRG
SZRD1 P10L MEDEEVAESWEEAADSGEIDRRLEKKL
预期输出:
0 1 2
TNN R11W MSLQEMFRFPWGLLLGSVLLVASAPATL
ASTN1 E5V NaN
HSPB7 H19P MSHRTSSTFRAERSFHSSPSSSSSSTSSSASRALPAQDPPMEK
CLCNKB C3Y MEYFVGLREGSSGNPVTLQELWGPCPRIRRGIRG
SZRD1 P10L NaN
代码:示例
with open('temp.txt', 'w') as fw:
for x in range(len(merge_two_files[1])):
for i in range(len(merge_two_files[2])):
if merge_two_files[1][x] == something:
data = anything
fw.write(str(data))
我想用 'column 1' 的索引更改 'column 2' 中的一个字符。例如,在第一行中,如果我检查 'column 1' 的索引,我将在 'column 2' 的第 11 个字符中查找 'R'。如果字符是'R',我想把它改成'W'。如果没有,我想在单元格中写 'NaN' 。抱歉,Pandas 对我有什么建议吗?
编写自定义函数:
def replace_char(row):
# explode 'R11W' into c='R', p=11, r='W'
c, p, r = (row[1][0], int(row[1][1:-1]), row[1][-1])
s1 = row[2]
s2 = np.NaN
if s1[p-1] == c:
s2 = f"{s1[:p-1]}{r}{s1[p:]}"
return s2
df[2] = df.apply(replace_char, axis=1)
输出:
>>> df
0 1 2
0 TNN R11W MSLQEMFRFPWGLLLGSVLLVASAPATL
1 ASTN1 E5V NaN
2 HSPB7 H19P MSHRTSSTFRAERSFHSSPSSSSSSTSSSASRALPAQDPPMEK
3 CLCNKB C3Y MEYFVGLREGSSGNPVTLQELWGPCPRIRRGIRG
4 SZRD1 P10L NaN
设置:
df = pd.read_csv('file1', header=None)
print(df)
# Output:
0 1 2
0 ASTN1 E5V MALAALCALLACCWGPAAVLATAAGDVDPSK
1 HSPB7 H19P MSHRTSSTFRAERSFHSSHSSSSSSTSSSASRALPAQDPPMEK
2 CLCNKB C3Y MECFVGLREGSSGNPVTLQELWGPCPRIRRGIRG
3 SZRD1 P10L MEDEEVAESWEEAADSGEIDRRLEKKL
这是我的回答:
data = [["TNN", "R11W", "MSLQEMFRFPRGLLLGSVLLVASAPATL"], ["ASTN1", "E5V", "MALAALCALLACCWGPAAVLATAAGDVDPSK"],
["HSPB7", "H19P", "MSHRTSSTFRAERSFHSSHSSSSSSTSSSASRALPAQDPPMEK"],
["CLCNKB", "C3Y", "MECFVGLREGSSGNPVTLQELWGPCPRIRRGIRG"],
["SZRD1", "P10L", "MEDEEVAESWEEAADSGEIDRRLEKKL"]]
result = []
for row in data:
res_row = []
res_row.append(row[0])
res_row.append(row[1])
c1 = row[1][0]
c2 = row[1][-1]
num = int(row[1][1:-1])
if row[2][num-1] == c1:
c3 = (row[2])
l = list(c3)
l[num-1] = c2
c3=''.join(l)
res_row.append(c3)
else:
res_row.append("NaN")
result.append(res_row)
print(result)