如果值在此列表中并且列值中的某处,我可以用列表值替换该列值吗? Pandas 数据框
If value is in this list and somewhere in a column's value can I replace that column value with list value? Pandas DataFrame
有很多这方面的文档,但我还是搞不懂。
这是我需要检查这些值之一是否在我的列值中的列表。如果是这样,用列表值替换整个单元格。
active_crews = ["CREW #101", "CREW #102", "CREW #203", "CREW #301", "CREW #404", "CREW #501", "CREW #406", "CREW #304", "CREW #701", "CREW #702", "CREW #703", "CREW #704", "CREW #705", "CREW #706",
"CREW #707" "CREW #708", "CREW #801", "CREW #802", "CREW #803", "CREW #805"]
我要替换的数据示例。是的,格式也有细微差别:
Debris Crew WO#
REFER TO IAP 12/16 TO 12/19 CREW #405
REFER TO IAP 06/02 TO 06/05 CREW #406
REFER TO IAP 03/24TO 03/27 CREW # 803
预计输出
Debris Crew WO#
CREW #405
CREW #406
CREW #803
我的问题是我不知道如何告诉 python 使用列表搜索列值以查找匹配项。以及该列表值是否在该列值中。用列表值替换当前列值
我尝试过的代码:
1)
df.loc[df['Debris Crew WO#'] == active_crews, 'Debris Crew WO#']
# doesn't work. This was done before research lol I get the following error, which makes sense
# ValueError: ('Lengths must match to compare', (2216,), (19,))
df.loc[:, ['Place Holder']] = df.loc[:, 'Debris Crew WO#'].str[28:]
# this code "works" but due to different formatting i get data back like this:
8 REW #406
9 CREW #406
# not very effective and can not be relied on. I hate hard coding anything.
df.loc[:, ['Place Holder']] = df.loc[:, 'Debris Crew WO#'].str[26:]
df.loc[:, ['Place Holder']] = df[['Place Holder']].str.split().join(" ")
# tried this due to I have this filter for specials characters with a for loop in a different code and yet I get this error and I have no clue why. Works on my other codes with no problems
#AttributeError: 'DataFrame' object has no attribute 'str'
# even if I use .loc I get the same error:
df.loc[:, ['Place Holder']] = df.loc[:, 'Debris Crew WO#'].str[26:]
df.loc[:, ['Place Holder']] = df.loc[:, ['Place Holder']].str.split().join(" ")
#plus its still hard coding (gross)
接下来我将使用 RE。有人告诉我,它非常适合像过滤类型这样的“CTRL 查找”风格,并且是数据科学中的关键工具。因此,下周从 RE Documentation 开始进入那个兔子洞,并在这个问题上进行练习。将随着我的进步进行更新编辑
话虽如此。我已经学习 python 差不多整整两个月了。请原谅任何“新手”styles/coding 只是尝试和试验,这样我和我周围的人的生活就会变得更好。
任何帮助都会被拒绝。提前致谢
引用列表的方法#1:
您可以使用 str.extract()
,捕获组是与 join('|')
的连接列表。 |
符号用于 OR,允许您同时为每一行搜索多个值。捕获组需要在它们周围加上括号,这就是为什么我在前后添加括号作为字符串。
active_crews = ["CREW #101", "CREW #102", "CREW #203", "CREW #301", "CREW #404", "CREW #501",
"CREW #406", "CREW #304", "CREW #701", "CREW #702", "CREW #703", "CREW #704",
"CREW #705", "CREW #706", "CREW #707" "CREW #708", "CREW #801", "CREW #802",
"CREW #803", "CREW #805"]
df['Debris Crew WO#'] = df['Debris Crew WO#'].str.extract('(' + '|'.join(active_crews) + ')')
df
#You can also use a formatted string like this:
df['Debris Crew WO#'] = df['Debris Crew WO#'].str.extract(f'({"|".join(active_crews)})')
Out[1]:
Debris Crew WO#
0 NaN
1 CREW #406
2 NaN
方法#2 基于正则表达式模式提取并忽略列表。 space 之后的 ?
表示 space 是可选的。除了 space
,您还可以对多个 space 执行 \s
或 \s+
。 \d+
表示连续的数字。如果数字中有逗号,则正则表达式略有不同:
df['Debris Crew WO#'] = df['Debris Crew WO#'].str.extract('(CREW ?# ?\d+)')
Out[2]:
Debris Crew WO#
0 #405
1 #406
2 # 803
有很多这方面的文档,但我还是搞不懂。
这是我需要检查这些值之一是否在我的列值中的列表。如果是这样,用列表值替换整个单元格。
active_crews = ["CREW #101", "CREW #102", "CREW #203", "CREW #301", "CREW #404", "CREW #501", "CREW #406", "CREW #304", "CREW #701", "CREW #702", "CREW #703", "CREW #704", "CREW #705", "CREW #706",
"CREW #707" "CREW #708", "CREW #801", "CREW #802", "CREW #803", "CREW #805"]
我要替换的数据示例。是的,格式也有细微差别:
Debris Crew WO#
REFER TO IAP 12/16 TO 12/19 CREW #405
REFER TO IAP 06/02 TO 06/05 CREW #406
REFER TO IAP 03/24TO 03/27 CREW # 803
预计输出
Debris Crew WO#
CREW #405
CREW #406
CREW #803
我的问题是我不知道如何告诉 python 使用列表搜索列值以查找匹配项。以及该列表值是否在该列值中。用列表值替换当前列值
我尝试过的代码:
1)
df.loc[df['Debris Crew WO#'] == active_crews, 'Debris Crew WO#']
# doesn't work. This was done before research lol I get the following error, which makes sense
# ValueError: ('Lengths must match to compare', (2216,), (19,))
df.loc[:, ['Place Holder']] = df.loc[:, 'Debris Crew WO#'].str[28:]
# this code "works" but due to different formatting i get data back like this:
8 REW #406
9 CREW #406
# not very effective and can not be relied on. I hate hard coding anything.
df.loc[:, ['Place Holder']] = df.loc[:, 'Debris Crew WO#'].str[26:]
df.loc[:, ['Place Holder']] = df[['Place Holder']].str.split().join(" ")
# tried this due to I have this filter for specials characters with a for loop in a different code and yet I get this error and I have no clue why. Works on my other codes with no problems
#AttributeError: 'DataFrame' object has no attribute 'str'
# even if I use .loc I get the same error:
df.loc[:, ['Place Holder']] = df.loc[:, 'Debris Crew WO#'].str[26:]
df.loc[:, ['Place Holder']] = df.loc[:, ['Place Holder']].str.split().join(" ")
#plus its still hard coding (gross)
接下来我将使用 RE。有人告诉我,它非常适合像过滤类型这样的“CTRL 查找”风格,并且是数据科学中的关键工具。因此,下周从 RE Documentation 开始进入那个兔子洞,并在这个问题上进行练习。将随着我的进步进行更新编辑
话虽如此。我已经学习 python 差不多整整两个月了。请原谅任何“新手”styles/coding 只是尝试和试验,这样我和我周围的人的生活就会变得更好。 任何帮助都会被拒绝。提前致谢
引用列表的方法#1:
您可以使用 str.extract()
,捕获组是与 join('|')
的连接列表。 |
符号用于 OR,允许您同时为每一行搜索多个值。捕获组需要在它们周围加上括号,这就是为什么我在前后添加括号作为字符串。
active_crews = ["CREW #101", "CREW #102", "CREW #203", "CREW #301", "CREW #404", "CREW #501",
"CREW #406", "CREW #304", "CREW #701", "CREW #702", "CREW #703", "CREW #704",
"CREW #705", "CREW #706", "CREW #707" "CREW #708", "CREW #801", "CREW #802",
"CREW #803", "CREW #805"]
df['Debris Crew WO#'] = df['Debris Crew WO#'].str.extract('(' + '|'.join(active_crews) + ')')
df
#You can also use a formatted string like this:
df['Debris Crew WO#'] = df['Debris Crew WO#'].str.extract(f'({"|".join(active_crews)})')
Out[1]:
Debris Crew WO#
0 NaN
1 CREW #406
2 NaN
方法#2 基于正则表达式模式提取并忽略列表。 space 之后的 ?
表示 space 是可选的。除了 space
,您还可以对多个 space 执行 \s
或 \s+
。 \d+
表示连续的数字。如果数字中有逗号,则正则表达式略有不同:
df['Debris Crew WO#'] = df['Debris Crew WO#'].str.extract('(CREW ?# ?\d+)')
Out[2]:
Debris Crew WO#
0 #405
1 #406
2 # 803