检查 Pandas DataFrame 中某列的字符串值是否以另一列的值开头

Question

我正在尝试在 Pandas DataFrame 中连接两个字符串列 col1 和 col2。但如果 col2 的值已经以 col1 的值开头，我不想连接它们。在这种情况下，我想使用 col2 而不连接。这将是预期的行为：

col1	col2	result
ABC	ABC	ABC
ABC	XYC	ABCXYZ
ABC	abc123	abc123

我试过这段代码：

import pandas as pd

df = pd.DataFrame({
    'col1': ['ABC', 'ABC', 'AbC'],
    'col2': ['ABC', 'XYZ', 'abc123'],
})


df['result'] = df['col2'].where(df['col2'].str.lower().str.startswith(df['col1'].str.lower()), df['col1'] + df['col2'])

df

但这会导致：

col1	col2	result
ABC	ABC	ABCABC
ABC	XYC	ABCXYZ
ABC	abc123	AbCabc123

出于测试目的，我使用字符串文字作为 startswith 的参数并收到预期结果：

df['result'] = df['col2'].where(df['col2'].str.lower().str.startswith('abc'), df['col1'] + df['col2'])

我发现 startswith 函数的结果总是 returns NaN:

df['result'] = df['col2'].str.lower().str.startswith(df['col1'].str.lower())

col1	col2	result
ABC	ABC	NaN
ABC	XYC	NaN
ABC	abc123	NaN

如果我用字符串文字替换 startswith 参数，我会按预期收到布尔值：

df['result'] = df['col2'].str.lower().str.startswith('abc')

col1	col2	result
ABC	ABC	True
ABC	XYC	False
ABC	abc123	True

我了解到在startswith函数中使用系列作为参数似乎是一个问题。但是我没有让它工作。

我是 Python 和 Pandas 的新手，在创建第一个 post 之前，我大量使用搜索引擎和 Whosebug 的搜索功能。我必须更改代码中的哪些内容才能完成所需的行为？非常感谢任何帮助。谢谢！

Answer 1

受此答案启发：

编写您自己的 startswith 函数并使用 numpy.vectorize 对其进行向量化。这样就可以逐行比较col1和col2中的字符串

from numpy import vectorize

def startswith(str1, str2):
  """Check if str1 starts with str2 (case insensitive)"""
  return str1.lower().startswith(str2.lower())

startswith = vectorize(startswith)
df['result'] = df['col2'].where(startswith(df['col2'], df['col1']), df['col1'] + df['col2'])

检查 Pandas DataFrame 中某列的字符串值是否以另一列的值开头

Check if a string value of a column in a Pandas DataFrame starts with the value of another column

python

pandas

startswith