为什么当字符足够长时它显示字符串索引超出范围？

Question

需要一些帮助来确定这里的问题。我正在尝试使用 Jupyter Notebook 中的 Pandas DataFrame 清理数据集。我的 DataFrame 数据如下所示： Click here for the Dataframe

我正在尝试使用 for 循环来过滤和删除不包含任何 'B'、'R'、'T' 或 'L' 的行df2["Data"] 列中每个字符串的索引 11。

for unit in df2["Data"]:
    if unit[11] != "B":
        if unit[11] != "R":
            if unit[11] != "T":
                if unit[11] != "L":
                    df2.drop(df2.index[df2['Data'] == unit], inplace = True)

但是它提示这个错误说“字符串索引超出范围”，而如果你看字符串，例如“计算器BW_2_1：Number\ttt”它显然大于11字符，但不知道为什么它仍然提示这样的错误？

IndexError                                Traceback (most recent call last)
<ipython-input-15-e1d0470b9bfb> in <module>
      1 for unit in df2["Data"]:
----> 2     if unit[11] != "B":
      3         if unit[11] != "R":
      4             if unit[11] != "T":
      5                 if unit[11] != "L":

IndexError: string index out of range

Answer 1

使用 str 访问器和 isin:

df2 = df2[df2['Data'].str[11].isin(list('BRTL'))]
print(df2)

# Output
                   Data
1         Calculator BW

设置：

df2 = pd.DataFrame({'Data': ['Calculator DienInTol', 'Calculator BW']})
print(df2)

# Output
                   Data
0  Calculator DienInTol
1         Calculator BW

一步一步：

# Extract the eleventh character
>>> df2['Data'].str[11]
0    D
1    B
Name: Data, dtype: object

# Check if the character is in the list ['B', 'R', 'T', 'L']
>>> df2['Data'].str[11].isin(list('BRTL'))
0    False
1     True
Name: Data, dtype: bool

# Keep rows where the condition is true
>>> df2[df2['Data'].str[11].isin(list('BRTL'))]
            Data
1  Calculator BW

注意：list('BRTL') 创建列表 ['B', 'R', 'T', 'L']。

为什么当字符足够长时它显示字符串索引超出范围？

Why does it shows string index out of range while the characters are long enough?

python

string

indexing

slice

pandas