如何迭代数据框单元格的字符串？

Question

我有一个数据框，每个单元格中都有文本。我想遍历数据框及其单元格的单个字符，并用 0 表示有空格或 1 表示有字符来填充列表。我尝试了 itertuples、iterrows 和 iteritems，但我无法访问字符串的每个字符。

crispr = pd.DataFrame({'Name': ['Bob', 'Jane', 'Alice'], 
                       'Issue': ['Handling data', 'Could not read sample', 'No clue'],
                       'Comment': ['Need to revise data', 'sample preparation', 'could not find out where problem occurs']})

我试过的是：

dflist = []
countchar= 0
for i,j in crispr.iteritems():
    for x in range(len(j)):
        test = j[countchar].isspace()
        countchar+=1
        if test == True:
            dflist.append(0)
        else:
            dflist.append(1)

我试图弄清楚它是否适用于 itertuples 或 iterrows():

for i in crispr.itertuples():
    for j in i:
        for b in j:
            print(b)

出现以下错误：

 TypeError: 'int' object is not iterable

预期输出是一个包含 1 个字符和 0 个空格的列表：

dflist = [[1,1,1], [1,1,1,1], [1,1,1,1,1]],[[1,1,1,1,1,1,1,0,1,1,1,1], ...]]

Answer 1

您发布的代码（在您上次编辑之前）有错误，其中有很多未知的东西，导致与您发布的不同的错误。我将您的代码固定为：

dflist = []                    # added this
for i,j in crispr.iteritems():
    for x in range(len(j)):
        test = j[x].isspace()  # changed countchar to x
        # countchar+=1         # removed this
        if test == True:
            dflist.append(0)
        else:
            dflist.append(1)

for i in crispr.itertuples():
    for j in i:
        for b in j:  # this procudes your error
            print(b)

如果您检查 j 的第一项，您会看到它的值为 0 - 因此会出现错误。您不能迭代 0.

解决方案：

import pandas as pd

crispr = pd.DataFrame({
    'Name': ['Bob', 'Jane', 'Alice'],
    'Issue': ['Handling data', 'Could not read sample', 'No clue'],
    'Comment': ['Need to revise data', 'sample preparation', 
                'could not find out where problem occurs']})

print(crispr)
outer_list = []
for i,j in crispr.iteritems():
    dflist = []
    for word in j:
        wordlist = [] 
        for char in word:
            if char.isspace():
                wordlist.append(0)
            else:
                wordlist.append(1)
        dflist.append(wordlist)
    outer_list.append(dflist)

print(outer_list)

输出（为清楚起见添加注释）：

                                   Comment                  Issue   Name
0                      Need to revise data          Handling data    Bob
1                       sample preparation  Could not read sample   Jane
2  could not find out where problem occurs                No clue  Alice

# Comment
[[[1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1], 
  [1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 
  [1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 
   1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1]], 
 # Issue
 [[1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1], 
  [1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1], 
  [1, 1, 0, 1, 1, 1, 1]],
 # Name 
 [[1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1, 1]]]

应该做你想做的。

如何迭代数据框单元格的字符串？

how to iterate over strings of a dataframe cell?

python

iterable

character

dataframe

pandas