如何迭代数据框单元格的字符串?
how to iterate over strings of a dataframe cell?
我有一个数据框,每个单元格中都有文本。我想遍历数据框及其单元格的单个字符,并用 0 表示有空格或 1 表示有字符来填充列表。我尝试了 itertuples、iterrows 和 iteritems,但我无法访问字符串的每个字符。
crispr = pd.DataFrame({'Name': ['Bob', 'Jane', 'Alice'],
'Issue': ['Handling data', 'Could not read sample', 'No clue'],
'Comment': ['Need to revise data', 'sample preparation', 'could not find out where problem occurs']})
我试过的是:
dflist = []
countchar= 0
for i,j in crispr.iteritems():
for x in range(len(j)):
test = j[countchar].isspace()
countchar+=1
if test == True:
dflist.append(0)
else:
dflist.append(1)
我试图弄清楚它是否适用于 itertuples 或 iterrows():
for i in crispr.itertuples():
for j in i:
for b in j:
print(b)
出现以下错误:
TypeError: 'int' object is not iterable
预期输出是一个包含 1 个字符和 0 个空格的列表:
dflist = [[1,1,1], [1,1,1,1], [1,1,1,1,1]],[[1,1,1,1,1,1,1,0,1,1,1,1], ...]]
您发布的代码(在您上次编辑之前)有错误,其中有很多未知的东西,导致与您发布的不同的错误。我将您的代码固定为:
dflist = [] # added this
for i,j in crispr.iteritems():
for x in range(len(j)):
test = j[x].isspace() # changed countchar to x
# countchar+=1 # removed this
if test == True:
dflist.append(0)
else:
dflist.append(1)
for i in crispr.itertuples():
for j in i:
for b in j: # this procudes your error
print(b)
如果您检查 j
的第一项,您会看到它的值为 0 - 因此会出现错误。您不能迭代 0
.
解决方案:
import pandas as pd
crispr = pd.DataFrame({
'Name': ['Bob', 'Jane', 'Alice'],
'Issue': ['Handling data', 'Could not read sample', 'No clue'],
'Comment': ['Need to revise data', 'sample preparation',
'could not find out where problem occurs']})
print(crispr)
outer_list = []
for i,j in crispr.iteritems():
dflist = []
for word in j:
wordlist = []
for char in word:
if char.isspace():
wordlist.append(0)
else:
wordlist.append(1)
dflist.append(wordlist)
outer_list.append(dflist)
print(outer_list)
输出(为清楚起见添加注释):
Comment Issue Name
0 Need to revise data Handling data Bob
1 sample preparation Could not read sample Jane
2 could not find out where problem occurs No clue Alice
# Comment
[[[1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0,
1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1]],
# Issue
[[1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1],
[1, 1, 0, 1, 1, 1, 1]],
# Name
[[1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1, 1]]]
应该做你想做的。
我有一个数据框,每个单元格中都有文本。我想遍历数据框及其单元格的单个字符,并用 0 表示有空格或 1 表示有字符来填充列表。我尝试了 itertuples、iterrows 和 iteritems,但我无法访问字符串的每个字符。
crispr = pd.DataFrame({'Name': ['Bob', 'Jane', 'Alice'],
'Issue': ['Handling data', 'Could not read sample', 'No clue'],
'Comment': ['Need to revise data', 'sample preparation', 'could not find out where problem occurs']})
我试过的是:
dflist = []
countchar= 0
for i,j in crispr.iteritems():
for x in range(len(j)):
test = j[countchar].isspace()
countchar+=1
if test == True:
dflist.append(0)
else:
dflist.append(1)
我试图弄清楚它是否适用于 itertuples 或 iterrows():
for i in crispr.itertuples():
for j in i:
for b in j:
print(b)
出现以下错误:
TypeError: 'int' object is not iterable
预期输出是一个包含 1 个字符和 0 个空格的列表:
dflist = [[1,1,1], [1,1,1,1], [1,1,1,1,1]],[[1,1,1,1,1,1,1,0,1,1,1,1], ...]]
您发布的代码(在您上次编辑之前)有错误,其中有很多未知的东西,导致与您发布的不同的错误。我将您的代码固定为:
dflist = [] # added this for i,j in crispr.iteritems(): for x in range(len(j)): test = j[x].isspace() # changed countchar to x # countchar+=1 # removed this if test == True: dflist.append(0) else: dflist.append(1) for i in crispr.itertuples(): for j in i: for b in j: # this procudes your error print(b)
如果您检查 j
的第一项,您会看到它的值为 0 - 因此会出现错误。您不能迭代 0
.
解决方案:
import pandas as pd
crispr = pd.DataFrame({
'Name': ['Bob', 'Jane', 'Alice'],
'Issue': ['Handling data', 'Could not read sample', 'No clue'],
'Comment': ['Need to revise data', 'sample preparation',
'could not find out where problem occurs']})
print(crispr)
outer_list = []
for i,j in crispr.iteritems():
dflist = []
for word in j:
wordlist = []
for char in word:
if char.isspace():
wordlist.append(0)
else:
wordlist.append(1)
dflist.append(wordlist)
outer_list.append(dflist)
print(outer_list)
输出(为清楚起见添加注释):
Comment Issue Name
0 Need to revise data Handling data Bob
1 sample preparation Could not read sample Jane
2 could not find out where problem occurs No clue Alice
# Comment
[[[1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0,
1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1]],
# Issue
[[1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1],
[1, 1, 0, 1, 1, 1, 1]],
# Name
[[1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1, 1]]]
应该做你想做的。