从列中删除 html 文本时,类型 'float' 的对象没有 len() 发生错误
While removing html text from column, object of type 'float' has no len() error is occuring
我正在使用亚马逊数据集进行情绪分析。数据集内容为
https://i.stack.imgur.com/qcKZp.png
数据集位于:
https://www.kaggle.com/PromptCloudHQ/amazon-reviews-unlocked-mobile-phones
我正在尝试从 Review
列中删除 html。
这就是我正在做的。注意:数据集分配给df
.
df_removedNoise = []
def removingHTML(text):
soup = BeautifulSoup(text, 'lxml').get_text()
return soup
def removingNoise(text):
html_removed = removingHTML(text)
return html_removed
for i in df["Reviews"]:
text = removingNoise(i)
df_removedNoise.append(text)
即使 Reviews 列将对象作为数据类型,我仍然收到类似这样的错误。
TypeError Traceback (most recent call last)
<ipython-input-83-3591f5d7a54f> in <module>
9
10 for i in df["Reviews"]:
---> 11 df_removedNoise.append(removingNoise(i))
<ipython-input-83-3591f5d7a54f> in removingNoise(text)
5
6 def removingNoise(text):
----> 7 html_removed = removingHTML(text)
8 return html_removed
9
<ipython-input-83-3591f5d7a54f> in removingHTML(text)
1 df_removedNoise = []
2 def removingHTML(text):
----> 3 soup = BeautifulSoup(text, 'lxml').get_text()
4 return soup
5
~/anaconda3/lib/python3.7/site-packages/bs4/__init__.py in __init__(self, markup, features, builder, parse_only, from_encoding, exclude_encodings, **kwargs)
244 if hasattr(markup, 'read'): # It's a file-type object.
245 markup = markup.read()
--> 246 elif len(markup) <= 256 and (
247 (isinstance(markup, bytes) and not b'<' in markup)
248 or (isinstance(markup, str) and not '<' in markup)
TypeError: object of type 'float' has no len()
任何帮助将不胜感激!
使用 df[df['Reviews'].isnull()]
检查 NaN
,如果发现任何问题,请先尝试 dropna
我正在使用亚马逊数据集进行情绪分析。数据集内容为
https://i.stack.imgur.com/qcKZp.png
数据集位于: https://www.kaggle.com/PromptCloudHQ/amazon-reviews-unlocked-mobile-phones
我正在尝试从 Review
列中删除 html。
这就是我正在做的。注意:数据集分配给df
.
df_removedNoise = []
def removingHTML(text):
soup = BeautifulSoup(text, 'lxml').get_text()
return soup
def removingNoise(text):
html_removed = removingHTML(text)
return html_removed
for i in df["Reviews"]:
text = removingNoise(i)
df_removedNoise.append(text)
即使 Reviews 列将对象作为数据类型,我仍然收到类似这样的错误。
TypeError Traceback (most recent call last)
<ipython-input-83-3591f5d7a54f> in <module>
9
10 for i in df["Reviews"]:
---> 11 df_removedNoise.append(removingNoise(i))
<ipython-input-83-3591f5d7a54f> in removingNoise(text)
5
6 def removingNoise(text):
----> 7 html_removed = removingHTML(text)
8 return html_removed
9
<ipython-input-83-3591f5d7a54f> in removingHTML(text)
1 df_removedNoise = []
2 def removingHTML(text):
----> 3 soup = BeautifulSoup(text, 'lxml').get_text()
4 return soup
5
~/anaconda3/lib/python3.7/site-packages/bs4/__init__.py in __init__(self, markup, features, builder, parse_only, from_encoding, exclude_encodings, **kwargs)
244 if hasattr(markup, 'read'): # It's a file-type object.
245 markup = markup.read()
--> 246 elif len(markup) <= 256 and (
247 (isinstance(markup, bytes) and not b'<' in markup)
248 or (isinstance(markup, str) and not '<' in markup)
TypeError: object of type 'float' has no len()
任何帮助将不胜感激!
使用 df[df['Reviews'].isnull()]
检查 NaN
,如果发现任何问题,请先尝试 dropna