Python - Scrubadub.clean 不工作 - 无法正确清除文本 PII + HTTP 错误 503
Python - Scrubadub.clean not working - Cannot properly scrub text PII + HTTP Error 503
抱歉,这可能是一个基本问题,但因为我正在学习 scrubadub 并试图让它在 Jupyter notebook 上运行。它一直显示 - HTTP 错误 503:服务不可用
这是我输入的,和scrubadub文档完全一样
text = u"John is a cat"
scrubadub.clean(text, replace_with='placeholder')
u"{{NAME}} is a cat"
这是我收到的错误消息:
HTTPError Traceback (most recent call last)
<ipython-input-92-5b0754baae94> in <module>()
1 text = u"John is a cat"
----> 2 scrubadub.clean(text, replace_with='placeholder')
3 u"{{NAME}} is a cat"
/anaconda3/lib/python3.7/site-packages/scrubadub/__init__.py in clean(text, cls, **kwargs)
14 cls = cls or Scrubber
15 scrubber = cls()
---> 16 return scrubber.clean(text, **kwargs)
/anaconda3/lib/python3.7/site-packages/scrubadub/scrubbers.py in clean(self, text, **kwargs)
55 clean_chunks = []
56 filth = Filth()
---> 57 for next_filth in self.iter_filth(text):
58 clean_chunks.append(text[filth.end:next_filth.beg])
59 clean_chunks.append(next_filth.replace_with(**kwargs))
我也在这里尝试了下面的代码,但我也收到了错误消息,我猜我是否遗漏了代码中的任何参数...
import scrubadub
class MyFilth(scrubadub.filth.base.Filth):
type = 'mine'
class MyDetector(scrubadub.detectors.base.Detector):
filth_cls = MyFilth
def iter_filth(self, text):
# do something here
pass
scrubber = scrubadub.Scrubber()
scrubber.add_detector(MyDetector)
text = u"My stuff can be found there"
scrubadub.clean(text)
u"{{MINE}} can be found there."
StopIteration Traceback (most recent call last)
/anaconda3/lib/python3.7/site-packages/scrubadub/detectors/base.py in iter_filth(self, text)
21 if self.filth_cls.regex is None:
---> 22 raise StopIteration
23 for match in self.filth_cls.regex.finditer(text):
StopIteration:
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
<ipython-input-94-2cc23d003da7> in <module>()
11
12 text = u"My stuff can be found there"
---> 13 scrubadub.clean(text)
14 u"{{MINE}} can be found there."
/anaconda3/lib/python3.7/site-packages/scrubadub/__init__.py in clean(text, cls, **kwargs)
14 cls = cls or Scrubber
15 scrubber = cls()
---> 16 return scrubber.clean(text, **kwargs)
github 上有一个未解决的问题,因为 scrubadub 似乎不能很好地与您当前使用的 python 3.7 配合使用。
我也可以在没有笔记本的情况下用 3.7 重现它。所以肯定是笔记本的问题。
作为临时解决方案,将您的 env 更改为 3.6(或者最不推荐的最坏情况是 2.7)是可行的。
https://github.com/datascopeanalytics/scrubadub/issues/40Stop Iteration issue
抱歉,这可能是一个基本问题,但因为我正在学习 scrubadub 并试图让它在 Jupyter notebook 上运行。它一直显示 - HTTP 错误 503:服务不可用 这是我输入的,和scrubadub文档完全一样
text = u"John is a cat"
scrubadub.clean(text, replace_with='placeholder')
u"{{NAME}} is a cat"
这是我收到的错误消息:
HTTPError Traceback (most recent call last)
<ipython-input-92-5b0754baae94> in <module>()
1 text = u"John is a cat"
----> 2 scrubadub.clean(text, replace_with='placeholder')
3 u"{{NAME}} is a cat"
/anaconda3/lib/python3.7/site-packages/scrubadub/__init__.py in clean(text, cls, **kwargs)
14 cls = cls or Scrubber
15 scrubber = cls()
---> 16 return scrubber.clean(text, **kwargs)
/anaconda3/lib/python3.7/site-packages/scrubadub/scrubbers.py in clean(self, text, **kwargs)
55 clean_chunks = []
56 filth = Filth()
---> 57 for next_filth in self.iter_filth(text):
58 clean_chunks.append(text[filth.end:next_filth.beg])
59 clean_chunks.append(next_filth.replace_with(**kwargs))
我也在这里尝试了下面的代码,但我也收到了错误消息,我猜我是否遗漏了代码中的任何参数...
import scrubadub
class MyFilth(scrubadub.filth.base.Filth):
type = 'mine'
class MyDetector(scrubadub.detectors.base.Detector):
filth_cls = MyFilth
def iter_filth(self, text):
# do something here
pass
scrubber = scrubadub.Scrubber()
scrubber.add_detector(MyDetector)
text = u"My stuff can be found there"
scrubadub.clean(text)
u"{{MINE}} can be found there."
StopIteration Traceback (most recent call last)
/anaconda3/lib/python3.7/site-packages/scrubadub/detectors/base.py in iter_filth(self, text)
21 if self.filth_cls.regex is None:
---> 22 raise StopIteration
23 for match in self.filth_cls.regex.finditer(text):
StopIteration:
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
<ipython-input-94-2cc23d003da7> in <module>()
11
12 text = u"My stuff can be found there"
---> 13 scrubadub.clean(text)
14 u"{{MINE}} can be found there."
/anaconda3/lib/python3.7/site-packages/scrubadub/__init__.py in clean(text, cls, **kwargs)
14 cls = cls or Scrubber
15 scrubber = cls()
---> 16 return scrubber.clean(text, **kwargs)
github 上有一个未解决的问题,因为 scrubadub 似乎不能很好地与您当前使用的 python 3.7 配合使用。
我也可以在没有笔记本的情况下用 3.7 重现它。所以肯定是笔记本的问题。
作为临时解决方案,将您的 env 更改为 3.6(或者最不推荐的最坏情况是 2.7)是可行的。
https://github.com/datascopeanalytics/scrubadub/issues/40Stop Iteration issue