如何使用 googletrans 翻译 Python 中的 Pandas 系列?
How to translate a Pandas Series in Python using googletrans?
我希望将 pandas 列的文本从印度尼西亚语翻译成英语,并将此翻译文本作为名为 'English' 的新列添加到我的数据框中。这是我的代码:
from googletrans import Translator
translator = Translator()
df['English'] = translator.translate(df['Review to Translate'], src='id', dest='en')
但是,我得到这个错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-81-0fd41a244785> in <module>()
2
3 translator = Translator()
----> 4 y['Review in English'] = translator.translate(y['Review to Translate'], src='id', dest='en')
~/anaconda3/lib/python3.6/site-packages/googletrans/client.py in translate(self, text, dest, src)
170
171 origin = text
--> 172 data = self._translate(text, dest, src)
173
174 # this code will be updated when the format is changed.
~/anaconda3/lib/python3.6/site-packages/googletrans/client.py in _translate(self, text, dest, src)
73 text = text.decode('utf-8')
74
---> 75 token = self.token_acquirer.do(text)
76 params = utils.build_params(query=text, src=src, dest=dest,
77 token=token)
~/anaconda3/lib/python3.6/site-packages/googletrans/gtoken.py in do(self, text)
179 def do(self, text):
180 self._update()
--> 181 tk = self.acquire(text)
182 return tk
~/anaconda3/lib/python3.6/site-packages/googletrans/gtoken.py in acquire(self, text)
145 size = len(text)
146 for i, char in enumerate(text):
--> 147 l = ord(char)
148 # just append if l is less than 128(ascii: DEL)
149 if l < 128:
TypeError: ord() expected a character, but string of length 516 found
有谁知道我该如何解决这个问题?我有一个相当大的 pandas df.
我猜你会收到此错误,因为你将 pandas Series
对象传递给翻译函数 (docs) 而不是 str
(字符串)对象。
尝试使用 apply:
from googletrans import Translator
translator = Translator()
df['English'] = df['Review to Translate'].apply(translator.translate, src='id', dest='en')
如果我 运行 这个例子在 repl.it:
from googletrans import Translator
import pandas as pd
translator = Translator()
df = pd.DataFrame({'Spanish':['piso','cama']})
df['English'] = df['Spanish'].apply(translator.translate, src='es', dest='en').apply(getattr, args=('text',))
它按预期工作。
我希望将 pandas 列的文本从印度尼西亚语翻译成英语,并将此翻译文本作为名为 'English' 的新列添加到我的数据框中。这是我的代码:
from googletrans import Translator
translator = Translator()
df['English'] = translator.translate(df['Review to Translate'], src='id', dest='en')
但是,我得到这个错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-81-0fd41a244785> in <module>()
2
3 translator = Translator()
----> 4 y['Review in English'] = translator.translate(y['Review to Translate'], src='id', dest='en')
~/anaconda3/lib/python3.6/site-packages/googletrans/client.py in translate(self, text, dest, src)
170
171 origin = text
--> 172 data = self._translate(text, dest, src)
173
174 # this code will be updated when the format is changed.
~/anaconda3/lib/python3.6/site-packages/googletrans/client.py in _translate(self, text, dest, src)
73 text = text.decode('utf-8')
74
---> 75 token = self.token_acquirer.do(text)
76 params = utils.build_params(query=text, src=src, dest=dest,
77 token=token)
~/anaconda3/lib/python3.6/site-packages/googletrans/gtoken.py in do(self, text)
179 def do(self, text):
180 self._update()
--> 181 tk = self.acquire(text)
182 return tk
~/anaconda3/lib/python3.6/site-packages/googletrans/gtoken.py in acquire(self, text)
145 size = len(text)
146 for i, char in enumerate(text):
--> 147 l = ord(char)
148 # just append if l is less than 128(ascii: DEL)
149 if l < 128:
TypeError: ord() expected a character, but string of length 516 found
有谁知道我该如何解决这个问题?我有一个相当大的 pandas df.
我猜你会收到此错误,因为你将 pandas Series
对象传递给翻译函数 (docs) 而不是 str
(字符串)对象。
尝试使用 apply:
from googletrans import Translator
translator = Translator()
df['English'] = df['Review to Translate'].apply(translator.translate, src='id', dest='en')
如果我 运行 这个例子在 repl.it:
from googletrans import Translator
import pandas as pd
translator = Translator()
df = pd.DataFrame({'Spanish':['piso','cama']})
df['English'] = df['Spanish'].apply(translator.translate, src='es', dest='en').apply(getattr, args=('text',))
它按预期工作。