Python 请求和 LanguageTool 编码错误
Python requests and LanguageTool encoding error
我正在尝试 post 将文本数据发送到 langaugetool 服务器。我的文字包括商标符号和版权符号等
在我第一次尝试 post 文本时,像这样:
response = requests.post(
LANGUAGETOOL_URL,
data=f"language=en-US&text={text}"
)
我收到请求错误:
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2122' in position 317: Body ('™') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.
根据 this post 我更新了我的请求如下:
response = requests.post(
LANGUAGETOOL_URL,
data=f"language=en-US&text={text}".encode('utf-8')
)
现在请求没有错误,但 langaugetool 服务器抱怨它无法解码查询:
2022-01-23 13:09:47.366 +0000 INFO [lt-server-thread-6] [logError] rID:- org.languagetool.server.LanguageToolHttpHandler An error has occurred: 'Could not decode query. Query length: 3085 Request method: POST', sending HTTP code 400. Access from 172.17.0.1, HTTP user agent: python-requests/2.27.1, User agent param: null, Referrer: null, language: null, h: 1, r: 29, time: 0m: ALL, l: DEFAULT, Stacktrace follows:org.languagetool.server.BadRequestException: Could not decode query. Query length: 3085 Request method: POST
at org.languagetool.server.LanguageToolHttpHandler.getParameterMap(LanguageToolHttpHandler.java:470)
at org.languagetool.server.LanguageToolHttpHandler.parseQuery(LanguageToolHttpHandler.java:452)
at org.languagetool.server.LanguageToolHttpHandler.getRequestQuery(LanguageToolHttpHandler.java:417)
at org.languagetool.server.LanguageToolHttpHandler.handle(LanguageToolHttpHandler.java:152)
at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
at jdk.httpserver/sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:82)
at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:80)
at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:725)
at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:694)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
我检查了所有语言工具文档,但找不到任何关于编码的信息。在这个阶段我不知道问题是requests、languagetool 还是我做错了什么。是否可以将 post 字符(如商标符号)添加到语言工具中?如果可以,如何实现?
将参数作为字典传递。无需手动编码任何内容:
import requests
import json
response = requests.post(
'https://api.languagetoolplus.com/v2/check',
data={'text':'check for mispelling™ © 2022', 'language':'en-US'}
)
print(json.dumps(response.json(), ensure_ascii=False, indent=2))
输出:
{
"software": {
"name": "LanguageTool",
"version": "5.7-SNAPSHOT",
"buildDate": "2022-01-18 13:50:09 +0000",
"apiVersion": 1,
"premium": true,
"premiumHint": "You might be missing errors only the Premium version can find. Contact us at support<at>languagetoolplus.com.",
"status": ""
},
"warnings": {
"incompleteResults": false
},
"language": {
"name": "English (US)",
"code": "en-US",
"detectedLanguage": {
"name": "English (US)",
"code": "en-US",
"confidence": 0.924
}
},
"matches": [
{
"message": "This sentence does not start with an uppercase letter.",
"shortMessage": "",
"replacements": [
{
"value": "Check"
}
],
"offset": 0,
"length": 5,
"context": {
"text": "check for mispelling™ © 2022",
"offset": 0,
"length": 5
},
"sentence": "check for mispelling™ © 2022",
"type": {
"typeName": "Other"
},
"rule": {
"id": "UPPERCASE_SENTENCE_START",
"description": "Checks that a sentence starts with an uppercase letter",
"issueType": "typographical",
"category": {
"id": "CASING",
"name": "Capitalization"
},
"isPremium": false
},
"ignoreForIncompleteSentence": true,
"contextForSureMatch": -1
},
{
"message": "Possible spelling mistake found.",
"shortMessage": "Spelling mistake",
"replacements": [
{
"value": "misspelling"
},
{
"value": "dispelling"
},
{
"value": "mi spelling"
}
],
"offset": 10,
"length": 10,
"context": {
"text": "check for mispelling™ © 2022",
"offset": 10,
"length": 10
},
"sentence": "check for mispelling™ © 2022",
"type": {
"typeName": "Other"
},
"rule": {
"id": "MORFOLOGIK_RULE_EN_US",
"description": "Possible spelling mistake",
"issueType": "misspelling",
"category": {
"id": "TYPOS",
"name": "Possible Typo"
},
"isPremium": false
},
"ignoreForIncompleteSentence": false,
"contextForSureMatch": 0
}
]
}
我正在尝试 post 将文本数据发送到 langaugetool 服务器。我的文字包括商标符号和版权符号等
在我第一次尝试 post 文本时,像这样:
response = requests.post(
LANGUAGETOOL_URL,
data=f"language=en-US&text={text}"
)
我收到请求错误:
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2122' in position 317: Body ('™') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.
根据 this post 我更新了我的请求如下:
response = requests.post(
LANGUAGETOOL_URL,
data=f"language=en-US&text={text}".encode('utf-8')
)
现在请求没有错误,但 langaugetool 服务器抱怨它无法解码查询:
2022-01-23 13:09:47.366 +0000 INFO [lt-server-thread-6] [logError] rID:- org.languagetool.server.LanguageToolHttpHandler An error has occurred: 'Could not decode query. Query length: 3085 Request method: POST', sending HTTP code 400. Access from 172.17.0.1, HTTP user agent: python-requests/2.27.1, User agent param: null, Referrer: null, language: null, h: 1, r: 29, time: 0m: ALL, l: DEFAULT, Stacktrace follows:org.languagetool.server.BadRequestException: Could not decode query. Query length: 3085 Request method: POST
at org.languagetool.server.LanguageToolHttpHandler.getParameterMap(LanguageToolHttpHandler.java:470)
at org.languagetool.server.LanguageToolHttpHandler.parseQuery(LanguageToolHttpHandler.java:452)
at org.languagetool.server.LanguageToolHttpHandler.getRequestQuery(LanguageToolHttpHandler.java:417)
at org.languagetool.server.LanguageToolHttpHandler.handle(LanguageToolHttpHandler.java:152)
at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
at jdk.httpserver/sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:82)
at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:80)
at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:725)
at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:694)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
我检查了所有语言工具文档,但找不到任何关于编码的信息。在这个阶段我不知道问题是requests、languagetool 还是我做错了什么。是否可以将 post 字符(如商标符号)添加到语言工具中?如果可以,如何实现?
将参数作为字典传递。无需手动编码任何内容:
import requests
import json
response = requests.post(
'https://api.languagetoolplus.com/v2/check',
data={'text':'check for mispelling™ © 2022', 'language':'en-US'}
)
print(json.dumps(response.json(), ensure_ascii=False, indent=2))
输出:
{
"software": {
"name": "LanguageTool",
"version": "5.7-SNAPSHOT",
"buildDate": "2022-01-18 13:50:09 +0000",
"apiVersion": 1,
"premium": true,
"premiumHint": "You might be missing errors only the Premium version can find. Contact us at support<at>languagetoolplus.com.",
"status": ""
},
"warnings": {
"incompleteResults": false
},
"language": {
"name": "English (US)",
"code": "en-US",
"detectedLanguage": {
"name": "English (US)",
"code": "en-US",
"confidence": 0.924
}
},
"matches": [
{
"message": "This sentence does not start with an uppercase letter.",
"shortMessage": "",
"replacements": [
{
"value": "Check"
}
],
"offset": 0,
"length": 5,
"context": {
"text": "check for mispelling™ © 2022",
"offset": 0,
"length": 5
},
"sentence": "check for mispelling™ © 2022",
"type": {
"typeName": "Other"
},
"rule": {
"id": "UPPERCASE_SENTENCE_START",
"description": "Checks that a sentence starts with an uppercase letter",
"issueType": "typographical",
"category": {
"id": "CASING",
"name": "Capitalization"
},
"isPremium": false
},
"ignoreForIncompleteSentence": true,
"contextForSureMatch": -1
},
{
"message": "Possible spelling mistake found.",
"shortMessage": "Spelling mistake",
"replacements": [
{
"value": "misspelling"
},
{
"value": "dispelling"
},
{
"value": "mi spelling"
}
],
"offset": 10,
"length": 10,
"context": {
"text": "check for mispelling™ © 2022",
"offset": 10,
"length": 10
},
"sentence": "check for mispelling™ © 2022",
"type": {
"typeName": "Other"
},
"rule": {
"id": "MORFOLOGIK_RULE_EN_US",
"description": "Possible spelling mistake",
"issueType": "misspelling",
"category": {
"id": "TYPOS",
"name": "Possible Typo"
},
"isPremium": false
},
"ignoreForIncompleteSentence": false,
"contextForSureMatch": 0
}
]
}