Lucee URI 编码问题(西里尔文)
Lucee URI encoding issue (cyrillic)
我刚刚将我们的一个核心应用程序从 Windows+IIS+Coldfusion 移动到 Ubuntu+Apache+Lucee。第一个大问题是外来字母表的 URI 编码。
例如,尝试达到此 url http://www.example.com/ru/Солнцезащитные-очки/saint-laurent/
会在 Apache access.log 中产生此记录:
http://www.example.com/ru/%D0%A1%D0%BE%D0%BB%D0%BD%D1%86%D0%B5%D0%B7%D0%B0%D1%89%D0%B8%D1%82%D0%BD%D1%8B%D0%B5-%D0%BE%D1%87%D0%BA%D0%B8/saint-laurent/
嗯,我认为 url 编码正确。然后我在 .htaccess 文件中使用重写规则在 url 查询字符串参数(假设 "foo")中获取 url(西里尔字母)的那部分。
使用cflog转储,我在应用日志中看到:
/index.cfm?foo=оÑки-длÑ-зÑениÑ&
...这显然是错误的,因为我需要的是原始字符串,在 utf-8 cyrillic 中。
我尝试将 URIEncoding 参数放入我的 server.xml tomcat http 连接器中,但没有结果:
<Connector port="8888" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443"
URIEncoding="UTF-8" />
如何获取 UTF-8 格式的 url 参数?
最好不要在任何情况下都在 URI 中使用西里尔字母。在其中包含除 ASCII 之外的内容是非常糟糕的做法。我在这里以俄语为母语的人告诉你俄罗斯莫斯科。
有一种所谓的俄语音译(俄语的罗马化),其中33个字母中的任何一个都可以直接转换为拉丁语。您可以应用这样的音译在后台将俄语解码为拉丁语,反之亦然。
像这样:
hostname:8888/index.cfm?foo=Solntsezaschitnye-ochki
或者尽可能使用 ID 号而不是文本。
我自己找到了解决方案。
来源:http://blogs.warwick.ac.uk/kieranshaw/entry/utf-8_internationalisation_with
Apache
Generally you don't need to worry about Apache as it shouldn't be
messing with your HMTL or URLs. However, if you are doing some
proxying with mod_proxy then you might need to have a think about
this. We use mod_proxy to do proxying from Apache through to Tomcat.
If you've got encoded characters in URL that you need to convert into
some query string for your underlying app then you're going to have a
strange little problem.
If you have a URL coming into Apache that looks like this:
http://mydomain/%E4%B8%AD.doc and you have a mod_rewrite/proxy rule
like this:
RewriteRule ^/(.*) http://mydomain:8080/filedownload/?filename=
[QSA,L,P]
Unfortunately the is going to get mangled during the rewrite. QSA
(QueryStringAppend) actually deals with these characters just fine and
will send this through untouched, but when you grab a bit of the URL
such as my here then the characters get mangled as Apache tries to
do some unescaping of its own into ISO-8859-1, but it's UTF-8 not
ISO-8859-1 so it doesn't work properly. So, to keep our special
characters in UTF-8, we'll escape it back again.
RewriteMap escape int:escape RewriteRule ^/(.*)
http://mydomain:8080/filedownload/?filename=${escape:} [QSA,L,P]
Take a look at your rewrite logs to see if this is working.
真的很难找。
我刚刚将我们的一个核心应用程序从 Windows+IIS+Coldfusion 移动到 Ubuntu+Apache+Lucee。第一个大问题是外来字母表的 URI 编码。
例如,尝试达到此 url http://www.example.com/ru/Солнцезащитные-очки/saint-laurent/
会在 Apache access.log 中产生此记录:
http://www.example.com/ru/%D0%A1%D0%BE%D0%BB%D0%BD%D1%86%D0%B5%D0%B7%D0%B0%D1%89%D0%B8%D1%82%D0%BD%D1%8B%D0%B5-%D0%BE%D1%87%D0%BA%D0%B8/saint-laurent/
嗯,我认为 url 编码正确。然后我在 .htaccess 文件中使用重写规则在 url 查询字符串参数(假设 "foo")中获取 url(西里尔字母)的那部分。
使用cflog转储,我在应用日志中看到:
/index.cfm?foo=оÑки-длÑ-зÑениÑ&
...这显然是错误的,因为我需要的是原始字符串,在 utf-8 cyrillic 中。
我尝试将 URIEncoding 参数放入我的 server.xml tomcat http 连接器中,但没有结果:
<Connector port="8888" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443"
URIEncoding="UTF-8" />
如何获取 UTF-8 格式的 url 参数?
最好不要在任何情况下都在 URI 中使用西里尔字母。在其中包含除 ASCII 之外的内容是非常糟糕的做法。我在这里以俄语为母语的人告诉你俄罗斯莫斯科。
有一种所谓的俄语音译(俄语的罗马化),其中33个字母中的任何一个都可以直接转换为拉丁语。您可以应用这样的音译在后台将俄语解码为拉丁语,反之亦然。
像这样:
hostname:8888/index.cfm?foo=Solntsezaschitnye-ochki
或者尽可能使用 ID 号而不是文本。
我自己找到了解决方案。
来源:http://blogs.warwick.ac.uk/kieranshaw/entry/utf-8_internationalisation_with
Apache
Generally you don't need to worry about Apache as it shouldn't be messing with your HMTL or URLs. However, if you are doing some proxying with mod_proxy then you might need to have a think about this. We use mod_proxy to do proxying from Apache through to Tomcat. If you've got encoded characters in URL that you need to convert into some query string for your underlying app then you're going to have a strange little problem.
If you have a URL coming into Apache that looks like this:
http://mydomain/%E4%B8%AD.doc and you have a mod_rewrite/proxy rule like this:
RewriteRule ^/(.*) http://mydomain:8080/filedownload/?filename= [QSA,L,P]
Unfortunately the is going to get mangled during the rewrite. QSA (QueryStringAppend) actually deals with these characters just fine and will send this through untouched, but when you grab a bit of the URL such as my here then the characters get mangled as Apache tries to do some unescaping of its own into ISO-8859-1, but it's UTF-8 not ISO-8859-1 so it doesn't work properly. So, to keep our special characters in UTF-8, we'll escape it back again.
RewriteMap escape int:escape RewriteRule ^/(.*) http://mydomain:8080/filedownload/?filename=${escape:} [QSA,L,P]
Take a look at your rewrite logs to see if this is working.
真的很难找。