有没有办法批量使用非 ASCII（在本例中为 Japanese/Chinese）字符的 FINDSTR？

Question

我有一个日语汉字列表及其发音保存在这样的文本文件 (JouyouKanjiReadings.txt) 中

亜   ア
哀   アイ,あわれ,あわれむ
愛   アイ
悪   アク,オ,わるい
握   アク,にぎる
圧   アツ
(each gap is made by pressing TAB)

我有这样的脚本

@echo off
set /p text=Enter here: 
echo %text%>Search.txt
echo.
findstr /G:"Search.txt" JouyouKanjiReadings.txt || echo No Results && pause > nul && exit
pause > nul

但是，当我运行脚本时，我总是得到 "No Results"。我尝试使用英文字符，效果很好。我也用这个

尝试了相同的脚本

findstr "%text%" JouyouKanjiReadings.txt || echo No Results && pause > nul && exit

但得到了相同的结果。有什么办法可以解决这个问题吗？另外，我使用

在命令提示符中正确显示了这些字符

chcp 65001

和不同的字体。

Answer 1

您需要使用find (which supports Unicode but not regex) instead of findstr (which supports regex but not Unicode). See Why are there both FIND and FINDSTR programs, with unrelated feature sets?

D:\kanji>chcp
Active code page: 65001

D:\kanji>find "哀" JouyouKanjiReadings.txt

---------- JOUYOUKANJIREADINGS.TXT
哀      アイ,あわれ,あわれむ

重定向到 NUL 以在不需要时抑制输出

也就是说，find 也不是一个好的解决方案。由于兼容性遗留问题，现在您应该使用 PowerShell 而不是 cmd 及其所有怪癖。 PowerShell 完全支持 Unicode，并且可以运行任何 .NET 框架方法。要搜索字符串，您可以使用 cmdlet Select-String 或其别名 sls

PS D:\kanji> Select-String '握'  JouyouKanjiReadings.txt

JouyouKanjiReadings.txt:5:握    アク,にぎる

如果事实上您甚至不需要使用 UTF-8 和代码页 65001。只需将文件存储为带有 BOM 的 UTF-16（这将导致文件小得多，因为您的文件主要包含日文字符） , 然后 find 和 sls 将自动以 UTF-16

进行搜索

当然，如果有很多现有的批处理代码，那么您可以像这样从 cmd 调用 PowerShell

powershell -Command "Select-String '哀'  JouyouKanjiReadings.txt"

但如果它是全新的，那么请避免麻烦并使用 PowerShell

有没有办法批量使用非 ASCII（在本例中为 Japanese/Chinese）字符的 FINDSTR？

Is there a way to use FINDSTR with non-ASCII (in this case Japanese/Chinese) characters in batch?

unicode

batch-file

findstr