变量音译(powershell)
variable transliteration (powershell)
有个问题我没能很快解决。
重点是用字母遍历两个 table 中的所有匹配项。
例如我的脚本
function global:TranslitToLAT {
param([string]$inString)
$Translit_To_LAT = @{
[char]'а' = "a"
[char]'А' = "a"
[char]'б' = "b"
[char]'Б' = "b"
[char]'в' = "v"
[char]'В' = "v"
[char]'г' = "g"
[char]'Г' = "g"
[char]'д' = "d"
[char]'Д' = "d"
[char]'е' = "e"
[char]'Е' = "e"
[char]'ё' = "e"
[char]'Ё' = "e"
[char]'ж' = "zh"
[char]'Ж' = "zh"
[char]'з' = "z"
[char]'З' = "z"
[char]'и' = "i"
[char]'И' = "i"
[char]'й' = "y"
[char]'Й' = "y"
[char]'к' = "k"
[char]'К' = "k"
[char]'л' = "l"
[char]'Л' = "l"
[char]'м' = "m"
[char]'М' = "m"
[char]'н' = "n"
[char]'Н' = "n"
[char]'о' = "o"
[char]'О' = "o"
[char]'п' = "p"
[char]'П' = "p"
[char]'р' = "r"
[char]'Р' = "r"
[char]'с' = "s"
[char]'С' = "s"
[char]'т' = "t"
[char]'Т' = "t"
[char]'у' = "u"
[char]'У' = "u"
[char]'ф' = "f"
[char]'Ф' = "f"
[char]'х' = "h"
[char]'Х' = "h"
[char]'ц' = "ts"
[char]'Ц' = "ts"
[char]'ч' = "ch"
[char]'Ч' = "ch"
[char]'ш' = "sh"
[char]'Ш' = "sh"
[char]'щ' = "sch"
[char]'Щ' = "sch"
[char]'ъ' = "" # "``"
[char]'Ъ' = "" # "``"
[char]'ы' = "y" # "y`"
[char]'Ы' = "y" # "Y`"
[char]'ь' = "" # "`"
[char]'Ь' = "" # "`"
[char]'э' = "e" # "e`"
[char]'Э' = "e" # "E`"
[char]'ю' = "yu"
[char]'Ю' = "yu"
[char]'я' = "ya"
[char]'Я' = "ya"
[char]' ' = "_"
}
$outChars = ""
$TwoLetter_To_LAT = @{
[string]'ъи' = 'yi'
[string]'ьи' = 'yi'
[string]'ье' = 'ye'
[string]'ъe' = 'ye'
[string]'ий' = 'ii'
[string]'кс' = 'x'
[string]'ц' = 'c'
}
$chars = $inString.ToCharArray();
$outChars1 = $outChars
foreach ($char in $chars) {
$outChars1 += $Translit_To_LAT[$char]
$outChars11 = Write-Output $outChars1 `n
}
$TwoLetter_To_LAT.GetEnumerator().name | % {
$inString = $inString.Replace($_, $TwoLetter_To_LAT.Item($_))
}
$outChars2 = $outChars
foreach ($c in $inChars = $inString.ToCharArray()) {
if ($Translit_To_LAT[$c] -ne $Null )
{ $outChars2 += $Translit_To_LAT[$c] }
else
{ $outChars2 += $c }
$outChars22 = Write-Output $outChars2 `n
}
$outChars3 = $outChars11 + $outChars22
Write-Output $outChars3
}
$text = Read-Host "Second name"
$log = TranslitToLAT $text | select $log > c:\users.txt
$log
部分有效。在用俄语输入姓氏时,第二个 table 有两个匹配项,我从第一个 table 和第二个中得到总数。我应该得到 4 个音译选项!
我会对如何使循环一遍又一遍地通过 table 的示例感到满意。
改写一下,我认为你问的问题是“给定一个源字符串 $inString
和一个替换列表 $Translit_To_LAT
,return 所有可能的列表应用于源字符串的替换组合。
例如,我假设您的示例 - Алексий
- 会给出以下 4 个替换:
1. А-л-е-к-с-и-й -> a-l-e-k-s-i-y
2. А-л-е-к-с-ий -> a-l-e-k-s-ii
3. А-л-е-кс-и-й -> a-l-e-x-i-y
4. А-л-е-кс-ий -> a-l-e-x-ii
虽然根据您的替换列表,但有几点需要注意:
它是 case-sensitive,可能有不同的替换,例如б
和 Б
有些替换在替换文本中有多个字符 - 例如ж
=> "zh"
一些替换有多个替换选项 - 例如ц
=> c
和 ts
一些替换匹配不止一个源字符 - 例如ъи
=> yi
有时同一字符序列有多种可能的替换 - 例如ий
可以是两个单独的替换(и
=> i
和 й
=> y
)或单独的“复合”替换(ий
=> ii
)
我已将您的代码重写为递归函数,它基本上执行以下操作:
如果输入字符串为空,则return一个空字符串
否则,枚举所有可以在字符串开头应用的替换,并将它们与字符串尾部递归调用的结果“相乘”
代码如下:
function Get-Transliteration
{
param(
[string] $InputString
)
$lookups = [ordered] @{
# single character substitutions
# (we need to use the [char] cast to force case sensitivity for keys)
[char] "а" = @( "a" )
[char] "А" = @( "a" )
[char] "б" = @( "b" )
[char] "Б" = @( "b" )
[char] "в" = @( "v" )
[char] "В" = @( "v" )
[char] "г" = @( "g" )
[char] "Г" = @( "g" )
[char] "д" = @( "d" )
[char] "Д" = @( "d" )
[char] "е" = @( "e" )
[char] "Е" = @( "e" )
[char] "ё" = @( "e" )
[char] "Ё" = @( "e" )
[char] "ж" = @( "zh" )
[char] "Ж" = @( "zh" )
[char] "з" = @( "z" )
[char] "З" = @( "z" )
[char] "и" = @( "i" )
[char] "И" = @( "i" )
[char] "й" = @( "y" )
[char] "Й" = @( "y" )
[char] "к" = @( "k" )
[char] "К" = @( "k" )
[char] "л" = @( "l" )
[char] "Л" = @( "l" )
[char] "м" = @( "m" )
[char] "М" = @( "m" )
[char] "н" = @( "n" )
[char] "Н" = @( "n" )
[char] "о" = @( "o" )
[char] "О" = @( "o" )
[char] "п" = @( "p" )
[char] "П" = @( "p" )
[char] "р" = @( "r" )
[char] "Р" = @( "r" )
[char] "с" = @( "s" )
[char] "С" = @( "s" )
[char] "т" = @( "t" )
[char] "Т" = @( "t" )
[char] "у" = @( "u" )
[char] "У" = @( "u" )
[char] "ф" = @( "f" )
[char] "Ф" = @( "f" )
[char] "х" = @( "h" )
[char] "Х" = @( "h" )
[char] "ц" = @( "c", "ts")
[char] "Ц" = @( "ts" )
[char] "ч" = @( "ch" )
[char] "Ч" = @( "ch" )
[char] "ш" = @( "sh" )
[char] "Ш" = @( "sh" )
[char] "щ" = @( "sch" )
[char] "Щ" = @( "sch" )
[char] "ъ" = @( "" )
[char] "Ъ" = @( "" )
[char] "ы" = @( "y" )
[char] "Ы" = @( "y" )
[char] "ь" = @( "" )
[char] "Ь" = @( "" )
[char] "э" = @( "e" )
[char] "Э" = @( "e" )
[char] "ю" = @( "yu" )
[char] "Ю" = @( "yu" )
[char] "я" = @( "ya" )
[char] "Я" = @( "ya" )
[char] " " = @( "_" )
# multi-character substitutions
[string] "ъи" = @( "yi" )
[string] "ьи" = @( "yi" )
[string] "ье" = @( "ye" )
[string] "ъe" = @( "ye" )
[string] "ий" = @( "ii" )
[string] "кс" = @( "x" )
}
# if the input is empty then there's no work to do,
# so just return an empty string
if( [string]::IsNullOrEmpty($InputString) )
{
return [string]::Empty;
}
# find all the lookups that can be applied at the start of the string
$keys = @( $lookups.Keys | where-object { $InputString.StartsWith($_) } );
# if there are no lookups found at the start of the string we'll keep
# the first character as-is and prefix it to all the transliterations
# for the remainder of the string
if( $keys.Length -eq 0 )
{
$results = @();
$head = $InputString[0];
$rest = $InputString.Substring(1);
$tails = Get-Transliteration -InputString $rest;
foreach( $tail in $tails )
{
$results += $head + $tail;
}
return $results;
}
# if we found any lookups at the start of the string we need to "multiply"
# them with all the transliterations for the remainder of the string
$results = @();
foreach( $key in $keys )
{
if( $InputString.StartsWith($key) )
{
$heads = $lookups[$key];
$rest = $InputString.Substring(([string] $key).Length);
$tails = Get-Transliteration -InputString $rest;
foreach( $head in $heads )
{
foreach( $tail in $tails )
{
$results += $head + $tail;
}
}
}
}
return $results;
}
这里有一些例子:
# no substitutions to apply
PS> Get-Transliteration "abc"
abc
# single substitution with multiple characters in the replacement text
PS> Get-Transliteration "[ж]"
[zh]
# multiple replacement options for a single match
PS> Get-Transliteration "[ц]"
[c]
[ts]
# replace multiple source characters for a single match
PS> Get-Transliteration "[ъи]"
[i]
[yi]
# replace multiple possible options
PS> Get-Transliteration "[кс]-[ий]"
[ks]-[iy]
[ks]-[ii]
[x]-[iy]
[x]-[ii]
# original sample - "Алексий"
PS> Get-Transliteration "Алексий"
aleksiy
aleksii
alexiy
alexii
# original sample - "Зелекский"
PS> Get-Transliteration "Зелекский"
zelekskiy
zelekskii
zelexkiy
zelexkii
在这里查看维基百科页面 - https://en.wikipedia.org/wiki/BGN/PCGN_romanization_of_Russian - 我认为音译规则比这个函数可以处理的要复杂一些,但希望这是一个开始...
有个问题我没能很快解决。 重点是用字母遍历两个 table 中的所有匹配项。 例如我的脚本
function global:TranslitToLAT {
param([string]$inString)
$Translit_To_LAT = @{
[char]'а' = "a"
[char]'А' = "a"
[char]'б' = "b"
[char]'Б' = "b"
[char]'в' = "v"
[char]'В' = "v"
[char]'г' = "g"
[char]'Г' = "g"
[char]'д' = "d"
[char]'Д' = "d"
[char]'е' = "e"
[char]'Е' = "e"
[char]'ё' = "e"
[char]'Ё' = "e"
[char]'ж' = "zh"
[char]'Ж' = "zh"
[char]'з' = "z"
[char]'З' = "z"
[char]'и' = "i"
[char]'И' = "i"
[char]'й' = "y"
[char]'Й' = "y"
[char]'к' = "k"
[char]'К' = "k"
[char]'л' = "l"
[char]'Л' = "l"
[char]'м' = "m"
[char]'М' = "m"
[char]'н' = "n"
[char]'Н' = "n"
[char]'о' = "o"
[char]'О' = "o"
[char]'п' = "p"
[char]'П' = "p"
[char]'р' = "r"
[char]'Р' = "r"
[char]'с' = "s"
[char]'С' = "s"
[char]'т' = "t"
[char]'Т' = "t"
[char]'у' = "u"
[char]'У' = "u"
[char]'ф' = "f"
[char]'Ф' = "f"
[char]'х' = "h"
[char]'Х' = "h"
[char]'ц' = "ts"
[char]'Ц' = "ts"
[char]'ч' = "ch"
[char]'Ч' = "ch"
[char]'ш' = "sh"
[char]'Ш' = "sh"
[char]'щ' = "sch"
[char]'Щ' = "sch"
[char]'ъ' = "" # "``"
[char]'Ъ' = "" # "``"
[char]'ы' = "y" # "y`"
[char]'Ы' = "y" # "Y`"
[char]'ь' = "" # "`"
[char]'Ь' = "" # "`"
[char]'э' = "e" # "e`"
[char]'Э' = "e" # "E`"
[char]'ю' = "yu"
[char]'Ю' = "yu"
[char]'я' = "ya"
[char]'Я' = "ya"
[char]' ' = "_"
}
$outChars = ""
$TwoLetter_To_LAT = @{
[string]'ъи' = 'yi'
[string]'ьи' = 'yi'
[string]'ье' = 'ye'
[string]'ъe' = 'ye'
[string]'ий' = 'ii'
[string]'кс' = 'x'
[string]'ц' = 'c'
}
$chars = $inString.ToCharArray();
$outChars1 = $outChars
foreach ($char in $chars) {
$outChars1 += $Translit_To_LAT[$char]
$outChars11 = Write-Output $outChars1 `n
}
$TwoLetter_To_LAT.GetEnumerator().name | % {
$inString = $inString.Replace($_, $TwoLetter_To_LAT.Item($_))
}
$outChars2 = $outChars
foreach ($c in $inChars = $inString.ToCharArray()) {
if ($Translit_To_LAT[$c] -ne $Null )
{ $outChars2 += $Translit_To_LAT[$c] }
else
{ $outChars2 += $c }
$outChars22 = Write-Output $outChars2 `n
}
$outChars3 = $outChars11 + $outChars22
Write-Output $outChars3
}
$text = Read-Host "Second name"
$log = TranslitToLAT $text | select $log > c:\users.txt
$log
部分有效。在用俄语输入姓氏时,第二个 table 有两个匹配项,我从第一个 table 和第二个中得到总数。我应该得到 4 个音译选项! 我会对如何使循环一遍又一遍地通过 table 的示例感到满意。
改写一下,我认为你问的问题是“给定一个源字符串 $inString
和一个替换列表 $Translit_To_LAT
,return 所有可能的列表应用于源字符串的替换组合。
例如,我假设您的示例 - Алексий
- 会给出以下 4 个替换:
1. А-л-е-к-с-и-й -> a-l-e-k-s-i-y
2. А-л-е-к-с-ий -> a-l-e-k-s-ii
3. А-л-е-кс-и-й -> a-l-e-x-i-y
4. А-л-е-кс-ий -> a-l-e-x-ii
虽然根据您的替换列表,但有几点需要注意:
它是 case-sensitive,可能有不同的替换,例如
б
和Б
有些替换在替换文本中有多个字符 - 例如
ж
=>"zh"
一些替换有多个替换选项 - 例如
ц
=>c
和ts
一些替换匹配不止一个源字符 - 例如
ъи
=>yi
有时同一字符序列有多种可能的替换 - 例如
ий
可以是两个单独的替换(и
=>i
和й
=>y
)或单独的“复合”替换(ий
=>ii
)
我已将您的代码重写为递归函数,它基本上执行以下操作:
如果输入字符串为空,则return一个空字符串
否则,枚举所有可以在字符串开头应用的替换,并将它们与字符串尾部递归调用的结果“相乘”
代码如下:
function Get-Transliteration
{
param(
[string] $InputString
)
$lookups = [ordered] @{
# single character substitutions
# (we need to use the [char] cast to force case sensitivity for keys)
[char] "а" = @( "a" )
[char] "А" = @( "a" )
[char] "б" = @( "b" )
[char] "Б" = @( "b" )
[char] "в" = @( "v" )
[char] "В" = @( "v" )
[char] "г" = @( "g" )
[char] "Г" = @( "g" )
[char] "д" = @( "d" )
[char] "Д" = @( "d" )
[char] "е" = @( "e" )
[char] "Е" = @( "e" )
[char] "ё" = @( "e" )
[char] "Ё" = @( "e" )
[char] "ж" = @( "zh" )
[char] "Ж" = @( "zh" )
[char] "з" = @( "z" )
[char] "З" = @( "z" )
[char] "и" = @( "i" )
[char] "И" = @( "i" )
[char] "й" = @( "y" )
[char] "Й" = @( "y" )
[char] "к" = @( "k" )
[char] "К" = @( "k" )
[char] "л" = @( "l" )
[char] "Л" = @( "l" )
[char] "м" = @( "m" )
[char] "М" = @( "m" )
[char] "н" = @( "n" )
[char] "Н" = @( "n" )
[char] "о" = @( "o" )
[char] "О" = @( "o" )
[char] "п" = @( "p" )
[char] "П" = @( "p" )
[char] "р" = @( "r" )
[char] "Р" = @( "r" )
[char] "с" = @( "s" )
[char] "С" = @( "s" )
[char] "т" = @( "t" )
[char] "Т" = @( "t" )
[char] "у" = @( "u" )
[char] "У" = @( "u" )
[char] "ф" = @( "f" )
[char] "Ф" = @( "f" )
[char] "х" = @( "h" )
[char] "Х" = @( "h" )
[char] "ц" = @( "c", "ts")
[char] "Ц" = @( "ts" )
[char] "ч" = @( "ch" )
[char] "Ч" = @( "ch" )
[char] "ш" = @( "sh" )
[char] "Ш" = @( "sh" )
[char] "щ" = @( "sch" )
[char] "Щ" = @( "sch" )
[char] "ъ" = @( "" )
[char] "Ъ" = @( "" )
[char] "ы" = @( "y" )
[char] "Ы" = @( "y" )
[char] "ь" = @( "" )
[char] "Ь" = @( "" )
[char] "э" = @( "e" )
[char] "Э" = @( "e" )
[char] "ю" = @( "yu" )
[char] "Ю" = @( "yu" )
[char] "я" = @( "ya" )
[char] "Я" = @( "ya" )
[char] " " = @( "_" )
# multi-character substitutions
[string] "ъи" = @( "yi" )
[string] "ьи" = @( "yi" )
[string] "ье" = @( "ye" )
[string] "ъe" = @( "ye" )
[string] "ий" = @( "ii" )
[string] "кс" = @( "x" )
}
# if the input is empty then there's no work to do,
# so just return an empty string
if( [string]::IsNullOrEmpty($InputString) )
{
return [string]::Empty;
}
# find all the lookups that can be applied at the start of the string
$keys = @( $lookups.Keys | where-object { $InputString.StartsWith($_) } );
# if there are no lookups found at the start of the string we'll keep
# the first character as-is and prefix it to all the transliterations
# for the remainder of the string
if( $keys.Length -eq 0 )
{
$results = @();
$head = $InputString[0];
$rest = $InputString.Substring(1);
$tails = Get-Transliteration -InputString $rest;
foreach( $tail in $tails )
{
$results += $head + $tail;
}
return $results;
}
# if we found any lookups at the start of the string we need to "multiply"
# them with all the transliterations for the remainder of the string
$results = @();
foreach( $key in $keys )
{
if( $InputString.StartsWith($key) )
{
$heads = $lookups[$key];
$rest = $InputString.Substring(([string] $key).Length);
$tails = Get-Transliteration -InputString $rest;
foreach( $head in $heads )
{
foreach( $tail in $tails )
{
$results += $head + $tail;
}
}
}
}
return $results;
}
这里有一些例子:
# no substitutions to apply
PS> Get-Transliteration "abc"
abc
# single substitution with multiple characters in the replacement text
PS> Get-Transliteration "[ж]"
[zh]
# multiple replacement options for a single match
PS> Get-Transliteration "[ц]"
[c]
[ts]
# replace multiple source characters for a single match
PS> Get-Transliteration "[ъи]"
[i]
[yi]
# replace multiple possible options
PS> Get-Transliteration "[кс]-[ий]"
[ks]-[iy]
[ks]-[ii]
[x]-[iy]
[x]-[ii]
# original sample - "Алексий"
PS> Get-Transliteration "Алексий"
aleksiy
aleksii
alexiy
alexii
# original sample - "Зелекский"
PS> Get-Transliteration "Зелекский"
zelekskiy
zelekskii
zelexkiy
zelexkii
在这里查看维基百科页面 - https://en.wikipedia.org/wiki/BGN/PCGN_romanization_of_Russian - 我认为音译规则比这个函数可以处理的要复杂一些,但希望这是一个开始...