JSON 格式的 Powershell 问题(log4jscanner 和 utf16)
Powershell Issues with JSON format (log4jscanner & utf16)
我正在尝试从 log4jscanner.exe 中成功检索一些 JSON 数据(Qualys 软件可以检测您的 pc/server 中是否存在易受攻击的文件或组件)但是在花费了很多时间之后几个小时了,我想我遇到了 Powershell 的问题。
如果我在 Powershell 5.1 中存储以下命令的结果
$a = .\Log4jScanner.exe /scan /report_pretty
结果“显示”为:
PS C:\temp> $a |where {$_ -ne ""}
{
"scanSummary": {
"scanEngine": "2.0.2.7",
"scanHostname": "XXXXXXXXX",
"scanDate": "2022-01-20T18:02:26+0100",
"scanDurationSeconds": 28,
"scanErrorCount": 54,
"scanStatus": "Partially Successful",
"scannedFiles": 649020,
"scannedDirectories": 209514,
"scannedJARs": 31,
"scannedWARs": 0,
"scannedEARs": 0,
"scannedPARs": 0,
"scannedTARs": 5,
"scannedCompressed": 43,
"vulnerabilitiesFound": 1
},
"scanDetails": [
{
"file": "XXXXXX.jar",
"manifestVendor": "Unknown",
"manifestVersion": "Unknown",
"detectedLog4j": true,
"detectedLog4j1x": true,
"detectedLog4j2x": false,
"detectedJNDILookupClass": false,
"detectedLog4jManifest": false,
"log4jVendor": "log4j",
"log4jVersion": "1.2.17",
"cve20214104Mitigated": false,
"cve202144228Mitigated": true,
"cve202144832Mitigated": true,
"cve202145046Mitigated": true,
"cve202145105Mitigated": true,
"cveStatus": "Potentially Vulnerable ( CVE-2021-4104: Found )"
}
]
}
之后,我想转换该数据以处理特定值,首先我尝试转换来自 json 的数据,此处文本变为红色并发生以下错误:
PS C:\temp> $a | convertfrom-json
convertfrom-json : Objet non valide passé, ':' ou '}' attendu. (2): {
"scanSummary": {
"scanEngine": "2.0.2.7",
"scanHostname": "FRBOURWXT013379.vcn.ds.volvo.net",
"scanDate": "2022-01-20T18:02:26+0100",
.... .... ....
最后,如果我copy/paste将$a的内容放入另一个变量,如
$b = '
{
"scanSummary": {
"scanEngine": "2.0.2.7",
"scanHostname": "XXXXXXXXX",
"scanDate": "2022-01-20T18:02:26+0100",
"scanDurationSeconds": 28,
... ... ...
'
这意味着我现在可以访问转换后的数据了:
PS C:\temp> $b | convertfrom-json
scanSummary
-----------
@{scanEngine=2.0.2.7; scanHostname=XXXXXXXXX; scanDate=2022-01-20T18:02:26+0100; scanDurationSeconds=28; scanErrorCount=54; scanStatus=Partially Successful; scann...
目前$a的类型是Object[],$b的类型是String。
所以我尝试将 $a 转换为字符串
PS C:\temp> $a = [string] $a
PS C:\temp> $a
{ "scanSummary": { "scanEngine": "2.0.2.7", "scanHostname": "XXXXXXXXX", "scanDate": "2022-01-20T18:02:26+0100", "scanDurationSeconds": 28, "scanErrorCount": 54, "scanStatus": "Partially Successful", "scannedFiles": 649020, "scannedDirectories": 209514, "scannedJARs": 31, "scannedWARs": 0, "scannedEARs": 0, "scannedPARs": 0, "scannedTARs": 5, "scannedCompressed": 43, "vulnerabilitiesFound": 1 }, "scanDetails": [ { "file": "XXXXX.jar", "manifestVendor": "Unknown", "manifestVersion": "Unknown", "detectedLog4j": true, "detectedLog4j1x": true, "detectedLog4j2x": false, "detectedJNDILookupClass": false, "detectedLog4jManifest": false, "log4jVendor": "log4j", "log4jVersion": "1.2.17", "cve20214104Mitigated": false, "cve202144228Mitigated": true, "cve202144832Mitigated": true, "cve202145046Mitigated": true, "cve202145105Mitigated": true, "cveStatus": "Potentially Vulnerable ( CVE-2021-4104: Found )" } ] }
然后从json转换过来,但是一团糟
PS C:\temp> $a | convertfrom-json
convertfrom-json : Objet non valide passé, ':' ou '}' attendu. (2): { "scanSummary": { "scanEngine": "2.0.2.7",
"scanHostname": "XXXXXX", "scanDate":
"2022-01-20T18:02:26+0100", "scanDurationSeconds": 28, "scanErrorCount":
54, "scanStatus": "Partially Successful", "scannedFiles": 649020,
"scannedDirectories": 209514, "scannedJARs": 31, "scannedWARs": 0,
最后,如果我将任何数据导出到 .json 文件,我无法使用记事本或 codium 打开它(每个字符 = nul nul nul nul),而我可以使用 get-content 访问它powershell.
似乎有一些隐藏字符或者我不知道是什么,但我无法处理如何轻松转换和访问 json 数据。
有什么遗漏吗?
非常感谢你们的支持!
编辑 1 - 如果我保存输出,我无法正确打开 .json 文件,但 Powershell 似乎理解得很好:
Log4jScanner.exe
输出 Unicode。
a bug in PowerShell 导致将 Unicode 字节发送到其 STDOUT/STDERR 流的程序的输出被破坏。
很容易确认 - 当你 运行 命令
Log4jScanner.exe /scan_directory C:\something /report_pretty > output.json
在cmd.exe
中,然后output.json
将是整齐的UTF-16:
0d 00 0a 00 7b 00 0d 00 0a 00 20 00 20 00 20 00 .␀.␀{␀.␀.␀ ␀ ␀ ␀
20 00 22 00 73 00 63 00 61 00 6e 00 53 00 75 00 ␀"␀s␀c␀a␀n␀S␀u␀
6d 00 6d 00 61 00 72 00 79 00 22 00 3a 00 20 00 m␀m␀a␀r␀y␀"␀:␀ ␀
但 PowerShell 将盲目地假定程序输出流的单字节编码,并将其编码为 UTF-16 再次,包括实际上属于 UTF- 16 个字符:
ff fe 0d 00 0a 00 00 00 0d 00 0a 00 00 00 7b 00 ÿþ.␀.␀␀␀.␀.␀␀␀{␀
00 00 0d 00 0a 00 00 00 0d 00 0a 00 00 00 20 00 ␀␀.␀.␀␀␀.␀.␀␀␀ ␀
00 00 20 00 00 00 20 00 00 00 20 00 00 00 22 00 ␀␀ ␀␀␀ ␀␀␀ ␀␀␀"␀
这里我们看到了 UTF-16 BOM (ff fe
) 然后一个 real NUL 字符 00 00
原始输出中的 NUL,换行符除外,这就是为什么我们仍然看到常规 \r\n
(0d 00 0a 00
)的原因。例如,space(UTF-16 中的 20 00
)将变为 20 00 00 00
,并在文本编辑器中显示为 space 加上 NUL,如您在记事本++.
这当然很可怕。
您的选择是:
- 运行
Log4jScanner.exe
来自 cmd.exe
- 在解析之前从输出中删除多余的 NUL 字符
后者是这样的:
$json = Log4jScanner.exe /scan_directory C:\something /report_pretty
$data = $json.Replace(([char]0).ToString(), "") | ConvertFrom-Json
.NET 字符串可以合法地包含 NUL 字符(例如 C 字符串不能),但是在我们期望程序输出的 JSON 中没有合法的 NUL 字符,这就是为什么将它们全部扔掉的原因有效,但它肯定不漂亮 - 它只适用于实际上不包含 Unicode 字符的程序输出(这里恰好是这种情况,JSON 中的所有字符都在 ASCII 范围内)。
这是我发现的一些奇怪的代码,可以识别没有 bom 文件的 unicode (utf16-le)。这偶尔会出现在 windows 中。记事本也可以识别它(和 utf8 没有 bom)。
# istextunicode.ps1
param([Parameter(ValueFromPipeline=$True)] $filename)
# https://devblogs.microsoft.com/scripting/use-powershell-to-interact-with-the-windows-api-part-1/
begin {
$MethodDefinition = @'
[DllImport("Advapi32",SetLastError=false)]
public static extern bool IsTextUnicode(byte[] buf, int len,
ref IsTextUnicodeFlags opt);
[Flags]
public enum IsTextUnicodeFlags:int
{
IS_TEXT_UNICODE_ASCII16 = 0x0001,
IS_TEXT_UNICODE_REVERSE_ASCII16 = 0x0010,
IS_TEXT_UNICODE_STATISTICS = 0x0002,
IS_TEXT_UNICODE_REVERSE_STATISTICS = 0x0020,
IS_TEXT_UNICODE_CONTROLS = 0x0004,
IS_TEXT_UNICODE_REVERSE_CONTROLS = 0x0040,
IS_TEXT_UNICODE_SIGNATURE = 0x0008,
IS_TEXT_UNICODE_REVERSE_SIGNATURE = 0x0080,
IS_TEXT_UNICODE_ILLEGAL_CHARS = 0x0100,
IS_TEXT_UNICODE_ODD_LENGTH = 0x0200,
IS_TEXT_UNICODE_DBCS_LEADBYTE = 0x0400,
IS_TEXT_UNICODE_NULL_BYTES = 0x1000,
IS_TEXT_UNICODE_UNICODE_MASK = 0x000F,
IS_TEXT_UNICODE_REVERSE_MASK = 0x00F0,
IS_TEXT_UNICODE_NOT_UNICODE_MASK = 0x0F00,
IS_TEXT_UNICODE_NOT_ASCII_MASK = 0xF000
}
'@
Add-Type Advapi32 $MethodDefinition -Namespace Win32
$totalcount = 8
}
process {
if ( (get-item $filename).length -lt 1mb ) {
#$bytes = [io.file]::ReadAllBytes($filename)
$bytes = get-content $filename -encoding byte -totalcount $totalcount
# reset every time
[Win32.Advapi32+IsTextUnicodeFlags]$opt = 0xffff
$result = [win32.advapi32]::IsTextUnicode($bytes, $bytes.length, [ref]$opt)
#$result = [win32.advapi32]::IsTextUnicode($bytes, $totalcount, [ref]$opt)
[pscustomobject]@{
Filename = $filename
Result = $result
Flags = $opt
}
#if($result) { write-host $filename }
}
}
# error.log
# icacls (no bom), task scheduler, regedit
# gpreport.html
# $a = ls -force -r -file -exclude *.dll,*.exe,*.mui,*.jpg,*.jar,*.zip,*.msb,*.dat | get-item | istextunicode | where result
# -filter *.ini *.txt *.log
# $a = ls -force -Recurse -Filter *.ini | get-item | istextunicode | where {$_.result -and $_.flags -notmatch 'signature' }
在 cmd 中作为管理员执行(在 powershell 中它添加了错误的编码 bom 签名):
sfc > file
然后在 powershell 中:
.\istextunicode file
Filename Result Flags
-------- ------ -----
file True IS_TEXT_UNICODE_ASCII16, IS_TEXT_UNICODE_STATISTICS, IS_TEXT_UNICODE_CONTROLS, IS_TEXT_UNICODE_NULL_BYTES
format-hex file | select -first 1
Path: C:\users\admin\foo\file
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 0D 00 0D 00 0A 00 4D 00 69 00 63 00 72 00 6F 00 ......M.i.c.r.o.
我正在尝试从 log4jscanner.exe 中成功检索一些 JSON 数据(Qualys 软件可以检测您的 pc/server 中是否存在易受攻击的文件或组件)但是在花费了很多时间之后几个小时了,我想我遇到了 Powershell 的问题。
如果我在 Powershell 5.1 中存储以下命令的结果
$a = .\Log4jScanner.exe /scan /report_pretty
结果“显示”为:
PS C:\temp> $a |where {$_ -ne ""}
{
"scanSummary": {
"scanEngine": "2.0.2.7",
"scanHostname": "XXXXXXXXX",
"scanDate": "2022-01-20T18:02:26+0100",
"scanDurationSeconds": 28,
"scanErrorCount": 54,
"scanStatus": "Partially Successful",
"scannedFiles": 649020,
"scannedDirectories": 209514,
"scannedJARs": 31,
"scannedWARs": 0,
"scannedEARs": 0,
"scannedPARs": 0,
"scannedTARs": 5,
"scannedCompressed": 43,
"vulnerabilitiesFound": 1
},
"scanDetails": [
{
"file": "XXXXXX.jar",
"manifestVendor": "Unknown",
"manifestVersion": "Unknown",
"detectedLog4j": true,
"detectedLog4j1x": true,
"detectedLog4j2x": false,
"detectedJNDILookupClass": false,
"detectedLog4jManifest": false,
"log4jVendor": "log4j",
"log4jVersion": "1.2.17",
"cve20214104Mitigated": false,
"cve202144228Mitigated": true,
"cve202144832Mitigated": true,
"cve202145046Mitigated": true,
"cve202145105Mitigated": true,
"cveStatus": "Potentially Vulnerable ( CVE-2021-4104: Found )"
}
]
}
之后,我想转换该数据以处理特定值,首先我尝试转换来自 json 的数据,此处文本变为红色并发生以下错误:
PS C:\temp> $a | convertfrom-json
convertfrom-json : Objet non valide passé, ':' ou '}' attendu. (2): {
"scanSummary": {
"scanEngine": "2.0.2.7",
"scanHostname": "FRBOURWXT013379.vcn.ds.volvo.net",
"scanDate": "2022-01-20T18:02:26+0100",
.... .... ....
最后,如果我copy/paste将$a的内容放入另一个变量,如
$b = '
{
"scanSummary": {
"scanEngine": "2.0.2.7",
"scanHostname": "XXXXXXXXX",
"scanDate": "2022-01-20T18:02:26+0100",
"scanDurationSeconds": 28,
... ... ...
'
这意味着我现在可以访问转换后的数据了:
PS C:\temp> $b | convertfrom-json
scanSummary
-----------
@{scanEngine=2.0.2.7; scanHostname=XXXXXXXXX; scanDate=2022-01-20T18:02:26+0100; scanDurationSeconds=28; scanErrorCount=54; scanStatus=Partially Successful; scann...
目前$a的类型是Object[],$b的类型是String。
所以我尝试将 $a 转换为字符串
PS C:\temp> $a = [string] $a
PS C:\temp> $a
{ "scanSummary": { "scanEngine": "2.0.2.7", "scanHostname": "XXXXXXXXX", "scanDate": "2022-01-20T18:02:26+0100", "scanDurationSeconds": 28, "scanErrorCount": 54, "scanStatus": "Partially Successful", "scannedFiles": 649020, "scannedDirectories": 209514, "scannedJARs": 31, "scannedWARs": 0, "scannedEARs": 0, "scannedPARs": 0, "scannedTARs": 5, "scannedCompressed": 43, "vulnerabilitiesFound": 1 }, "scanDetails": [ { "file": "XXXXX.jar", "manifestVendor": "Unknown", "manifestVersion": "Unknown", "detectedLog4j": true, "detectedLog4j1x": true, "detectedLog4j2x": false, "detectedJNDILookupClass": false, "detectedLog4jManifest": false, "log4jVendor": "log4j", "log4jVersion": "1.2.17", "cve20214104Mitigated": false, "cve202144228Mitigated": true, "cve202144832Mitigated": true, "cve202145046Mitigated": true, "cve202145105Mitigated": true, "cveStatus": "Potentially Vulnerable ( CVE-2021-4104: Found )" } ] }
然后从json转换过来,但是一团糟
PS C:\temp> $a | convertfrom-json
convertfrom-json : Objet non valide passé, ':' ou '}' attendu. (2): { "scanSummary": { "scanEngine": "2.0.2.7",
"scanHostname": "XXXXXX", "scanDate":
"2022-01-20T18:02:26+0100", "scanDurationSeconds": 28, "scanErrorCount":
54, "scanStatus": "Partially Successful", "scannedFiles": 649020,
"scannedDirectories": 209514, "scannedJARs": 31, "scannedWARs": 0,
最后,如果我将任何数据导出到 .json 文件,我无法使用记事本或 codium 打开它(每个字符 = nul nul nul nul),而我可以使用 get-content 访问它powershell.
似乎有一些隐藏字符或者我不知道是什么,但我无法处理如何轻松转换和访问 json 数据。
有什么遗漏吗?
非常感谢你们的支持!
编辑 1 - 如果我保存输出,我无法正确打开 .json 文件,但 Powershell 似乎理解得很好:
Log4jScanner.exe
输出 Unicode。
a bug in PowerShell 导致将 Unicode 字节发送到其 STDOUT/STDERR 流的程序的输出被破坏。
很容易确认 - 当你 运行 命令
Log4jScanner.exe /scan_directory C:\something /report_pretty > output.json
在cmd.exe
中,然后output.json
将是整齐的UTF-16:
0d 00 0a 00 7b 00 0d 00 0a 00 20 00 20 00 20 00 .␀.␀{␀.␀.␀ ␀ ␀ ␀ 20 00 22 00 73 00 63 00 61 00 6e 00 53 00 75 00 ␀"␀s␀c␀a␀n␀S␀u␀ 6d 00 6d 00 61 00 72 00 79 00 22 00 3a 00 20 00 m␀m␀a␀r␀y␀"␀:␀ ␀
但 PowerShell 将盲目地假定程序输出流的单字节编码,并将其编码为 UTF-16 再次,包括实际上属于 UTF- 16 个字符:
ff fe 0d 00 0a 00 00 00 0d 00 0a 00 00 00 7b 00 ÿþ.␀.␀␀␀.␀.␀␀␀{␀ 00 00 0d 00 0a 00 00 00 0d 00 0a 00 00 00 20 00 ␀␀.␀.␀␀␀.␀.␀␀␀ ␀ 00 00 20 00 00 00 20 00 00 00 20 00 00 00 22 00 ␀␀ ␀␀␀ ␀␀␀ ␀␀␀"␀
这里我们看到了 UTF-16 BOM (ff fe
) 然后一个 real NUL 字符 00 00
原始输出中的 NUL,换行符除外,这就是为什么我们仍然看到常规 \r\n
(0d 00 0a 00
)的原因。例如,space(UTF-16 中的 20 00
)将变为 20 00 00 00
,并在文本编辑器中显示为 space 加上 NUL,如您在记事本++.
这当然很可怕。
您的选择是:
- 运行
Log4jScanner.exe
来自cmd.exe
- 在解析之前从输出中删除多余的 NUL 字符
后者是这样的:
$json = Log4jScanner.exe /scan_directory C:\something /report_pretty
$data = $json.Replace(([char]0).ToString(), "") | ConvertFrom-Json
.NET 字符串可以合法地包含 NUL 字符(例如 C 字符串不能),但是在我们期望程序输出的 JSON 中没有合法的 NUL 字符,这就是为什么将它们全部扔掉的原因有效,但它肯定不漂亮 - 它只适用于实际上不包含 Unicode 字符的程序输出(这里恰好是这种情况,JSON 中的所有字符都在 ASCII 范围内)。
这是我发现的一些奇怪的代码,可以识别没有 bom 文件的 unicode (utf16-le)。这偶尔会出现在 windows 中。记事本也可以识别它(和 utf8 没有 bom)。
# istextunicode.ps1
param([Parameter(ValueFromPipeline=$True)] $filename)
# https://devblogs.microsoft.com/scripting/use-powershell-to-interact-with-the-windows-api-part-1/
begin {
$MethodDefinition = @'
[DllImport("Advapi32",SetLastError=false)]
public static extern bool IsTextUnicode(byte[] buf, int len,
ref IsTextUnicodeFlags opt);
[Flags]
public enum IsTextUnicodeFlags:int
{
IS_TEXT_UNICODE_ASCII16 = 0x0001,
IS_TEXT_UNICODE_REVERSE_ASCII16 = 0x0010,
IS_TEXT_UNICODE_STATISTICS = 0x0002,
IS_TEXT_UNICODE_REVERSE_STATISTICS = 0x0020,
IS_TEXT_UNICODE_CONTROLS = 0x0004,
IS_TEXT_UNICODE_REVERSE_CONTROLS = 0x0040,
IS_TEXT_UNICODE_SIGNATURE = 0x0008,
IS_TEXT_UNICODE_REVERSE_SIGNATURE = 0x0080,
IS_TEXT_UNICODE_ILLEGAL_CHARS = 0x0100,
IS_TEXT_UNICODE_ODD_LENGTH = 0x0200,
IS_TEXT_UNICODE_DBCS_LEADBYTE = 0x0400,
IS_TEXT_UNICODE_NULL_BYTES = 0x1000,
IS_TEXT_UNICODE_UNICODE_MASK = 0x000F,
IS_TEXT_UNICODE_REVERSE_MASK = 0x00F0,
IS_TEXT_UNICODE_NOT_UNICODE_MASK = 0x0F00,
IS_TEXT_UNICODE_NOT_ASCII_MASK = 0xF000
}
'@
Add-Type Advapi32 $MethodDefinition -Namespace Win32
$totalcount = 8
}
process {
if ( (get-item $filename).length -lt 1mb ) {
#$bytes = [io.file]::ReadAllBytes($filename)
$bytes = get-content $filename -encoding byte -totalcount $totalcount
# reset every time
[Win32.Advapi32+IsTextUnicodeFlags]$opt = 0xffff
$result = [win32.advapi32]::IsTextUnicode($bytes, $bytes.length, [ref]$opt)
#$result = [win32.advapi32]::IsTextUnicode($bytes, $totalcount, [ref]$opt)
[pscustomobject]@{
Filename = $filename
Result = $result
Flags = $opt
}
#if($result) { write-host $filename }
}
}
# error.log
# icacls (no bom), task scheduler, regedit
# gpreport.html
# $a = ls -force -r -file -exclude *.dll,*.exe,*.mui,*.jpg,*.jar,*.zip,*.msb,*.dat | get-item | istextunicode | where result
# -filter *.ini *.txt *.log
# $a = ls -force -Recurse -Filter *.ini | get-item | istextunicode | where {$_.result -and $_.flags -notmatch 'signature' }
在 cmd 中作为管理员执行(在 powershell 中它添加了错误的编码 bom 签名):
sfc > file
然后在 powershell 中:
.\istextunicode file
Filename Result Flags
-------- ------ -----
file True IS_TEXT_UNICODE_ASCII16, IS_TEXT_UNICODE_STATISTICS, IS_TEXT_UNICODE_CONTROLS, IS_TEXT_UNICODE_NULL_BYTES
format-hex file | select -first 1
Path: C:\users\admin\foo\file
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 0D 00 0D 00 0A 00 4D 00 69 00 63 00 72 00 6F 00 ......M.i.c.r.o.