Powershell脚本优化

Powershell script optimization

我在这里需要一些帮助,只是为了把事情放在上下文中,我是 PowerShell 的新手,我有一个任务,简单来说,需要一个包含超过 200 万条记录(来自 BigFix)的 csv 和一个很多列,通过 selecting 特定列将其分成多个 CSV,所以下面的代码是我尝试完成此操作,创建的 CSV 将是 zipped.Issues,只有 20 万条记录,这花了大约 4 个小时,所以首先我不知道是否有办法只导入一次 Csv 而不是每次我必须 select 不同的输出列时导入它? 除了最开始的复制任务(需要先)和Zipping需要在所有CSV创建之后,其余的可以同时运行(我不知道该怎么做) 感谢您的帮助

$filePath = "C:\location2\powerShellTesting\Input\bigFixDataNew.csv"

Copy-Item "\location1191213_BFI_SAMPLE_DATA_csv.csv" -Destination $filePath




$System = "..\Output\System.csv"
$AddRemove = "..\Output\AddRemove.csv"
$GS_PC_BIOS = "..\Output\GS_PC_BIOS.csv"
$GS_PROCESSOR = "..\Output\GS_PROCESSOR.csv"
$GS_LOGICAL_DISK = "..\Output\GS_LOGICAL_DISK.csv"
$GS_X86_PC_MEMORY = "..\Output\GS_X86_PC_MEMORY.csv"
$GS_COMPUTER_SYSTEM = "..\Output\GS_COMPUTER_SYSTEM.csv"
$GS_OPERATING_SYSTEM = "..\Output\GS_OPERATING_SYSTEM.csv"
$GS_WORKSTATION_STATUS = "..\Output\GS_WORKSTATION_STATUS.csv"



$desiredColumnsAddRemove = @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' },
@{ expression = {$_.'Component Name'}; label ='DISPLAYNAME'},
@{ expression = {$_.'Product Version'}; label = 'VERSION'},
@{ expression = {$_.'Publisher Name'}; label = 'PUBLISHER'},
@{ expression = {$_.'Creation'}; label = 'INSTALLDATE'}

$desiredColumnsGS_COMPUTER_SYSTEM = @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' },
@{ expression = {$_.'Server Vendor'}; label = 'MANUFACTURER0'},
@{ expression = {$_.'Server Model'}; label = 'MODEL0'},
@{ expression = {$_.'Partition Virtual Processors'}; label = 'NUMBEROFPROCESSORS0'}

$desiredColumnsGS_OPERATING_SYSTEM = @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' },
@{ expression = {$_.'Operating System'}; label = 'NAME0'},
@{ expression = {$_.'Operating System'}; label = 'CAPTION0'}

$desiredColumnsGS_WORKSTATION_STATUS = @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID'},
@{ expression = {$_.'Last Scan Attempt'}; label = 'LASTHWSCAN'}

$desiredColumnsSystem = @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' },
@{ expression = {$_.'DNS Name'}; label = 'NAME'},
@{ expression = {$_.'User Name'}; label = 'USER_NAME'}

$desiredColumnsGS_X86_PC_MEMORY = @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' }

$desiredColumnsGS_PROCESSOR = @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' },
@{ expression = {$_.'Vendor'}; label = 'MANUFACTURER0'},
@{ expression = {$_.'Processor Brand String'}; label = 'NAME0'}

$desiredColumnsGS_PC_BIOS = @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' },
@{ expression = {$_.'Server Vendor'}; label = 'MANUFACTURER0'},
@{ expression = {$_.'Server Serial Number'}; label = 'SERIALNUMBER0'}

$desiredColumnsGS_LOGICAL_DISK = @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' }




Import-Csv $filePath | Select $desiredColumnsGS_X86_PC_MEMORY -Unique |
Export-Csv -Path $GS_X86_PC_MEMORY –NoTypeInformation

Import-Csv $filePath | Select $desiredColumnsGS_PROCESSOR -Unique |
Export-Csv -Path $GS_PROCESSOR –NoTypeInformation

Import-Csv $filePath | Select $desiredColumnsGS_PC_BIOS -Unique |
Export-Csv -Path $GS_PC_BIOS –NoTypeInformation

Import-Csv $filePath | Select $desiredColumnsGS_LOGICAL_DISK -Unique |
Export-Csv -Path $GS_LOGICAL_DISK –NoTypeInformation

Import-Csv $filePath | Select $desiredColumnsGS_OPERATING_SYSTEM -Unique |
Export-Csv -Path $GS_OPERATING_SYSTEM –NoTypeInformation

Import-Csv $filePath | Select $desiredColumnsGS_WORKSTATION_STATUS -Unique |
Export-Csv -Path $GS_WORKSTATION_STATUS –NoTypeInformation

Import-Csv $filePath | Select $desiredColumnsSystem -Unique |
Export-Csv -Path $System –NoTypeInformation

Import-Csv $filePath | Select $desiredColumnsGS_COMPUTER_SYSTEM -Unique |
Export-Csv -Path $GS_COMPUTER_SYSTEM –NoTypeInformation

Import-Csv $filePath | Select $desiredColumnsAddRemove |
Export-Csv -Path $AddRemove –NoTypeInformation



# Creating the Zip File
$compress = @{
    Path = "..\Output\AddRemove.csv",
    "..\Output\GS_COMPUTER_SYSTEM.csv" ,
    "..\Output\GS_OPERATING_SYSTEM.csv",
    "..\Output\GS_WORKSTATION_STATUS.csv",
    "..\Output\System.csv",
    "..\Output\GS_X86_PC_MEMORY.csv",
    "..\Output\GS_PROCESSOR.csv",
    "..\Output\GS_PC_BIOS.csv",
    "..\Output\GS_LOGICAL_DISK.csv"

    CompressionLevel = "Fastest"
    DestinationPath = "..\Output\BigFix.Zip"
}
Compress-Archive @compress -Force
$filePath = "C:\location2\powerShellTesting\Input\bigFixDataNew.csv"

Copy-Item "\location1191213_BFI_SAMPLE_DATA_csv.csv" -Destination $filePath




$System = "..\Output\System.csv"
$AddRemove = "..\Output\AddRemove.csv"
$GS_PC_BIOS = "..\Output\GS_PC_BIOS.csv"
$GS_PROCESSOR = "..\Output\GS_PROCESSOR.csv"
$GS_LOGICAL_DISK = "..\Output\GS_LOGICAL_DISK.csv"
$GS_X86_PC_MEMORY = "..\Output\GS_X86_PC_MEMORY.csv"
$GS_COMPUTER_SYSTEM = "..\Output\GS_COMPUTER_SYSTEM.csv"
$GS_OPERATING_SYSTEM = "..\Output\GS_OPERATING_SYSTEM.csv"
$GS_WORKSTATION_STATUS = "..\Output\GS_WORKSTATION_STATUS.csv"



$desiredColumnsAddRemove = @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' },
@{ expression = {$_.'Component Name'}; label ='DISPLAYNAME'},
@{ expression = {$_.'Product Version'}; label = 'VERSION'},
@{ expression = {$_.'Publisher Name'}; label = 'PUBLISHER'},
@{ expression = {$_.'Creation'}; label = 'INSTALLDATE'}

$desiredColumnsGS_COMPUTER_SYSTEM = @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' },
@{ expression = {$_.'Server Vendor'}; label = 'MANUFACTURER0'},
@{ expression = {$_.'Server Model'}; label = 'MODEL0'},
@{ expression = {$_.'Partition Virtual Processors'}; label = 'NUMBEROFPROCESSORS0'}

$desiredColumnsGS_OPERATING_SYSTEM = @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' },
@{ expression = {$_.'Operating System'}; label = 'NAME0'},
@{ expression = {$_.'Operating System'}; label = 'CAPTION0'}

$desiredColumnsGS_WORKSTATION_STATUS = @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID'},
@{ expression = {$_.'Last Scan Attempt'}; label = 'LASTHWSCAN'}

$desiredColumnsSystem = @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' },
@{ expression = {$_.'DNS Name'}; label = 'NAME'},
@{ expression = {$_.'User Name'}; label = 'USER_NAME'}

$desiredColumnsGS_X86_PC_MEMORY = @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' }

$desiredColumnsGS_PROCESSOR = @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' },
@{ expression = {$_.'Vendor'}; label = 'MANUFACTURER0'},
@{ expression = {$_.'Processor Brand String'}; label = 'NAME0'}

$desiredColumnsGS_PC_BIOS = @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' },
@{ expression = {$_.'Server Vendor'}; label = 'MANUFACTURER0'},
@{ expression = {$_.'Server Serial Number'}; label = 'SERIALNUMBER0'}

$desiredColumnsGS_LOGICAL_DISK = @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' }


$csvfile = Import-Csv $filePath


$csvfile | Select $desiredColumnsGS_X86_PC_MEMORY -Unique |
Export-Csv -Path $GS_X86_PC_MEMORY –NoTypeInformation

$csvfile | Select $desiredColumnsGS_PROCESSOR -Unique |
Export-Csv -Path $GS_PROCESSOR –NoTypeInformation

$csvfile | Select $desiredColumnsGS_PC_BIOS -Unique |
Export-Csv -Path $GS_PC_BIOS –NoTypeInformation

$csvfile | Select $desiredColumnsGS_LOGICAL_DISK -Unique |
Export-Csv -Path $GS_LOGICAL_DISK –NoTypeInformation

$csvfile | Select $desiredColumnsGS_OPERATING_SYSTEM -Unique |
Export-Csv -Path $GS_OPERATING_SYSTEM –NoTypeInformation

$csvfile | Select $desiredColumnsGS_WORKSTATION_STATUS -Unique |
Export-Csv -Path $GS_WORKSTATION_STATUS –NoTypeInformation

$csvfile | Select $desiredColumnsSystem -Unique |
Export-Csv -Path $System –NoTypeInformation

$csvfile | Select $desiredColumnsGS_COMPUTER_SYSTEM -Unique |
Export-Csv -Path $GS_COMPUTER_SYSTEM –NoTypeInformation

$csvfile | Select $desiredColumnsAddRemove |
Export-Csv -Path $AddRemove –NoTypeInformation



# Creating the Zip File
$compress = @{
    Path = "..\Output\AddRemove.csv",
    "..\Output\GS_COMPUTER_SYSTEM.csv" ,
    "..\Output\GS_OPERATING_SYSTEM.csv",
    "..\Output\GS_WORKSTATION_STATUS.csv",
    "..\Output\System.csv",
    "..\Output\GS_X86_PC_MEMORY.csv",
    "..\Output\GS_PROCESSOR.csv",
    "..\Output\GS_PC_BIOS.csv",
    "..\Output\GS_LOGICAL_DISK.csv"

    CompressionLevel = "Fastest"
    DestinationPath = "..\Output\BigFix.Zip"
}
Compress-Archive @compress -Force

与其导入这么多次,不如将它导入一个变量一次,然后再操作这个变量。

当然,问题是您在 $filePath 每个输出文件 读取和解析文件一次,而理想情况下,它将被读取和解析一次。诱惑可能只是将 Import-Csv $filePath 的结果存储在一个变量中以供重用,但是,正如您发现的那样,这会使您与 不一致。即使不是这种情况,您在脚本运行时仍然会占用大量内存。

不是一次写入一个输出文件,我们可以通过逐条记录地将数据写入散布到每个输出文件,从而只读取一次 $filePath。首先,让我们清理定义哪些列输出到哪些文件的代码...

$filePath = "C:\location2\powerShellTesting\Input\bigFixDataNew.csv"

Copy-Item "\location1191213_BFI_SAMPLE_DATA_csv.csv" -Destination $filePath

$outputFileDescriptors = @(
    @{
        Path = "..\Output\System.csv"
        Columns = @(
            @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' },
            @{ expression = {$_.'DNS Name'}; label = 'NAME'},
            @{ expression = {$_.'User Name'}; label = 'USER_NAME'}
        )
    },
    @{
        Path = "..\Output\AddRemove.csv"
        Columns = @(
            @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' },
            @{ expression = {$_.'Component Name'}; label ='DISPLAYNAME'},
            @{ expression = {$_.'Product Version'}; label = 'VERSION'},
            @{ expression = {$_.'Publisher Name'}; label = 'PUBLISHER'},
            @{ expression = {$_.'Creation'}; label = 'INSTALLDATE'}
        )
    },
    @{
        Path = "..\Output\GS_PC_BIOS.csv"
        Columns = @(
            @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' },
            @{ expression = {$_.'Server Vendor'}; label = 'MANUFACTURER0'},
            @{ expression = {$_.'Server Serial Number'}; label = 'SERIALNUMBER0'}
        )
    },
    @{
        Path = "..\Output\GS_PROCESSOR.csv"
        Columns = @(
            @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' },
            @{ expression = {$_.'Vendor'}; label = 'MANUFACTURER0'},
            @{ expression = {$_.'Processor Brand String'}; label = 'NAME0'}
        )
    },
    @{
        Path = "..\Output\GS_LOGICAL_DISK.csv"
        Columns = @(
            @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' }
        )
    },
    @{
        Path = "..\Output\GS_X86_PC_MEMORY.csv"
        Columns = @(
            @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' }
        )
    },
    @{
        Path = "..\Output\GS_COMPUTER_SYSTEM.csv"
        Columns = @(
            @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' },
            @{ expression = {$_.'Server Vendor'}; label = 'MANUFACTURER0'},
            @{ expression = {$_.'Server Model'}; label = 'MODEL0'},
            @{ expression = {$_.'Partition Virtual Processors'}; label = 'NUMBEROFPROCESSORS0'}
        )
    },
    @{
        Path = "..\Output\GS_OPERATING_SYSTEM.csv"
        Columns = @(
            @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID' },
            @{ expression = {$_.'Operating System'}; label = 'NAME0'},
            @{ expression = {$_.'Operating System'}; label = 'CAPTION0'}
        )
    },
    @{
        Path = "..\Output\GS_WORKSTATION_STATUS.csv"
        Columns = @(
            @{ expression = {$_.'Internal Computer ID'}; label = 'RESOURCEID'},
            @{ expression = {$_.'Last Scan Attempt'}; label = 'LASTHWSCAN'}         
        )
    } `
        | ForEach-Object -Process { [PSCustomObject] $_ }
)

$outputFileDescriptors 将包含一个 [PSCustomObject] 实例数组,每个实例都具有定义该输出文件的 PathColumns 属性。此时,您可以 将脚本的末尾重写为...

foreach ($outputFileDescriptor in $outputFileDescriptors)
{
    Import-Csv $filePath | Select $outputFileDescriptor.Columns -Unique |
        Export-Csv -Path $outputFileDescriptor.Path -NoTypeInformation
}

# Creating the Zip File
Compress-Archive -Path ($outputFileDescriptors).Path -DestinationPath "..\Output\BigFix.Zip" `
    -CompressionLevel "Fastest" -Force

...但是与您的原始脚本相比没有性能改进;我们仍然为每个输出文件调用一次 Import-Csv

相反,让我们像这样修改该循环...

foreach ($record in Import-Csv $filePath)
{
    foreach ($outputFileDescriptor in $outputFileDescriptors)
    {
        $record | Select $outputFileDescriptor.Columns |
            Export-Csv -Path $outputFileDescriptor.Path -NoTypeInformation -Append
    }
}

现在 我们只调用 Import-Csv 一次,对于每个输入记录,我们将适当的列输出到每个文件。最重要的是,我们一次只能对一条记录进行变量引用,从而减少了内存使用量。

这里还有两个值得注意的变化。首先,我们将 -Append 传递给 Export-Csv;这样整个文件就不会被每条记录覆盖。其次,我们 不是 -Unique 传递给 Select-Object。我们可以,但它不会做任何事情,因为在这种情况下Select在评估唯一性时只考虑单个记录而不是整个输入数据集。

不幸的是,Select ... -Unique 不能用于像这样的流式输出场景,因为它会等到它评估了 所有 输入数据后才传递任何内容管道(看起来它肯定可以在第一次遇到它时输出一个值,但显然 it doesn't)。如果你确实有需要过滤掉的冗余输出数据,那么你可以自己跟踪你已经看到的数据......但是在内存中收集数据几乎让我们回到了我们开始的地方除非唯一数据量占整个数据集的一小部分and/or删除冗余数据的需要只是特定输出文件的真正问题。