Powershell csv 行列转置和操作

Powershell csv row column transpose and manipulation

我是 Powershell 的新手。我尝试针对中等大小的基于 csv 的记录(大约 10000 行)处理/转置 row-column。原始 CSV 包含大约 10000 行和 3 列 ("Time","Id","IOT"),如下所示:

"Time","Id","IOT" 
"00:03:56","23","26" 
"00:03:56","24","0" 
"00:03:56","25","0" 
"00:03:56","26","1" 
"00:03:56","27","0" 
"00:03:56","28","0" 
"00:03:56","29","0" 
"00:03:56","30","1953" 
"00:03:56","31","22" 
"00:03:56","32","39" 
"00:03:56","33","8" 
"00:03:56","34","5" 
"00:03:56","35","269" 
"00:03:56","36","5" 
"00:03:56","37","0" 
"00:03:56","38","0" 
"00:03:56","39","0" 
"00:03:56","40","1251" 
"00:03:56","41","103" 
"00:03:56","42","0" 
"00:03:56","43","0" 
"00:03:56","44","0" 
"00:03:56","45","0" 
"00:03:56","46","38" 
"00:03:56","47","14" 
"00:03:56","48","0" 
"00:03:56","49","0" 
"00:03:56","2013","0" 
"00:03:56","2378","0" 
"00:03:56","2380","32" 
"00:03:56","2758","0" 
"00:03:56","3127","0" 
"00:03:56","3128","0" 
"00:09:16","23","22" 
"00:09:16","24","0" 
"00:09:16","25","0" 
"00:09:16","26","2" 
"00:09:16","27","0" 
"00:09:16","28","0" 
"00:09:16","29","21" 
"00:09:16","30","48" 
"00:09:16","31","0" 
"00:09:16","32","4" 
"00:09:16","33","4" 
"00:09:16","34","7" 
"00:09:16","35","382" 
"00:09:16","36","12" 
"00:09:16","37","0" 
"00:09:16","38","0" 
"00:09:16","39","0" 
"00:09:16","40","1882" 
"00:09:16","41","42" 
"00:09:16","42","0" 
"00:09:16","43","3" 
"00:09:16","44","0" 
"00:09:16","45","0" 
"00:09:16","46","24" 
"00:09:16","47","22" 
"00:09:16","48","0" 
"00:09:16","49","0" 
"00:09:16","2013","0" 
"00:09:16","2378","0" 
"00:09:16","2380","19" 
"00:09:16","2758","0" 
"00:09:16","3127","0" 
"00:09:16","3128","0" 
... 
... 
... 

我尝试使用基于从 https://gallery.technet.microsoft.com/scriptcenter/Powershell-Script-to-7c8368be
下载的 powershell 脚本的代码进行转置 基本上我的 powershell 代码如下:

$b = @() 
    foreach ($Time in $a.Time | Select -Unique) { 
        $Props = [ordered]@{ Time = $time } 
        foreach ($Id in $a.Id | Select -Unique){ 
            $IOT = ($a.where({ $_.Id -eq $Id -and $_.time -eq $time })).IOT 
            $Props += @{ $Id = $IOT } 
        } 
        $b += New-Object -TypeName PSObject -Property $Props 
    } 
$b | FT -AutoSize 
$b | Out-GridView 

上面的代码可以给我预期的结果,所有 "Id" 值都将成为列 headers 而所有 "Time" 值将成为唯一行和 "IOT"值作为 "Id" x "Time" 的交集,如下所示:

"Time","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","2013","2378","2380","2758","3127","3128" 
"00:03:56","26","0","0","1","0","0","0","1953","22","39","8","5","269","5","0","0","0","1251","103","0","0","0","0","38","14","0","0","0","0","32","0","0","0" 
"00:09:16","22","0","0","2","0","0","21","48","0","4","4","7","382","12","0","0","0","1882","42","0","3","0","0","24","22","0","0","0","0","19","0","0","0" 

虽然只涉及几百行,结果出来的速度和预期的一样快,但是现在的问题是处理整个csv文件有10000行,上面的脚本'keep executing'好像不行完成很长时间(数小时)并且无法吐出任何结果。 因此,如果来自 Whosebug 的一些 powershell 专家可以帮助评估上面的代码并且可能可以帮助修改以加快结果?

非常感谢您的建议

10000 条记录很多,但我认为建议 streamreader* 并手动解析 CSV 还不够。不过,最不利于您的是以下行:

$b += New-Object -TypeName PSObject -Property $Props 

PowerShell 在这里所做的是创建一个新数组并将该元素附加到它。这是一个非常占用内存的操作,您要重复 1000 次。在这种情况下,更好的做法是利用管道发挥优势。

$data = Import-Csv -Path "D:\temp\data.csv"
$headers = $data.ID  | Sort-Object {[int]$_}  -Unique

$data | Group-Object Time | ForEach-Object{
    $props = [ordered]@{Time = $_.Name}
    foreach($header in $headers){
        $props."$header" = ($_.Group | Where-Object{$_.ID -eq $header}).IOT
    }
    [pscustomobject]$props
} |  export-csv d:\temp\testing.csv -NoTypeInformation

$data 将是内存中的整个文件 object。需要获取将成为列 headers 的所有 $headers

按每个 Time 对数据进行分组。然后在每次 object 中我们得到每个 ID 的值。如果该 ID 在此期间不存在,则该条目将显示为空。

这不是最好的方法,但应该比你的方法更快。我 运行 在不到一分钟的时间内记录了 10000 条记录(3 次传球平均 51 秒)。如果可以的话,我会向您展示基准。

我只是用我自己的数据 运行 你的代码一次,花了 13 分钟。我认为可以肯定地说我的性能更快。


虚拟数据是用这个逻辑制作的,仅供参考

1..100 | %{
 $time = get-date -Format "hh:mm:ss"
 sleep -Seconds 1
    1..100 | % {

        [pscustomobject][ordered]@{
            time = $time 
            id = $_
            iot = Get-Random -Minimum 0 -Maximum 7
        } 
    }
} | Export-Csv d:\temp\data.csv -notypeinformation

* 对于您的 streamreader 案例来说,这不是一个很好的例子。只是指出它是为了表明它是读取大文件的更好方法。只需要逐行解析字符串。