Powershell 或 Python3 - CSV 文件:根据一列中的重复项删除行,在另一列中使用基于 IF ELSE 的条件

Powershell Or Python3 - CSV file: remove row based on duplicates in a column, with IF ELSE based conditions in another column

所以我在编码方面有点弱,并且在 powershell 和 python 方面都有一些经验,所以我愿意接受任何一种解决方案。

这可能很难描述,所以我创建了一个假数据集,希望它能让它更清楚。

我想做的是根据名称对目录中每个 CSV 的行进行重复数据删除,然后按顺序进行: 如果 NARRATIVE="CAUGHT",我想保留那一行 别的 如果 NARRATIVE 包含 URL,我想保留该行 别的 如果这些都不是真的,我想保留 last/bottom 大多数条目。

我觉得我最接近 Powershell,所以我将使用这个例子,但如果你能在 python 中解决这个问题,我也完全愿意。我哪里失败了?

gci -Filter *.csv | Select-Object -ExpandProperty FullName | Import-Csv | Foreach-Object {Select-Object where $_.NARRATIVE -Contains "Caught"} | export-csv test1.csv -NoTypeInformation

主要数据集:

SITE,DATE,URL,SITE2,NAME,NARRATIVE
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME1,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME2,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME3,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME4,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME5,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME6,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME7,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME1,only visited http://thisismyhouse.com once
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME2,NAME2 did some stuff and here's how/why
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME5,NAME5 just sat there
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME3,NAME3 was really important right here
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME6,NAME6 fell down and couldn’t get up
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME3,NAME3 was MOST important right here
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME8,NAME8 Dropped the beat
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME9,After the game NAME9 went home
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME4,"while NAME4 was at the store, they found a grape"
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME7,NAME7 got hit in the head
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME9,NAME9 spends a lot of time on http://dungeondepths.com
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME1,On Friday the 13th NAME1 got a tattoo
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME4,For dinner NAME4 ordered pizza
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME8,NAME8 Fired the Bass Cannon
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME9,NAME9 is rebooting
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME6 ,NAME6 broke their leg
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME8,NAME8 Put the needle on the record

期望的结果:

SITE,DATE,URL,SITE2,NAME,NARRATIVE
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME1,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME2,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME3,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME4,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME5,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME6,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME7,CAUGHT
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME9,NAME9 spends a lot of time on http://dungeondepths.com
AAA,03/17/2020,https://someurl.com/1234,BBB,NAME8,NAME8 Put the needle on the record

现在我完全理解了,试试这个(有更多的假设):

$groupedCsv = Import-Csv .\stackTest.csv | Group-Object Name
$result = @()
foreach ($csvObject in $groupedCsv){
    if($value = $csvObject | % {$_.Group | Where-Object Narrative -eq "Caught"}){
        $result += $value
    } elseif ($value = $csvObject | % {$_.Group | Where-Object Narrative -like "*http*"}){
        $result += $value
    } else {
        $result += $csvObject.Group[-1]
    }

}

#This is just to show the result
$result | ft

假设:

  1. 每个名字组只满足一个条件,我会尊重你的顺序。

这样我就得到了你的结果。

希望能解决您的问题。如果你想 运行 this 与多个 CSV,我建议将其创建为一个函数并按 CSV 调用它,以免使逻辑比它需要的更复杂,并且可以重复使用,如下所示:

function Parse-Csv{
    Param(
        [Parameter(Mandatory=$true, ValueFromPipelineByPropertyName=$true)]
        [Alias("Path")]
        [String] $FullName
    )
    process{
        if(Test-Path -Path $FullName -IsValid){
            #SetUp
            $groupedCsv = Import-Csv -Path $FullName | Group-Object Name
            $result = @()

            #Main
            foreach ($csvObject in $groupedCsv){
                if($value = $csvObject | % {$_.Group | Where-Object Narrative -eq "Caught"}){
                    $result += $value
                } elseif ($value = $csvObject | % {$_.Group | Where-Object Narrative -like "*http*"}){
                    $result += $value
                } else {
                    $result += $csvObject.Group[-1]
                }
            }
            $result | Export-Csv -Path $FullName -Force -NoTypeInformation
        } else{
            Write-Error "Invalid path provided ($Path), please verify and try again."
        }
    }
}

gci -Filter *.csv | Parse-Csv