从前导字符数中识别重复文件

Question

我有一个文件目录，其中包含大约。从其他来源复制的 600 个员工图像文件。

文件名格式为：

xxxxxx_123456_123_20141212.jpg

更新员工图像文件时，它只会在同一位置创建另一个文件，最后只有 datetime 发生变化。

我需要能够识别最新的文件，但是我需要首先确定哪些文件是 'duplicated'。

我最初的想法是尝试匹配前 14 个字符，如果匹配，则计算出最近的修改日期，然后删除旧文件。

Answer 1

这需要 PowerShell 版本 3。

$Path = 'C:\Users\madtomvane\Documents\PowerShellTest'
           #Get the files                        #Group them by name                          #Select the most resent file
$FilesToKeep = Get-ChildItem $Path -Recurse -File | Group-Object -Property {$_.Name[0..14]} | ForEach-Object {$_.Group | Sort-Object -Property LastWriteTime -Descending | Select-Object -First 1}

              #Get the files                          #Group them by name                    #Where there is more than one file in the group                            #Select the old ones
$FilesToRemove = Get-ChildItem $Path -Recurse -File | Group-Object -Property {$_.Name[0..14]} | Where-Object {$_.Group.Count -gt 1} | ForEach-Object {$_.Group | Sort-Object -Property LastWriteTime -Descending | Select-Object -Skip 1}
$FilesToRemove | Remove-Item

从前导字符数中识别重复文件

Identify duplicate files from leading number of characters

powershell

duplicate-removal