user3653766 user3653766 - 3 months ago 20
PowerShell Question

Delete duplicate files with Powershell except the file specified

I am using the following code to delete duplicate files in one folder:

ls *.wav -recurse | get-filehash | group -property hash | where { $_.count -gt 1 } | % { $_.group | select -skip 1 } | del


I have two issues. I want to limit this to only one filehash at a time and I need to specify the file name I want to keep.

As an example, I have a folder named Recordings. The first five files listed all have the same filehash but only one has the filename that has already been entered in my database.

Recordings

It would be great if I could use the -Exclude parameter for the del cmdlet but that parameter does not accept pipeline input.

I also considered using the code above and then renaming the remaining file afterward but the code is not limited to one filehash.

Answer

It all depends on how you want it to work. For example, if you know the file name you want to keep in advance, you could do it this way:

$fileName = 'file1.txt'
$fileHash = Get-FileHash .\$filename
$duplicates = ls -Recurse | Get-FileHash | Where-Object {$_.Hash -eq $fileHash.Hash -and ($_.Path | Split-Path -Leaf) -ne $fileName }
$duplicates | del

This sequence sets the filename, gets the hash of that file, and then the main command checks for other files with that same hash that doesn't have the same filename.

Note: Test first to make sure this will do what you expect before you execute the del command.

Update: It appears that Get-FileHash puts some sort of lock on the files being hashed so you cannot immediately pipe to the del (Remove-Item) command. I modified the results to save the array of duplicates to a variable and then pass that to the delete command which works.