Update 20220510
Sorting out my fileserver, i had the need for a deduplication script.
Many files i’ve been copying from backup, clouds mobile devices and workstation. Inevitable to get many copies.
Below script walks a directory, using locate it tries to find files with same name. Using a md5sum it wil check if it is the same file, when found a simular file it stops searching, removes the one from the check-directory and checks the next one.
#!/bin/bash
# Copy this script to your to clean directory,
# when you got a copy on your fileserver from this script
# then the copy in your clean dir will be removed also.
# Dont want that? change
# find -type f |
# into
# find -type f | grep -v <nameofthisscript> |
# dont is current directory, skip these from locations
dont=$(pwd)
# Never start in /mnt ? uncomment below
# echo "$dont" | grep "^/mnt" && ( echo "start in tank" ; exit )
find -type f | while read file ; do
filemd5=$(md5sum "$file" | cut -f1 -d" ")
basenamefile=$(basename "$file")
echo "searching $basenamefile"
locate -i "/$basenamefile" | grep -v "$dont" | while read location ; do
if [ -f "$location" ] ; then
locatedfilemd5sum=$(md5sum "$location" | cut -f1 -d" ")
if [ "$filemd5" == "$locatedfilemd5sum" ] ; then
echo "found same md5sum at $location"
rm "$file"
break
fi
fi
done
done
# Remove empty dirs?
# find . -type d -empty -delete
Locate can be slow, sometimes it is better to put the locate DB in memory of on another fast storage system.
mkdir /ramdisk
mount ramfs -t ramfs /ramdisk/
cp /var/lib/mlocate/mlocate.db /ramdisk/
# change above script locate command
locate -d /var/lib/mlocate/mlocate.db -i IMG20191123.jpg
And remove empty directories?
Add below at the end of the script
find . -type d -empty -delete