Posted on 2020-05-15
I have a bad habit of fearing losing data, and making a hasty backup when I need to change machines. Since it’s always laborious to compare a directory of photos with what I already archive, I usually backup the whole directory to deal with it later.
After a few months (sometimes years), I end up with many backup directories. Some images are duplicated across these directories, with various sizes and naming. This make the task of archiving even more difficult.
I recently had to do it and I created some tiny utilities that made this task much easier. Let’s dig in!
The first thing to do is to properly name an image. I chose the
format year-month-day_hour-minutes-seconds.ext
, eg.
2020-05-15_14h30m05s.jpg
. Usually this data is available in
the photo Exif. You can
extract it with exiftool.
For instance you can create a small script named
rename-exif-date
:
#!/bin/sh
# Rename wrt date and hour of shot: 2020-12-25_20h03m12s.jpg
exiftool -d %Y-%m-%d_%Hh%Mm%Ss%%-c.%%le "-filename<CreateDate" "$@"
And run it with fd:
> fd -t f -x rename-exif-date {}
That will rename all your files.
If you have thumbnails cluttering your directories, you can detect and remove them with feh.
To remove files smaller than 300x300, create a script name
remove-small-images
:
#!/bin/sh
feh --recursive --list --max-dimension $1 --action 'rm %F'
> remove-small-images 300x300
Now you should have clean directories, but potentially full of duplicates.
The tool I recommend to detecting (and removing) them is dupeGuru.
It can detect perfect duplicates (basically the same file), or similar images with a similarity threshold based on the content.