How to sort multiples photo directories

Posted on 2020-05-15

Too many backups

I have a bad habit of fearing losing data, and making a hasty backup when I need to change machines. Since it’s always laborious to compare a directory of photos with what I already archive, I usually backup the whole directory to deal with it later.

After a few months (sometimes years), I end up with many backup directories. Some images are duplicated across these directories, with various sizes and naming. This make the task of archiving even more difficult.

I recently had to do it and I created some tiny utilities that made this task much easier. Let’s dig in!

Proper naming

The first thing to do is to properly name an image. I chose the format year-month-day_hour-minutes-seconds.ext, eg. 2020-05-15_14h30m05s.jpg. Usually this data is available in the photo Exif. You can extract it with exiftool.

For instance you can create a small script named rename-exif-date:

#!/bin/sh
# Rename wrt date and hour of shot: 2020-12-25_20h03m12s.jpg
exiftool -d %Y-%m-%d_%Hh%Mm%Ss%%-c.%%le "-filename<CreateDate" "$@"

And run it with fd:

> fd -t f -x rename-exif-date {}

That will rename all your files.

Detect thumbnails

If you have thumbnails cluttering your directories, you can detect and remove them with feh.

To remove files smaller than 300x300, create a script name remove-small-images:

#!/bin/sh
feh --recursive --list --max-dimension $1 --action 'rm %F'

> remove-small-images 300x300

Detect duplicates

Now you should have clean directories, but potentially full of duplicates.

The tool I recommend to detecting (and removing) them is dupeGuru.

It can detect perfect duplicates (basically the same file), or similar images with a similarity threshold based on the content.