Sorting Photoprism Photos With the Mistral Cat
I chose to experiment with Le Chat by Mistral, the French AI alternative to Gemini, Claude and CatIFARTED (ChatGPT). For the experiment I copied my Photoprism photos from the drive I use that is connected a Raspberry pi to a laptop before running scripts to sort and remove duplicates. It worked well, with a nice little bonus which I’ll expand on later.
Goal: Clean Up Duplicate Photos
My objective was to Remove duplicate photos from a large collection while keeping the best version of each file. I Used jdupes to identify duplicates and a custom script to decide which files to keep.
The sources of duplication were that I imported photos from Google Takeout on one side, as well as from two or three iphones and an android phone. I suspect that Photosync might also contribute by encouraging the creation of a folder per device that we import from.
Custom Rules for Keeping Files
After running jdupe I set up custom rules that looked at file Naming: I told it to Prefer IMG_* or VIRB* over hash-named files. iPhones, Android phones and photo cameras never, or rarely, use names that are hashes. These are usually created by Whatsapp and similar apps. I chose to apply directory priority to Keep files in human-readable directories (e.g., “Spain bike ride”) over generic ones (e.g., “Photos from 2018”). Google Takeout creates two or more folders. It creates a primary year folder with all photos from that year, as well as secondary event specific folders based either on the name we chose, for example ‘spain bike ride’ or date based, if we did not give a specific name.
In the final step I noticed that it seemed to be choosing to delete HEIC files, rather than .JPG/.JPEG files. As HEIC are usually the original I want to keep the original. Eventually I saw that we had duplicate HEIC files, in which case I allowed it to remove duplicates of this file type. Finally I noticed that video files were either kept as mov files or converted. I accepted to have a rule to choose.MOV over .MP4. I used Le Chat (The cat) to help me understand the output from jdupe runs.
Script Development and Testing
As we progressed through the project The Chat offered three types of automation. It suggested Digikam, Pillow, a bespoke python script or using exiftool. In several cases it Wrote a Python script to apply the rules and generate a list of files to delete based on the output from jdupe being run.
Testing and Iteration
Part of collaborating with AI tools is experimentation and iteration. It’s about running a command, seeing the output, understand what you see, and then perfecting the command until you get what you want. It’s also about seeing opportunities.
One of the scripts I got The Cat to run was to check the “to delete” list and check if they had exif data for the creation date. i.e. the date when photos were taken. When a script confirmed that this was the case, that’s when the process of fine tuning the deletion script advanced. These are the rules we mentioned above.
Verification and Safety Checks
We ran a lot of dry runs. When you run dry-run jdupe it checks for duplicates and outputs to the terminal. When you have thousands of duplicates the terminal window forgets plenty of results. that’s where writing to a text file helps. It’s persistent.
The beauty of these text files is that they’re light, and you can share them with The Cat and the cat, in some situations, will actually run the script you discussed with it, rather than outputting the python script. This differs from Gemini in two ways. First it runs the script, so you don’t have to, but secondly if it reaches the token limit for script execution then it gives you the python script to run locally.
What is especially nice is that you can still keep “chatting” even if you reach that limit. It just won’t run scripts internally.
Backup: Emphasized
Along the way The Cat constantly encourages you to make sure you have a backup before running a command. As I was working from a copy, rather than the primary library I felt safe to experiment. Eventually I did execute the command to delete the duplicates and ran jdupe one last time to ensure the duplicates were gone.
And Finally
While experimenting I hit the limitations of the free plan, first for code execution, and then for chat. I didn’t intend for it to run scripts on files I uploaded. I think running scripts locally makes more sense. I uploaded the data for The Cat to get a better understanding of the data.
Hitting the data limit is a feature. It encourages us to take a break and work on something else.
What surprised me, yesterday, but again today, is that I get fatigued from playing with AI, because although large language models do some of the thinking, you still need to babysit them, and understand and supervise what they’re doing.