Playing with Hard Drives

Page content

One of the easiest things to do is to buy a hard drive, and over a period of time fill it, and then get another drive, and a third, and a fourth, and a fifth. People will go from a small drive, to a larger drive, and a larger drive after that and backup plenty of the files across three to five drives. The result is that you have terabytes of storage, and some files are “backed up” on every drive, whereas others are precariously stored on just one drive. The result is that if you do want to consolidate that data to a single volume you need a larger and larger drive.

Over the years I have worked as a Media Asset Manager and I have a technique for finding duplicates, and freeing HD space, whilst at the same time ensuring that data is still backed up to at least two volumes. if Not more. As I deal with video I do this by organising every project by year-month-day-country-subject-individual where “individual” is the person working on that project.

Consolidate Media Assets onto a Single Volume

The idea is that as you copy files and folders from a collection of external hard drives you consolidate everything onto a NAS or other form of storage server. If the date is not provided I use the file names and project title and related information to situate it in with time.

Iterate

At first the process is easy, because every project and set of files are unique. The beauty of the year-month-day folder structure is that as you progress you spot that two or three folders have the same name, and that’s when you go and check file dates to see which is the most recent.

Fine tuning

One nuance is that if a file is named final final, and another is named I hope this is the final, and a third is named final final final final, then you change the name. Each file is renamed according to it’s creation date. This allows you to see within seconds which file was last exported, for example, and helps order chaos.

Backing up the Consolidated Files

In an ideal world you would have as many terabytes or petabytes of storage for the NAS but this isn’t always possible. What I did instead was to consolidate the data from the smaller external drives onto the central storage and then store them, as they were, as a backup in case the NAS fails. This isn’t ideal but it’s a good compromise when budget is limited.

Imagine that you have three or four one terabyte drives and they’re all getting full. If you have a two or three terabyte file you may be able to backup one or two drives, but run out of space for the two others. The solution is to get a four or five terabyte hard disk. With this you can dump the four smaller drives, and then start sorting projects and media assets by year-month-day but also by project name. With this work flow you can identify duplicates with ease.

Reducing the Number of Drives Plugged In

The aim of dumping drive A, B, C and D to drive E is that you get rid of the need to have four drives plugged in at once. You can have a projects folder for 2024. Each drive is in its own folder. You open two finder windows and then you can start organising your projects by year-month-day-project-name. You move the folder for the project from the Drive A folder to the year folder. When you finish bringing in drive A you repeat the process with drive B, Drive C and Drive D. When you spot duplicates you delete the duplicate files.

Removing Duplicatew

As you progress you go from having four terabytes of files to having three terabyte, down to two and a half, and so on. As duplicates are eliminated you regain that space so your four terabyte drive is not filled instantly. Ideally you would have the four terabyte drive mirrored on another four terabyte drive. If space is tight then you can re-use the small drives. 2024 can then be split across two or three drives. Now you have a primary storage solution, and several secondary drives. If the four terabyte drive fails you either fail over to the clone, or you fail over to the re-used smaller capacity drives.

If the four terabyte volume fails then you get a new four terabyte drive, and copy the data from the four backup drives and within a short amount of time you have recovered.

Shifting to Exfat

For a long time I was using MacOS so I had drives that were formatted with APFS or the Journaled versions. As I shifted from MacOS to Windows and Linux I found the need to have drives that could be read by all three systems. By consolidating media I gave myself the room I needed to backup a drive from one volume to another, convert it to exfat, and then move data back.

If the windows machine fails, or the Linux machine fails, or the Windows machine fails, then I do not want to have to replace that machine, just because of a file system. By having Exfat I have the freedom to slide from machine to machine, with fluidity.

And Finally

It can be overwhelming to see that you have ten to twenty hard drives that may contain duplicates. By getting a larger drive you can go from having a dozen drives, and a dozen places where things are stored to having a single place where everything is centralised. Once everything is centralised you can order files in folders that are organised by year, and then by year-month-day. In so doing you don’t need to know what you’re archiving intimately. You just need to be able to organise things by year-month-day.

I know that I repeat this point a lot but there is a reason. If you know when something was photographed, filmed or worked on then you can find files within seconds, rather than hours, and without the CMS, should the CMS fail. The other reason is that a project might have two or three names, depending on who worked on the project. If it’s organised well, then going upstream is easier.

This should be an itterative process. Start by the task that is easiest, and then itterate and fine tune the files and folders until everything is well organised, duplicates are detected and consolidated, or deleted, dependant on context. If you do this well then it can be relaxing.

I’ve been consolidating files and data to free up drives to experiment with Immich, PhotoPrism and Nextcloud in parallel.