Organising Terabytes Fast
Imagine for a second that you have six hard drives filled with Data. Some are four terabytes. Others are one terabyte each, and you’ve already sorted personal videos and photos from other media files. Each drive is moved to its own folder
Two Terabyte Seagate has gone from being a drive to a folder. Bob 2017 has also become a folder, rather than a drive. Samantha 2018 is also a volume, rather than a drive. There is a reason for congregating the media from smaller files to a large volume and that is speed.
If you want to organise photos into a “photos” folder, videos into a “videos” folder and then organise those photos by year, initially, then moving files and folders from location A on a drive to Location B on a drive takes seconds.
To be more specific, you have volume A, B and C with files in them, and there is a lot of overlap of files. You could compare the files in three locations to make sure that there are no duplicates across a spaguetti junction of physical drives but by centralising everything you have a single volume, whether it’s a raid or a hard drive.
When everything is on a single volume organising files becomes as easy as creating folders, and moving files to the right folder. Video files from 2024 all go into videos/2024, videos from 2016 go to videos 2016. You may notice that I’m not going through the year-month-date yet. That’s because I find that itteration is faster. The aim is to identify the duplicates fast, and delete them from the aggregation/consolidation drive. Having three or four copies across three or four drives makes sense, for data recovery. On a single drive it’s a waste of space.
Once duplicates have been identified and got rid of, then time can be spend in uniformising date format and file names. Remember, eventually you can move the well organised files back to a smaller drive as a low cost backup.
10 Terabytes Become Four
This figure is an imaginery one. The point is that if you have data across 6-8 drives, and their storage amounts to 10-12 terabytes but a lot of that data is duplicates then the real space needed and used is lower. You should have two local backups and one off-site backup.
By copying data from multiple drives to a single drive it becomes easy to get everything organised, and one it is organised you can dump that data, in an organised manner, back to external drives.
For example you could have a drive for photos from 2010-2024, and another for videos from 2020-2024, and so on. Usually I don’t print labels for drives so it can get confusing. That’s where I like to use post-it notes. They’re cheap, and versatile. They need to last only until you finish organising your files. In theory you could print proper labels for drives but post-its are quick, cheap, and easy to use.
If I had a label maker I might print labels. I would consider printed labels once things are finalised, rather than when they’re in flux. Post-its are good for constant change.
Knowing What You Have and Where
If you backup from your laptop to drive A when you run out of space, and then drive B when you run out of space you have duplicates, triplicates, maybe even five or six copies of the same file. The problem is that because it’s decentralised it’s easier to back everything up and be safe, than assume you have a file or folder backed up when it isn’t.
By aggregating smaller drives to a big drive you gain control of your former chaos. You go from thinking you need ten terabytes to realising you need four to six terabytes instead.
Two Motivations
The first motivation for finally doing this properly is that I noticed a few years ago that I had lost track of hundreds of files from the uni years and I wanted to recover them. Between Picasa, iPhoto, Aperture, Google Photos and other solutions I lost track of these files. Now that I have regained track of them I can take advantage.
My second motivation is that PhotoPrism and Immich look like interesting solutions. In the good old days I had so few photos that they fit on my laptop’s drive with ease, but in the age of having a camera with us everywhere we go we end up with thousands of photos. These take up space, and by having a self-hosted solution like PhotoPrism and Immich we can keep track of these images with ease.
Estimating Cost
For the sake of argument let’s say that you have twenty drives. and they vary from 750 gigabytes to 8 terabytes in size. In theory you would assume that you need a 30+ terabyte raid to backup all that data. The issue is that this 30 terabyte raid costs hundreds, if not thousands of francs. If you regain control of how much space you need, on smaller volume drives then you get a better idea of how much storage you need.
An empty four bay synology device costs over four hundred francs, and that’s before you get disk drives. With the disk drives you get to 1500 CHF. I am not against getting a Synology or other device. I’m encouraging people to get into good habits, to ensure that there are two local copies, and a third off-site backup, rather than 15 drives that all have similar files.
## And Finally
I chose to write this blog post today because yesterday I suddenly felt overwhelmed by the amount of data I felt I still had to consolidate and the little amount of space I had, relative to the requirement. Initially my idea was to dump all the data from the smaller drives to the central server but I don’t have that much storage.
By organising files by photos, documents, and videos, as well as then going down to organising them by year I can quickly detect and delete duplicates. This helps me streamline how much storage I need, and I can then backup that data by photos on one drive, video on another, and documents on a third, for example. I could also organise them chronologically.
By organising the video and photo files on a single volume though I prepare it to be indexed and catalogued by either PhotoPrism, or Immich, or both, to see which one I prefer over time.