Storing Your Digital Assets: A Practical Guide for Photographers
Posted by: Mr. Tech in Backup, Digital Photography, Hard Drives & Storage, Mac, Small Business Corner, WindowsTerms like “data storage”, “backup”, and “disaster recovery planning” used to be primarily associated with pocket-protector-wearing nerds who inhabit the cubicles and refrigerator-like datacenters of Information Technology departments in large corporations. (I’m allowed to cast such libelous aspersions since I myself transitioned from an exceedingly Kafkaesque, corporate IT background into the world of digital photography.)
But now that photography has largely gone digital, photographers are finding themselves in the position of having to make decisions on how to store, organize, and secure the lifeblood of their business — digital images – whether in RAW format, as JPEGs, TIFFs, or retouched Photoshop files.
While many different computer programs exist that can organize, manipulate, and retrieve digital images (applications like iPhoto, Adobe Lightroom, Aperture, and iView Media Pro, to name a few), the focus of this article is the physical media used to store and back up this data, regardless of what applications are used.
Direct Attached Storage:
The simplest method of saving data to an external hard drive is to use the physical ports commonly found on most computers, namely USB and FireWire.
Unless your computer is really old, it probably has USB (Universal Serial Bus) 2.0 ports. USB version 1.1 was quite slow, whereas USB 2.0 supports a theoretical data rate of 480 “Mbps” (megabits per second).
(NOTE: I don’t think you need to understand “megabits per second”, so I won’t explain further, but if you do, well, that’s why the gods invented Wikipedia!)
For comparison’s sake, FireWire “400” (a.k.a., “1394a”) is roughly 400 Mbps, while FireWire “800” (a.k.a., “1394b) is – you guessed it — roughly 800 Mbps. A new Apple MacBook Pro (which is a laptop) comes with both FireWire 400 and FireWire 800 ports, and cards can be purchased for use with either an Apple or a Microsoft operating system that provide these ports if they’re not already built into your computer.
Moving up the speed chain comes eSATA, which is roughly 3,000 Mbps – a lot faster than even the fastest FireWire connection currently available. Most computers do not come with eSATA ports, but cards can be purchased that provide them.
Many external hard drives now support multiple connections, e.g., the “G-DRIVE Q” with its “Quad Interface”.
It has four types of connections: eSATA, FireWire 400, FireWire 800, and USB 2.0. So, if you’re using a computer with the high-speed eSATA port, then you’d want to use it to copy data to the hard drive to achieve the fastest data transfer speed (throughput), but someone on an older machine could also use the external hard drive using one of he older, slower ports (FireWire or USB). This allows maximum flexibility. Even if you don’t yet have an eSATA card in your computer, I suggest purchasing a drive with an eSATA port anyway so you can take advantage of that added speed in the future. It’s a sort of insurance or hedge against the inevitable built-in obsolescence of the technology we purchase.
RAID:
RAID stands for “Redundant Arrays of Inexpensive Disks” (some say “Independent” Disks: for some odd reason, the technology world is fraught with strongly contested, ambiguous acronyms).
If hard drives were perfect, there would be no need for RAID, but hard drives can and do fail, rendering all data irretrievable. This can be due to environmental factors (fire, water, a freak Fluffernutter accident, etc.), but sometimes they break through no fault of the owner or any other external reasons.
According to a Carnegie Mellon study, hard drives fail 15 times more frequently than hard drive vendors claim (http://www.pcworld.com/article/id,129558/article.html). If the data you store is critical to your business, you should assume that the external hard drive you’re using to store data may fail at any time. The use of RAID can mitigate this risk to some degree.
RAID 1:
“RAID 1”, also known as mirroring, is most commonly implemented with just two hard drives: everything written to hard drive A is also written to hard drive B. If either hard drive fails, you still have a complete copy of your data. If both drives fail, however, you’ll need to restore from backup (this assumed you have a backup!) If a hard drives fails, you can simply replace the failed drive to rebuild the RAID 1.
The “G-SAFE” is a good example of a RAID 1 solution (http://www.g-technology.com/Products/G-SAFE.cfm). It’s basically an enclosure with two hard drives: all data written to drive A is also written to drive B.
Bear in mind this also means you are paying a price for the redundancy of RAID 1: if each drive is 1 terabyte in size, you still have only 1 terabyte of usable space, since each drive contains an exact copy of the other.
While the G-SAFE is advertised as “The Perfect Storage Solution for Professional Digital Photographers,” that’s just marketing speak: any brand RAID 1 enclosure from any vendor will protect data of any kind.
While my examples thus far have been of products from G-Technology, there are a plethora of other hardware vendors available. G-Technology drives tend to look cool, especially with Mac hardware, but you’ll pay a premium for their style and name. I own some myself, but they’re certainly not a bargain solution.
RAID 0:
“RAID 0” is also known as “striping”: it is actually NOT redundant. (Hence the contention that RAID actually stands for “Random Array of Independent Disks”; again, don’t get me started on the rampant acronym ambiguity!)
In fact, one may argue that RAID 0 is statistically less redundant than using just a single hard drive, since the data is striped between two or more drives to improve read/write performance. With striping between two drives, half the data is written to on drive and half to another. This is useful for video editing when single hard drive speeds are insufficient, but again, it provides no redundancy whatsoever. If you’re buying hard drives for redundancy, do NOT buy a RAID 0 solution.
I bring up RAID 0 because many enclosures are sold which come with two drives that can be configured as either a RAID 0 or a RAID 1 device: this is perfectly acceptable, just be sure to configure it as a RAID 1 for redundancy.
RAID 0+1:
You might see “RAID 0+1,“ “RAID 1+0,“ or my least favorite nomenclature, “RAID 10“. They all mean the same thing: striping with mirroring, so you get the speed advantages of striping (RAID 0) together with the redundancy of mirroring (RAID 1). The disadvantage? If you have four 500 GB drives in a RAID 0+1 configuration, you will get 1,000 GB (abut 1 TB) of usable space. Remember, mirroring always cuts your usable storage in half due to the redundancy.
RAID 5:
RAID 5 requires three or more hard drives and provides redundancy such that if one drive in the array fails, no data will be lost. If two drives fail, you must restore from backup. You’ll find RAID arrays with only three drives, but others with more than ten drives. You lose “n-1” in storage; e.g., if you have three 500 GB hard drives in a RAID 5 configuration, you’ll have about 1,000 GB of storage (about 1 TB). If you have ten 500 GB drives in a RAID 5 array, you’ll have about 4,500 GB of usable data, or 4.5 TB.
Network Attached Storage:
RAID 1 and RAID 5 are not only used in hard drive enclosures that are meant to connect directly to one computer at a time (whether using eSATA, FireWire, or USB). Many servers or “Network Appliances” are sold that utilize RAID 1 and RAID 5 technology with the added advantage that they can be accessed over a network, using Ethernet cables or WiFi (wireless) connections. This allows multiple workstations to connect to a centralized data storage device and access the same network shares. However, accessing a server over the network is bottlenecked by the speed of he Ethernet connection itself, which is often 10 or 100 Mbps. Gigabit (1,000 Mbps) Ethernet connectivity to the server is recommended; you just have to invest in the network infrastructure to support this (i.e., gigabit network switches as well as gigabit network cards in each workstation).
For those who require even faster access to a server, Fiber channel cards can be used from each workstation to a backend server or SAN (Storage Area Network).
The servers themselves often use fast “SCSI” hard drives that are either locally attached or accessed using the iSCSI protocol. SCSI drives are often much faster than IDE hard drives. (Most likely, every hard drive you’ve ever used is an IDE drive). With IDE drives, the hard drive controller is built into the drive itself, so there is no separate card required to communicate with the hard drives as with SCSI. Most servers use SCSI hard drives and most workstations use IDE.
SATA drives have become more common as well, mostly in the server market. They are generally cheaper and slower than SCSI drives, but “SATA 3.0” is almost as fast as SCSI at a much cheaper cost. The SCSI drives also provide greater sustained throughput, so if you need the best performance, SCSI is still king, but a SATA solution is probably more cost effective per megabyte.
Datacenters:
Having your own server requires that you have adequate IT support to deploy and support it. It also requires hat you have enough power and air conditioning to cool the server. Ideally you should have redundant power as well.
This is why it’s often better to collocate your server in someone else’s datacenter, one with redundant power, adequate cooling, redundant Internet connectivity, and offsite backups. They should also have adequate, proven disaster recovery capabilities and offsite backups. They should also have tight security and 24/7 monitoring of the server in case something goes wrong.
If you do locate your own server in a datacenter, remember that your Internet connectivity better be good, since when it’s down you can’t access your remote server!
Another option is to utilize “Managed Services”. Instead of purchasing your own server, simply pay for the service of having a certain amount of disk space in someone’s datacenter. This way you need not worry about how may servers they have, what type of drives they’re using, etc. Typically you pay a flat monthly fee for a certain amount of disk space that is tied to a Service Level Agreement (SLA). If the datacenter that’s hosting your data is inaccessible for some reason, they can offer a refund of free service to make up for the outage.
Conclusion:
Hopefully by understanding the connectivity options to a hard drives (or hard drive array) as well as the different RAID levels will make you a much more educated consumer. Plus you can impress your friends with your refined understanding of storage options!
Share This Tags: , arrays, Backup, eSATA, FireWire, G-DRIVE Q, G-SAFE, G-Technology, hard drives, NAS, Network Attached Storage, RAID, RAID 0, RAID 1, RAID 5, RAID0, RAID1, RAID5, SATA
Entries (RSS)