Remarkably, Microsoft has given us some surprisingly exciting improvements to storage technologies over the last couple of years — and Disk Deduplication stands out as a significant one.
Plan A for storing files in a hard drive is so straightforward, it almost goes without saying. Plan A says that if you want to store a file in a drive, and you have permission to store it there, and there’s room for it, it gets stored. Again — obvious, right?
There’s a problem with Plan A, but you don’t start to notice it until you’re a large corporate organization. Consider the situation where HR updates the company handbook and emails a copy to everybody in the organization. Hundreds of employees find an attachment in their inbox and save the attachment to their My Documents folder. In large organizations, though, that location isn’t local on the user’s hard drive, it’s up on a file server thanks to Folder Redirection. So now there are hundreds of redundant copies of the same 10 MB PDF on the file server that you manage. So what are you going to do about it, huh?
Here’s what you’re going to do — you’re going to turn on Data Deduplication. It’s built in to Windows Server 2012 and 2012 R2, and can be activated simply by installing the feature (it’s a Role Service underneath File and Storage Services), and then right-clicking a volume in Server Manager and choosing “Enable data deduplication”.
And it’s just that simple. Windows now starts watching that volume, looking for ways to optimize the use of storage on that volume and analyzing the file as a set of content-dependent chunks. On a server that contains the My Documents folder for our users, we can expect that maybe 30 percent of the data on those drives are redundant. So with a 1 terabyte hard drive used to capacity, after deduplication we might end up with 700 GB of used space, and another 300 MB available — for free! Those formerly redundant copies of the file are now provided by one copy of its content — or maybe by a couple of copies if it’s a heavily used file.
How about drives where you store the installation media for software your organization uses? Think about how much redundant content is in application install packages. Many of them are using identical installation software. Many of them carry redundant copies of commonly used DLLs that make up the underlying plumbing that makes that application work. Many apps will package whole copies of the .Net Framework, or the C++ runtime or other API content in the installation media for the application. Microsoft is seeing reductions in disk consumption in the area of 75 percent. So, that 1 TB drive we were using before? It’s now down to 250 GB used, with 750 GB free!
And we can do even better than that. How about storage for virtual machines? Each of those VMs has a virtual hard drive — a VHD file. That file contains the entire operating system used by a VM. If you’ve got a dozen servers running in VMs on top of a Windows Server 2012 Datacenter Server, you’ve got a lot of redundancy. Consider that each of those VMs contains a nearly identical copy of the Windows/System32 directory. That’s a couple of dozen gigabytes of totally redundant Microsoft code. How about we store just one of those sets of files, and not 12? Now we’re looking at savings in the neighborhood of 90 percent compared to the original size of the stored files.
Can you imagine asking for approval from the boss to buy more storage, and having this conversation:
Boss: You just asked me for another 10 TB of storage six months ago — are you sure you need this storage?
You: Well, we’ve stored 800 terabytes of data in those 10 terabytes of space you bought. We might be able to get all the way up to 900 terabytes if we push it, but I think it’s time to give us some more overhead.
Boss: Wait — you’ve stored 800 terabytes of data in a 10 tera … um — fine, go ahead and make the purchase. Will they accept our corporate credit card?
Microsoft’s solution has been recommended for drives that are 10 TB or smaller, but that all changes in Windows Server 2016, which will scale all the way up to a 64 TB volume. It provides great performance even on busy systems — dedup will do its heaviest work when the system is idle, and will back off when the system is busy doing “real work.” Users will transparently access their deduplicated data with no awareness that the technology is even in use — and they’ll wonder why the IT manager has such a big smile!
Microsoft Windows Server