Windows Server 2012 Deduplication is Amazing!

Saturday, February 16, 2013

The following article describes how to use Windows Server data deduplication on an Solid State Drive (SSD) that holds active Hyper-V virtual machines.

Coloring Outside the Lines Statement:
This configuration is not supported by Microsoft. See Plan to Deploy Data Deduplication for more information. Use these procedures at your own risk. That said, it works great for me. Your mileage may vary.

A while back I decided to add another 224GB SATA III SSD to my blistering Windows Server 2012 Hyper-V server for my active VMs. The performance is outstanding and it makes the server dead silent. I moved my primary always-on HyperV VM workloads to this new SSD:

Domain Controller on WS2012
Exchange 2010 multi-role server on WS2012
TMG server on WS2008 R2

These VMs took 134GB, or 60%, of the capacity of the drive which was fine at the time. Later, I added a multi-role Exchange 2013 server which took up another 60GB of space. That left me with only 13% free space, which didn't leave much room for VHD expansion and certainly not enough to host any other VMs. Rather than buy another larger and more expensive SSD, I decided to see how data deduplication performs in Windows Server 2012.

Add the Data Deduplication Feature

Data Deduplication is a feature of the File and Storage Services role in Windows Server 2012. It's not installed by default, so you need to install it using the Add Roles and Features Wizard (as above) or by using the following PowerShell commands:

PS C:\> Import-Module ServerManager
PS C:\> Add-WindowsFeature -Name FS-Data-Deduplication
PS C:\> Import-Module Deduplication

Next, you need to enable data deduplication on the volume. Use the File and Storage Services node of Server Manager and click Volumes. Then right-click the drive you want to configure for deduplication and select Configure Data Deduplication, as shown below:

Configuring Data Deduplication on Volume X:

So far, this is how you normally configure deduplication for a volume. You would normally configure deduplication to run on files older than X days, enable background optimization, and schedule throughput optimization to run on at specified days and times. It's pretty much a "set it and forget it" configuration.

From here on I'm going to customize deduplication for my Hyper-V SSD.

In the Configure Data Deduplication Settings for the SSD, select Enable data deduplication and configure it to deduplicate files older than 0 days. Click the Set Deduplication Schedule button and uncheck Enable background optimization, Enable throughput optimization, and Create a second schedule for throughput optimization.

Enable Data Deduplication for Files Older Than 0 Days

Disable Background Optimization and Throughput Optimization Schedules

Click OK twice to finish the configuration. What we've done is enabled data deduplication for all files on the volume, but deduplication will not run in real-time or on a schedule. Note that these deduplication schedule settings are global and affect all drives configured for deduplication on the server.

You can also configure these data deduplication settings from PowerShell using the following commands:

PS C:\> Enable-DedupVolume X:
PS C:\> Set-Dedupvolume X: -MinimumFileAgeDays 0
PS C:\> Set-DedupSchedule -Name "BackgroundOptimization", "ThroughputOptimization", "ThroughputOptimization-2" -Enabled $false

This configuration mitigates the reason why Microsoft does not support data deduplication on drives that host Hyper-V VMs. Mounted VMs are always open for writing and have a fairly large change rate.1 This is the reason Microsoft says, "Deduplication is not supported for files that are open and constantly changing for extended periods of time or that have high I/O requirements."

In order to deduplicate the files and recover substantial disk space you need to shutdown the VMs hosted on the volume and then run deduplication manually with this command:

PS C:\> Start-DedupJob –Volume X: –Type Optimization

This manual deduplication job can take some time to run depending on the amount of data and the speed of your drive. In my environment it took about 90 minutes to deduplicate a 224GB SATA III SSD that was 87% full. You can monitor the progress of the deduplication job at any time using the Get-DedupJob cmdlet. The cmdlet shows the percentage of progress, but does not return any output once the job finishes.

You can also monitor the job using Resource Monitor, as shown below:

Process Monitor During Deduplication

Here you can see that the Microsoft File Server Data Management Host process (fsdmhost.exe) is processing the X: volume. When the deduplication process completes, the X: volume queue length will return to 0.

Once deduplication completes you can restart your VMs, check the level of deduplication, and how much data has been recovered. From the File and Storage Services console, right-click the volume and select Properties:

Properties of Deduplicated SSD Volume

Here we can see that 256GB of raw data has been deduplicated to 61.5GB on this 224GB SSD disk - a savings of 75%!!! That leaves 162GB of raw disk storage free. I could easily create or move additional VMs to this disk and run the deduplication job again.

The drive above now actually holds more reconstituted data than the capacity of the drive itself with no noticeable degradation in performance. It currently hosts the following active Hyper-V VMs:

Domain Controller on WS2012
Exchange 2010 multi-role server on WS2012
TMG server on WS2008 R2
Exchange 2013 multi-role server on WS2012
Exchange 2013 CAS on WS2012
Exchange 2013 Mailbox Server on WS2012

Caveats:

Because real-time optimization is not being performed, the VMs will grow over time as changes are made and data is added. The manual deduplication job would need to be run as needed to recover space.
Since the SSD actually contains more raw duplicated data than the drive can hold, I'm unable to disable deduplication without moving some data off the volume first.
Even though more VMs can be added to this volume, you have to be sure that there is sufficient free space on the volume to perform deduplication.

For even more information about Windows Server 2012 data deduplication, I encourage your to read Step-by-Step: Reduce Storage Costs with Data Deduplication in Windows Server 2012!

I hope you find this article useful in your own deployments and I'm interested to know what your experience is. Please leave a comment below!

15 comments:

Andrew HigginbothamFebruary 16, 2013 at 7:58 PM
Cool proof of concept. Hopefully customers will read your disclaimer at the top. I'd be interested in seeing what performance impact, if any, this has on an Exchange or SQL database afterwards.
AnonymousFebruary 18, 2013 at 11:40 AM
Thanks
I saw similar post a few months ago, still very helpful article
too bad I use VMware workstation:((for testing)
AnonymousMarch 14, 2013 at 8:39 PM
Actually you don't need to explicitly use import-module in Server 2012. It will load that module automatically once you and use that cmdlet.
Aman AyazApril 1, 2013 at 9:27 AM
Great work Jeff!
Dino CaputoApril 8, 2013 at 10:10 AM
I have 2 SSD's setup in a JBOD for my VM storage. I wonder if I can setup deduplication against a volume spanned over 2 drives? Thoughts?
AnonymousApril 10, 2013 at 8:50 AM
Hello Dear,

I have some questions:

My file server is Windows server 2008 R2;
My backup server is in Windows Server 2008 R2;
Use backup server in the Windows robocopy to bring information from File Server and then save LTO tape with Arcserve Backup.

If I upgrade the server to Windows Server Backup 2012 and deploy

Deduplication, the night when the robocopy performs incremental backup of

files to the File Server Backup Server, I have problems with

deduplication?

Files copied to the backup server will return to normal size? or have the

size as the deduplication configured?

What better scenario in my case? install windows server 2012 on fle server and enable deduplication?

Thank you.

Att

Andre Santos
UnknownApril 17, 2013 at 3:14 AM
I may have missed this in your example but I had to turn off dedup (uncheck the box) before I could start up my VMs. Did you run into that?

Thank you for your comment! It is my hope that you find the information here useful. Let others know if this post helped you out, or if you have a comment or further information.

Pages

Windows Server 2012 Deduplication is Amazing!

15 comments: