Windows Server 2012 Deduplication is Amazing!

Saturday, February 16, 2013
The following article describes how to use Windows Server data deduplication on an Solid State Drive (SSD) that holds active Hyper-V virtual machines.

Coloring Outside the Lines Statement:
This configuration is not supported by Microsoft.  See Plan to Deploy Data Deduplication for more information.  Use these procedures at your own risk. That said, it works great for me.  Your mileage may vary.

A while back I decided to add another 224GB SATA III SSD to my blistering Windows Server 2012 Hyper-V server for my active VMs.  The performance is outstanding and it makes the server dead silent.  I moved my primary always-on HyperV VM workloads to this new SSD:
  • Domain Controller on WS2012
  • Exchange 2010 multi-role server on WS2012
  • TMG server on WS2008 R2
These VMs took 134GB, or 60%, of the capacity of the drive which was fine at the time.  Later, I added a multi-role Exchange 2013 server which took up another 60GB of space.  That left me with only 13% free space, which didn't leave much room for VHD expansion and certainly not enough to host any other VMs.  Rather than buy another larger and more expensive SSD, I decided to see how data deduplication performs in Windows Server 2012.

Add the Data Deduplication Feature
Data Deduplication is a feature of the File and Storage Services role in Windows Server 2012.  It's not installed by default, so you need to install it using the Add Roles and Features Wizard (as above) or by using the following PowerShell commands:

PS C:\> Import-Module ServerManager
PS C:\> Add-WindowsFeature -Name FS-Data-Deduplication
PS C:\> Import-Module Deduplication

Next, you need to enable data deduplication on the volume.  Use the File and Storage Services node of Server Manager and click Volumes.  Then right-click the drive you want to configure for deduplication and select Configure Data Deduplication, as shown below:

Configuring Data Deduplication on Volume X:
So far, this is how you normally configure deduplication for a volume.  You would normally configure deduplication to run on files older than X days, enable background optimization, and schedule throughput optimization to run on at specified days and times.  It's pretty much a "set it and forget it" configuration.

From here on I'm going to customize deduplication for my Hyper-V SSD.

In the Configure Data Deduplication Settings for the SSD, select Enable data deduplication and configure it to deduplicate files older than 0 days. Click the Set Deduplication Schedule button and uncheck Enable background optimization, Enable throughput optimization, and Create a second schedule for throughput optimization.

Enable Data Deduplication for Files Older Than 0 Days

Disable Background Optimization and Throughput Optimization Schedules
Click OK twice to finish the configuration.  What we've done is enabled data deduplication for all files on the volume, but deduplication will not run in real-time or on a schedule.  Note that these deduplication schedule settings are global and affect all drives configured for deduplication on the server.

You can also configure these data deduplication settings from PowerShell using the following commands:
PS C:\> Enable-DedupVolume X:
PS C:\> Set-Dedupvolume X: -MinimumFileAgeDays 0
PS C:\> Set-DedupSchedule -Name "BackgroundOptimization", "ThroughputOptimization", "ThroughputOptimization-2" -Enabled $false
This configuration mitigates the reason why Microsoft does not support data deduplication on drives that host Hyper-V VMs.  Mounted VMs are always open for writing and have a fairly large change rate.1  This is the reason Microsoft says, "Deduplication is not supported for files that are open and constantly changing for extended periods of time or that have high I/O requirements."

In order to deduplicate the files and recover substantial disk space you need to shutdown the VMs hosted on the volume and then run deduplication manually with this command:
PS C:\> Start-DedupJob –Volume X: –Type Optimization
This manual deduplication job can take some time to run depending on the amount of data and the speed of your drive.  In my environment it took about 90 minutes to deduplicate a 224GB SATA III SSD that was 87% full.  You can monitor the progress of the deduplication job at any time using the Get-DedupJob cmdlet.  The cmdlet shows the percentage of progress, but does not return any output once the job finishes.

You can also monitor the job using Resource Monitor, as shown below:

Process Monitor During Deduplication
Here you can see that the Microsoft File Server Data Management Host process (fsdmhost.exe) is processing the X: volume.  When the deduplication process completes, the X: volume queue length will return to 0.

Once deduplication completes you can restart your VMs, check the level of deduplication, and how much data has been recovered.  From the File and Storage Services console, right-click the volume and select Properties:

Properties of Deduplicated SSD Volume
Here we can see that 256GB of raw data has been deduplicated to 61.5GB on this 224GB SSD disk - a savings of 75%!!!  That leaves 162GB of raw disk storage free.  I could easily create or move additional VMs to this disk and run the deduplication job again.

The drive above now actually holds more reconstituted data than the capacity of the drive itself with no noticeable degradation in performance.  It currently hosts the following active Hyper-V VMs:

  • Domain Controller on WS2012
  • Exchange 2010 multi-role server on WS2012
  • TMG server on WS2008 R2
  • Exchange 2013 multi-role server on WS2012
  • Exchange 2013 CAS on WS2012
  • Exchange 2013 Mailbox Server on WS2012
Caveats:
  • Because real-time optimization is not being performed, the VMs will grow over time as changes are made and data is added. The manual deduplication job would need to be run as needed to recover space.
  • Since the SSD actually contains more raw duplicated data than the drive can hold, I'm unable to disable deduplication without moving some data off the volume first.
  • Even though more VMs can be added to this volume, you have to be sure that there is sufficient free space on the volume to perform deduplication.
For even more information about Windows Server 2012 data deduplication, I encourage your to read Step-by-Step: Reduce Storage Costs with Data Deduplication in Windows Server 2012!

I hope you find this article useful in your own deployments and I'm interested to know what your experience is.  Please leave a comment below!

15 comments:

  1. Cool proof of concept. Hopefully customers will read your disclaimer at the top. I'd be interested in seeing what performance impact, if any, this has on an Exchange or SQL database afterwards.

    ReplyDelete
    Replies
    1. Hi Andrew,

      I've been running all my VMs on this deduplicated SSD volume for a month before I wrote this article. There has been ZERO perceptible performance impact. The only downside is having to take all the VMs offline to run the deduplication job.

      Delete
  2. Thanks
    I saw similar post a few months ago, still very helpful article
    too bad I use VMware workstation:((for testing)

    ReplyDelete
    Replies
    1. forgot to mention, because I like VMware workstation feature for testing and cant use DE duplication
      I use clones, some kind of other cool alternative(especially if all or most base machines are 2012)that ssd provide the benefit of better performance when using these clones.

      Delete
  3. Actually you don't need to explicitly use import-module in Server 2012. It will load that module automatically once you and use that cmdlet.

    ReplyDelete
  4. Great work Jeff!

    ReplyDelete
  5. I have 2 SSD's setup in a JBOD for my VM storage. I wonder if I can setup deduplication against a volume spanned over 2 drives? Thoughts?

    ReplyDelete
    Replies
    1. I imagine it should work. Please post back and let the community know!

      Delete
  6. Hello Dear,

    I have some questions:

    My file server is Windows server 2008 R2;
    My backup server is in Windows Server 2008 R2;
    Use backup server in the Windows robocopy to bring information from File Server and then save LTO tape with Arcserve Backup.

    If I upgrade the server to Windows Server Backup 2012 and deploy

    Deduplication, the night when the robocopy performs incremental backup of

    files to the File Server Backup Server, I have problems with

    deduplication?

    Files copied to the backup server will return to normal size? or have the

    size as the deduplication configured?

    What better scenario in my case? install windows server 2012 on fle server and enable deduplication?

    Thank you.

    Att

    Andre Santos

    ReplyDelete
    Replies
    1. If I understand your question, you want to know if you upgrade your file server to 2012 and enable deduplication, what happens to files that are copied to a Windows Server 2008 R2 server for backup?

      The files will be copied as their original size (for example, a 100GB file stored as 50GB on 2012 will be 100GB on the target server).

      Delete
    2. Thanks for the feedback,

      My doubt was that unlike the case mentioned, since the files are on the file server Windows 2008 R2 and will be stored on Windows 2012 with deduplication.

      The deduplication creates the references to decrease the space, as in the above Windows 2008 R2 does not use deduplication probably the same copy on top of the references created in windows 2012 and return to normal size?

      eg

      The File Server (2008R2 Win.) has a 1GB folder;
      The Backup Server (Win. 2012) via Robocopy copies daily 1GB folder, the folder with deduplication reduces the size to 100MB;

      The next day, the process occurs again, does robocopy copy the whole folder of 1GB? or robocop will understand that they are the same files.
      I do not know if robocopy understands that the 1GB folder from the File Server is the same 100MB in backup Server.

      Att.

      Andre Santos

      Delete
    3. So you're scenario is different. You want to use WS2012 dedup on the backup server. The WS2012 backup server can use dedup to reduce the files copied to it. If it copies 1GB of data from WS2008R2 it will copy the full 1GB and store it as deduped data.

      By default, robocopy only copies changed files. For example, if the 1GB folder contains 3 new or changed files since the last time it was run robocopy will only copy those files (in their original size). Dedup will run against that new data.

      Delete
    4. Understood.

      I'll testing here and soon became as inform.

      Thanks Jeff.

      Delete
    5. Hello Jeff okay?

      Came a new question about deduplication in this scenario that I told you.

      In the concept of deduplication, it stores the information in SVI (System Volume Information). How do I copy the files from the file server by robocopy also need to copy this directory?

      Tks.

      Andre Santos

      Delete
  7. I may have missed this in your example but I had to turn off dedup (uncheck the box) before I could start up my VMs. Did you run into that?

    ReplyDelete

Thank you for your comment! It is my hope that you find the information here useful. Let others know if this post helped you out, or if you have a comment or further information.