Storage Efficiency: Multi-Layering of Deduplication
Thursday, 06 August 2009 by Michel Roth
Continuing with the theme of Storage Efficiency, this article talks about some of the unique things that NetApp does with deduplication.  
As we're all aware, the amount of data being created today is growing exponentially as new devices (iPhone, Blackberry) and applications (Social Media) encourage users to create more digital bits. Analytics are also contributing a greater piece of our ability to conduct business and predict future trends.  

So IT managers are now facing three issues:
  1. The amount of data being created is growing faster than the incremental increases in disk size.  So more disks are constantly needed to keep up with data growth.
  2. Data storage growth requires additional rackspace, power and cooling.  None of these are falling in price in most markets.  
  3. Companies are deriving more and more value from all of this data, so discouraging users, customers or applications from generating it is probably not recommended. 
So how is an IT manager supposed to deal with these conflicting challenges?  For those IT managers that use NetApp storage, one of the most powerful tools they have in their toolbox is Deduplication.  

Why is Deduplication so powerful?  First of all, it addresses item #1, which is that often times data is replicated (with limited changes) into multiple forms.  NetApp deduplication is able to remove all the replicated/redundant data from your primary storage arrays, immediately savings disk-space.  Typically we see this form of deduplication save at least 50-70%.  And the value of disk-deduplication extends beyond just the primary storage.  When storage is replicated via SnapMirror to backup or disaster-recovery sites, those deduplication savings are extended to the remote site.  Not only does it save on WAN costs, but it saves on disk costs at the remote site.

But we still haven't addressed item #2, which is that more data typically requires more storage, either because of space or performance requirements. This is where deduplication can once again be a powerful tool in the toolbox.  This time it's in the form of dedup-aware Intelligent Caching. By adding a NetApp Performance Acceleration Module (PAM), the storage arrays are able to significantly increase performance by offloading the need to constantly access disk for frequent read/writes.  So this intelligence cache reduces the amount of disk required without sacrificing performance to the application. Taking this a step farther, the NetApp PAM module is aware of deduplication, so it is able to intelligently reduce the amount of redundant data it keeps in cache.  This allows more frequently-accessed information to populate the PAM, again assisting in the reduction of storage needed to provide the required performance.  

Related Items:

FlexClones or Deduplication? (6 July 2009)
NetApp Data Motion (31 August 2009)
Demo of Quest vWorkspace With NetApp Storage Integration (18 September 2009)
Hyper-V and NetApp Storage Videos (17 June 2009)
Quest Software Virtualization Group Speeds VDI Adoption With New Edition of Quest vWorkspace (9 September 2009)
Sneak Preview - NetApp RCU 3.0 (7 December 2009)
Thin Provisioning In a VMware-NetApp Environment Part I and II (2 November 2009)
Solid State Disk will change the storage world (2 November 2009)
Get Thin Provisioning working for you in vSphere (26 October 2009)
Free tool: CTXCOMMAP (8 April 2008)
Comments (2)
written by Peter Smails, August 06, 2009
Hi Michael...one other technology admins are using is real-time compression. It reduces the size of every file created up to 10x before it's even written to disk. There's no degradation in performance and admins can still deupe compressed data. The combination of real-time compression and NetApp dedupe creates huges savings throughout the data lifecycle. For more, check out www.storwize.com.

Thanks...Peter Smails
written by M.Roth, August 06, 2009
Hi Peter,

Thanks for the reply. What puzzles me is how one can compress without performance degradation? No CPU overhead? Or are you just saying it is small :-) ?