If you work in IT and are responsible for backing up large amounts of data, you’ve probably heard the term data deduplication. What exactly does it mean, and why does it matter?

The purpose of data deduplication is to eliminate redundant data. In the deduplication process, extra copies of the same data is deleted, leaving only one copy to be stored. Data analysis identifies duplicate byte patterns to ensure the single instance is indeed the single file. The duplicates are replaced with a reference that points to the stored chunk.

Given that the same byte pattern may occur dozens, hundreds, or even thousands of times — think about the number of times you make only small changes to a PowerPoint file or share another important business asset — the amount of duplicate data can be significant. In some companies, 80% of corporate data is duplicated across the organization. Reducing the amount of data to transmit across the network can save significant money in terms of storage costs and backup speed — in some cases, up to 90%!

As an example, imagine an email server that contains 100 instances of the same 1 MB file attachment, perhaps a product planning chart that was sent to everyone on the sales staff. Without data duplication, if everyone backs up his email inbox, all 100 instances of the product planning chart are saved, requiring 100 MB storage space. With data deduplication, only one instance of the attachment is actually stored; each subsequent instance is just referenced back to the one saved copy, reducing storage and bandwidth demand to only 1 MB.

Even more meaningful than the storage requirements: In environments with limited bandwidth, deduplication means the data has a chance to be backed up!

Want a more in-depth explanation of de-duplication? Read Understanding Data Deduplication and 3 Ways to Dedupe your Duplicated Duplicates for the architectural details.

This is a key technology for Druva inSync. Druva’s approach has four unique attributes:

  • It is performed on the client (versus the server), thereby reducing the amount of data needed to be shipped over the network.
  • The analysis is done at the sub-file or block-level to find duplicate data within a file.
  • It is aware of the applications from which data is generated. That is, Druva inSync look insides files such as an Outlook email file leveraging MAPI, to find duplicate data in email attachments.
  • Druva’s deduplication scales beyond a single user to find duplicate data (say, an email sent to an entire organization) across multiple users and devices.

Learn more about data deduplication in the technology brief below.