

However, large portions of the huge data volume contain massive redundancies that are created by users, applications, systems, and communication models. This data explosion poses significant burden not only on data storage space but also access latency, manageability, and processing and network bandwidth.

We are encountering an explosion of data volume, as a study estimates that data will amount to 40 zeta bytes by the end of 2020.

Includes data deduplication methods for a wide variety of applications Includes concepts and implementation strategies that will help the reader to use the suggested methods Provides a robust set of methods that will help readers to appropriately and judiciously use the suitable methods for their applications Focuses on reduced storage, backup, recovery, and reliability, which are the most important aspects of implementing data deduplication approaches Includes case studies The book also includes future research directions, case studies, and real-world applications of data deduplication, focusing on reduced storage, backup, recovery, and reliability. Data Deduplication Approaches provides readers with an overview of the concepts and background of data deduplication approaches, then proceeds to demonstrate in technical detail the strategies and challenges of real-time implementations of handling big data, data science, data backup, and recovery. Due to ever-increasing data duplication, its deduplication has become an especially useful field of research for storage environments, in particular persistent data storage. Data Deduplication Approaches: Concepts, Strategies, and Challenges shows readers the various methods that can be used to eliminate multiple copies of the same files as well as duplicated segments or chunks of data within the associated files. Duplicated data or redundant data is a main challenge in the field of data science research. In the age of data science, the rapidly increasing amount of data is a major concern in numerous applications of computing operations and data storage. Chapter 6 introduces a mobile de-duplication technique with image (JPEG) or video (MPEG) considering performance and overhead of encryption algorithm for security on mobile device. Chapter 5 displays a network de-duplication where redundant data packets sent by clients are encoded (shrunk to small-sized payload) and decoded (restored to original size payload) in routers or switches on the way to remote servers through network. Chapter 4 develops a de-duplication technique applied for cloud-storage service where unit data to be compared are not physical-format but logical structured-format, reducing processing time efficiently. Chapter 3 considers redundancies in email servers and a de-duplication technique to increase reduction performance with low overhead by switching chunk-based de-duplication and file-based de-duplication. It classifies existing de-duplication techniques depending on size of unit data to be compared, the place of de-duplication, and the time of de-duplication. It explains places where duplicate data are originated, and provides solutions that remove the duplicate data. It describes novel emerging de-duplication techniques that remove duplicate data both in storage and network in an efficient and effective manner. This book introduces fundamentals and trade-offs of data de-duplication techniques.
