Intelligent Data Management and the lifecycle of information

By Satyen Vyas, Director Advanced Systems Group, Small & Medium Business (SMB) Dell India

We are living in the age of ‘information overload’. Companies are relying heavily on enterprise planning systems like ERP, SCM and CRM to automate and manage their resources. These systems generate and host a vast amount of data which is well structured in its own way to fulfill a specific need. Apart from this structured data, enterprise also store massive volumes of unstructured data in the form of e-mail, IM, documents and images. The structured and unstructured information is required to be stored and retained within these systems for various strategic, business and regulatory requirements.

As information accumulates in the production servers, the performance of the systems deteriorate; storage need grows; disaster recovery, backups and upgrades take longer—leading to extended business outages. Today’s dynamic enterprises cannot afford these outrages. No wonder Charlie Gary from Meta group says, “Data is growing at 125 percent per year. In a typical enterprise, up to 80 percent of this data is inactive and yet remains in production systems to cripple performance.” This is what I call the data tsunami and it can be avoided by intelligent data management.

Enter Intelligent Data Management (IDM)

Information by itself is considered a resource and enterprises need to plan effectively and put the right combination of strategy, software and hardware tools in place to avoid a data tsunami. Apart from the new data that is generated every day, the strict data retention policies and legal regulations to retain transactional data over long periods is fuelling data growth. These ever-increasing volumes of inactive data which is retained for compliance affect the application performance, limit data access and strain storage infrastructure. This has resulted in the increased complexities in mission-critical IT environment and is a growing concern among businesses. This is where the increasingly popular concept of intelligent data management (IDM) comes into picture.

IDM helps company’s strategies on how to manage data through its lifecycle—from the time the data is generated/ captured to the time it is deleted from the systems.

The value of information keeps changing with time, processes, business and regulatory needs. This in turn affects the probability of usage of data. Data reuse and data deduplication has been one of the key metrics of IDM which helps strategies the storing of data on different tiers to cost-effectively optimize the storage infrastructure and enhance performance. A well-planned IDM strategy will allow the enterprise to retain all the reporting and access capabilities as if the data were lying on the same server.

Analysts have been scouting through experiences to come out with the best practices that would guide companies through the changing times of IDM. The experiences of various organizations clearly demonstrate clear best practices.

Classifying data

The importance of data retention policies is of key significance. The data value otherwise called data classification forms the foundation for a successful and efficient information management. The data retention policies need to have a buy-in from all the entities which own or use the data. Classification of data, which helps organize the data onto different tiers is probably the most important step for IDM.

Choosing the right storage tier

In a recent conference in California, Data Base Administrators complained that their senior management was misinterpreting the hierarchical storage management (HSM) and was looking forward to totally removing Tier 1 (production tier) from their IT environment. But, the Tier 2 storage could not handle data request of any real-time production environment. It was only for the data which was rarely accessed. Tiring the data should be for eliminating the unnecessary load on the production servers, improving performance and achieving optimized storage utilization.

Data De-Duplication

Deduplication is a storage-optimization technology that reduces the data footprint by eliminating multiple copies of redundant data and storing only unique data. Copies of the redundant data are replaced by references to the original data, sometimes called “pointers”. Deduplication of redundant data can occur at the file, sub-file, or block level. At the sub-file and block levels, data commonly is divided into smaller segments which can be more easily analyzed for possible redundancies, as compared to using file level data, and more efficiently stored.

Deduplication can occur in primary storage, such as file sharing devices (NAS) like the Dell NF and NX products. For example, the NX4 can help to reduce footprints for large File workloads with redundant or static data. However, secondary storage (i.e. backup data) with its vast amounts of redundant data, is currently receiving the majority of industry focus, such as in backup to disk (B2D) implementations. B2D is especially attractive because of the nature of backup data. Typically the bulk of the data in an organization has not changed considerably from the last backup job, so storing copy after copy of the same file or data can unnecessarily consume resources – storage capacity, power, cooling and management. The information to the right provides more details on deduplication technologies and implementations.

Restoring data

Businesses need to expect the unexpected and be prepared for any eventuality. The archived data is always in ‘Read only’ mode for compliance reasons. The software which enables the company to archive the data needs to allow for de-archiving the data into the production database without losing data integrity. This is necessary in case of editing requirements of the archived data (e.g. product recall).

Data security and compliance

The need for setting apt user and management level access privileges for data increases as we classify the data into various tiers based on its value. Only required users need to be given access to production, archive, or both depending on their responsibilities. Also, sensitive data (e.g. financial, health data) needs to be protected in production, archive and non-production environments (testing, development, and outsourcing).

One of the key drivers for IDM is compliance. Various regulatory bodies across the world have been coming out with their own version of governing data retention. For today’s global companies, archiving software should allow for incorporating any number of regulations without overriding the other and help achieve compliance

Data integrity

ILM requires that data of any value be available for immediate access for reporting and compliance purposes. A few regulatory bodies also require all the tiered data—say production and archived data to be accessed through the same application which created the data. This online seamless availability of data can be achieved only if data integrity and referential integrity are maintained during the hierarchical staging of data.

Many vendors are attacking the archive market from a packaged application perspective (e.g. Oracle Applications, PeopleSoft, and SAP). But most companies will have a need to archive more than a single application; for this reason, users should evaluate the scope of packaged solutions. What companies need is a comprehensive enterprise archiving solution which covers both structured data as in packaged application like Oracle Apps and unstructured data like e-mail, IM, and documents.


About Andy Painter

A passionate Information and Data Architect with experience of the financial services industry, Andy’s background spans pharmaceuticals, publishing, e-commerce, retail banking and insurance, but always with a focus on data. One of Andy’s principle philosophies is that data is a key business asset.
This entry was posted in big data, Data 2.0, data management, Data Storage, Information Management. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s