New Ideas for Old Information

Software/service providers are a natural fit for a growing archival headache; for many organizations, the cloud looks like the best answer

Jim Ericson writes:

We are barely a generation past the day when a stack of ledgers and cabinets of contracts and correspondence best represented the paper trail of a corporate history.

The business information record of today looks much different than it did 20 years ago, but is no less bulky. The fruit of the Information Age is also a creaking attic of accumulation full of diverse records created by more people and systems than ever. From microfiche to punch cards to tape drives to hard disks to solid state, we have counted on technology to outpace our need to retain everything, even as those resources inevitably become strained and call for new solutions.

The largest businesses still manage their own archives, and there is no shortage of archival disk and tape drives in corporate back offices. Yet more organizations are colocating storage in data centers or calling on providers who bring software and private dedicated storage to assume the records chore for overtaxed IT departments. These services are not all equal and don’t always come with significant cost savings. While the trend of lower costs for storage devices may offset some of the growth of information, the human expense of managing archives is not going down.

Now, cloud computing can be added to the list of alternatives to do-it-yourself corporate record libraries. Through software/service vendors and their partnerships, cloud archival services have come to market with lower price points and “limitless” infrastructure, security, backup and analysis performance that would otherwise be prohibitively expensive for small and midsized companies to deliver internally. While this market is immature and evolving, it is marching upstream with a paradigm that is likely to shift the long-term archival landscape once again.

Asset and Burden

The duality of information is that it is an asset in its existence and a liability in its maintenance, and the cost tradeoff has always been between time to access and persistence. Storage strategies already tier information across a lifecycle of access and assign seldom-used records to cheaper, slower storage. But with today’s regulatory rules and legal compliance for e-discovery, time to access is now important for more and more corporate information and is abetted by search, auditing and analysis to speed retrieval of records.

While any form of content is subject to scrutiny, email has glutted the corporate record more than any content to date. Petabytes worth of email records have landed at individual companies over a remarkably short span of time, and initial management concerns were more about operational continuity than e-discovery. “People started archiving email a few years ago but they weren’t doing it for compliance, they did it because their email servers would fail on some irregular basis,” says Alan Pelz-Sharpe, an analyst at CMS Watch.

As the digital tail grew, businesses struggled with how much information to keep for how long, well before legal mandates made some archiving decisions academic. But in certain industries, the uptime of the archive is now more important to administrators than the uptime of in-house mail servers. “If you are a broker/dealer and you are required to archive email and instant messaging, if your archive goes down but email and IM are still running, you’ve got content coming and going that’s not being archived,” says Michael Osterman of Osterman Research. “That kind of downtime is no fun in the course of a SEC audit.”

The risk associated with email and other forms of archiving has crossed the radar of more midsized as well as large businesses. A 2009 Osterman Research survey of IT decision-makers at different-sized organizations found that 46 percent have a need to handle routine e-discovery requests; 42 percent have end users that need to recover their own missing files; and 34 percent have a need to extract old electronic email and other content for regulatory audits.

Own or Offload

All the attention to archives has been good news to the storage, software as a service and hosting industries. The same Osterman study found that more than half of those surveyed had been ordered by a court or regulator to produce employee email or instant messaging and that 57 percent of midsized and large organizations indicated a need to manage email server storage more effectively by offloading it to less-expensive storage.

Competitors in the archive hosting arena come in all sizes and include HP, Proofpoint, Sun, EMC, Iron Mountain and a host of smaller vendors and startups. Competitive pressure and lower storage device costs have brought hosting prices down generally, depending on the services offered, but ongoing management issues and growing volumes of information have led many hosted vendors to develop their own cloud infrastructure models alongside in-house or offsite data centers.

The ubiquity of broadband connections that allow quicker Web access to cloud-based archives and the emergence of prominent cloud services, notably Amazon Web Services (AWS), have also brought early legitimacy to the cloud model. In a September Gartner Research paper, analyst Adam Couture identified large vendors specifically focused on building proprietary and/or private clouds, including Google, Microsoft and Dell, to name a few.

These and many other vendors, Couture says, are offering hybrid models consisting of on-premise and cloud archiving products.

Still other startups, he notes, such as Clearpace, Moonwalk and Sonian are offering their own software as a service linked directly to the cloud, but assigning infrastructure duties to AWS or other providers in a partnership model.

Multi-Vendor Models

Sonian may have been the first vendor to arise directly from the cloud partnership model and in August secured an additional $5.6 million round of funding based on early success. Greg Arnette, the company’s founder and CTO, says a straight-to-cloud business model and architecture is a natural fit for archival management – with the additional benefit of quick scaling and unlimited CPU access. “One advantage of cloud computing infrastructure is the ability to pick and choose the requirements for a system that meets the requirements for our audience. As a subscription service, our job is to please customers with one-off expectations.”

Sonian’s SA2 archive stack runs on top of AWS and serves a target audience of midsized organizations with 500 to 5,000 employees in public, government, health care and professional services. Email is the most popular archive service, though the company also hosts documents, messaging, social media, wikis and other content.

Cuyahoga County, Ohio, is a Sonian client, where Joe Hernandez works as a security analyst in the county’s Information Services Center, which serves between 3,500 and 4,500 employees at public-facing agencies. The county presently stores and accesses about three terabytes of email on Sonian SA2. Hernandez says the cost of storage and onsite management led the county to look for a cheaper solution, but one that could be trusted.

“The first thing we wanted was integrity of the data being archived offsite for litigation, investigative purposes or incident response so we were sure we were managing our legal liability,” Hernandez says. “The other thing we came across in our diligence was the cost-effectiveness of cloud computing, the guaranteed backup, the ease of archive requests and the support, which in my view has been excellent.”

Sonian offers services on a flat fee or schedule, and because they were on the government purchasing schedule, came to Cuyahoga County at what Hernandez says was a discount. In fairness, he says he’s not sure how the county’s pricing and cost savings compares to public customers, but he’s currently exploring Sonian and other hosted services for ways he might offload other IT duties that have become a burden on his department.

A prominent value-add for Sonian customers is reporting and analysis services, which arise from another vendor partnership with Vertica, a provider of columnar-based database software. Vertica’s deployment options include AWS, where query and reporting deliver fast search and standard reporting to Sonian customers through a Web interface or an exposed application programming interface, where advanced users can build their own reports and views.

“You see this attraction of companies that get the cloud, and Vertica was there with this really high-performance database,” says Arnette. “So when we work with Amazon, Vertica and others, the model supports more than just managing data. Customers want to mine and unlock the dark data that comes from analyzing communications from different perspectives, everything from social graphing to themes and project experts.”

Customers don’t see Vertica branding in the Sonian Web interface, but they get the benefits at the interface in the form of exports, PDFs and reports Hernandez calls “pretty remarkable.”

Service level agreements are guaranteed directly by Sonian and promise service levels with 99.99 percent availability and even greater information durability.

For its part, Amazon Web Services doesn’t comment specifically on partnerships but surely benefits from startups leveraging AWS to sell software and services. AWS spokesperson Kay Kinton will say that the company is seeing fast-changing demands and listening carefully to customers. “The bulk of what we’ve released today in terms of additional features and services came directly from feedback from our developer customers,” Kinton says.

Good Enough for Now

While the small and midsized path to hosted cloud archiving may be paved, in-house and dedicated hosted archiving still has a respectable future, even as large businesses are likely to be experimenting with their own “private” internal clouds.

Nonetheless, there’s no clear impetus yet for very large businesses to hand the task to cloud-based providers as long as archiving has internal scale with many advocates and vested careers. “Small businesses, yes, but larger enterprises are so ingrained and embedded that it’s going to take a long time to shift the psychology,” says Pelz-Sharpe. “Cloud computing is still scary to a lot of people.”

There’s also no guarantee that cloud-based solutions are ready to meet the rigors large companies have established for internal use. “Cloud vendors using Web interfaces will be trying to shave milliseconds here and there with the software they are building,” says Michael Coté, an analyst at RedMonk. “With on-premise apps by comparison, you can be a lot more sloppy with the performance of your application; it doesn’t factor as much on your network.”

When it comes to information security, AWS offers a kind of partitioning that can isolate silos of customer data, but it’s not yet clear how that compares to contemporary definitions of compliance.

“The concern people always have with the cloud business model is that a lot of users are going to be accessing something concurrently or accessing your service at the same time,” Coté says. “It’s still a little fuzzy in how that is being managed and explained.”

The same can be said for regulators who write mandates, place the burden of timely compliance on the respondent and later deal with companies at different stages of the archiving technology curve. From the enforcement side, the definition of what constitutes compliance will not be fully clear until judges and regulators have a better track record and case history to dictate what is expected and reasonable.

Cleaning House

In the interim, companies can do a much better job of deciding what they archive and why. If more racks and more labor to maintain archives is not a sustainable model, neither is the practice of archiving all the content that exists. Even with a data lifecycle, organizations are surely going to be storing more data, not less.

Arnette is hopeful that even the (relatively) slow pipes of the broadband Internet will keep up with information archiving and prove that a Web portal is a more efficient use of bandwidth for managing data than direct online storage. “Right now it makes more sense to me than the data center where you’re reading and writing the same file 10 times over,” he figures. “The management picture is going to sort out over time.”

To Pelz-Sharpe, the more important point is that as people confront the growing glut, they are becoming savvy to the fact that most of what they store is redundant or duplicate. “Storing junk is not a regulation,” says the analyst. “It’s only now dawning on IT that they can reduce archive volumes by 80 percent by housekeeping. Any good records manager will tell you the art of archiving is getting rid of stuff.”

Unlike building an addition, maintaining and cleaning house is seldom a welcome exercise, and for business, the attic awaits.


About Andy Painter

A passionate Information and Data Architect with experience of the financial services industry, Andy’s background spans pharmaceuticals, publishing, e-commerce, retail banking and insurance, but always with a focus on data. One of Andy’s principle philosophies is that data is a key business asset.
This entry was posted in data management, Data Storage, Information Management, SaaS and tagged , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s