DFW UNIX Users Group
SearchWiki:
Recent Changes Printable View Page History Edit Page
Content Last Modified on August 07, 2005, at 08:39 AM CST

Return to ILM Table of Contents

  
NETWORK WORLD NEWSLETTER: 11/11/04

Today's focus:  The half-life of data

By Mike Karp

It is often tempting to assign terms from physics to areas where
they don't really belong.  For example, occasionally we hear the
term "data half life," which should indicate a predictable decay
rate that goes on forever for the value of data.

If the value of data truly had a half-life however, data would
decrease in value at a steady rate until such point that it has
no value at all (actually of course, like one of Xeno's
paradoxes such constant "halving" can never reach a zero-point,
but the principle works).  With some data, particularly data
that must conform to regulatory requirements, that just doesn't
happen. Regulated data must often be capable of being recalled
within a set amount of time, must be auditable, and so forth.
Old data may never be accessed, but it must be capable (and
provably so) of being accessed whenever the need arises.

That being said, the value of most data changes over time, and
despite the fact that not all data changes at the same rate we
can make several useful generalizations about these changes.
This occurs because the value attached to certain kinds - or
classes - of data changes in predictable ways.

Consider, for example, the case of a company's sales data. The
current month's sales figures, because they define business
success, merchandise ordering, commissions, accounts receivable
and so forth, are always going to be high-value data by just
about anybody's standards.

As this data ages however, it typically diminishes in value.
When this information becomes last month's sales figures, it may
find use in month-over-month comparisons; it may feed into the
general ledger and accounts payable, as merchandise has to be
ordered to fulfill orders.  As the quarters roll past, the value
of a month's sales data typically diminishes further and users
refer to it with lessening frequency.  Beyond a year, it may be
of interest only to internal accountants and outside auditors.

A similar sort of pattern of changing data value can be
identified for most data that lies outside a database (often
referred to as "unstructured" data). Unfortunately, however,
unstructured data by its very nature is often hard to manage
because it is challenging to categorize. Most times it is simply
tagged by file type, dealt with by file type, and there,
essentially, is where management ends.

Obviously if the value of data lessens over time, data that was
once deserving of high priority services and the very best
storage assets available logically becomes less deserving as the
data ages. Any time budgets are constrained (and when are they
not?) maximizing the use of what assets you do have becomes all
the more important.

This understanding of the changing state for data value is the
fundamental reason many IT sites are going in the direction of
information lifecycle management (ILM).

ILM is still a developing concept, but there is already much to
be said for it (and indeed, in this column much already has been
said about it).Unfortunately, many people still assume that ILM
is hierarchical storage management (HSM) warmed over for the new
millennium.  In this they are dead wrong.

HSM works fine when the concept of data half-life is applicable.
After all, its only concern is demoting data so that less
valuable data doesn't overflow the capacity of our high-end
storage systems. But HSM misses the boat when it comes to
retrieving data quickly and efficiently. Be assured your
auditors aren't going to be impressed that you have
cost-effectively archived data to tape; what they will want is
to see if you can retrieve and protect your data according to
whichever laws or regulations apply.

This calls for a far more complex management scheme than simply
moving data down the storage food chain to cheaper devices. For
compliance you will require some policy-driven mechanism that
understands how the value of your data changes over time, that
migrates the data to the appropriate storage device
automatically, that keeps track of every transaction involving
the data, and quite a bit more besides.

If you are not already aware of how the data you manage changes
in value, you might begin by checking the service-level
agreements for which you are responsible.
_______________________________________________________________

Mike Karp is senior analyst with Enterprise Management
Associates, focusing on storage, storage management and the
methodology that brings these issues into the marketplace.
Mike can be reached via e-mail
<mailto:mkarp@enterprisemanagement.com>.
_______________________________________________________________

Copyright Network World, Inc., 2004

 

Return to ILM Table of Contents

WikiHelp
Recent Changes Printable View Page History Edit Page
Special thanks for hosting our website to Central Iowa (Model) Railroad!