DFW UNIX Users Group
SearchWiki:
Recent Changes Printable View Page History Edit Page
Content Last Modified on December 24, 2007, at 02:37 PM CST
The Storage Triage
Information Centric Technology Centric Manageware
The tasks are:
Web 2.0 Wine Tasting - Hu Yoshida's Blog (HDS), ***** rating

(Jeremiah Owyang comment to this blog post)
Hu,
Wonderful insight into this growing and emerging market.
As a web professional, I do indeed believe that companies will offer online data storage for free, and then offer it as a give away such as "free checking".
Over time personal data, media, will grow online, it will be loosely collected and organized by common attributes.
Many of these online data storing companies will open up their APIs to share data with other companies to create applications we've not even imagined yet.
At some point, I suspect companies like Amazon and Google will be able to pay consumers to put data on this "storage cloud" in hopes of understanding the specific interests of the individual, and thereby serving back contextual and relevant marketing.
I also predict that this content in storage cloud will be retrieved from a variety of clients anytime anywhere such as mobile devices, cars, living room, and in front of the computer.
This is not to suggest that there will be NO data on a person locally, but it may be synched with the storage cloud in real time for both privacy and performance reasons.
The biggest challenges are going to be bandwidth, security, and identity.

The future of data storage - by Martin McKeay on December 7, 2006 09:16 AM

(Article excerpt)
But the most interesting conversation of the night was when Ben Rockwood asked Dave Roberson about the future of storage technology. First of all, Dave said that for all of the redundancy built into storage devices, they're still effectively a single point of failure; in order to combat that a large part of the future of storage will be replication of the data in real time to redundant sites. As Ben pointed out even with the speed of today's Internet, replicating gigabytes or terabytes of data is incredibly time consuming. Dave was a little cagey and kept saying 'what if' as if he already had a solution in mind, which he probably did.

Dave then went on to say that the other issue that needs to be addressed is the redundancy of data in storage. Most of the data we have in storage is replicated at least four times and often more than ten. All the documents, emails, databases that have copies all over your storage media. If those redundant copies could be eliminated or become pointers to the original files, the storage needs of companies would be greatly reduced and replication across sites would be much easier. I found it interesting that the different forms of redundancy are both a solution and a problem to be solved.

Data storage is only going to be more of a concern going forward, both for businesses and consumers. I know that on my own home network I have nearly a terabyte of storage capability, though close to half of that is just for backups. As social networking services become more popular, not just the venue of the young and technologically hip, their storage needs are going to skyrocket. HDS is being very smart by trying to be one of the first companies to court these young companies.

Prosuming - Jeff Tash's ITscout Blog

(Article excerpt)
In 1979, futurist Alvin Toffler coined the term "prosumer" to describe the open source-like phenomenon of people producing what they consume. The term applies to individuals who prefer to be involved in designing the things they purchase. In other words, new products and/or services are created by combining together the roles of producer and consumer.

The hottest new way to prosume comes from a Web 2.0 development called mashups which enable people to seamlessly combine content from more than one source into an integrated experience. And, of course, the granddaddy of prosuming is open source software which allows programmers to read and modify source code for a piece of software thereby improving it, adapting it, as well as fixing bugs.

The Sands are Shifting on the Desert - DrunkenData.com - Jon Toigo

Robert Pearson just submitted a response to a long thread that has been going on here for the last couple of days that was stimulated by IBM?s speeds-and-feeds bragging around a new storage platform. Read the original post and commentary here if you want to get caught up. (I've set the link to open in a new window so you won't lose your place here.)

Anyway, I believe that Pearson's comments merit front page attention because they cut to the heart of several timely issues -- timely to me, at least, because I am waist deep in work trying to synthesize the responses to a lengthy survey we just completed of storage managers and administrators on the topic of the state of storage into a set of use cases. These cases will become part of a requirements specification for the folks at Hermes SoftLabs to consider as they help the StorageRevolution create version one of its free CAPSAIL framework for open storage management.

Here is the quote, which was in part prompted by some excellent opinions offered by Pq65 and Chuck Hollis from EMC.

This thread is the real Storage Revolution!!!!!!!!!

Is anyone listening to what Pq65 is saying? Or are they just hearing the words?

Chuck touched on the key point in his paragraph: "The previous comments of need to use snaps could be augmented by need to vary workloads over time, need to change access patterns, need to fill arrays up, need to replicate, need to figure out what happens when a drive is rebuilding, what happens when cache gets full, etc. etc. which are all very real-world performance issues that SPC (or any other standardized test) can't capture."

What would it take for EMC (substitute your favorite vendor here) to capture this Performance information?

Everybody in the trenches knows SNIA, SMI-S, SPC, TPC and others are very primitive and crude attempts to establish some very necessary "Speed Limit of the Information Universe" baselines. Like cutting butter with a sledge hammer.

We need a way to gather Performance information about the requirements of the Managed Unit of Information. Not the Managed Unit of ?Enabling? Technology. Who cares? I never saw a blank disk drive that made any money.

If you are in the trenches, where you really need this information, you know the same thing that all vendors know. You can never performance test the "real" environment. It is too risky or too expensive. For a vendor, it might show that the design tradeoffs in your Unit of "Enabling" Technology are not the best fit for that Unit of Information. The vendors do the minimum out of regard for the loss risk. The trench people do the minimum because they don't have the resources. All we get are minimums.

We need a Fundamental Shift in the Storage Paradigm. We need to be able to give Pq65 what he wants, needs and deserves. At the same time we don?t want to threaten the vendors who could help but are lost in their own fog.

What is the feature/function set for the Unit of "Enabling" Technology for defined Units of Information? Is it a Performance or a Bandwidth Unit of Information? Once these relationships are defined "for your IT shop" then you can specify the generic Unit of Technology. Once the generic requirements for your Units of Information are known you can figure out some way to benchmark specific Units of "Enabling" Technology.

Maybe we should reverse the Storage acquisition process. The customers create representative Units of Information and the vendors have to show that their Units of "Enabling" Technology will meet that IT shops "Speed Limit of the Information Universe" requirements.

Robert, with your permission, we are going to borrow some of your terms in my spec. But let's dig into what they mean.

You refer to "Unit of Enabling Technology for defined Units of Information." I confess that I originally bristled when I read your first mention of these terms many posts back. With all due respect, they sounded like just so much more tech doublespeak. But, now, the sands have shifted on the desert. Within the context of performance testing, I think I finally understood what you were talking about. So forgive my earlier blockheadedness and read on.

I would like to understand, as you would, what describes a Unit of Enabling Technology:

  • what are its component features
  • functions
  • how do we measure them
  • how could we represent them on a "dashboard" or display console for anyone who gives a hoot.

If these units of enabling technology could be defined, we might be able to manage IT much as we manage inventory. Never buying what we don?t need and buying resource just in time to meet business requirements.

This would have great merit also as a means of gauging the efficacy of IT investments. We could calculate how the expense of a unit of enabling technology mapped back to the value provided to the business that purchased the unit.

We also seem to be in agreement that an understanding of Units of Information is a prerequisite for buying the correct unit of enabling technology. As you wrote, "Once the generic requirements for your Units of Information are known you can figure out some way to benchmark specific Units of "Enabling" Technology".

In short, we can't tell vendors what we need to buy if we don?t understand the requirements ourselves. But that begs the question of how we define units of information. I would very much like to hear your thoughts on this.

From a procurement standpoint, you are spot on when you say "Maybe we should reverse the Storage acquisition process. The customers create representative Units of Information and the vendors have to show that their Units of "Enabling" Technology will meet that IT shops "Speed Limit of the Information Universe" requirements".

I have made this same point (albeit, without the "units" terminology) over and over on this blog.

Are you sure my dad didn't know your mom somewhere along the way?

Anyway, I think this blog (and the StorageRevolution effort) could do a great service if we could do the following:

  • Define what a unit of information is and how a consumer can, procedurally speaking, characterize his data.
  • We need to make the definition and the process "drool proof simple".
  • Identify opportunities for automating the process by which we define the information unit so that anyone can do it.
  • Define what a unit of enabling technology is:
    • the procedure for defining such a unit,
    • or at a minimum a checklist or breakdown of features/functions to consider.

Starting with these two concepts, we could begin

  • Establishing worthwhile testing regimes for use in comparing the wares of vendors seeking our dollars.
  • Monitor and perhaps manage the operations of infrastructure so that they better serve the company.
  • Measure the performance of IT investments over time.
  • Begin doing some modeling to anticipate future needs and their associated costs.

A noble objective and maybe too much for a blog. However, I find it interesting that once we hung this issue on a "What is IBM smoking?" rant, everyone inadvertently put on their thinking caps?

Can we possibly do, in a conversation here, what SNIA has failed to do with its big brainiacs behind closed doors for the past seven years?

It will be interesting to see the responses to this post.

Unstructured Information Management - StorageMojo.com

Robert Pearson said, on December 9th, 2006 at 2:53 am

I would like to share some information I have about "unstructured" Information. First, a quick Operational Definition. I use "Information" to refer to information that can produce revenue and "information" to refer to still combined data and Information using Walter A. Shewhart's data presentation rules:

  1. Data has no meaning apart from its context.
  2. Data contains both signal and noise. To be able to extract Information,

one must separate the signal from the noise within the data.

Unstructured data has an unknown Information Context. The Information Content may be known but this is often meaningless out of the Context. There are various ways of attempting to quickly determine Context. Ad hoc Information spaces seem to be the most popular.

How does this apply?
Quoting Ray Ozzie from an ACM interview:

"But IT really needs to leave it to the line of business to understand what the key collaborative processes are. The line of business really has to understand the role of structured processes versus ad hoc projects, and it has to assist IT in defining what tools are best to enable its structured processes and ad hoc processes. The line of business must understand the skills of the people and how best to match the tools with those skills. ...[snip]... For example, if somebody did all of his or her work totally in an ad hoc, unstructured manner, there would be no artifacts that benefited the organization."

This quote is from page 3 of a 5 page interview. I recommend reading the entire interview.

I have been a big fan of the concept of "Weaving the Pervasive Information Fabric using agents and multi-agents" since I first ran across it. The closest thing to a home page is:
http://eprints.ecs.soton.ac.uk/3797/

The best write-up is the PDF at:
http://eprints.ecs.soton.ac.uk/3797/01/ohs6-weaving.pdf

A companion PDF is the "Hypermedia by coincidence" at:
http://eprints.ecs.soton.ac.uk/5901/01/ht01-coincidence.pdf

What does all this mean?
The "Pervasive Information Fabric" is required by the IoD (Information on Demand) infrastructure. The IoD is necessary for the timely creation and rapid dispersal, with Persistence, of transient "ad hoc" Information spaces. The "ad hoc" Information spaces extract Information from "unstructured" data to satisfy User requests. Think of it as JIT (Just In Time) Information.

To deal with the "Fundamental Shift in the Information Paradigm" we need some new concepts. The two key ones are Information Centric and Technology Centric. To manage these two concepts we need the "Lower Metric" definitions of the Unit of Information and the Unit of "Enabling" Technology. The Unit of Information is the key element and the Managed Units of Information are the "Profit Center" of the enterprise. A Managed Unit of Information is simply a Unit of Information with an SLA (Service Level Agreement). By definition a Managed Unit of Information must be ?Enabled? by a Managed Unit of Technology.

All the IDCs (Internet Data Centers) have to be doing this to survive.

Managed Unit of Information (MUoI) - Under Construction


Once a Unit of Information is identified as having a sufficient (ROI/TCO) value to justify having an SLA (Service Level Agreement) it becomes a Managed Unit of Information (MUoI)

  • The SLA must be Managed by the SLM (Service Level Management) system, if it exists
  • By definition, a Managed Unit of Information (MUoI) must be enabled by a Managed Unit of Technology (MUoT)
  • The Managed Unit of Technology is a Unit of Technology with an SLA
  • This SLA is separate from the MUoI SLA but each references the other
  • They are interdependent
Unit of Information (UoI) - Under Construction


  • Information Technology Specific
    • In Information Technology the standard basic UoI is the file or record, if the record contains sufficient Information to supply meaningful Context
    • A collection of files could also be a UoI
    • A database or collection of databases can be UoIs if they pass the Context test
    • One record or file in a database is not a UoI. The whole database may be, if it passes the Context test
    • The Context test is the difference between a sentence and a sentence fragment
    • The sentence, if a correct one, gives meaning by conveying a complete thought. A fragment does not
    • By definition, a Unit of Information (UoI) does not require being enabled by a Managed Unit of Technology (MUoT)
  • General Definitions
    • A Unit of Information is defined as smallest human recognizable entity or object that has meaning
    • For example, a word
    • Some single character entities or objects appear to meet the test but they have no meaning out of Context. Therefore they fail
* Future comment
  • Future Topic
  • Future Topic
  • Future Topic


Under Construction

WikiHelp
Recent Changes Printable View Page History Edit Page
Special thanks for hosting our website to Central Iowa (Model) Railroad!