.comment-link {margin-left:.6em;} <$BlogRSDURL$>

Tuesday, November 08, 2005

ACM Queue - Managing Semi-Structured Data :
"Most of the world's data does not fit into a traditional database structure. Fortunately, work is being done on various fronts to harness this vast information pool.
Data

I vividly remember during my first college class my fascination with the relational database—an information oasis that guaranteed a constant flow of correct, complete, and consistent information at our disposal. In that class I learned how to build a schema for my information, and I learned that to obtain an accurate schema there must be a priori knowledge of the structure and properties of the information to be modeled. I also learned the ER (entity-relationship) model as a basic tool for all further data modeling, as well as the need for an a priori agreement on both the general structure of the information and the vocabularies used by all communities producing, processing, or consuming this information.

Several years later I was working with an organization whose goal was to create a large repository of food recipes. The intent was to include recipes from around the world and their nutritional information, as well as the historical and cultural aspects of food creation.

I was involved in creating the database schema to hold this information. Suddenly the axioms I had learned in school collapsed. There was no way we could know in advance what kind of schema was necessary to describe French, Chinese, Indian, and Ethiopian recipes. The information that we had to model was practically unbound and unknown. There was no common vocabulary. The available information was contained mostly in natural language descriptions; even with significant effort, modeling it using entities and relationships would have been impossible. Asking a cook to enter the data in tables, rows, objects, or XML elements was unthinkable, and building an entry form for such flexible and unpredictable information structures was difficult, if not impossible. The project stopped. Years later I believe we still do not have such information available to us in the way we envisioned it..."

This article goes on to provide a very good overview of the unstructured and semi-structured information problem.

Comments: Post a Comment


Google

This page is powered by Blogger. Isn't yours?