Metadata: What You Need to Know (And Why You Need to Know It)

Challenges
Besides the previously noted challenges in automated metadata entry, two other issues challenge metadata’s widespread adoption: classification and "freshness" of metadata.

Figure 4. The old stalwarts of automated ISR (indexing, search, and retrieval), such as Pictron and Autonomy Virage (pictured here), have been automating media-based metadata with nominal success for more than a decade.

The Classification Conundrum
Classification is the act of creating a framework for cataloging, much in the same way that the Dewey Decimal System created a library catalog framework, only to be replaced by the more-detailed Library of Congress system when Dewey could no longer account for more granular book placement. The data world is facing the same issue, as classification decisions have often been made more on age of content than on value of content.

"The current state of data classification is largely a byproduct of historical, hierarchical storage management (HSM) implementations," says Dave Vellante, a co-founder of the open source advisory Wikibon.org. "Data age was the primary classification criterion."

"Early visions of classifying data based on business value never fully came to fruition because it required a manual, brute force approach and was too hard to automate," Vellante says. "New business value drivers include ‘never delete’ retention policies as well as performance, availability, and recovery attributes. While generally age-based schema dominate, they must more aggressively incorporate richer classification attributes. However, this extension should be accomplished with an eye toward automation where data set metadata is autoclassified upon creation and/or use of the data set."

Vellante’s premise is that older technologies are limiting metadata’s growth as past solutions had a self-imposed limit based on HSM, which determined whether stored content should remain on the database server’s hard drive or in near-field storage or should be banished to a tape library where record retrieval could take several minutes. In other words, some of the limitations in metadata usefulness have little to do with the content’s quality but more to do with its accessibility.

Jean Graef, founder of the Montague Institute, which works with federal agencies (including the U.S. Senate) on setting up classification structures, has a slightly different view.

"The more structured the content, the better autocategorization will work," says Graef, who has tested a variety of systems for clients. "For example, many of these programs were tested on large repositories of news stories—which have good, complete metadata and are written in a highly structured way (headline, byline, date, summary paragraph, details).

"Most intranet content, however, doesn’t have this well-defined structure," Graef continued. "We have found that a hybrid approach to classification works best, where content or subject specialists refine the behavior of the autocategorization program."

From Graef’s viewpoint, this means developing a list of categories or "buckets" for the software to place content objects and then assisting in developing the rules that the software uses to evaluate content (or helping to "train" software that doesn’t use rules) and reviewing the classified output to make sure it’s delivering the desired results.

"This is not a trivial job," says Graef. "In a large federal agency, it took a team of three or more part-time developers 3 years to develop both the taxonomy and the rule sets, including 1 year for the rule sets to optimize content for an automated system."

Fresh Metadata
Yet the classification of data is only one part of the solution, as metadata also tends to change in value as it is used. Vellante suggests that classification occur not just at the time of entry but also at the time of use, transcending initial classification and tagging and being judged for its quality of current relevance. There is a business opportunity here for those who can make metadata a part of mission-critical, high-availability workflows, while limiting duplication when metadata is available in multiple locations.

Given this topic of "freshness," one would think social media tagging, where metadata is consistently updated, might provide an answer. While social networking tagging can be useful, Graef and Vellante both feel it is a means to an end but may be too free-flowing to meet long-term metadata needs. Vellante says it may not meet legal and compliance standards, while Graef says social media tagging is useful for individuals and small teams looking to organize small, specialized document repositories. It’s also useful for maintenance workers of classification systems, Graef noted, such as those who are responsible for updating/maintaining enterprise or divisional taxonomies.

Conclusion
To rephrase Steve Mack, metadata’s value isn’t in the gathering or creating of the metadata but in the use and analytic measurement of the metadata. Digitalsmiths calls it a "deep and intelligent approach to metadata management," while I call it the engine that will drive continued streaming media adoption in social, political, and entertainment settings.

Whether it’s customized playlist construction or ad targeting, manual metadata such as content recommendations, coupled with increasingly more-accurate automated metadata creation is an area we all contribute to. And, like exercise, getting metadata into a usable form is a somewhat Herculean task, but the benefits are clear: Metadata is both necessary and highly beneficial to the overall well-being of your burgeoning content repositories if content monetization is in your future.

Previous Page Next Page