The Transformational Power of Metadata, May 20, 2012


Metadata, by definition, is data about data.

What a boring, obtuse, notion. You know exactly what your data stands for. Your computer program doesn't know what it stands for, but that doesn't matter, right? As a programmer, your task is to tell the program what to do with the data. What the data stands for is irrelevant to the program. Under what circumstances could it possibly make a difference if your computer program "knows" what your data stands for?

The answer is that a program that works with your data's metadata can accomplish far more than a program that doesn't. When the program can read your data's metadata and then autonomously make choices about what to do with the data, then that's one less thing that you must design it to do. This increases both the flexibility and power of the application you ultimately construct. It can also go a long ways toward avoiding duplication of code.

One commonly encountered type of metadata is a database schema. This metadata describes how data is organized in the database. In SQL, there are a number of tables that contain this data. If you have access to a SQL database, try running the query "SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME='$yourtable'." This query will reveal a number of facts about the columns in $yourtable. For example, if $yourtable contains a varchar[n] column, you can see what the maximum number of allowed characters is. The application of this is that if the textual data saved in this column is input by a user into an application, you could program the application to retrieve this metadata at runtime and autonomously decide how many characters to accept from the user.

In the previous example, you avoid duplication of code. Without using metadata in your application, you must specify the number of characters the user may enter into a text field, possibly in multiple places in your application. If the database schema is changed so that more or fewer characters are allowed, you must find and change all references to the maximum number of characters. Multiply this by many different fields, and a serious database schema change could be very bad news, indeed. With just this tiny bit of metadata, your application is more flexible and easier to maintain.

If that sounds like a powerful idea, allow me to introduce you to lightning in a bottle. The previous example used data that was automatically generated in the course of setting up your application's database. That's nothing compared to what you can accomplish when you write your own metadata.

Let's suppose that you begin programming in a new environment where there is a database containing years worth of data about a company's records. Soon after you begin, the company comes under an income tax audit. Inquisitive government agents begin demanding all sorts of information about the company's records. These demands are heard by the company's lawyers and accountants, who are familiar with your reputation as an outstanding programmer. They deluge you with requests for reports about all sorts of things contained in the company's database.

You begin construction of an application that will generate these reports as fast as you can. However, it's very difficult to keep up. The data is not all well-annotated, which results in much confusion about exactly what data should be in which report, and the details of the reports keep changing. For instance, the Employee Expense Report contains data about work-related expenses that have been reimbursed to employees for the past three years. Bob from the accounting department is delighted with your ability to retrieve all this data from the database so that he can present it to the government agents, but the agents keep changing their demands regarding what data they want to see.

First, they keep changing the time frame that they want to see data for. Then, they keep changing what categories of expense that they will and will not accept as a legitimate business expense. Furthermore, they want the report filtered down to transactions over $1000 one week, and then $500 the next. It would seem like a hopeless situation because every time a new demand for data is received, you must write code to generate a report to meet the demand.

This serves as an extreme example of changing requirements. Consider writing metadata about these reports that you must generate. Many modern languages and frameworks include libraries for working with standardized formats, such as XML and JSON. Try writing a file in an approriate format that contains data about the report that must be generated. Whatever decisions you must make about the report that are subject to change are best kept as metadata whenever possible. Put the categories of expenses for each report, time frames, and the minimum transaction amounts in metadata. From now on, whenever you need to change what is on this report, you no longer have to write and debug code. Just modify your metadata. Once you have extensive metadata, you may even be able to generate reports that are brand new to the recipient, but simply a matter of modifying metadata on your part.

Here, the word "transformative" applies doubly. It can transform your client's perception of your ability to deliver solutions as it allows you to transform one program into a seemly compltely different one.