The poetry of Database Programming

Posted on January 5, 2011

1


Building a database is very much like poetry.

In the wee hours I am working on a new project involving database design for a site that should be able to handle thousands of concurrent user sessions per hour.

It has been a while since I have been doing something more than the mundane database designs, even since I have done anything with a Relational Database System.

And I love it.

I stopped hardcore backend-development a while ago as there is hardly any fame or glory in that. Where the flash-guys and girls and (from now on) HTML guys and girls and the designers can show you something nice and sweet for the eye the backend people only have diagrams and complex theories. They are the nerds. The wallflowers. Only understood by others similar to them if asked: “so what do you do?” as the answer is so far from anyones day to day concerns that eyes will faze and hands will start to fidget and minds start to think of escape-strategies. Like the drummer, bass player or lead singer is much more sexy than the guy who stands like a stick in the corner and plays the keyboard.

Backend development is a lonely place.

The poetry of database programming

Where coding in Java and any front end language is very similar to writing a novel, the database is like a poem: you try to put as much information as possible in as little words as possible.

For instance: when I have entities like “dog”, “cat”, “musquito” and “lobster” the general OOP approach is to create some base classes describing the commonalities and then extend these in an Aristotelan hierarchy of expanding properties. And each creature (cat, dog, lobster) will have its own specific class to be represented by “as they are all different”.

A database person will try to abstract and go the other way, “as there must be a point where they are all the same”. So instead of fanning out, he or she will try to compress towards the most optimal and compact way of defining the object. A good database will not contain a table “cats” for cats and “dogs” for dogs, but something like “animals” or “creatures” or “entities” if also rocks, crystals and plant-life are included.

For databases, findability is the main goal and purpose. When I am looking for “something” in my database, it should be almost like a google search: “SELECT * FROM entities WHERE name = ‘pooch’ ” which might result in a selection of cats, dogs, clams, lobsters and rocks.

In other words: Less is more.

You separate items by classifications. Which can be free-hand values (which will create a mess) or via classification tables. So: entities might have a classification entityType, stored in a separate table and provided with a unique ID per type. To state the entity type, you simply assign the number of that type to the entity and suddenly you can start filtering.

As your entity-types can be many – there are a lot of different types of animals, insects, rocks, minerals and plants, you want to move one step up in the hierarchy. So you add a classification, called – for instance – entityClass. Now, when you classify an entity, you state the class first (animal, insect, rock). This will automatically filter out the relevant entityTypes – if you also classifies those – thus limiting the choice of types to what is relevant.

You can step up even more and add more filters and definitions. Mainly to increase the options you have to find specific information. For instance: “number of legs”. Or “type of reproduction” where you can state per entity type or entity sub class whether it lays eggs, holds the fetus in the womb, and so on.

Smart database design aims at reduction of work. If you can save time by moving a specific classification up the hierarchy, than do it. If you “inherit” properties which are also implicating other rules like: “all mammels have 4 legs” then – instead of defining per entity type (dog, cat, etc) the number of legs – you do it in in a “super” or parent table.

Inheritance versus the relational model

To understand relational databases from a OOP angle, consider the concept of inheritance from multiple objects. Or the concept of composition.

In our case the “entity” will have several objects that will expand the capabilities of our class. Using common methods on each of these classes or objects will create a specific response.

The inheritance model in most OOP languages only allow a “one parent, inherited by multiple children” approach. “One child, inheriting multiple parents” will not fly.

In other words:

public class myClass inherits parentClass

Will fly.

public class myClass inherits parentClass1, parentClass2, parentClass3

will break.

Coding is like writing a novel

When you code, you use a different approach to build your map or model of the world: represented in your application.

As you do much more than offering an optimal model for data retrieval, you create a lot more code. As your programming language does not allow for dynamic definitions for objects as much as database systems do, you manually code and hardcode things like Constants and Classes, representing very specific items – which would be fluid and extendable in a Relational Database System

This means that what is compressed in the poem that is the Database, becomes uncompressed in code. In other words: where one line can tell the world in a poem, the novellist will take one or two pages – or the entire novel – to capture the same sentiment or idea.

Unfortunately, many coders never built database systems, leading to a lot of overhead where this might not be needed.

Writing this article I came to realize why my approach to OOP is and was – some times – so much different from that of my collegues. I have seen projects where each and every item in a website was represented in XML and in a Class representing that item in the code. Even when that class was 99% similar to 10 or 100 other classes in that same project and basically consisted of:

public MyClassRepresentingBlackSock23 extends TheGenericClass {
    public function MyClassRepresentingBlackSock23():void
    {
        super();
    }
    // Done
}

When you are used to normalizing your objects and models in a Relational Data Model, this is crazyness and you will start to search for – and find – commonalities and re-use. To the – sometimes obsessive – point where you can build applications from libraries of “smart objects” by using parameters instead of code: where behavior and appearence is determined by the values injected from your data-sources.

Which brings me to:

Maintainability

The damaging blows to the maintainability of any project are the lines of code created by Copy & Paste from other parts in the project. The more repetition you will find, the larger the risk that – when specifications or requirements change – you will spend a lot of time refactoring and rebuilding your code, because there are a lot of places you need to change your code.

OOP and Design Patterns do not automatically ward you from this. Nor do they automatically warrant that your code will be elegant, lean and efficient. I have seen code – in projects I was hired to fix – which were complete nightmares of redundant and repetitive code BECAUSE they tried to be completely compliant to specific Design Patterns.

The same goes for frameworks like Robotlegs, PureMVC, Mate and Swiz. The fact that someone else took the effort to solve some basic issues and offers you a very specific way of working does not make your project a better and more maintanable project “because it is a ‘standard’ and because you use it”. On the contrary I believe.

Each of these frameworks use tricks that you can easily reproduce once you understand what you are working with. And instead of getting trapped in a very specific shape and direction – that is enforced by any framework – you can mix these tricks where they apply.

I also went through my own code after a year of neglect, to find stuff so compact and obscured by behavioral patterns based on parameters injected by something or someone else that debugging was like unfolding a Chines Puzzle.

Maintainability is in my experience a combination of:

  1. A clear structure – In coding, in names given to classes, methods and variables, in the distribution of responsibility (what happens where and by whom) and in the locations of fields and classes
  2. A reduction of repetition of code – By using proper OOP principles, allowing you to do things once and then re-use them everywhere else
  3. A clear and consistent repetition of design patters – Where they are your own, or those designed by others. Whether they are in your code – by doing things consistently, or in the organization of your classes and distribution of responsibilities
Advertisements