Tuesday, November 25, 2008

Handling database changes without complete migration

The ORM market
ORM tools are great. Products like Hibernate, JDO, JPA, IBATIS, Torque, and others made life easier for developing database enabled applications.
Using JDBC when your application is database intensive with lots of table can be lots of work especially if lots of your existing code base doesn't provide some DAO classes.
Usually in ORM tools, you map a set of fields to some columns, using XML or annotations, and you're done.

Most J2EE and core Java developers have faced database changes and migration issues at least once. The problem is pretty crucial when your database model is shared by other applications which can't be upgraded(for many reasons).

The concern
  • How to handle database changes which keep happening?
  • Should/Could you stop providing backward compatibility?
  • Is upgrading the database model your only solution?
The application history
  • You have an existing application with a model which has been designed carefully and everything is going well.
  • You have a server side application with a database model and client applications with the same database model as the "main server".
  • A month or a year later, you need to make lots of changes in couple of tables, replace some primary keys, introduce some non null foreign keys, etc.
  • You were using raw JDBC mostly and plain SQL. Now, you would like to use that brand new bleeding edge technology(Hibernate, JPA, Ibatis, name it).
  • Here and there, you might have been using a very old ORM tool which was convenient at the time and is still getting the job done.
Constraints
  • You need to be able to support simultaneously clients(applications) running older and newer versions of the database schema.
  • You cannot force the customer to upgrade for many reasons(hardware dependencies, partner application compatibility, the customer doesn't want to, etc.)
  • You need to keep adding new features which might involve altering again the existing schema
  • Your table contents are now messed up, invalid or irrelevant values here and there because the database column has a "NOT NULL" property.
Possible solutions
  • One might be tempted to maintain different versions of the same database, but let's say I have 100 versions since 1994.
  • Ok, let's use JCR to provide another abstraction level, maybe checking a node property before deciding which class to map, overkill in most cases?
  • "Dear customer, please, upgrade and buy the new pack to be able to use that version which also provide bug fixes and new features"
  • Hum... last resort "Dear customer, you should upgrade because that X, Y, W feature fixes lots of serious security holes which will affect your network"
  • Ok, from now on, every table will be like a key-pair, probably not very wise most of the time, especially if it will involve rewriting most parts of a huge application.
The problem is here, a fix must be delivered!
What I would probably do is :
  • Stick with raw SQL and migrate ORM mappings to JDBC, as needed(if the ORM tools cannot ommit fields), to ignore some properties depending on the client database version. I can insert some dummy values when I have no choice when dealing with "old software clients".
  • Use native SQL queries and JDBC only
  • Use named SQL queries with binding and anything tool that supports it Hibernate, Ibatis, a resultset handler from JdbcTemplate or DbUtils, etc. Some dummy data will need to be inserted when not available(new non null columns).
  • If the problem gets out of hand, way too many changes, I'll probably want to use a non relational database and handle relationships myself(An object or XML database might do, but might not scale)
  • Another solution, would be JCR. I messed with JackRabbit once, and the pain was brought. Performance, concurrency and the API probably improved since then.
I would definintely try to avoid running multiple database versions at the same time. You can easily go from 1, 2 versions and then reach 100.

What would you developers do in such a situation?

1 comment:

Jason Monberg said...

An XML database is an excellent solution when the schema is changing often or may not be well understood, such as unstructured documents. Full disclosure: I work for Mark Logic, we develop, sell, and support the MarkLogic Server. It is an XQuery compliant XML database which scales very well - 100's of TB's. We have a community version that developers can try out on projects available here:
http://developer.marklogic.com/

Another important aspect of using an XML Server is that XML can be served directly to the client, for some applications this means serving XHTML directly to the browser. The OR impedance mismatch is removed, simplifying the architecture and development cycle.