Seeking an Agile Database Definition Language
The database schemas for my current project are getting hashed out, mostly in Visio. This is pretty but not very automatable. I’d like to see if we can follow a more agile, iterative approach to database development. In particular, I’d like to be able to check the database definition into source code control and build the whole thing, including database tables and sample databases for testing out of Ant. Requirements include:
- Supports (at a minimum) MySQL and Derby
- Can be read by Java and Python
- Allows for inserting of data for unit testing; i.e. not just table definitions
It also wouldn’t hurt if it could reverse engineer existing SQL databases.
I’m tempted to write my own, probably using XML, but surely someone has already done this? I haven’t found a lot though. What I’ve got so far are these, none of which really meet the requirements:
Andromeda
Andromeda uses YAML for the database definition language. The downside is that only a PHP parser is currently available. I’d have to write one for Java and/or Python.
Python
We have some database generation code written in Python. Maybe I should just use Jython to call this from Java? It still feels like a hack though.
Raw SQL
We could just write a big SQL script to set up the tables and load in the data. However JDBC can’t execute SQL scripts, only individual statements. (Perhaps this is fixed somewhere in some project?)
Surely this is an obvious enough idea that someone has already done it? Probably in XML? Any ideas?
June 14th, 2007 at 7:55 AM
Maybe your definition of SQL script is more nuanced than mine, but I’ve certainly used the JDBC executeBatch() method to run multiple SQL statements in one go.
We’ve been using DbUnit with some success to insert unit test data. I have a little command line app (build using their classes) which I give a query to, and it dumps the data out in an XML file which you can later reload, using their JUnit extensions. There’s some fun to be had around null handling, but it works fine. We extract data from our main database (big commercial DB) and insert it into an in-memory HSQLDB for the tests. I’m sure it’ll work with Derby just as easily.
For schema generation in the HSQLDB we’re lucky enough to be able to use Hibernate to do most of it. As many of the tables we are interested in are bound to POJO JavaBeans with Hibernate, we can simply ask Hibernate to generate the necessary table creation statements, foreign keys etc. It might be complete overkill to write a Hibernate .hbm XML file for each of your tables solely to support creating them for unit testing, but it does work…
June 14th, 2007 at 8:35 AM
I don’t know why the Ant “src” task does not do what you’re wanting? It will execute batches of statements from a file, it uses jdbc, and even supports transactions. http://ant.apache.org/manual/CoreTasks/sql.html
Maybe I’m misunderstanding your requirements.
June 14th, 2007 at 10:37 AM
I use AQT, which talks ODBC to most any database. It has DDL generation, etc. Applying DDL requires the database connection, of course, but there’s no need to add a language engine too. Any command line, DOS/*nix/MVS will do. The whole point of relational databases is to Cut The Cord to Languages.
If your question is, what meta-DDL creators are there out there, Andromeda gets my vote. IDEA has DBHelper. Erwin, Rational Rose (now of IBM), Embarcadero, etc. Rails has a Migrations function, which generates DDL incrementally. RoR of course.
Databases can use SQL to create test data as needed. Graeme Birchall’s DB2 book has chapter (http://mysite.verizon.net/Graeme_Birchall/id1.html). It’s pretty vanilla; should be usable for most any database.
June 14th, 2007 at 10:42 AM
Ant has task to execute a SQL script, I believe…
June 14th, 2007 at 11:03 AM
I think the best practice is to put raw SQL into source control, and just use the Ant
sql
task to execute it. Note that in production you need to not only modify the DDL but also check in a upgrade script that transforms the existing database by adding (or dropping) tables or columns or what not.June 14th, 2007 at 12:46 PM
We use a combination of HIbernate and JUnit load tests.
Hibernate can generate the tables from the Java source (we use XDoclet, but annotations work the same way).
Then, after this, we run a JUnit tests that create POJOs and insert them into the db. In this way, we can radically refactor object names, relationships, etc with the compilers help to fix everything working with no manual sql generation at all.
We have various tests to create the standard db config, as well as those for test, general dev, and sales demos.
The downside to this approach is when you update the production environment. In this case, you really need to use sql update statements. Its easy to generate the schema, but migrating all the data from the previous schema, can be a pain if many changes have been made.
June 14th, 2007 at 3:15 PM
Hibernate+annotations.
June 14th, 2007 at 5:25 PM
I know this doesn’t meet all of your requirements but Ruby on Rails could handle the database definitions and use the migrations to track changes. There is a export function that will dump all the SQL scripts to build the database. You can also define some stock data for various unit tests. Now with JRuby 1.0, it will support Derby along with MySQL. And then if you want to get some hard copy documents of your schema, you can use http://railroad.rubyforge.org/.
CW
June 14th, 2007 at 9:00 PM
Hibernate may be possible. I’ll have to explore it further.
executeBatch
won’t work. That lets me queue up a bunch of separateexecute
calls in one go. However each of those is still a separate Java statement. It doesn’t let me load in one SQL script file. I’d like the database people not to have to edit the Java code.Ant’s sql task is good for deployment but not for testing. For unit testing you need to set up the database before each test and tear it down after each test. Ant would only let you do that before the test suite, not each test.
June 14th, 2007 at 10:08 PM
We use Hibernate for our object relational mapping and use the tools it provides to automatically generate the schema from the mapping files. We define all our reference and configuration data as Spring beans and have a task in Ant to populate all of this into the database as well. Part of our automated build process is to blow the database away and start from scratch, to prove that the application is not just working because somebody has been tinkering directly with the tables and that the application can be made to work solely from what is contained in the source repository.
You could also use this approach in the setUp() method of your unit tests, but it tends to make the tests run rather slowly. I believe Spring has an extension to JUnit which doesn’t let the unit tests commit anything, and rolls the transaction back after each test (see http://www.springframework.org/docs/reference/testing.html#testing-tx).
Anything that can’t be handled by Hibernate, we keep as SQL scripts in the repository which the Ant script applies after Hibernate has done its thing. Hibernate will also save the schema generation SQL to a file, so you can include this in your distribution so that clients/DBAs can create the schema on site without having to have your build file and source code.
When we have a release, we still have to perform a diff between the schema generation SQL for this release and the previous release and hand craft an upgrade script to take clients from one release to the next, preserving their existing data.
June 15th, 2007 at 12:40 AM
The Apache Derby database comes along with a scripting tool called ij: Description page from the Apache Derby IJ tool documentation.
It is written in Java, database-neutral can be even called directly from Java without forking a new VM. We use it to fill newly created databases with our table layout specified in SQL files. It is deployed in a JAR file separate from the derby stuff.
June 15th, 2007 at 3:43 AM
I’m rather happy with the Oracle command line SQL*PLUS (not its ugly quasi-gui relative with the same name). After you define an alias for connection parameters you can simply call
sqlplus user/password@alias @myscript
to run myscript.sql, which can be automatically generated and can be a container that calls other scripts (some of which can be fixed or generated in different ways).
Similar command line tools doubtless exist for other DBMSs and are easily scriptable from Java (JUnit), Ant, Python, etc.
For production environments, the scripts can be regenerated with some controlled differences like switching names and omitting INSERTs of read only test data; DB administrators can easily review, edit and execute pure SQL code without interacting with the code generation and configuration control tools and procdures that originated them.
June 15th, 2007 at 9:08 AM
Elliotte – just to be 100% clear, in the past I’ve been able to load SQL scripts with executeBatch(), i.e. something like
insert into foo values (…);
select from foo where …;
update bar set …;
Now it could just be an artifact of the driver I was using but as long as I used the right statement separator (in this case ‘;’) between the statements I was able to pass in a whole file in one go. Now I’m looking at the API again, I wonder if I even used addBatch() and executeBatch() at all, or whether I just passed the whole script to execute() as one big string…
I don’t know if JDBC requires this behavior work, or if it defines a standard statement separator or if that is vendor specific, but I do know it worked on the database I was using a couple of years ago.
And failing that, something that read a file line by line and called addBatch() for each line in the file would be about 30 seconds work for the guy who wrote Java I/O :)
June 15th, 2007 at 1:52 PM
If you’d consider perl, the DBIx::Class module team is working on supporting diffs between versions of a database with the ability to move data from the old schema to the new. Check out the 6/12/2007 perlcast podcast, http://www.perlcast.com.
June 15th, 2007 at 5:49 PM
I use either Ant’s sql task or sqlunit (which we also normally use from ant). Both mix happily with Groovy’s AntBuilder if you want the ability to refactor to reduce duplication. Both also happily mix with WebTest for functional testing of HTML and Web Service applications (in either XML/Groovy form). This allows you for example to populate the test database with data or test some database preconditions, then run a web test, then clear away the test data in the database or check some database postcondition.
I am a fan of the Groovy flavored tests myself as they tend to be faster to write and run than the XML variants and tend to be much clearer and succinct thanks to Groovy’s nifty syntax abbreviations and DSL capabilities.
June 15th, 2007 at 9:54 PM
The problem I experienced with dbunit is the dataset files loaded by dbunit to set up a test are hard to read as far as understanding the explicit test conditions being set up. I’ve had more success setting up the database by writing code that uses the objects that the application uses to interface with the database. The unit tests are more readable that way.
I’m sure there is a place for dbunit. Maybe someone could explain where they found it useful in a context where the quality and readability of the unit tests are stressed?
June 16th, 2007 at 8:49 AM
To clarify about YAML and Andromeda.
YAML parsers are available in just about every language out there. You can use YAML as a human-writable file format and then parse it in any language.
As far as Andromeda’s syntax is concerned, what we have defined is little more than a set of keywords that we use to specify database builds. Using YAML does not mean you have to use our keywords, especially if you are not using our parser.
I cannot help but to comment on one thing though. Look deeper at Andromeda. You are recapitulating the course that led to its development, and you will find it anticipates your needs one after the other. After you get the format down, you need a builder. Andromeda has that. After you get it built, you need an upgrader. Andromeda has that. After you can build and upgrade you will want to publish the app. Andromeda has that. Then you may wish to host multiple versions of the same app on a server. Andromeda has that. You will need a rational way to store application-level and instance-level code. Andromeda has that. You will then want to document the database structure. Andromeda has that. And so on and so on.
End of commercial :)
June 16th, 2007 at 2:41 PM
I second the idea of using Rails migrations; should have thought of that myself. Combined with JRuby, you have perfect integration.
June 17th, 2007 at 8:52 PM
Liquibase looks like it might be headed in the direction of something you’d be interested in, but I don’t think it’s quite where you need it to be yet (data insertions, for instance). Might be worth a look.
October 22nd, 2007 at 3:08 AM
[…] check the full story here […]
March 1st, 2008 at 3:03 PM
Hand code SQL scripts for each db vendor and make sure you have solid persistence layer abstraction. It’s the only way to handle these kinds of requirement’s in agile, production environments.