The article shows how Alice, the development manager, applies few techniques in the her team giving some guarantees to Bob, the developer, to write code compliant to a given database schema, ie. the field of a data object is also present as a column in a table. This approach is based on integration tests, postgres dockerization, version control and database schema versioning. The reader can apply the ideas to other technologies like wsdl versioning or other database engines.
I'm Alice and before introducing my approach I clarify the meaning of few keywords I've used before.
First, what an Integration test is? Is the phase of software testing where two or more modules are combined and tested as a group. In this case we want to test if our code is able to perform real database selections, i.e all needed columns in a particular table are both in the schema and in the code. In a more complex scenario it would sounds like "we want to test if our data access layer is able to perform CRUD operations against the current datasource configured in the preproduction environment".
Second, what a schema versioning is? A good practice in software development is to keep code and schema strictly coupled, for this reason also *.sql scripts should be versioned. In my team we order them using a proper name (es. 0_1 -> 0_2 -> 1_0 -> ..., maybe using some semantic versioning convention) and we separate them between DML (data manipulation language) scripts and DDL (data definition language) scripts. One of the main beneficts is that we can reproduce programmatically the schema wherever, i.e. in production, in a particular stage of the continuous integration pipeline, or ... in a dockerized database :-)
Third, what a dockerized postgres is? You can think about it like a lightweight virtual machine that runs in your OS and that contains your postgres database. You can start it, stop it, access it, configure it, distribute it and more important you do this once, programmatically and in a centralized way!
Bob's development machine
I'm Alice again. In my team we distribute this development machine configuration. As you can see is quite generic and the guys can still browse Facebook or read some tennis article :-)
- Ubuntu 16.04 LTS
- jdk 1.8.0_111
- docker 1.12.5
- mvn 3.3.9
- git 2.7.4
Hi, I'm Bob, a software engineer. I'd like to speak a bit about my daily development activity. Once I've opened the project simple_db_reader in my IDE I write some code. Before committing I follow this steps:
- I run
sh ./docker_start_env.sh to start/restart my local postgres. This is nice since I don't need any database installation and all the *.sql scripts are located in simple_db_reader/sql and automatically executed in an alphabetical order. Also, I can access some table with my PgAdmin! This is great to test new scripts locally!
- I compile and test my code performing a
- I can run the jar with
- If everything is fine I push in some remote branch
A busy working day in Alice's office
Alice: Few days ago we have released a first version in production (changeset
da1d744). You can see it with the commands:
git checkout master
git checkout da1d744
Figure 2 and Figure 3 show
da1d744 checkout in the whole git log and the relation between the code and the database schema
In the meanwhile the dba asked for a change request about a database denormalization, dropping the table address and adding a new column zipcode in people. Is important to remark that the request has a great impact in the code so we need to modify the integration test
AppTest.testDatabaseAccess to be sure that the codebase is still aligned with the new schema. A new branch
addressesDenormalization is created and Bob is working on it
git checkout addressesDenormalization
Figure 4 and Figure 5 show
addressesDenormalization in git log and the new database schema with the compliant codebase
Suddenly, during the development, we received a ticket from the help desk about a bug in production: the "HALLO world" print is not correct, it has to be changed in "Hello World! We have to hot fix production now". Bob, checks out
da1d744, creates a new branch
productionBugFixing,fixes the code and merges in the master ready to be deployed in production:
git checkout master
git checkout 472daa4
productionBugFixing (Figure 6) Bob can work in a code base aligned to the database schema in production (Figure 7). Also, since the ddl are versioned he can locally replicate the database and develop in a safer way.
Alice: Hey Bob, I heard about your fix, are you confident the code is still compliant with the database schema in production? No sql.exception are going to be raised right !?
Bob: Just before pushing I performed the integration test pointing to the production database schema replicated by the local dockerized postgres:
Alice: that's ok! We can release the fix in production (Figure 8) ...
... and you can carry on working in
addressesDenormalization (Figure 9)
Bob: for this activity I checked out
addressesDenormalization, I added some ddl script to perform the denormalization, fixed the broken data layer code and performed my tests. I'm ready to merge in master (Figure 10).
Alice: once again, I've seen you had conflicts in merging in master and I know your fix has few impacts. Can you test again the master?
git checkout master
Dear Alice, I'm confident that you can tell the dba to run the new incremental *.sql and we can deploy the denormalization!
Alice: fine .. I let you know soon :-)
Points of Interest
Bob's point: with dockerization I can set up my development environment easily, skipping all the painful details like database installation and configuration. Also I can access and perform scripts locally avoiding annoying issues that usuelly come from a shared development environment. Also, since the *.sql scripts are also versioned, I'm confident to replicate automatically the related schema.
Alice's point: in the first stage of building up the development environment we configured once a dockerized postgres and shared it between the team. Everyone enjoyed a ready to go installation and we used a clone of the container in our continuous intergration pipeline as well. Also, the practice to keep the database schema versioned has few advantages: we can rebuild from scratch all the time a particular snapshot coupled with a particular checkout of the codebase, this is useful when different environments are involved in the release life cycle. Yet, sharing those scripts, we can communicate formally with the dba and other colleagues avoiding dangerous misunderstandings. Finally, integration tests between the access data layer and the codebase pointing to a real instance of the database gives use a certain level of confidence on technical details like the jdbc driver or some table indexing.