Cheddar now uses DynamoDB document API

We have recently moved to using v1.9 of the Amazon SDK for Java in Cheddar - travel.cloud’s Hexagonal Domain-Driven Architecture. This brings some new features which we have started to integrate into the Cheddar framework. This article will look at some of those new features and discuss the improvements that they enable.

JSON storage

Here at travel.cloud we use Domain-Driven Design as the basis for the software architecture of our microservices. One element of DDD is the aggregate pattern, which is defined as “a cluster of domain objects that can be treated as a single unit”. The terms for the contract of this pattern state that loads and saves should be achieved in a single transaction to ensure consistency. The new document API now allows better compliance with this pattern, lets find out why.

In our domain objects we often have the structure where one object will contain a list of other domain objects. Using versions earlier than 1.9, the most practical way to persist this kind of one-to-many relationship is to use two tables and reference IDs. This has the potential to cause issues if an update fails part way through the commit to table 2 after successfully updating table 1, resulting in inconsistent data to be present and a break in the aggregate pattern. By using the Document API, introduced in v1.9, and storing the data as JSON it means that one-to-many relationships can be modelled as nested objects and that updates can be committed in one transaction.

The Document API is wholly implemented in the SDK, which means that to use it there just needs to be a move to the latest version of the SDK.The final benefits of using the Document API are: it makes the Java code responsible for database interaction less verbose, and that it enables better alignment to OO design.

Storage example

Use of the Document API changes the way data is stored in Dynamo, here is an example of the difference.

Within our system the user can search for hotels, when the results return from suppliers we cache them in a DynamoDB row. This means that future searches for the same criteria are quicker and that we can enable pagination of the results set by reading local results rather than performing fresh queries each time. A typical results set can be modelled in JSON as follows:

{
 "id": "978153a6c0f9f8c5317c09c2a4760280",
 "searchResultList": [
 {
 "propertyID": "123456",
 "type": "com.clicktravel.persistence.model.SearchResult",
 "name": "Cosy Hotel Birmingham",
 "phone": "02125467654",
 "addressLines": [
 "Street",
 "Town",
 "Post Code"
 ],
 "rooms": [
 {
 "id": "762dc402-9e34-4dce-8617-cf0fbfaa86b9",
 "name": "Twin Room",
 "value": "30.00",
 "currency": "GBP"
 },
 {
 "id": "f6ea82b9-8fd1-453c-b9dd-24f423e2bdaf",
 "name": "Twin Room",
 "value": "30.00",
 "currency": "GBP"
 }
 ]
 }
 ],
 "version": "1"
}

Using the old storage mechanism for JSON in Cheddar this object would have been persisted in DynamoDB with ID and version being accessible parameters but the remainder of the document being stored as a single JSON String, for example:

{
 "id": {
 "S": "978153a6c0f9f8c5317c09c2a4760280"
 },
 "searchResultList": {
 "S": "{\"propertyID\": \"123456\",\"type\": \"com.clicktravel.persistence.model.SearchResult\",\"name\": \"Cosy Hotel Birmingham\",\"phone\": \"02125467654\",\"addressLines\": [\"Street\",\"Town\",\"Post Code\"],\"rooms\": [{\"id\": \"762dc402-9e34-4dce-8617-cf0fbfaa86b9\",\"name\": \"Twin Room\",\"value\": \"30.00\",\"currency\": \"GBP\"},{\"id\": \"f6ea82b9-8fd1-453c-b9dd-24f423e2bdaf\",\"name\": \"Twin Room\",\"value\": \"30.00\",\"currency\": \"GBP\"}]}\""
 },
 "version": {
 "N": "1"
 }
}

Using the document API means that JSON is now persisted with structure so our row in DynamoDB now looks like this:

{
 "id": {
 "S": "978153a6c0f9f8c5317c09c2a4760280"
 },
 "searchResultList": {
 "L": [
 {
 "M": {
 "propertyID": {
 "S": "123456"
 },
 "type": {
 "S": "com.clicktravel.persistence.model.SearchResult"
 },
 "name": {
 "S": "Cosy Hotel Birmingham"
 },
 "phone": {
 "NULL": true
 },
 "addressLines": {
 "L": [
 {
 "S": "Street"
 },
 {
 "S": "Town"
 },
 {
 "S": "Post Code"
 }
 ]
 },
 "rooms": {
 "L": [
 {
 "M": {
 "id": {
 "S": "762dc402-9e34-4dce-8617-cf0fbfaa86b9"
 },
 "name": {
 "S": "Double Room"
 },
 "value": {
 "S": "30.00"
 },
 "currency": {
 "S": "GBP"
 }
 }
 },
 {
 "M": {
 "id": {
 "S": "f6ea82b9-8fd1-453c-b9dd-24f423e2bdaf"
 },
 "name": {
 "S": "Twin Room"
 },
 "value": {
 "S": "30.00"
 },
 "currency": {
 "S": "GBP"
 }
 }
 }
 ]
 }
 }
 }
 ]
 },
 "version": {
 "N": "1"
 }
}

There is obviously a drawback to using the Document API in that there is more data written to DynamoDB than before however this is outweighed by the benefits.

A quick note about the JSON above: you’ll notice that there is a "type" field being persisted, which contains the fully qualified Java object type. This is included to enable objects of the correct type to be instantiated from the JSON on unmarshalling where polymorphism is used.

Document Path

Another introduction in v1.9, that is enabled by storing JSON, is the ability to retrieve or update part of a document using a path; the path that is specified is similar to XPath however it’s not quite as powerful.

Retrieval

Using the first example JSON above, the following code can be used to retrieve part of a document:

final Item partialDocItem = table.getItem(new GetItemSpec().withPrimaryKey(“id”,  “978153a6c0f9f8c5317c09c2a4760280”).withProjectionExpression(documentPath));

If "documentPath" is set to one of the paths below then the associated value would be returned:

Document Path	Value returned
id	{ “id”: “978153a6c0f9f8c5317c09c2a4760280” }
searchResultList[0].propertyID	{ “propertyID”: “123456” }
searchResultList[0].propertyID.rooms[0]	{ "id": "762dc402-9e34-4dce-8617-cf0fbfaa86b9", "name": "Twin Room", "value": "30.00", "currency": "GBP" }
searchResultList[0].propertyID.rooms[0].name	{ “name”: “Twin Room” }

It would be nice in future releases of the API if more Xpath-like functionality could be included. At the moment only known parts of the document can be returned, which means that items in a list have to be specifically referenced by index. Instead, it would be nice to return the list item where one of its attributes matches a given criteria.

Update

Document path can also be used to update specific parts of a document. Using the first example JSON above, the following code can be used to update part of a document:

table.updateItem(new UpdateItemSpec().withPrimaryKey(“id”,  “978153a6c0f9f8c5317c09c2a4760280”).withUpdateExpression("SET version = :version").withValueMap(new ValueMap().withLong(":version", 3)));

In an update, an update expression has to be provided as well as a value map. At runtime, placeholders in the update expression are replaced with the values from the value map.

Modular Maven Libraries

The final improvement added in V1.9 is modular Maven libraries. In v1.8 of the SDK, class files for all AWS services were included in two JAR files: aws-java-sdk-1.8.11.jar and aws-java-sdk-core-1.8.11.jar; these two JAR files weigh in at about 14MB. With the introduction of modular Maven libraries in 1.9 we now only have to include the dependencies for the services that Cheddar uses. This has two effects: firstly, there is a slight increase in the number of jar files on the classpath - now nine JAR files rather than two; secondly, there is a reduction in the size of the jar files loaded into the JVM - now 3.6MB.

Using modules enables Cheddar to work with different versions of the SDK for different AWS products. For instance, if a bug fix in a point release of SQS is useful for our platform, we now only have to bump the version of the SQS jar from aws-java-sdk-sqs-1.9.XX.jar to aws-java-sdk-sqs-1.9.XY.jar.

This article has explored some of the newer AWS SDK features that have been incorporated into Cheddar. Keep an eye on the engineering blog for more updates and details of the fantastic work being done at travel.cloud by subscribing to our blog.