Modern (Go) application design
When it comes to application design, I’ve formed a few opinions backed by experience. The most important one is: structure matters. In my first years of development, I’ve built a CMS system that was copied over more than 100 times for different web pages. You don’t get there unless you repeat the same process over and over.
Application development is like that. If you’re writing one middleware, you want the process to be repeatable for each following middleware.
The more people that work on the project, the more consistent you want the code base to be. Principles like SOLID or DDD give you a repeatable structuring model. Extending your application with a new service or component encourages composition, locality of behaviour, while adding new bounded scopes for testing.
Smaller package scopes lead to better tests, lead to less bugs overall.
There’s two ways to think about this, or rather two principles that apply to application development leaning into composability:
- Use cases for the application
- Data model first principles
As we know, applications can range from cli tooling, services of various sizes providing APIs, REST, gRPC, a package API. A web application may use templating to render a data model into other representations aimed at browsers. There’s variety and the use cases sort of dictate the top level components.
Use cases dictate structure
Let’s take a look at a familiar example for most, git
. The tool
provides some typical commands behind it which are used by developers
daily.
flowchart TD git --> status git --> clone git --> pull git --> commit git --> push
Git also allows to be extended, for example if you provide a git-st
binary in the system, the git
command will execute that binary when
git st
is invoked. I’ve used this in situations where multiple
repositories are composed into a single application, and I had to work on
multiple source trees at the same time.
Sometimes, the data model is more integrated into the cli, giving you a
hint at additional structures. An example of that would be the docker compose config
command, which makes config a primary component of the
tool. The command evaluates dynamic parts of the configuration and prints
out the json, including things like environment variables.
If someone comes along and sees a component like that, they are well equipped to integrate against it, and use it for new purposes, extending the original tooling without tight integrations.
The practice of introducing barriers between components makes software more reliable. But compared to the development benefits (TTV), well defined components are much valuable in the regard of what safe changes can be made, while relying on type safety to provide this security.
- Reduction in on-call to practically 0, bugs disappear
- Safer development of new features, APIs and components
- A set of common patterns for development from end to end
In the context of terminal cli
apps, the interface for them usually
follows similar practices, so you’d expect terraform apply
, terraform sync
and similar invocations of various cli tools to structure their
code according to their interface. Maybe I want to implement terraform stats
, and there should be a clear-cut way how to begin.
Data model first principles
When it comes to application development with SQL, there’s a good chance that people just described the work of a database architect as domain driven design (DDD).
The top level entities I’d expect to see in any database driven application are:
user
- providing user registration, loginprofile
- providing public data for a usersession
- authenticating users requests with a session IDuser_group
- define user groupsuser_group_member
- link users to user groupsuser_group_rule
- link access controls and permissions to user groups
These are typical, common, repetitive business entities that can be interfaced to componentize services around them. Wether or not OAuth2 or other authentication is in use, that’s all an implementation detail, another component of the composed system.
If you wanted to build a customer-relationship management software, this
gets extended again with domain specific tables and schema. You may have
a customer
table, mail_list
table, mail_list_members
. All of these
things are additional data models which come from the business domain.
Personally I use a pet project of mine to take this known data model and
generate go code for it, similar to how protobuf / gRPC works to
generate the data model for the .proto
service definitions. I’d reach
for go-bridget/mig to map the SQL
schema to code, and buf.build to generate the gRPC
models.
This gives me:
- A source of truth for the API definitions (
.proto
+ buf) - A source of truth for the repository / storage layer (
.sql
+ mig)
This schema-first approach keeps a clean data model, against which one or multiple services can be written. The components written usually form a repetitive CRUD (create-replace-update-delete) pattern, that generally maps better to RPC than rest.
Internal components data access layer could be expressed with a generic interface that each of the tables should meet:
type Repository[T] interface {
Create(context.Context, T) error
Update(context.Context, T) error
Get(context.Context, id string) (T, error)
Select(context.Context, query string, args ...any) ([]T, error)
}
A more complex repository would compose multilple repositories or
implement functions specific to that repository. This is in general
called an aggregate
:
type UserGroupAggregate struct {
groups Repository[*model.UserGroup]
members Repository[*model.UserGroupMember]
rules Repository[*model.UserGroupRule]
}
func (r *UserGroupAggregate) IsMember(ctx context.Context, userID string, groupID string) (bool, error) {
g, err := r.groups.Get(ctx, groupID)
u, err := r.users.Get(ctx, userID)
m, err := r.GetMember(ctx, groupID, userID)
return count(m) > 0, err
}
func (r *UserGroupAggregate) GetMember(ctx context.Context, groupID string, userID string) ([]*model.UserGroupMember, error) {
return r.members.Select(ctx, "user_id=? and group_id=?", userID, groupID)
}
func (r *UserGroupAggregate) GetMembers(ctx context.Context, groupID string) ([]*model.UserGroupMember, error) {
return r.members.Select(ctx, "group_id=?", groupID)
}
This handles the surface level operations that can be invoked on a
table. Separating the implementation for write
and read
operations
can be classified as following the CQRS pattern.
Reads can be typically scaled independent of writes with approaches like in-memory caches, Redis, while due to consistency guarantees, indexing, database writes are harder to scale.
The usage of Repository
somewhat guarantees that the correct
underlying storage driver is in use. If we wrote raw SQL and added an
*sqlx.DB
into the struct, the storage for all 3 areas must be
available on the same connection. In practical use, the user groups,
members and permissions may be cached by the application, avoiding a
storage hit altogether.
The IsMember function also carries a certain amount of validation:
- Does the group exist,
- Does the user exist,
- Is the user a member of the group.
The smaller GetMember
function can be considered the data access layer
for this particular business logic call. It’s also relatively easy to
replace the implementation for GetMember as on the surface the only it
provides is the known data model types in the response.
In practice there are some additional considerations for handling
writes, like performing them in a database transaction. The important
thing to note is that the business layer should not be aware of the
underlying storage driver types like db.Tx
, db.Conn
and otherwise.
These are all implementation details for a repository, and should not be
exposed as a coupling through function arguments or otherwise.
Data model separation benefits
Ok, let’s asume you’re further down the implementation journey. You’ve structured your services into packages, your data model is separated and new code can be written that uses and extends it.
You decide to write a web application. A common pattern for that is that you get the business-layer data model and write a template that renders it into the desired view. What you need is to write a controller that retrieves data from storage layer into the business layer, to complete a common pattern called MVC - model, view, controller.
In the case of a REST API or similar, the controller is a http.Handler that translates business logic into the output format required. To summarize, when it comes to the data model we always consider the same path:
- Read data from storage layer into the business layer,
- Read data from the business layer to a view
When you consider the flow of data, the REST API receives and uses the same components that the templated view would. The data gets translated on each layer, adding or omitting details as needed.
If we take a simple microblogging platform to the extreme, we want
to have an article
table that carries an user_id
field. The data model
for each layer can be unique for example:
- storage layer contains the
user_id
field, - business layer uses
*model.User
, going from id to a full object, - the view layer omits most detail and may just use
user_id
anduser_name
.
When you consider that the user model may contain sensitive information
like email
, the view has to take only the data required and limit
field usage to a separate view specific model.
If you use the MVC pattern, or write gRPC services, REST APIs, these components of your system always follow the three package rule:
- model (reusable, build things on top)
- service (replicatable, new things of the same flavour)
- storage (interface for the model “at rest”)
These things are then integrated into a transport:
- gRPC
- REST APIs
- Websocket APIs, SSE
- CLI interfaces
- …
I’m basically explaining that as long as your database model is isolated, accessible, it’s always possible to go from that into a new component or satisfy new transport requirements.
While the components can be laid out in a single package with conventions, it’s more appropriate to put them into stand alone packages that follow the explained patterns, reaching a higher level of stability, testing, consistency.
Following DDD principles on top allows you to maintain a glossary, easier to discuss the business logic when this glossary is aligned between the business shareholders and engineering teams.
An example of such a glossary is published by
Docker. It defines
concepts that have meaningful mappings to their offering, providing a
common interpretation of image
, container
, layer
and other terms
specific to adopting and using docker.
Conclusion
Good software isn’t fools gold. Just like work itself where you’d break up a big task into smaller ones, you’d ideally have software which breaks itself up in sets of smaller components.
Keeping structures flat and simple is similar to factories. Factories have made
insane progress in manufacturing by adopting a repeatable process. Software
engineering when done right, takes a similar approach to divide and structure
a code bases in similar ways. The abstraction of a http.Handler
is able
to provide new abstractions, for example a middleware may be defined as:
type MiddlewareFunc func(http.Handler) http.Handler
I’ve lifted this definition from gorilla/mux.
By following this abstraction, everyone is able to create a new middleware, put it into
a package, and set up tests that cover the required behaviour. And thus, gorilla/handlers
was born. The implementation of those handlers is completely separated from gorilla/mux
, but
seemlessly integrated using composition.
When it comes to the smallest applications (microservices), they generally follow the same pattern for building it:
- get config/credentials from env,
- get storage connection,
- update or query data with a well known data model,
- render the response
In my experience there’s always some requirement that goes beyond the initial architecture, the consideration that should always be made is “how can i make a 1000 of these” while keeping things neatly sorted away.
How can you make this process repeatable?