Application structure

In this chapter you will learn how to create a flexible application architecture. This architecture will provide a solid base for the chapters ahead of us. You will learn how to read environment variables, configuration files into a single unified application configuration. You will also learn how to add different CLI commands to your application to make simple tasks like running migrations easier.

You can find the sample codes on GitHub

Structure for CLI commands

First we will tackle command line argument parsing, we will use the clap crate for that. You will learn how to create different subcommands for your CLI application and how to add parameters to them.

We will start from a layout similar to the structure of the hello_world application, but the crate will be called cli_app this time. Create the Cargo.toml file for the workspace:

[workspace]
resolver = "2"

members = [
  "cli_app"
]

and initialize the application crate:

$ cargo new cli_app

In cli_app/Cargo.toml add the clap crate to the list of dependencies:

[package]
name = "cli_app"
version = "0.1.0"
edition = "2021"

[dependencies]
clap = "4"
anyhow = "1"

I also added anyhow because we will use it for error handling.

Run cargo build to ensure that the dependencies are downloaded and build correctly.

To create a new subcommand with clap you have to do two things:

add the subcommand to the clap configuration
handle the different commands according to command-line parameters

We could add these code snippets simply to main.rs but that way the main.rs would become bloated really quickly.

I prefer to put these things into a dedicated commands rust module.

Let's go to our cli_app/src folder and create a new commands directory:

$ cd cli_app/src
$ mkdir commands
$ cd commands

In the commands directory create a new mod.rs file, and add two methods: one to configure the command and one to handle the CLI arguments:

use clap::{ArgMatches, Command};

pub fn configure(command: Command) -> Command {
    command.subcommand(Command::new("hello").about("Hello World!"))
}

pub fn handle(matches: &ArgMatches) -> anyhow::Result<()> {
    if let Some((cmd, _matches)) = matches.subcommand() {
        match cmd {
            "hello" => { println!("Hello world!"); },
            &_ => {}
        }
    }

    Ok(())
}

The configure method simply takes an existing Command configuration and adds a new hello subcommand to it.

The handle method takes the argument matches returned by clap and checks whether our hello subcommand was called. If that was the case, it prints Hello world! to the console.

Notice the return type anyhow::Result<()>: the handle method returns nothing by default, but it can return an error result is something goes wrong.

Now to use this code from main.rs we have to change it a little:

mod commands;

use clap::Command;

pub fn main() -> anyhow::Result<()> {

    let mut command = Command::new("Sample CLI application");
    command = commands::configure(command);
    
    let matches = command.get_matches();
    commands::handle(&matches)?;

    Ok(())
}

First, we have to add the mod commands declaration at the top to integrate our new module into the codebase.

In the main method we have to make the command instance mutable, because the commands::configure method creates a new version of it.

The get_matches() method does the heavy lifting: parses the command line arguments for us.

After calling command.get_matches() we also call commands::handle to handle all the subcommands we configured.

Notice the question mark at the end of the commands::handle call: when the method returns an error result, the execution of main will be interrupted here and the main method returns an error too.

One more trick: it's usually useful to arrange the crate to contain both a lib.rs and a main.rs file. It will contain both a library and a binary at the same time. This can make testing, benchmarking easier later.

To do so, add a lib.rs file to the src directory and move the mod commands declaration from main.rs:

pub mod commands;

We have to make it public, so main.rs can use it later. Now change the main.rs file too:

use clap::{Arg, Command};
use cli_app::commands;

pub fn main() -> anyhow::Result<()> {
  ...
}

The name of the lib module is equivalent to the name of our crate: cli_app, so to import the commands module we add use cli_app::commands to main.rs.

Now make things a little more complicated: add more subcommands. To do this, I will split the commands module into submodules. Add a new hello.rs file to the commands folder and move the hello subcommand configuration and handler there:

use clap::{ArgMatches, Command};

pub const COMMAND_NAME: &str = "hello";

pub fn configure() -> Command {
    Command::new(COMMAND_NAME).about("Hello World!")
}

pub fn handle(_matches: &ArgMatches) -> anyhow::Result<()> {
    println!("Hello World!");

    Ok(())
}

I changed the configure() method a little, so it only returns a new Command and does not configure and existing one. Also, I moved the hello string into a constant, because we will use it in multiple locations.

Now change commands/mod.rs to use the new hello submodule:

mod hello;

use clap::{ArgMatches, Command};

pub fn configure(command: Command) -> Command {
    command
        .subcommand(hello::configure())
        .arg_required_else_help(true)
}

pub fn handle(matches: &ArgMatches) -> anyhow::Result<()> {
    if let Some((cmd, matches)) = matches.subcommand() {
        match cmd {
            hello::COMMAND_NAME => hello::handle(matches)?,
            &_ => {}
        }
    }

    Ok(())
}

The configure method adds the new Command returned by hello::configure() as a subcommand to the main clap configuration. Another small change: set the arg_required_else_help flag to true, so whenever you call your application without any arguments, it will display a short description of the available subcommands.

The handle method simply dispatches the processing to the handle method of the hello module if the specified subcommand was hello. Notice the question mark: errors are returned immediately.

Now add one more command: the serve command will run our webserver later. Create a new file called commands/serve.rs:

use clap::{ArgMatches, Command};

pub const COMMAND_NAME: &str = "serve";

pub fn configure() -> Command {
    Command::new(COMMAND_NAME).about("Start HTTP server")
}

pub fn handle(_matches: &ArgMatches) -> anyhow::Result<()> {
 
    println!("TBD: start the webserver on port ??? ");

    Ok(())
}

and modify commands/mod.s to use the subcommand from serve.rs too:

mod hello;
mod serve;

use clap::{ArgMatches, Command};

pub fn configure(command: Command) -> Command {
    command
        .subcommand(hello::configure())
        .subcommand(serve::configure())
        .arg_required_else_help(true)
}

pub fn handle(matches: &ArgMatches) -> anyhow::Result<()> {
    if let Some((cmd, matches)) = matches.subcommand() {
        match cmd {
            hello::COMMAND_NAME => hello::handle(matches)?,
            serve::COMMAND_NAME => serve::handle(matches)?,
            &_ => {}
        }
    }

    Ok(())
}

Do you see the pattern? If you need more subcommands you can simply add more submodules and call them in the configure and handle methods.

Build our project again using cargo build and run the resulting binary from target/debug/cli_app. Whenever you call it without additional arguments, it will simply display a help message:

$ ./target/debug/cli_app 
Usage: cli_app [COMMAND]

Commands:
  hello  Hello World!
  serve  Start HTTP server
  help   Print this message or the help of the given subcommand(s)

Options:
  -h, --help  Print help

But if you specify one of the subcommands, serve for example, then the application will execute it:

$ ./target/debug/cli_app serve

TBD: start the webserver on port ???

You may have noticed that our serve method has to start the server on a specific TCP port, but there is no way to specify is yet. We have to introduce command line arguments for that. The TCP port is 16-bit integer value, so we have to parse it into a u16 variable. Let's name the parameter --port and also add a short version: -p to it:

use clap::{value_parser, Arg, ArgMatches, Command};

pub const COMMAND_NAME: &str = "serve";

pub fn configure() -> Command {
    Command::new(COMMAND_NAME).about("Start HTTP server").arg(
        Arg::new("port")
            .short('p')
            .long("port")
            .value_name("PORT")
            .help("TCP port to listen on")
            .default_value("8080")
            .value_parser(value_parser!(u16)),
    )
}

We use the .long() method to specify the full parameter name and .short() to specify the single character short version. We also have to define a placeholder to be displayed in the help message, this is .value_name(). The .help() message describes the parameter. When we build and run our CLI application, we can get a detailed help message for our subcommand:

$ ./target/debug/cli_app help serve

Start HTTP server

Usage: cli_app serve [OPTIONS]

Options:
  -p, --port <PORT>  TCP port to listen on [default: 8080]
  -h, --help         Print hel

We also specified a default value for the port: 8080 and set the value_parser property, so it will know that the argument must be parsed into a u16 value.

Finally, update our handler to use the new argument:

pub fn handle(matches: &ArgMatches) -> anyhow::Result<()> {
    let port: u16 = *matches.get_one("port").unwrap_or(&8080);

    println!("TBD: start the webserver on port {}", port);

    Ok(())
}

The matches.get_one() method tries to fetch the argument named port. We use unwrap_or to specify a default value for the case when the parsing runs into and error. The clap argument parser displays useful error messages whenever you try to specify invalid arguments:

$ ./target/debug/cli_app serve --port notanumber
error: invalid value 'notanumber' for '--port <PORT>': 
    invalid digit found in string

For more information, try '--help'.

$ ./target/debug/cli_app serve --port 100000
error: invalid value '100000' for '--port <PORT>': 100000 is not in 0..=65535

For more information, try '--help'.

$ ./target/debug/cli_app serve --port 
error: a value is required for '--port <PORT>' but none was supplied

For more information, try '--help'.

Command line parameters

TBD: more examples, like alias, required, how to specify and argument multiple times, etc.

TBD: alternatives style using derive macros.

Application configuration

Every application needs some configuration. For example: the URL to access a backend service or database, the log level, the addresses of telemetry data collectors, etc. These configurations can come from many sources: configuration files, environment variables, .env files, command line arguments, etc.

The config crate provides an easy way to read configuration values from these sources. Our sample code will start off where we finished the processing of command line parameters. You can find the sample code for this section in the 03-application-structure/application-configuration folder.

You can find the sample codes on GitHub

First add our new dependencies to cli_app/Cargo.toml:

[dependencies]
clap = "4"
anyhow = "1"
config = "0.14"
dotenv = "0.15"
serde = { version = "1", features = ["derive"] }

We will use the dotenv crate to read .env files into environment variables. The serde crate will be used to deserialize json format configuration files.

Next we have to build a structure for our application configuration. Create a file named settings.rs in cli_app/src and reference it from cli_app/src/lib.rs:

pub mod commands;
pub mod settings;

Start off with a simple configuration structure: create a Database struct to store database configuration, a Logging struct to store logging configuration and a Settings struct that includes both the Database and Logging structs. Our settings.rs will look like this:

pub struct Database {
    pub url: String,
}

pub struct Logging {
    pub log_level: String,
}

pub struct Settings {
    pub database: Database,
    pub logging: Logging,
}

To be able to deserialize these structs from various formats we have to add the Deserialize derive macro to the structs. I also add the Default derive macro to be able to instantiate these structs without specifying a value for all the fields. The Debug macro is also handy, so we can easily log the contents of the configuration later.

Now our structs are like this:

use serde::Deserialize;

#[derive(Debug, Deserialize, Default)]
#[allow(unused)]
pub struct Database {
    pub url: String,
}

#[derive(Debug, Deserialize, Default)]
#[allow(unused)]
pub struct Logging {
    pub log_level: String,
}

#[derive(Debug, Deserialize, Default)]
#[allow(unused)]
pub struct Settings {
    pub database: Database,
    pub logging: Logging,
}

I also added the #allow(unused) marker, to silence compiler warnings about unused fields.

There is one more problem with this schema: all the fields are required, so you cannot import an empty json for example and use the defaults for all configuration options.

There are two possible ways to solve this problem:

make fields optional with Option<>
or use default values

I usually prefer Option<> for basic values like strings, numbers, boolean flags. This way we can easily replace missing values with defaults later:

#[derive(Debug, Deserialize, Default)]
#[allow(unused)]
pub struct Database {
    pub url: Option<String>,
}

#[derive(Debug, Deserialize, Default)]
#[allow(unused)]
pub struct Logging {
    pub log_level: Option<String>
}

...

let log_level = settings.logging.log_level.unwrap_or("info");

For structures, I prefer to go with default values, so an empty structure of the configuration is always built for us:

#[derive(Debug, Deserialize, Default)]
#[allow(unused)]
pub struct Settings {
    #[serde(default)]
    pub database: Database,
    #[serde(default)]
    pub logging: Logging,
}

The config crate can load configuration from both configuration files and environment variables. A config file is handy for the bulk of the configuration, environment variables are preferable for sensitive values like passwords and settings that usually deviate in different environments. In a kubernetes-based deployment probably the whole configuration will be built from environment variables.

To use both a config file and environment variables, we use a layered configuration:

impl Settings {
    pub fn new(location: &str, env_prefix: &str) -> anyhow::Result<Self> {
        let s = Config::builder()
            .add_source(File::with_name(location))
            .add_source(
                Environment::with_prefix(env_prefix)
                    .separator("__")
                    .prefix_separator("__"),
            )
            .build()?;

        let settings = s.try_deserialize()?;

        Ok(settings)
    }
}

First we load a configuration file from location then override these settings with values found in environment variables.

Assuming an env_prefix value of APP the environment variable names will look like these:

APP__DATABASE__URL
APP__LOGGING__LOG_LEVEL

I also prefer to store the config file location and other parameters required to be able to reload the configuration later:

use config::{Config, Environment, File};

...

#[derive(Debug, Deserialize, Default)]
#[allow(unused)]
pub struct ConfigInfo {
    pub location: Option<String>,
    pub env_prefix: Option<String>,
}

#[derive(Debug, Deserialize, Default)]
#[allow(unused)]
pub struct Settings {
    #[serde(default)]
    pub config: ConfigInfo,
    #[serde(default)]
    pub database: Database,
    #[serde(default)]
    pub logging: Logging,
}

impl Settings {
    pub fn new(location: &str, env_prefix: &str) -> anyhow::Result<Self> {
        let s = Config::builder()
            .add_source(File::with_name(location))
            .add_source(
                Environment::with_prefix(env_prefix)
                    .separator("__")
                    .prefix_separator("__"),
            )
            .set_override("config.location", location)?
            .set_override("config.env_prefix", env_prefix)?
            .build()?;

        let settings = s.try_deserialize()?;

        Ok(settings)
    }
}

We save the location and env_prefix values into Setttings with the set_override calls.

Now to use this structures, we have to extend our main.rs. First, add a new command line argument so users can specify the configuration file location, but assume config.json when not specified otherwise:

use clap::{Arg, Command};
use cli_app::commands;

fn main() -> anyhow::Result<()> {
    let mut command = Command::new("Sample CLI application")
            .arg(
                Arg::new("config")
                    .short('c')
                    .long("config")
                    .help("Configuration file location")
                    .default_value("config.json"),
            );

    command = commands::configure(command);

    let matches = command.get_matches();

    let config_location = matches
        .get_one::<String>("config")
        .map(|s| s.as_str())
        .unwrap_or("");
    
    commands::handle(&matches)?;

    Ok(())
}

The arg configuration should be familiar from the previous section. We read the config_location with the matches::get_one::<String> call. We have to specify the type, because the compiler cannot guess it in this case. An alternative syntax is:

    let config_location = matches
        .get_one("config")
        .map(|s: &String| s.as_str())
        .unwrap_or("");

Here the type specified on closure parameter s gives the compiler enough information, so we do not have to specify it on the get_one() call.

Now we can load our configuration in main.rs:

let settings = settings::Settings::new(config_location, "APP")?;

println!(
    "db url: {}",
    settings
        .database
        .url
        .unwrap_or("missing database url".to_string())
);

println!(
    "log level: {}",
    settings.logging.log_level.unwrap_or("info".to_string())
);

We used the println! to print some configuration values. As you can see, we substituted the missing values with unwrap_or().

Now go to the project root directory, compile and test our code:

$ cargo build
...
$ ./target/debug/cli_app hello
Error: configuration file "config.json" not found

Well, our config.json is missing. Create a simple one:

{
   "database": {
       "url": "pgsql://"
   }
}

And run again:

$ ./target/debug/cli_app hello
db url: pgsql://
log level: info
Hello World!

We can see the db url configured in config.json and the default log level. Now define an environment variable to override the db url:

$ export APP__DATABASE__URL="mysql://"
$ ./target/debug/cli_app hello
db url: mysql://
log level: info
Hello World!

If we add dotenv parsing to the top of the main() function:

fn main() -> anyhow::Result<()> {
    dotenv().ok();

    ...
}

Then we can load a .env file too. Let's create a simple .env file:

APP__LOGGING__LOG_LEVEL="warn"

And run our application again:

$ ./target/debug/cli_app 
db url: mysql://
log level: warn
Hello World!

As you can see it loaded the .env file too.

Currently, you must use a config.json to be able to start the application. You can make it optional and depend entirely on environment variables. Modify argument handling in main.rs slightly:

let config_location = matches
    .get_one("config")
    .map(|s: &String| Some(s.as_str()))
    .unwrap_or(None);

Now our config_location is an optional. Modify settings.rs too:

impl Settings {
    pub fn new(location: Option<&str>, env_prefix: &str) -> anyhow::Result<Self> {
        let mut builder = Config::builder();
        if let Some(location) = location {
            builder = builder.add_source(File::with_name(location));
        }

        let s = builder
            .add_source(
                Environment::with_prefix(env_prefix)
                    .separator("__")
                    .prefix_separator("__"),
            )
            .set_override("config.location", location)?
            .set_override("config.env_prefix", env_prefix)?
            .build()?;

        ...
    }
}

You have to remove the default value from the command configuration as well! Now you can delete config.json and the application still works as expected:

$ ./target/debug/cli_app hello
db url: mysql://
log level: info
Hello World!

The configuration file format is not limited to JSON, the config crate can use TOML, YAML, INI and others too.

Subcommands

One more thing to do: we have to pass the settings to all the subcommands, so they can use them. Currently, we call the commands module this way in main.rs:

commands::handle(&matches)?;

Let's change it to also pass the settings:

commands::handle(&matches, &settings)?;

We can also remove the two println! macros from main.rs, they are not needed anymore.

We have to modify the handle method commands/mod.rs to accept the new parameter and pass it to all the subcommands:

pub fn handle(
    matches: &ArgMatches, 
    settings: &Settings
) -> anyhow::Result<()> {        
    if let Some((cmd, matches)) = matches.subcommand() {
        match cmd {
            hello::COMMAND_NAME => hello::handle(matches, settings)?,
            serve::COMMAND_NAME => serve::handle(matches, settings)?,
            &_ => {}
        }
    }

    Ok(())
}

Finally, extend the handle methods in both hello.rs and serve.rs to accept the settings parameter (prefixed with underscore, since we do not use it now).

pub fn handle(
    _matches: &ArgMatches, 
    _settings: &Settings
) -> anyhow::Result<()> {        
    ...
}

Do not forget to add the use crate::settings::Settings statement to the top of all these command modules.

(Re)loading configuration

TBD

Different environments

Your application will most probably run in different target environments. First in your own development environment, then in some staging or demo environment, finally in one or more production environments.

You can prepare your application for all these target environments in multiple ways. Some approaches hardcode the list of possible environments in the application or commit specific configuration files for each target environment into the source code repository. These approaches can work when you have only a limited number of fixed environments, but we do not consider this a good practice.

The best approach is to make configuration and code completely independent of each other. Of course, you can include a sample configuration with your application to showcase all the possible settings, but the actual configuration should be provided by your target environment.

Assume, for example, that you deliver your application as a docker container image, this is quite common nowadays. The image itself should not contain any configuration. You can take two approaches: read all configuration parameters from environment variables or use a mix of configuration files and environment variables. When you use environment variables only, those can be specified in a docker-compose.yml file or in a kubernetes pod definition. When you have to use configuration files, those can be mounted as volumes on specific locations. Both docker-compose and kubernetes allows you to assign these volumes to the containers. If you target kubernetes, you can create helm charts for your application too, these can serve as blueprints for deployment. We will show you some examples in later chapters.

When you deliver your application as a single binary, it will be started most probably by some kind of service management: by an init script, as a systemd unit or by supervisord for example. These solutions can specify environment variables and command line arguments for your application as well. When the application is deployed as a single binary, DevOps engineers will usually use some kind of configuration management system (like Ansible, Puppet, Chef, etc.) to automate the deployment of the application, these can generate the required configuration files from templates and a predefined set of variables.

The domain model

In the following chapters we will build a sample API. To keep things simple, it will be a simplified blogging application. Our first two models will be the User who writes blog posts and the Post itself.

A User has the following properties:

id: a unique identifier, an i64 number for example
username: also unique, but String and changeable by the user
password: for user authentication
status: to indicate active or blocked state of the user
created: the time when the user was created
updated: the last time when the user's properties were modified
last_login: the last time when the user logged in

A Post has the following properties:

id: a unique id, an i64 number
author_id: unique id of the author (the User who created the Post)
title: title of the blog post
content: content of the blog post
slug: a unique String identifier derived from the title, suitable for usage in URLs
status: to indicate draft or published state of the post
created: the time when the post was created
updated: the last time when the post's properties were modified