Persistence
In the world of web applications, persistence boils down to remembering stuff.
Unlike your regular computer programs, web applications are stateless by nature.
This means every interaction with the web server is treated as a fresh start.
Here's where persistence comes in: it's the magic that allows web applications
to hold onto information across different sessions.
There are two main types of persistence in web applications: session persistence and data persistence.
Session persistence focuses on remembering information within a single user's browsing session. Imagine you're adding items to an online shopping cart. The web application uses session persistence to keep track of the items you've added, even if you navigate to different product pages. This is commonly achieved using cookies or storing data on the server-side for a limited duration.
Data persistence is the harder one - it's about storing data permanently, beyond a single user session. This is where databases come into play. Web applications can store user information, product details, or any other kind of data in a database, ensuring it survives even after the user closes their browser or the server restarts.
Different types of data may require different types of databases. The simplest data to be stored is a binary blob: a chunk of data that we simply store as is, and we don't have to look regularly into its contents. This is the kind of data which can be stored in a file system as a file. Of course, storing this way is not ideal for scalability, although with a shared file system like AWS EFS, it can be done. Much more common is to use a dedicated object storage mechanism like AWS S3, which is designed to store and serve large amounts of data. We will cover this topic later.
A more common case is to store simple key-value pairs. This is the kind of data which can be stored in a key-value store like Redis or Memcached. Sessions are a good example of this kind of data. There are some Rust based solutions too, like TiKV, which is a distributed key-value store.
The most common case is to store structured data, which can be stored in a relational database like MySQL, PostgreSQL, or SQLite, or in a NoSQL database like MongoDB or DynamoDB.
Relation databases are the most common, they are based on the relational model of data, which is based on the idea of a table. Each table has a set of columns, and each row in the table has a value for each column. The columns are defined when the table is created, and the rows are added and removed as the data changes. The SQL language is used to interact with the database.
NoSQL databases are a newer type of database, which are designed to be more scalable and flexible than relational databases. There are many different types of NoSQL databases, but they all share the idea of storing data in a way that is not based on the relational model. This means that they can be more flexible and scalable, but it also means that they can be harder to work with, because they don't have the same kind of structure that relational databases do. Common designs are document stores like MongoDB, the key-value stores we mentioned before, and graph databases like Neo4j or Amazon Neptune. Graph databases are designed to store and query data that is connected in complex ways, like social networks or other kinds of networks. There is another kind, so called wide-column stores, like Cassandra or ScyllaDB. They are designed to store and query large amounts of data, and they are often used for time-series data, like logs or sensor data.
We have to mention the concept of ACID, which is a set of properties that guarantee that database transactions are processed reliably. ACID is an acronym for Atomicity, Consistency, Isolation, and Durability. Atomicity means that transactions are all or nothing, either all the changes are made, or none of them are. Consistency means that the database is always in a consistent state, even if a transaction fails. Isolation means that transactions are processed independently of each other, so that they don't interfere with each other. Durability means that once a transaction is committed, it is permanent, even if the database crashes. ACID is a very important concept in database design, and it is one of the reasons why relational databases are so popular.
NoSQL databases are often designed to be more scalable than relational databases, but they often sacrifice some of the ACID properties in order to achieve that scalability.
Another topic we have to mention is the CAP theorem, which states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:
- Consistency: Every read receives the most recent write or an error
- Availability: Every request receives a response, without guarantee that it contains the most recent write
- Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes
In a distributed system, you can only have two of the three. This is a very important concept in distributed systems design, and it is one of the reasons why distributed systems are so hard to design and build.
Relational databases are designed primarily to provide consistency. The most common way of replication in the relational world is to have a primary database, which is the only one that can accept writes, and one or more replica databases, which can accept reads. This is called a master-slave replication. With additional components it is possible to achieve automatic failover, so that if the primary database fails, one of the replicas can take over. This is called a high-availability setup.
NoSQL databases offer a variety of trade-offs between consistency and availability, and they are often designed to be more tolerant of network partitions. This means that they can provide better availability than relational databases, but at the cost of consistency. This is why NoSQL databases are often used in distributed systems, where it is more important to be able to continue operating in the face of network problems than it is to have a consistent view of the data. The trade-offs between the C-A-P properties are sometimes configurable, so you can choose the best setup for your use case.
This chapter will be primarily about data persistence. First we will cover traditional relational databases. They can be used directly using SQL or through an ORM (Object-Relational Mapping) library like Diesel or SeaORM.
Then we will cover NoSQL databases, MongoDB primarily. Later we will introduce SurrealDB, a Rust based NoSQL database. This one can be embedded directly into your application, and offers a highly flexible data storage layer. It can use in-memory data storage, local storage based on RocksDB or can connect to a distributed storage cluster based on TiKV. The data model of SurrealDB is document based, but it has an SQL-like query language, and it can be used as a graph database too.
Adding a database
We will use docker and docker-compose
to add a database to our project.
We will use MariaDB (which is a fork of MySQL) as our database.
A sample docker-compose.yml
file is provided in the 05-persistence/sqlx/
directory.
version: '3'
services:
db:
image: mariadb
environment:
MYSQL_ROOT_PASSWORD: password
MYSQL_DATABASE: sampledb
MYSQL_USER: user
MYSQL_PASSWORD: password
volumes:
- data:/var/lib/mysql
ports:
- "3306:3306"
volumes:
data:
According to this file, we will have a MariaDB database running in a Docker
container. The database will be named sampledb
, and it will have a user
named user
with the password password
. The database will be accessible
on port 3306.
Let's start up the database using docker-compose
:
$ cd 05-persistence/sqlx
$ docker-compose up -d
We can check if the database is running using the docker ps
command:
$ docker ps
We can also check if the database is accessible using the mysql
command line
client:
$ mysql -h 127.0.0.1 -u user -p sampledb
When it asks for the password, enter password
. If everything is set up
correctly, you should see a prompt like this:
MariaDB [sampledb]>
Now create tables for our application. We will use the mysql
command line
client to do this. The SQL commands are in the 05-persistence/sqlx/schema.sql
:
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
username VARCHAR(255) NOT NULL,
password VARCHAR(255) NOT NULL,
status INT NOT NULL DEFAULT 1,
created TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
last_login TIMESTAMP,
UNIQUE (username)
);
CREATE TABLE posts (
id INT AUTO_INCREMENT PRIMARY KEY,
author_id INT NOT NULL REFERENCES users(id),
slug VARCHAR(255) NOT NULL,
title VARCHAR(255) NOT NULL,
content TEXT NOT NULL,
status integer NOT NULL DEFAULT 1,
created TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
UNIQUE (slug)
);
We can execute these commands using the mysql
command line client:
$ mysql -h 127.0.0.1 -u user -p sampledb < schema.sql
Now we have a database set up and ready to use. We can use the mysql
command
to verify that the tables have been created:
$ mysql -h 127.0.0.1 -u user -p sampledb
MariaDB [sampledb]> SHOW TABLES;
Using SQLx
SQLx is a modern SQL client for Rust, which is designed to be easy to use and efficient. It is designed for asynchronous operation, so it depends on executors like Tokio or async-std. We will use it with Tokio.
You can find the sample codes on GitHub
First, add the sqlx
and sqlx-macros
dependencies to the Cargo.toml
file:
[dependencies]
sqlx = { version = "0.7", features = [ "mysql", "runtime-tokio",
"tls-rustls", "chrono", "macros" ] }
About the features:
mysql
: This feature enables support for the MySQL databaseruntime-tokio
: This feature enables support for thetokio
runtimetls-rustls
: This feature enables support for therustls
TLS librarychrono
: This feature enables support for thechrono
date and time librarymacros
: This feature enables support for thesqlx
macros, likequery!
Run cargo build
to download and compile the dependencies.
Now we can use the sqlx
library to connect to the database and execute SQL
queries. We will also need the sqlx
cli tool, to install it run:
$ cargo install sqlx-cli
To use our database with sqlx
, we need to create a DATABASE_URL
environment
variable. This variable should contain the connection string for the database.
For example:
$ export DATABASE_URL=mysql://user:[email protected]/sampledb
Now run cargo sqlx prepare --workspace
from the workspace root directory
to check our implementation and store the database schema information
for later use.
To actually run our application, we need to export the database URL in
another environment variable, APP__DATABASE__URL
:
$ export APP__DATABASE__URL=mysql://user:[email protected]/sampledb
We already had an ApplicationState
configuration in src/commands/serve.rs
:
let state = Arc::new(ApplicationState::new(settings)?);
Now we have to extend this with a database connection:
let db_url = settings
.database
.url
.clone()
.expect("Database URL is not set");
let pool = sqlx::MySqlPool::connect(&db_url).await?;
let state = Arc::new(ApplicationState::new(settings, pool)?);
We create a new MySqlPool
and pass it to the ApplicationState
constructor.
In the constructor, we use the pool to configure a new UserService
implementation, this time based on MySQL:
pub struct ApplicationState {
pub settings: ArcSwap<Settings>,
pub user_service: Arc<MySQLUserService>,
pub post_service: Arc<InMemoryPostService>,
}
impl ApplicationState {
pub fn new(settings: &Settings, pool: MySqlPool) -> anyhow::Result<Self> {
Ok(Self {
settings: ArcSwap::new(Arc::new((*settings).clone())),
user_service: Arc::new(MySQLUserService::new(pool)),
post_service: Arc::new(InMemoryPostService::default()),
})
}
}
The MySQLUserService
implementation is in src/services/user.rs
.
It implements the same UserService
trait as the InMemoryUserService
, but
uses sqlx
to interact with the database. The underlying data model and
the request-response structures are the same. The constructor simply
creates a new instance of the service, storing the database pool:
pub struct MySQLUserService {
pub pool: MySqlPool,
}
impl MySQLUserService {
pub fn new(pool: MySqlPool) -> Self {
Self { pool }
}
}
Now the trait implementation uses the pool to execute SQL queries. For example,
the get_user_by_id
method looks like this:
impl UserService for MySQLUserService {
async fn get_user_by_id(&self, id: i64) -> anyhow::Result<User> {
let res = sqlx::query!(
r#"
SELECT id, username, password, status, created, updated, last_login
FROM users
WHERE id = ?
"#,
id
);
res.fetch_one(&self.pool)
.await
.map(|row| User {
id: row.id as i64,
username: row.username,
password: row.password,
status: UserStatus::from(row.status),
created: row.created.unwrap_or_default(),
updated: row.updated.unwrap_or_default(),
last_login: row.last_login,
})
.map_err(|e| anyhow::anyhow!(e).context(format!("Failed to get user by id: {}", id)))
}
...
}
The sqlx::query!
macro is used to create a new SQL query. The question
marks are placeholders for parameter binding. Here we bind a single parameter,
the id
variable. The fetch_one
method is used to execute the query and
retrieve a single row from the result set. The row is then converted into a
User
struct. If the query fails, an error is returned.
The database stores the status
enum as an integer, so we have to convert it
using the From
trait in src/model.rs
:
impl From<i32> for UserStatus {
fn from(value: i32) -> Self {
match value {
1 => UserStatus::Active,
2 => UserStatus::Blocked,
_ => UserStatus::Active,
}
}
}
impl From<UserStatus> for i32 {
fn from(value: UserStatus) -> Self {
match value {
UserStatus::Active => 1,
UserStatus::Blocked => 2,
}
}
}
The create_user
method works similarly:
async fn create_user(&mut self, req: CreateUserRequest) -> anyhow::Result<User> {
let query = sqlx::query!(
r#"
INSERT INTO users ( username, password, status, created, updated, last_login )
VALUES ( ?, ?, ?, NOW(), NOW(), NULL )
"#,
req.username, req.password, i32::from(req.status));
let res = query.execute(&self.pool)
.await?
.last_insert_id();
let id: i64 = res.try_into().or_else(|_| anyhow::bail!("Failed to convert user id"))?;
let user = self.get_user_by_id(id).await?;
Ok(user)
}
We create a new SQL query using the sqlx::query!
macro, bind the parameters
from the CreateUserRequest
and execute the query. The last_insert_id
method
retrieves the ID assigned to the newly created user. We have to convert it
into i64
because we used this type in the User
struct (another approach
is to use the u64
type for id fields).
Finally, we retrieve the newly created record, so we can return it to the
client, including the automatically calculated created
, updated
fields.
The update_user
method is quite similar again:
async fn update_user(&mut self, id: i64, req: UpdateUserRequest) -> anyhow::Result<User> {
let query = sqlx::query!(
r#"
UPDATE users
SET username = ?, password = ?, status = ?, updated = NOW(), last_login = ?
WHERE id = ?
"#,
req.username, req.password, i32::from(req.status), req.last_login, id);
query.execute(&self.pool).await?;
let user = self.get_user_by_id(id).await?;
Ok(user)
}
The delete_user
method is even more simple, because we don't have to return
anything:
async fn delete_user(&mut self, id: i64) -> anyhow::Result<()> {
let query = sqlx::query!(
r#"
DELETE FROM users
WHERE id = ?
"#,
id);
query.execute(&self.pool).await?;
Ok(())
}
After these modifications you can run cargo sqlx prepare --workspace
and
cargo build
again and the application will use the database instead of the
in-memory storage to retrieve and store user data. The login method, for
example, will only accept a username when it is present in the database.
You can implement the MysqlPostService
in a similar way. Try to do it
yourself, and if you get stuck, you can check the solution in the
code samples.