Going up in smoke… I mean, moving to the cloud.

Going up in smoke… I mean, moving to the cloud - horizontal scaling - Sysbee

Going up in smoke… I mean, moving to the cloud.

There are a lot of posts around (even on our website) that will be more than happy to tell you why you should move to the cloud; be it scaling, security, believing it will solve all your problems, it’s good for you resume or FOMO, you name it. In this post we’re not going to argue if you should opt for a monolithic or a microservice architecture, if and why you should do it or the benefits of either approach.
You probably know your business needs better than anyone anyway.

This post is mostly going to be about adapting what is a traditional monolith into something that can somehow scale. We’re not going to go into REST APIs, service meshes, CI/CD, …  (maybe another time). However, the things we will see here will help you reason better about the cloud native approach and maybe get a clearer picture of what’s going on.

If you’re looking for some basics to get you started, look no further. 🙂

We’ll be focusing mostly on PHP, but I’m pretty sure the skills are transferable to other programming languages.

It’s time to get serious and dive into it.

Sessions

Let’s talk about sessions and session storage and why does it matter.

The standard way to set sessions in PHP is using the session_start() function.

<?php
session_start();
?>

This will give us access to $_SESSION global variable in which we can store whatever information we wish.

Once the HTTP response is returned, it will contain a cookie, usually PHPSESSID with a random string value. On your next request, your browser will send the PHPSESSID cookie and PHP will be able to retrieve the information it stored in the $_SESSION variable.

So far so good, nothing new we all know how cookies work.

How does this work under the hood?

By default, PHP will store the session variable in a file on the server (usually in the /tmp or /var/lib/php/something directory). So when the next request comes, even on a different PHP script, with a PHPSESSID cookie, PHP will still be able to retrieve the information stored in the session variable.

This is all fine while our application is confined to a single server.

Server - Session handlers - Sysbee

This becomes a bit problematic when we decide we want to scale horizontally.

If you’re not familiar, horizontal scaling is when we add more instances of our application. The entry point to our system is a load balancer, which distributes requests across our application instances, which are working independently from one another.

This is in contrast to vertical scaling, where we just throw more hardware/resources to the existing server.

Let’s think about what happens to our sessions when we have multiple instances, for the sake of simplicity let’s assume we have 2 instances (call them worker A, worker B).

The first request that comes hits worker A, which will set a PHPSESSID variable that it saves on its local filesystem.

On the next request, the client will send the cookie it got and if we’re lucky it will land on worker A that has the information stored locally. If we’re unlucky it will land on worker B. Worker B does not have the same information stored locally as worker A has and has no idea what to do with the session information it got.

This would result in lost shopping carts, logout users in the middle of doing some work, spamming the user with cookie consents banners and various other annoying behavior.

We cannot rely on luck to keep our users happy so we take a different approach.

Session handlers

Lucky for us someone already thought about this and figured out it would be a good idea to be able to store sessions somewhere else.

PHP can be configured to use different sessions handlers, other then the local filesystem. As a personal preference i would suggest you consider Redis for the task.

The reason for choosing Redis is quite simple. It’s very performant and can be setup in a High Available fashion, which is something you may consider when running things in prod.

However, we’re not here to praise Redis, but to see how we can use it for Session storage. This just requires a simple config change in our php.ini config file:

[Session]
session.save_handler=redis
session.save_path="tcp://redis_host:6379"

Replace redis_host with the IP or hostname of your Redis instance and you’re good to go (if you setup Redis on a different port you need to change that as well).

How does this help us?

Let’s return to our previous example of multiple servers and annoyed users.

Now when worker A sets up a new session, it will store it in Redis instead of the local filesystem. If a second request hits worker B it will be able to retrieve the information from Redis and will have access to information the worker A stored in it.

Hooray! We can now add as many workers as we want and everything will work fine. We’re only limited by our imagination, infrastructure budget and Redis memory limit. 🙂

Please do note that this is a simplified case. Your application or framework may not use PHP session (may not be even written in PHP), so consult your senior developer before making any changes that will get you fired.

Talk is cheap. Show me the code

All of the examples you see in this post are available on https://gitlab.com/andrea181/blog-post-examples. You will find more information about the setup in the Readme page. To keep this post from turning too long, we’re just going to focus on the most important parts.

Consider the following piece of PHP code

<?php
session_start();
echo "hello from: ".gethostname()."\n";

if(isset($_SESSION['hostname']) && !empty($_SESSION['hostname'])) {

  echo "you already have a cookie!\n";
  print_r($_SESSION);


}else{

  echo "Looks like you don't have a cookie. Here you go!\n";
  $_SESSION['hostname']=gethostname();
  print_r($_SESSION);

}

The logic is very simple.

We start a session, say hello to the user from the server.

If already have a $_SESSION['hostname'] we inform the user that the cookie is already set and just print out whatever it’s in our session variable, if not we inform the user that there is no cookie set and will set the $_SESSION['hostname']variable to the hostname of our server.

example1: single application instance with local session storage

Let’s see what happens when we make a request ( The response headers are truncated for brevity):

andrea@grunf:~$ curl -i https://testko2.sysb.ee  
HTTP/2 200 
set-cookie: PHPSESSID=fbce8eca27c7ae8825bce405ce188be8; path=/


hello from: e3c793ed4765
Looks like you don't have a cookie. Here you go!
Array
(
    [hostname] => e3c793ed4765
)

We didn’t have a session cookie, so we hit the second case and got a set-cookie header set with the value PHPSESSID=fbce8eca27c7ae8825bce405ce188be8

When we make the second request with the same cookie, we hit the first statement

andrea@grunf:~$ curl -i https://testko2.sysb.ee  \
--cookie PHPSESSID=fbce8eca27c7ae8825bce405ce188be8
HTTP/2 200 

hello from: e3c793ed4765
you already have a cookie!
Array
(
    [hostname] => e3c793ed4765
)

If we look in the /tmp directory we will see the session cookie stored there

bash-5.1# ls /tmp
sess_fbce8eca27c7ae8825bce405ce188be8
bash-5.1# cat /tmp/sess_fbce8eca27c7ae8825bce405ce188be8 
hostname|s:12:"e3c793ed4765";bash-5.1# 

So far so good, everything works as expected.

Let’s see what happens when we horizontally scale our app.

example2: multiple instances with local session storage

andrea@grunf:~$ curl -i https://testko2.sysb.ee
HTTP/2 200                                                                                                                                                                                                                         
set-cookie: PHPSESSID=a659f82e5f73329d33f68c540c6d43db; path=/


hello from: 109ce0ffa809                             
Looks like you don't have a cookie. Here you go!
Array                                                
(                                                    
    [hostname] => 109ce0ffa809
)   


andrea@grunf:~$ curl -i https://testko2.sysb.ee  \
--cookie PHPSESSID=a659f82e5f73329d33f68c540c6d43db
HTTP/2 200                                           

hello from: fed3907e49e4                             
Looks like you don't have a cookie. Here you go!
Array                                                
(                                                    
    [hostname] => fed3907e49e4
) 


andrea@grunf:~$ curl -i https://testko2.sysb.ee \ 
--cookie PHPSESSID=a659f82e5f73329d33f68c540c6d43db
HTTP/2 200                                           

hello from: 109ce0ffa809                             
you already have a cookie!
Array                                                
(                                                    
    [hostname] => 109ce0ffa809
)

Our first request hit the application running on hostname 109ce0ffa809. We got provided with a session cookie.

The second request hit the application running on hostname fed3907e49e4. We sent the cookie we got in the first request, but the application had no use for it. The third request hit again the application running on hostname 109ce0ffa809 which had the cookie saved locally and we got the expected response.

This is obviously not good, so let’s try setting up Redis as our session storage handler.

example3: multiple application instances, redis as session storage

andrea@grunf:~$ curl -i https://testko2.sysb.ee
HTTP/2 200 
set-cookie: PHPSESSID=75b4888f528382b28520e2b5479accf3; path=/

hello from: 562d348d9220
Looks like you don't have a cookie. Here you go!
Array
(
    [hostname] => 562d348d9220
)


andrea@grunf:~$ curl -i https://testko2.sysb.ee \
--cookie PHPSESSID=75b4888f528382b28520e2b5479accf3
HTTP/2 200 

hello from: c243a9ca754c
you already have a cookie!
Array
(
    [hostname] => 562d348d9220
)

As before, out first request was cookieless and the application running on hostname 562d348d9220 set the cookie for us with value 562d348d9220. The second request landed on host c243a9ca754c. The application running on this host has access to the sessions stored in Redis and is able to get us the expected response.

We can take a look at what’s going on in Redis and see that the sessions are indeed stored there

root@1ac28e4b7746:/data# redis-cli 
127.0.0.1:6379> keys *
1) "PHPREDIS_SESSION:75b4888f528382b28520e2b5479accf3"
127.0.0.1:6379> GET "PHPREDIS_SESSION:75b4888f528382b28520e2b5479accf3"
"hostname|s:12:\"562d348d9220\";"
127.0.0.1:6379>

User generated content/media uploads

Another thing we need to take in consideration is what happens if our application allows users to upload files.
The problem here is pretty much of the same nature as with sessions discussed in the previous section. If we store user generated content on the local filesystem we cannot horizontally scale.
One instance will have access to the uploaded content, while the other won’t.
Unfortunately, there is no silver bullet here.

If we’re running our application on multiple bare metal/VMs that are “close” (by close we mean low network latency between them), we might consider using NFS.
This will most likely work for most use cases. The benefit here is that we don’t need to do any application changes for it to work.

However, using NFS is not always an option.
If you’re running your application in a public cloud provider environment, check out what the provider has to offer. As we mostly work with AWS, the go to answer here is usually S3.
It integrates nicely with CloudFront so you have the benefit of cheap storage + CDN.
The problem is you will have to figure out a way to make the application store the files on S3.

Maybe you are worried about vendor lock in or are not comfortable using a managed storage solution.
Fortunately, there is still hope for you. You can find products that are built for this problem such as MinIO, which is an object storage solution with an S3 compatible API.

How to write application code that will use S3 object storage for uploads is out of scope of this post. But not to worry, we’re going to see an example of everything we learned here.

Putting everything together

All the examples we have seen so far have been relatively simple use cases.
While this might be useful to understand the underlying concepts, it would be interesting to take a look on how we might adapt something that is traditionally monolithic into something scalable.

We’re going to take WordPress as an example, as it’s one of the most popular systems out there.
You probably won’t ever need to run it this way, but it will be a nice experiment and we will get to see some techniques for deploying.

If you already took a look at the examples you saw that we used docker compose to set everything up. We will be doing the same here and to emulate NFS storage we’re simply going to share volumes between the containers.

Let’s check the docker-compose file we’re using to simulate out infrastructure.

version: '3.7'
services:

## worker 1 start
    apache:
      image: apache:local
      ports:  
        - "81:80"                   

      volumes:
        - fpm-socket:/var/lib/php7/fpm
        - codebase:/home/appuser/public_html
        - uploads:/home/appuser/public_html/wp-content/uploads
    php:
      image: php:redis-storage
      volumes:
        - fpm-socket:/var/lib/php7/fpm
        - codebase:/home/appuser/public_html
        - uploads:/home/appuser/public_html/wp-content/uploads
    wordpress:
      image: wordpress:5.9
      volumes:
        - codebase:/home/appuser/public_html
        - uploads:/home/appuser/public_html/wp-content/uploads
      env_file:
        - ./.env

## worker 1 end
## worker 2 start

    apache2:
      image: apache:local
      ports:  
        - "82:80"                   

      volumes:
        - fpm-socket2:/var/lib/php7/fpm
        - codebase2:/home/appuser/public_html
        - uploads:/home/appuser/public_html/wp-content/uploads
    php2:
      image: php:redis-storage
      volumes:
        - fpm-socket2:/var/lib/php7/fpm
        - codebase2:/home/appuser/public_html
        - uploads:/home/appuser/public_html/wp-content/uploads
    wordpress2:
      image: wordpress:5.9
      volumes:
        - codebase2:/home/appuser/public_html
        - uploads:/home/appuser/public_html/wp-content/uploads
      env_file:
        - ./.env

## worker 2 end

volumes:
    fpm-socket:
    codebase:
    fpm-socket2:
    codebase2:
    uploads:

In each worker section we have 3 services running. One for Apache, one for PHP and one for the WordPress container.

All the containers in one worker share the codebase volume (this is defined as the document root for our vhost in Apache). The fpmsocket volume is shared between the Apache and PHP container, so the FCGI can work via UNIX sockets.

All the services in both worker sections share the uploads volume, this is what is going to emulate our NFS storage for the media content.

Just a quick side note on the setup. While we could have packaged everything in one container, we have multiple benefits with having each service in it’s own container.
For starters we can update each service individually, which will be much quicker than rebuilding a huge container and will also be easier to revert if something goes wrong.
The second benefit is that we can easily separate the logs from each service. This would be much more complicated if the stdout/stderr of a container would contain mixed logs from both Apache, PHP and WordPress.
Also, one day when you decide to move to K8S, this is the way to go 🙂

The WordPress container in our setup, is used just to bootstrap everything.

If we look in the Dockerfile

FROM alpine:3.13
RUN apk add --no-cache bunch of packages
RUN  addgroup -g 1337 -S appgroup && adduser -G appgroup -S appuser -s /bin/bash -u 1337
COPY bin/sync.sh /root/sync.sh
RUN chmod +x /root/sync.sh
COPY --chown=appuser:appgroup codebase/ /tmp/codebase/
RUN chmod 711 /home/appuser
ENTRYPOINT /root/sync.sh

We see that all it does is setup our unprivileged user, copying the codebase to a temporary directory and on container startup it will execute a bash script.

The bash script will contain the instruction for our deployment procedure.

#!/bin/bash
rsync -a --delete --exclude "wp-content/uploads" /tmp/codebase/ /home/appuser/public_html/
chown appuser:appgroup /home/appuser/public_html/wp-content/uploads/
sudo -u appuser -i php /home/appuser/public_html/wp --path=/home/appuser/public_html config set DB_HOST $DB_HOST
sudo -u appuser -i  php /home/appuser/public_html/wp --path=/home/appuser/public_html config set DB_USER $DB_USER
sudo -u appuser -i php /home/appuser/public_html/wp --path=/home/appuser/public_html config set DB_PASSWORD $DB_PASSWORD
sudo -u appuser -i php /home/appuser/public_html/wp --path=/home/appuser/public_html config set DB_NAME $DB_NAME
sudo -u appuser -i php /home/appuser/public_html/wp --path=/home/appuser/public_html core update-db

In the script, we copy the content from the container into the codebase shared volume.

With the wp-cli command (php /home/appuser/public_html/wp in the script) we setup all the variables needed for wp-config.php (we get those from the .env file)

The last line of the script is the most interesting one. Here we’re calling wp core update-db which upgrades our database schema to the one needed for the WordPress version we’re running.
This will help us out when we want to upgrade.

The upgrade procedure in this case would be to rebuild a new WordPress container with the latest codebase and change the tag in the docker-compose file.
When we restart the WordPress container, the new codebase will be deployed on the shared volume and the database will be up to date with our version.

A couple of noteworthy things

If you try out the last example, you will notice that nothing is stored into Redis. This is because WordPress does not use the PHP session handler to handle sessions, they are stored in the database.

Also, to keep things simple we haven’t consider how we handle plugins and themes. But if you got the gist of the deployment procedure it won’t be so hard to implement that as well.

I would suggest you take a look at wp cli plugin and wp cli theme documentation pages.

Share this post