Scrapbook — A micro-service based photo sharing application.

8 min readFeb 9, 2022

Scrapbook is a micro-service based application to upload, download and share pictures that demonstrates the power of a distributed system architecture by making use of modern technologies. In this blog I am going to walk through the whole process of how the application was created. The GitHub repo for the project can be found here.

Sections

Technology Stack
Design
Development
Automation
Deployment
Testing (Load & Fault Tolerance Testing)
Cloud Native Compliant System
Reflection

Technology Stack

Angular 11 (TypeScript)
SpringBoot (Java 15)
Flask (Python)
MongoDB Atlas
Redis Cloud
Google Cloud Storage
Apache Kafka
Docker
Jenkins
Kubernetes
Google Kubernetes Engine
JMeter

Design

napkin diagram for scrapbook that shows a high level concept of the app — A high level overview of what you can do on Scrapbook

We paid a lot of attention to the design phase of the project where we started with discussing what the application could do, what was in scope and the feasibility given the time frame that we had.

See the mockups here. (made using Adobe XD)

Development

Microservices

User Interface: Built it using Angular 11 with Bootstrap and Material Design. Supporting UI for all the app functions.
Authentication Service: Built using Flask and Python. This services authenticates the user either by a local strategy or through Google OAuth. This also validates the token in the user request.
Session Service: Built using NodeJS and Express and Redis to manage user sessions.
User Service: Built using NodeJS and MongoDB. Stores and retrieves users.
Gateway: Built using Flask and Python, this service aggregates incoming requests from the client and distributes them to the required services.
Image Service: Built using SpringBoot and Java, this service is responsible for storing album and image information in MongoDB and communicating with Google Drive Service.
Google Drive Service: Built using SpringBoot and Java, this service is responsible for communicating with Google Drive. All images are being stored on the cloud.
Meta Data Extractor Service: Built using Flask, this service extracts meta data from the uploaded image. This happen asynchronously as uploaded images are placed into a Kafta topic and passed into the extractor service, which then saved the data into a Mongo database.

Architecture

Below is our resulting architecture,

Containerization

Deploying and maintaining multiple services can be a really hard task due to the fact that these services use a plethora of technologies. To make these services OS independent, these were containerized using Docker. A Dockerfile was created for every service, which was then used to create the image and push it onto Docker Hub. To test if these services were working in their containers, docker-compose was used.

Automation

Establishing CI/CD

We have used Jenkins for our CI/CD that is deployed on a VM on JetStream. We have also set up a GitHub web-hook to automatically trigger builds when a push is made to respective micro-service branches. Each micro-service has its own Jenkinsfile. Below are the steps in our Jenkins pipeline,

Clone repository: Getting the repository from GitHub
Prepare repository: due to our folder structure (with the files being in a sub-directory), we need to process the folders in order to access the files.
Build repository: installing the dependencies using npm, mvn or pip.
Test repository: running the unit tests for the services.
Building image: building the docker image using the Dockerfile.
Pushing image: pushing the built image to Docker Hub.
Deleting local image: cleaning up the local image so that dangling images do not consume space.
Deploy on Kubernetes: ssh’ing in the Kubernetes master node and applying the deployment/service config files to create a pod/service.

Continuous Deployment

For CD we tried using various plugins on Jenkins for Kubernetes but they were all not compatible with the current version of Jenkins. So we used a plugin called SSH Agent to first establish a connection between Jenkins and the Kubernetes cluster and then copying over the config YAML via scp and then applying the config so that these pods/services can be deployed on the cloud. Therefore the overall workflow of our CI/CD is shown below,

Deployment

Kubernetes

To create the Kubernetes deployment and service scripts, we used Minikube to test it out locally. Minikube is an application that allows us to locally create one node of Kubernetes. This way, we had the freedom to play around with the configurations and also debug it.

Each service has its on deployment/service config files that are applied on the Kubernetes master node. The gateway and the UI have external IP (LoadBalancer type service) ports as they should be able to be accessed from outside. Our rationale for making the Gateway externally available is so that applications that are hosted on another cloud service can still use the scrapbook APIs. All the other communication is internal, thus happens via their ClusterIP (internal) ports and will not be available to applications outside the Kubernetes cluster. An example of a Kubernetes config YAML is given here.

Kubernetes Cluster Creation

Using this blog we have created Ansible scripts to set up the Kubernetes cluster. The script uses the node created in the previous section, and installs Kubernetes on them. Then it assigns them roles using various security groups. The master node has port 6443 open for the Control Plane to be set up. The worker nodes have port 30000–32767 open. These are ports described in the Kubernetes documentation. By configuring the floating IP of the master node, the Ansible script configures the Master and the Worker nodes.

Testing

Through load, fault tolerance and spike testing using JMeter, we could qualitatively define our system limits and behavior towards a large number of users. Below are some important results.

Better performance for more replicas

Through the load tests, we observed the performance (throughput) was much better when testing with services with 5 replicas than that with 3. Having that said, for all the tested functions we had an average of 50+ requests per second for 3 and 5 replicas. Below are a few graphs that compare the stress test for 3 and 5 replicas by comparing the throughput and error rates.

Effect of fault tolerance testing

Through the testing, we feel that our system is fairly fault tolerant. We manually killed instances on our Kubernetes cluster to test out how our system reacts to failed pods. The errors spike for a brief moment (a few second) spiked but Kubernetes was able pull up new instances immediately.

Cloud Native Compliant System

For this project, we scaled the infrastructure to handle user requests even if the IU VMs were down. We did this by deploying our application on the TACC cloud, and then used a sidecar mechanism to re-route the requests to the TACC servers in the event that the IU cloud was down. HAProxy, that handles the re-routing, is hosted on a VM in the IU cloud and therefore, we have made an assumption here that this VM never goes down. This also enables us to do blue-green deployment between IU and TACC clouds.

We also decided to have an external load balancer, which would increase the security and stability of our system. We decided to go with HaProxy as it is a widely used open-source Elastic Load Balancing Server. Using this, as mentioned above, we can do blue-green deployments on cluster level traffic. It has nice options to configure:

Active-Passive: where requests are routed TACC server if and only if IU is down.
Round Robin: it is split 50–50 to have a balanced load.
Blue Only: all requests are routed to the IU cluster.
Green Only: all requests are routed to the TACC cluster.

Reflection

Here I’ve listed my major take aways, challenges and learnings from the project.

Using environment variables

Before Dockerizing, all the URLs in the services (such as the gateway) were all hard coded into the code. We realized that this wouldn’t work while deploying and would be better to set it dynamically during build time. Therefore we changed all the variables to use environment variable so that that they can be passed either via the Dockerfile, docker-compose or the Kubernetes deployment script.

How container ports work

While writing the Kubernetes deployment scripts, learning how container ports work and how they’re mapped with their services really helped in configuring our whole system. Minikube helped out to locally debug issues.

Using third-party services

We had to make a few tweaks to our architecture and then redeploy. Previously, we were using free tier of 2 cloud services namely, Redis Cloud and MongoAtlas, to handle our persistent storages. These cloud services have various restrictions that hindered our system’s capabilities. For example, the throughput and error rates for logging in was about 10req/s and 90% respectively due to the fact that Redis Cloud rate limits requests. Redis cloud only allows 30 requests per second which is why the cloud would throw an error, bringing down the login.

We solved this by spinning up a VM on Jetstream for Redis as well as MongoDB. So essentially, our full application is not on Jetstream apart from the ones that use the Google API. This significantly improved the throughput and reduced the error rate.

Through this project I’ve not only had the chance to apply modern dev-ops and cloud native technologies but also get a complete understanding of how a distributed system architecture works. After putting in endless hours and having multiple discussions with my mentors to try debug a problem and put up a great application, its great to hear this,

You can find the GitHub repo for the project can be found here. Thank you so much for reading!