11th July 2022

Josep Stuhli On Scaling to 20 Million Users

Josep Stuhli is CTO of SofaScore. SofaScore has live scores from more than 600 soccer leagues world wide. According SimilarWeb it has around 50-100 million totals visits per day. Headquartered in Zagreb, Croatia, most users come from Brazil, Italy, USA, and GB. On DORS/CLUC 2022, a Linux and free software conference held in Zagreb, Josep Stuhli presented the evolving architecture of the employed hard- and software used by SofaScore.

Initially, in 2010, they started with a single server running Linux, Apache, MySQL, and PHP. They still use PHP today. That configuration would crash, when more than 1,000 users visited the site sumultaneously. They added NGINX in front of Apache. They added memcached but later switched to varnish. Caching helped them a lot. Though they quickly found out that they now suffer the "Thundering Herd Problem":

With two servers, one web, one database server, they were able to handle multiple thousand users.

At one point in time they even manually took a HTML snapshot of some of their most visited pages and uploaded them as static pages.

Varnish solved their "Thundering Herd Problem" by employing "Request Coalescing".

They switched from their on-premise servers to Amazon cloud. That provided some form of scaling, but they found out the hard way that it became quite expensive. By moving back again to on-premise servers they were able to reduce their cost by 1:10! They used Amazon Auto Scaling Group. In 2012 and with their Amazon setup they were able to handle 3,000 concurrent users.

They switched from MySQL to MongoDB. They wanted replication, master election. But turned out to be a not so good decision. They finally switched to PostgreSQL. As Josep Stuhli said:

We were yound and naive.

One important feature from PostgreSQL was "minimal locking". MVCC (multiple version concurrency control) is important for SofaScore. They can now do one million SELECT queries, and 90 thousand transactions, consisting of one SELECT, three UPDATE, one INSERT. This is all on one database server.

They use beanstalkd.

Now they use Varnish in front of their applications servers.

As their access pattern of their web-site is quite special, i.e., the users come at very specific times, not just randomly over the whole day, they were experiencing issue with Cloudfare, as this was wrongly recognized as DDOS. They even flew to Cloudfare headquarter to discuss this matter. Cloudfare would then write special rules for them. With this they were able to achieve 300 thousand concurrent users in April 2018. They use Cloudfare as their load balancer.

They use ClickHouse for analytics.

They span up different Varnish servers all over the world to cope for international growth. By this they could reduce wait time by 8th in Australia. In December 2020 they hit one million concurrent users per minute. In October 2021 they grew to 1.5 million concurrent users per minute.

They use NATS messaging.

According his slides the SofaScore users generate more than 600 TB of data per month, and more than 180 billion requests per month. Their growth is phenomenal.