Wednesday, April 13, 2011

Building the next scalable system

Scalability has always been nightmare of most developers. If you are thinking of building web application that millions of users will be accessing at the same time, then you are not wrong to worry about how scalable your design is.

For the few years that I have developed software, I gathered few tips on how to design scalable systems. Last night during a 'blue bulb' session with my roomate (PK)  and Emma, there was a short discussion on scalability, so today I decided to put together some tips on developing a scalable system. So here are the tips

Think Stateless
The problem with maintaining state, especially in memory, is that you CANNOT scale your memory. Memory is supposed to be used for processing and not for storing data. This is why most people think scripting languages (like php and python) scale better than compile languages (like java). This is not wholely true. The problem is with the mindset of developers who have used compiled languages. With languages like C# and Java you can store data in memory from one request to another. This is not true for a php script. With php once the script is done, all variables are out of memory and your memory is free to serve the next request.

At best try to make (and treat) each request as independent as possible. In that case, your server resources consumption does not increase with increased requests. Let say you maintain 100KB of data of state in memory  for each user logged in to your system. If you have 3M users logged in to the system, you will need 100KB x 3M amount of memory to avoid your system going down.

From my experience it is BETTER to maintain state in database (if you have to maintain state at all), why? Because you can scale your database across different servers on different machines but it will be difficult and complex to share memory on different machines.

Tips
1. Use classes for data modelling
2. Decoupling is better. Make sure you dont have complex associations between your classes. For instance avoid  using List or LinkedlList to associate classes.
3. Do not put data access methods in a data model class, use a static manager class or a manager class with static methods
4. As far a possible make use of the power of your DBMS. For instance use mysql to perform searches instead of looping through a list or hashmap in memory.



Use client resources for accessing external services

If possible use client machine for accessing external services. Using content distribution networks (CDN like google, or microsoft) for loading javascripts libraries like jquery frees your server of bandwith. Also if you are developing an application that interacts with social network sites like facebook or twitter, it's better to use client side libraries than server side libraries.

Lets assume you are building a simple app that pulls a user's friends from facebook and displays their profile pictures. Let's say the user has 5000 friends on facebook. It will be unwise to retrieve the user's friends from server side. The best way will be to use the facebook javascript library then connect to facebook from the client's machine.

If possible, try to learn a scripting language (like python or php)
Scripting languages forces the developer to think in terms of scalability. I was part of a team that developed a mobile game in Java ME and backend in Servlets. We heavily used the memory for maitaining states of games in play and it worked fine. This is because java (a compiled language) supports that. We were not forced to think in terms of scalability and this is by no means the fault of java (java scales), we had a bad design of our system and we got away with that because we were using java.

So the point is, if you have experience with languages like python, you will be forced to think to scale from onset. I rewrote the backend of the game in python and it scales.

I hope the tips were useful.
Special thanks to PK Anane, Emmanuel and Robert.