TL;DR — Crowdmap operates on a Rackspace Cloud-based infrastructure. We run nginx, PHP-FPM, MySQL and memcached. We have servers in Dallas, Chicago and Tokyo, and use Amazon's Route 53 (DNS service) to direct users based on their latency to the datacenters. New Crowdmap was designed from the start to be highly bandwidth efficient for mobile users and implements innovative techniques and bleeding edge technologies to achieve this.
Howdy! I’m Evan Sims and I’m a senior developer at Ushahidi. Aside from getting to build awesome software, I also help oversee our server infrastructure. Today I'd like to talk to you about the technologies and effort that go into keeping Crowdmap online and running smooth.
At Ushahidi we get excited about building tools that empower individuals, communities and organizations to do amazing things. We’re constantly trying new things and pushing ourselves into new territory, because we know that in the end our tools will turn out the better for it. Crowdmap has been a particularly challenging and rewarding adventure for us. It was our first attempt at a large scale hosted service, and meant hosting tens of thousands of Ushahidi installations, managing software upgrades, securing servers, ensuring databases were backed up, monitoring performance, providing support and delivering on expectations of availability. It was and continues to be no small endeavor.Scaling out and growing up
[caption id="attachment_13290" align="alignright" width="150"] They aren't kidding.[/caption]When I joined Ushahidi in December of 2011, my first task was to transition us to a robust hosting platform that would give us room to grow. Crowdmap at the time was with a provider that, while operated by great folks, just wasn’t able to keep up with the service’s exploding growth. I immediately began moving us to Rackspace’s Cloud platform, who we continue to run on today. They’re an amazing company with a stellar team, and I really enjoy working with them. After a few months of building, tuning and rigorous stress testing (I was a bit paranoid about my first task being a horrific, flaming failure), we flipped the switch and Classic’s new infrastructure went live. This new infrastructure was comprised of a robust load balancer and reverse proxy, two app servers, and a MySQL replication cluster with a total of 16GB of RAM. We also switched from Apache to nginx and PHP-FPM, which was a tremendous boost in and of itself. This was also the first time Crowdmap had true redundancy and fail-over support. There were growing pains, of course; configurations had to continue to be tuned over the year that followed to fix fringe cases and allow from extremely large deployments to continue to run as smoothly as possible, but overall the move went off splendidly.Going global with Version 3
A few months ago we launched a completely new version of Crowdmap, written from the ground up to be something new and different from the core platform. This was an interesting development journey (which I wrote about recently over on my Medium account), but when it came time to launch it meant expanding our server infrastructure.
I am the Keymaster, are you the Gatekeeper?
Every byte counts.
We built New Crowdmap from a blank slate, and in doing so re-evaluated everything. Classic was a monolithic platform that emphasized volume and management from a deployer perspective, and it perfectly fits that need. We wanted to take a slightly different approach for this reboot, however, and make something more approachable to everyday users that emphasized the importance of content ownership, and obliterated all friction to the posting process. This of course meant that mobile had to be a priority for us. Mobile is a very tricky beast. Aside from dealing with the quirks of different browsers and variation of capabilities based on operating systems, there is the issue of network performance. Most people in the world aren’t breezing along on LTE. They have to make do with EDGE or 3G: extremely narrow pipes. Worse yet is the latency overhead of mobile connections. Your phone is constantly hopping between cell towers, re-established its data connection, dealing with packet corruption. We knew we wanted to be able to deliver stunning high resolution photos to users, but also had to accommodate users on slower connections. This seems like a contradiction, but it really just meant we have to be smart about what we’re sending to users in these different conditions. We’re doing a lot of leg work with every photo uploaded to us. We generate four different sizes of the photo and only deliver the one appropriate for the device’s screen resolution. We use a scrapping process to remove every unnecessary byte of metadata from produced images (and of course to remove sensitive EXIF metadata like embedded geolocation stamps.) We convert PNGs into JPGs as we don’t need the transparency, and JPG offers better compression options. We convert to Progressive JPG to avoid Mobile WebKit resolution limitations, and let users get a preview of media as it’s loading. We also produce alternative versions of each photo resolution using Google’s WebP format, which offers insanely good compression ratios – often 50% smaller than JPG – and deliver those when the device supports it. I built a library called uxImage (available here) that helps make it all work on the client side.
I feel the need for speed!
Final thoughts
Infrastructure and application scalability are complex and constantly evolving issues, as anyone in the industry will tell you. Every single day we’re finding new ways to optimize, squeeze out a little more performance, and improve service reliability. Since our transition to our new infrastructure we haven’t suffered a major incident of downtime thanks to careful planning and a strong backup and recovery strategy. I'm really happy with how far we've come with our infrastructure, and I'm excited to think about where we'll be a year from now.Feel free to tap my shoulder on Twitter, App.net or Google+ if you'd like to talk about any of this in more detail. You can also reach me at evan@ushahidi.com, but I try to avoid my inbox as much as humanly possible.