Avoid a stop and rewrite
Last year I attended a CTO summit arranged by one of Vivino’s first investors. The summit featured a number of CTOs from some very successful companies. At one point they were asked to discuss some of their greatest mistakes. I was surprised that the majority of them mentioned “that one software project that we really shouldn’t have rewritten from scratch”. How in doing so, they left out so many things that their users wanted because they tried to ship fast and how it took a long time for them to get back to where they were and how often they would burn customers in the process. It’s a classic problem that so many companies run into. I remember reading Joel Spolsky’s “Things You Should Never Do” decades ago that highlighted that very mistake and how Netscape shot itself in the foot in the process. Not only that, I have also been in companies that made those same mistakes. Usually through a rebranding where you couple a start-from-scratch system with a visual redesign to really get everyone excited. It can be very exciting! A way to take a team quickly from forming to storming and bring everyone together on a common goal. Until you release and realise no one is happy with your hard work. Your users aren’t because you removed their favourite features if not their whole reason for being there. Management isn’t happy either because it took too long and after putting pressure to release finds that the corners cut were important after all. It just never ends well.
In the beginning
I joined Vivino on June 1st, 2013. I had been hired as the chief engineer, one of the first full time in-house engineers in the company. In taking over the “backend” team I found that big ol’ hairy mess. Our backend code was written in PHP. And it wasn’t fancy PHP. There was no framework, no tests, no objects and very little structure. Just your plain script pages thrown together to get something on a web site. It was both API and Web site (see it here) in a single codebase mixing HTML, CSS, SQL and PHP in every single file. It didn’t perform, it wasn’t secure, it was hard to read and harder to change. Things had to change. First we needed to be functional. We spent a year focused on modernising the development cycle, improving visibility via logging and monitoring, splitting the web from the API and getting the infrastructure working smoothly, all while continuing to fix bugs and building and shipping new features. Then it was time to look at what might come next.
Exploration
The backend team was great, enthusiastic and eager to learn, they worked hard and delivered. All in PHP. Where could I take us from there? They weren’t experienced Object Oriented code developers and that in itself is a skill that takes a few years to really build up. They were good procedural programmers. At the same time, I wanted more stability and more security all while retaining our velocity.
While pondering the uncertain future and what it might look like I heard bold claims about functional languages like Haskell, promising drastically fewer bugs in deployed code. I even started reading Real World Haskell but by the 6th chapter I felt I still didn’t have a real world solution to anything, it was still laying very abstract foundations. No, not going to put my team through that!
Later I remembered Go from having been featured in an article about Unicode support stating that it had one of the best Unicode support of any language at the time. I looked into it and started on Go Dev’s Tutorial page. In a day or two I had most of the core concepts down. I looked for something else to get me going and found basic tools to help me get started, a web framework (Revel) and a database tool (Gorm). All were rather rough around the edges, but worked and a few test controllers later I had some of my simplest APIs working.
The pilot
Simple APIs are just that and you don’t really know how far you can take it. Something meatier was required. As it happened, the timing coincided with a new initiative we wanted to launch, an MVP of a wine list scanner that could take a photograph of a wine list from a restaurant and identify all the wines mentioned on that list with those in our database. That would be our pilot project and the race was on to deliver our first production backend in Go. In the space of a couple of months we created the APIs and set up the infrastructure we needed. Despite version 1.0 of Go only having been launched 2 years before there was an impressive suite of libraries already available (Go had been available in a beta since 2009). Finding libraries to enable quick integration with ElasticSearch and other technologies we used helped us along. Being a Go application you can make deployment fairly simple, a 6 line shell script got us off the ground. After roughly 2 months of work we delivered a working solution that was released in October 2014 using Go 1.2 and the feature was released in the Vivino app shortly after that. The backend worked great and the new feature, while still rough around the edges, made some waves. More than that, it gave us the confidence to continue using Go for our backend development and helped us deliver better results for our users.
The long hussle
With basic foundations and a strong belief in the technology, the long now was ahead of us. In the following year we re-wrote some of our key APIs and systems for scalability. The social feed was rewritten and expanded to deliver more capabilities. We started porting more APIs to Go either as on-boarding exercises or as ways to refactor older poorly performing APIs. This continued bit by bit over the years while launching new features and developing the app and the organisation. One of the key features that we rewrote was our scanning API. It had been identified as a bottleneck for high season scalability, which for Vivino is on Christmas and New Years Eve, and the problems weren’t just in code but architecture. The process, even when implemented in PHP, required multiple threads of work and ways to sync state between them. Go was always a far more ideal solution for this and as we had proved with our pilot project, it could solve this a lot more gracefully. We went from using the database and queries to sync state via PHP to doing it in memory with Goroutines. We had targeted a 10x improvement and aimed to increase throughput from ca. 150 requests per minute (RPM) to 1500 RPM. We easily delivered that, and in 2022 it was still the main endpoint for wine label uploads, and had peaks exceeding 5700 RPM on our busiest days. In between those years we gradually replaced and eventually retired our whole PHP API stack. The slow evolution over almost a decade meant we never had to stop the world just to solve an engineering issue or scalability challenges. We managed a healthy mix of working out our tech debt while creating new features and launching new initiatives.
Why Go?
Go is designed for simplicity and optimised for reading over writing. It’s a complex design challenge for a programming language to deliver that and it’s something we benefited from with easy on-boarding to a new language that delivered on multiple fronts:
- Faster code execution, improving API response speed.
- More efficient code, the same servers would handle many times the request volume of a PHP application, reducing costs.
- Compiled code is less buggy, improving reliability and reducing testing overheads, resulting in a more stable service.
In the beginning hiring developers was harder, as you had few who knew the language and not everyone was willing to deep dive into a new language with an uncertain market. It quickly changed as Go became more and more popular and for our own scale-up journey it was well timed to the market, more developers were picking it up as Vivino grew. In many markets we were one of the few companies offering a sizable Go ecosystem with a worldwide product offered at scale, which by itself attracted developers.
Evolution vs Revolution
For me though, the lessons are fundamentally less about technology choice, where we got lucky and it worked for us, but more about how to deal with big unavoidable changes: Avoid stopping the business to rewrite from scratch. When possible, choose an evolutionary path that allows you to replace one system with another over the time frame you can afford. We evolved from one ecosystem to another in a way that didn’t compromise the product or our users. We believe it’s a good model for transitioning complex systems successfully over time given an engineering organisation that can sustain that vision for the time required. Pacing change to your growth while not impeding said growth is of course the fundamental challenge for any engineering teams transitioning from startup to scale-up and beyond.
It doesn’t always work, if you are burning everyday due to scalability issues you may need a revolution to survive for the long term. Just recall the old phrase that the revolution devours its own children, and it can be as true in the software engineering sense as it can be in the political one. Revolutions are easy to start but difficult to stop. Evolve your technology at the pace that supports the needs of the business, if you can afford to do it slowly, do it slowly.