Looking at data analytics should be a natural step for almost everyone in their working roles today. After all, who doesn’t want more business intelligence supporting their opinions, experiences, and thoughts on how to improve results? Data should aid decision-making as well as informing whether those decisions are correct.
At least, that is the goal. Instead, we find ourselves bombarded with so much data that it can be difficult to find the right data when we need it. According to NewVantagePartners’ research, the percentage of companies stating that adoption of big data has been a challenge has risen over the past two years, from around 65% in 2018 through to over 73% today.
About the author
Matt Yonkovit, Chief Experience Officer, Percona.
Additionally, we need to consider how we can more quickly access that data. The biggest companies in the world see slow data performance resulting in revenue drops with every additional millisecond. In 2006, Amazon estimated a 1% drop in sales for every 100 millisecond extra it took to load pages. Just over a decade later, research from Akamai in 2017 found that the same delay would lead to 7% fewer conversions on eCommerce platforms. The need for speed has not gone away — in fact, it now affects more companies than ever.
Building for real time – what do you need to bear in mind?
Finding the right data, and finding it quickly, means we need to build services that work in real time. We need to think about customers’ expectations and how to serve them. And, we have to meet our budgets for the IT infrastructure that underpins those services.
Meeting those three requirements is easier if you are a large enterprise with extensive teams at your disposal. However, this is generally not the case if you are a smaller business, or if you have limited resources at your disposal. Here, it is important to start considering how open source can help.
For companies with limited budgets, open source can be a great option. Operating systems, database software, applications and other stack components can be accessed and used based on the availability of source code. Often, full versions are available for commercial use that don’t have cost implications and can help you scale up to meet many of your technical challenges. Open source can help businesses of all sizes manage greater volumes of data.
However, moving to open source alone is not enough to support real-time performance requirements. Instead, you have to look at how you can get the best performance out of all the elements that make up your infrastructure. At the heart of this is your data — or more specifically, your database.
The importance of databases
Picking the right database design at the start can make a huge difference to the success of your operations. The evaluation needs to cover a range of requirements. For example, a NoSQL database like MongoDB would be able to support fast scale and deployment, as it can expand rapidly and is very easy to implement. However, MongoDB is not widely available as a managed service — you either buy from MongoDB itself or run your own instance.
Alternatively, you might find a relational database is a better fit for your business and your service. The SQL query language is widely used, so finding people that can use this data is easy. Databases like MySQL are easy to run and great where the schema is already defined. Equally, PostgreSQL is a great option for services where data types like time-series or geo-location are needed in real time. PostgreSQL supports SQL, making it easy to support and manage data, but it also handles these data use cases with ease.
Each of these open source databases can support hundreds to thousands of concurrent user requests — however, scaling up into the hundreds of thousands, or millions, of simultaneous users, can put a lot of pressure on that database instance. It is essential you explore how to improve performance and tune your database.
Data in context
Database tuning is an area that requires some genuine experience. For most companies, understanding how databases really work and how to tweak them is not something they have on-staff. Getting good consulting advice can, therefore, be vital to help you meet your real-time service goals. This helps you avoid common mistakes, learn how to expand using clustering or other scale techniques, and give you the assistance you need to make an informed decision on the right approach for the future.
Equally, it’s worth looking at the rest of your application too. Although your database may be performing well, it might appear to be a bottleneck for requests. If it is not integrated well with other application components like message queues or analysis tools, then slow performance in those areas might affect how quickly service requests get processed. It is worth looking at these other application elements in context, to see how open source offerings can help.
For example, your storage system can affect your performance — every file, including your database, has to sit on some form of storage over time. With more data coming in, using compression to reduce the amount of space your data takes up makes sense, but it can impact performance when you have to read or write using that data.
Storage and compression
A poor approach to storage and compression can slow down performance, costing you customers while you try to manage overheads. Using the open source file system ZFS can address this problem, allowing you to compress your data and save on storage costs while not impacting performance.
Similarly, looking at how you manage the queue for messages going through a system can improve results overall. Offerings like RabbitMQ can manage the flow of messages through the application over time, helping you scale up the overall volume of requests that your application can handle for the same amount of hardware. By picking up efficiency gains like this, your application can handle more concurrent users in real time for the same level of spend.
Whatever the size of your business, you need access to the right data, quickly — and being able to serve your customer’s growing needs in real time is essential. This demand used to only be a concern for companies like Google and Amazon. Now every company has to think in real time. Open source technologies can help your business meet that demand.