February 22, 2024
In this article, we talk to David Clince, senior developer on the Matching Engine team. He discusses the value databricks has for developers and customers.
I am a senior developer on our Matching Engine team. The Matching Engine is key product in Spanish Point that uses a lot of different technologies, but Databricks has become one of the most important over the last couple of years.
Our team first started using Databricks in 2019 to ingest, parse, and load data from various different file formats into our system. This process is usually referred to as ETL (Extract, Transform, Load) and is one of the many things that Databricks excels at.
A little later on in our development, we began using Delta Tables which are another feature of Databricks. They are designed to be queried, joined, and written to at scale which makes them very efficient. We store a copy of our systems data in these tables, and they have been a huge advantage not only for ourselves and the development of the Matching Engine, but also for our customers, who can easily access these tables and use them for their own reports and dashboards.
What is Databricks?
Databricks is a unified, open analytics platform for building, deploying, sharing, and maintaining enterprise-grade data, analytics, and AI solutions at scale. The Databricks Data Intelligence Platform integrates with cloud storage and security in your cloud account, and manages and deploys cloud infrastructure on your behalf.
What is Databricks used for?
Databricks provides tools that help you connect your sources of data to one platform to process, store, share, analyze, model, and monetize datasets with solutions from BI to generative AI.
The Databricks workspace provides a unified interface and tools for most data tasks, include:
What are the benefits of Databricks?
One of the biggest benefits of Databricks for our development team is the Databricks Spark cluster, and the notebooks that we run on them. Apache Spark is built into Databricks and is a data processing framework that can quickly perform processing tasks on very large data sets and can also distribute data processing tasks across multiple computers. A Databricks cluster is essentially a group of Azure virtual machines that are started up and used together to enable Spark’s powerful parallel processing capabilities. Databricks provides a lot of options for configuring these virtual machines to ensure they are optimised for whatever job is at hand. The Matching Engine has several quite intensive jobs that run in the background, and the majority of these are run on Databricks because of the power and efficiency provided by these clusters and Spark.
Our team is not the only user of Databricks on this project though, our customers also use it for their own purposes. One of the biggest benefits for them is the ease of use. There is a built-in SQL editor on the Databricks portal for querying the Delta tables. These queries can easily be transformed into graphs and added to dashboards to be used in reports all on the same screen with just a few clicks. Databricks SQL works in the exact same way as traditional SQL (with some extra features that can make it more powerful) which is great because some customers already have some experience with SQL, and it is easy to pick up for those who don’t.
The editor itself is also very user-friendly – it will highlight any syntax or reference errors and suggest a correction if it can. Recently, an AI assistant was added that can not only spot and fix any problems in your query but can even create completely new queries for you!
How does Databricks seamlessly Integrate with our Matching Engine?
Databricks has become one of the cornerstones of the Matching Engine. One of the main ways our customers load data into the system is through the Data Ingestion module which is powered by a combination of Databricks and Azure Data Factory. There is also a regularly scheduled job that syncs all of our systems data into our Delta tables. Delta tables are designed to make this very easy seamless. This ensures the data in the Delta tables is always up to date, which is important not only for our customers who may be querying the data, but also for some of the most important jobs of the Matching Engine that make extensive use of the Delta tables. The performance of these jobs is essential, and our use of the Delta tables and Spark are crucial to keeping the execution time as low as possible.
What do our current customers use databricks for?
Databricks provides a very straightforward way to quickly query data. The data from this query is returned in a very clear table that can also easily be turned into a graph for a more visual representation of the data. These tables and graphs can all be added to something called a Dashboard, which is just a clean way of displaying multiple different tables or graphs at once and make for very effective reports. This is one of the main ways our customers use Databricks.
Some customers also use Databricks to transform some of their existing data files into a format that is supported by the Matching Engine so that they can be ingested into the system.
How do interested organisations get started with Databricks?
Databricks is a very large platform and supports a number of vastly different use cases. For example, there is an entire suite of technologies inside Databricks that I have not mentioned at all that are designed specifically for Machine Learning, so it can be easy to get a bit overwhelmed when looking at everything Databricks can offer. For organisations considering Databricks, it is important to first understand their current applications and technologies, and what use cases Databricks could be used for. I recommend attending one of our webinars or having an exploratory consultation with our team.