Building Large-scale Data Solutions in Distributed Functional Block Programming
Six years ago when we started Prescient, we were technologists with a vision. Big data was on the rise, and the world needed a better way to process data. At the time , the default approach was to develop data solutions in software. That method is still the default today, and it comes with its fair share of challenges:
It is not a fast process. Sure people called it “agile” because it is faster than developing hardware, but developing production software is slow.
As software gets more complicated, it gets worse. At one point our chief software developer said he was afraid of changing a single line of code, because it could cause the whole program to crash.
You’re stuck with the vendor who built it. The software could contain hundreds of thousands to millions of lines of code. It is hard to understand even by the developers who wrote it. You basically get stuck with the team that developed it.
This is particularly hard for industrial companies, which faces more challenges attracting the best software talent. But what choices do companies have? Software development was and still is pretty much the only choice. This is why projects are so slow, and when they are slow, failure risks are high.
As engineers, we were used to engineering software such as Labview, Matlab Simulink, RSLogix, Cadence/Synopsys, and etc. All these have one thing in common - they support graphical programming using drag-and-drop functional blocks. Why is this approach effective? Because this allows the engineers to focus on the functionality of the solution, rather than on debugging fundamental software issues. So we thought, why can’t we build large-scale, distributed data solutions this way?
From our advisors to our customers, everyone thought this was a great idea. The only question was, “is this possible?” It took us a few years, but we did it. We started out with open-source tools, and fixed reliability and scaling issues along the way. One of the key challenges was finding a way to support distributed data processing across geographical locations - think edge and cloud. We did that by building containerized data engines that could be deployed and managed across physical and logical locations. Then we had to optimize and monitor CPU and memory of each of these containers to support high-speed and high-volume data. After that we had to build a dashboard solution that can rival the performance of a web application but has the drag-and-drop development capability we were after.
Once we’ve proven performance and scale, it has become a no brainer. It is so much faster to build, easier to iterate, and simpler to understand as compared to software solutions. This increases the speed of innovation and improves the rate of success.
We’ve borrowed an approach from engineering while tackling the technology challenges along the way to deliver a solution that is faster and more agile than traditional software development. With the functional block diagram programming, we’re able to keep it simple even as solutions scale and get more complex. And finally, by leveraging the open-source tools we started with, we are helping to keep our solution open and flexible, so the solution is not tied down with any specific vendor.
Today, our solutions process billions of data points, hundreds of millions of database queries, and run at hundreds of locations, for critical operations that customers rely on. We firmly believe that this approach will replace software development in building large-scale data solutions. See how we help our customers to scale their data solutions through our customer stories.