There is a lot of hype towards AI today. The capability of AI is often exaggerated, which makes it difficult for subject matter experts to determine the effectiveness of AI in their projects. This article provides an explanation of what AI is and some guidance on how to select AI solutions.

“What is AI?” A brief definition

While people may argue about the definition of AI, it generally refers to machine learning algorithms. A machine learning algorithm is one where the algorithm or “model,” as it is called in data science, is figured out through regression (i.e. curve fitting) over a set of data. A classic machine learning algorithm is purely data-driven.

AI models vs. physics-based algorithms

This contrasts with physics-based algorithms, which are deduced based on physics principles. Let’s say we want to predict the fatigue on equipment operating in the field, and we think that the fatigue is a function of the equipment speed and pressure as follows:

Fatigue = a[1] * speed ^ n[1] + a[2] * pressure ^ n[2]

In the above, speed and pressure are variables. These are data we can measure via sensors installed inside the equipment. a[1], n[1], a[2], and n[2] are parameters, or weights, as they are called in data science. These are not known. We need to figure out the parameters in order to compute the fatigue.

Using machine learning when accurate physics models are not possible

A physics-based model is derived purely from physics and math. If we know the physics of the equipment, such as material properties and fluid dynamics, then it might be possible to calculate the parameters. If this was the case, then we would have a physics-based algorithm.

Yet, a lot of the times, the physics may be too complex. What could we do then? Assuming that we could measure the fatigue of some of the equipment, either in the field or in a laboratory setting, then it might be possible to find the parameters via regression. This is a data-driven algorithm, and the process to figure out the parameters is called model training. Since computers do the training today, this process is called machine learning. Once the parameters are determined, we can use the parameters to calculate the fatigue of all equipment in the field. This is called inference.

AI models vs. statistical models

Now you may say, “Wait a minute, how is this any different from the statistics I learned in college?” It is not fundamentally different, really. Machine learning, or AI, is rooted in the same math we already know. There is nothing mysterious about it.

< 1 > Statistics focuses on data analysis, machine learning focuses on prediction

There are some differences that distinguish machine learning from statistics in practice. The first is that statistics focuses on data analysis. Many of us took data in the lab, did curve fitting, and analyzed if the data deviated from the derived formula. Machine learning, on the other hand, focuses on prediction. If we know the fatigue of an equipment, then we could predict when the equipment will fail. Prediction makes machine learning much more broadly applicable.

< 2 > Machine learning could have millions to billions of parameters

The second is the scale of machine learning algorithms. While we did curve fitting on 3 to 4 parameters, machine learning algorithms could have hundreds to millions of parameters. For example, ChatGPT 5 has around 600 Billion parameters. Think about it - ChatGPT is a formula with 600 Billion parameters! This is not something we could derive using physics. At this level of complexity, the output of the formula could be so much more accurate than the formulas we were familiar with.

< 3 > Machine learning leverages on big data and incredible compute power

The third is that because of the rise of the need for prediction, many new machine learning algorithms are invented to take advantage of the massive amounts of data and the incredible compute power we have today. For example, deep learning is a class of multi-stage, non-linear algorithms that are very good at classifying objects in images. LLMs such as ChatGPT is a deep learning model with memory to capture both short-term signatures and long-term signatures from the training data.

So there is no doubt that AI is very useful. It is a new class of mathematical algorithms that could solve problems that we couldn’t solve before. And since AI is still in its infancy, there is huge potential for improvement. But AI is still math, and any mathematical algorithm has limitations. In the next section we will discuss this.

What are the limitations of AI?

Now that we know that AI is just a class of purely data-driven mathematical algorithms, it is easy to understand its limitations.

< 1 > Predictions could be wrong if you don’t have all the data you need

If you do not have all the data that represents all the variables you need, your predictions could be wrong. This is a data availability problem. In our simple example in the previous section, maybe temperature is an important variable. By not including it, either because we didn’t realize we should or because we could not measure it, our calculated fatigue would likely be wrong.

< 2 > Noise can steer predictions in the wrong direction

The second is that any mistakes or noise in the data could steer the prediction in the wrong direction. This is a data quality problem. Data scientists often spend up to 80% of their time on solving data quality issues, so ensuring data quality is much more time consuming than building AI models.

< 3 > Data models are not explainable

The third is that data driven models are not explainable. If the training data is incomplete or bad, or if the algorithm is not the right one, the AI output could be completely wrong and even non-sensible. When this happens, we say the AI “hallucinates”. Physics constraints must be placed around the AI to limit the hallucination.

< 4 > Your data scientist’s expertise matters

The fourth is that the expertise of the data scientist matters. Good data scientists can determine and meet the correct data quality requirement, choose the best machine learning model for the application, and know how to test the model to ensure accuracy across all application scenarios.

< 5 > The AI model is only a small part of the solution

The fifth is that the AI model is only a small part of an AI solution. According to the Google AI team, the AI model accounts only for 5% of the software code of an AI solution. The other 95% includes technology infrastructure, data engineering, and application development. Therefore, developing an AI solution is as complex as developing any other major engineering project. It is risky, it is laborious, and it takes a skilled team to pull it off.

What should you consider when selecting an AI vendor?

Many domain experts are faced with the task of picking an AI vendor. The world is so noisy and there are so many vendors that will tell you they will solve your problem with AI. How do you pick the right partner?

Having a proof-of-concept is one thing, being able to scale is another

When speaking with a vendor, it is important to verify a few things. The first is if they have scaled deployments. It is one thing to have done proof-of-concepts, it is another to scale production solutions. Ask for customer testimonials or publications of proof.

Your solution requires more than just a platform

The second is to ask if they have a solutions team to implement the solution you need. Many vendors, including many big ones, focus on selling platforms rather than solutions. A famous saying in the industry is that a platform gets you to 80%, but the last 20% is 80% of the work. Find a vendor that is willing to build a complete solution for you.

Similarly, there are lots of telltale signs in how a vendor responds to you. If they say, “just give us your data, and we will solve your problem with AI,” then run the other way as fast as you can.

A good vendor will ask you details about the problem you want to solve, then work with you to figure out what data will be needed. They will ask if you have those data and how the data is generated (to determine data quality), and finally they will give you an honest opinion on the probability of success. Because AI is non-deterministic, unless the problem has already been solved, there are always uncertainties in the outcome. A good vendor should be able to tell you where the risks are and how to mitigate them.

Final remarks

AI is a new class of mathematical algorithms that are very powerful. Despite the limitations in these algorithms, every company needs to explore how AI could elevate their business. Some experiments will fail, so rapid implementation and iteration are key. Data scientists are not enough. A good software engineering team is needed to build a complete AI solution. Finally, please keep in mind that AI will help your business, but it does not replace your core competency. Your domain expertise is always your most valuable asset.

What is AI? A Primer for Subject Matter Experts

“What is AI?” A brief definition

AI models vs. physics-based algorithms

Using machine learning when accurate physics models are not possible

AI models vs. statistical models

< 1 > Statistics focuses on data analysis, machine learning focuses on prediction

< 2 > Machine learning could have millions to billions of parameters

< 3 > Machine learning leverages on big data and incredible compute power

What are the limitations of AI?

< 1 > Predictions could be wrong if you don’t have all the data you need

< 2 > Noise can steer predictions in the wrong direction

< 3 > Data models are not explainable

< 4 > Your data scientist’s expertise matters

< 5 > The AI model is only a small part of the solution

What should you consider when selecting an AI vendor?

Having a proof-of-concept is one thing, being able to scale is another

Your solution requires more than just a platform

Final remarks

Further reading

What is AI? A Primer for Subject Matter Experts

“What is AI?” A brief definition

AI models vs. physics-based algorithms

Using machine learning when accurate physics models are not possible

AI models vs. statistical models

< 1 > Statistics focuses on data analysis, machine learning focuses on prediction

< 2 > Machine learning could have millions to billions of parameters

< 3 > Machine learning leverages on big data and incredible compute power

What are the limitations of AI?

< 1 > Predictions could be wrong if you don’t have all the data you need

< 2 > Noise can steer predictions in the wrong direction

< 3 > Data models are not explainable

< 4 > Your data scientist’s expertise matters

< 5 > The AI model is only a small part of the solution

What should you consider when selecting an AI vendor?

Having a proof-of-concept is one thing, being able to scale is another

Your solution requires more than just a platform

Final remarks

Further reading

Why did we build a usage-based fatigue model using GenAI technology?