# Artificial Intelligence
Finally, we reach the topmost level of the hierarchy of a digital system. Right on top of the many layers we explored in all other notes, from [[Physical Layer|signals]] to [[Printed Circuit Boards|boards]] and [[Units, Chassis and Racks|units]], from racks, [[Data Centers and "The Cloud"|data centers]], and [[Software|software]], lie a layer that we can, for now, collectively call Artificial Intelligence (AI) which appears as a bit of an overarching, *magical* thing although it also shows an internal decomposition into constituent parts. AI is not a _thing_ but a category; a superset of techniques and technologies we will see.
But what exactly is AI? In general, ==AI refers to the simulation of human intelligence in machines designed to perform tasks that typically require human cognition. These tasks include reasoning, learning, problem-solving, perception, and language understanding. AI systems use data and algorithms to make decisions or predictions, often improving their performance through experience.== As said, AI encompasses various technologies:
- Machine learning focuses on training algorithms to identify patterns in data and make predictions.
- Deep learning, a subset of machine learning, uses neural networks to process large amounts of complex data
- Natural language processing (NLP) enables machines to understand and generate human language. Computer vision allows systems to interpret visual information from the world.
- Robotics integrates AI to enable machines to interact physically with their environment. Other components include expert systems for decision-making and knowledge representation for structuring data logically.
Together, these technologies form the foundation of AI applications across industries. Let's expand a bit each one of them.
## Machine Learning (ML)
ML is a core area of AI that enables systems to learn and improve from data without being explicitly programmed. Algorithms are trained on datasets to identify patterns and make predictions. ML techniques are often categorized into supervised learning, where the model is trained on labeled data; unsupervised learning, which deals with finding hidden structures in unlabeled data; and reinforcement learning, where agents learn optimal behavior by interacting with their environment through rewards and penalties.
A good crash course on Machine Learning can be found here: https://developers.google.com/machine-learning
> [!info]
> A Real-Life Example: Training a Self-Driving Car Model
>
Consider a company developing an AI model for self-driving cars. The process starts with collecting massive amounts of driving data from sensors, cameras, and LiDAR systems installed on test vehicles. This data is then uploaded to a data center, where it is cleaned and labeled. For example, objects like pedestrians, stop signs, and traffic lights need to be identified so the model can learn how to recognize them.
>With the dataset prepared, training begins in a GPU-powered cluster. The model is fed millions of images and sensor readings, learning to detect obstacles, predict vehicle paths, and make driving decisions. Advanced systems, such as NVIDIA’s DGX servers or Google’s TPU pods, process this data in parallel, reducing training time. After multiple training cycles and fine-tuning, the model is tested on unseen data to evaluate its accuracy. If necessary, adjustments are made, and training is repeated until performance reaches the desired level.
>Once the model is ready, it is deployed to self-driving vehicles. Instead of using massive data center GPUs, these vehicles run a smaller, optimized version of the model on specialized edge GPUs, such as NVIDIA Jetson chips. This allows the car to make fast driving decisions in real-time without relying on an internet connection.
## Deep Learning
Deep Learning is a subset of machine learning that uses artificial neural networks inspired by the human brain. These networks consist of multiple layers (hence the term "deep") that process data hierarchically. Deep learning works well in processing large volumes of unstructured data such as images, text, and audio. It powers applications like image recognition, speech-to-text systems, and autonomous vehicles.
## Natural Language Processing (NLP)
NLP focuses on enabling machines to understand, interpret, and generate human language. Applications include language translation, sentiment analysis, text summarization, and chatbots. NLP combines linguistic rules with machine learning to process tasks such as tokenization, parsing, and semantic understanding. Technologies like transformers (for instance GPT models) have revolutionized NLP by improving contextual understanding and text generation. See [this](https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/) for a great intro to Large Language Models.
## Computer Vision
Involves teaching machines to interpret and analyze visual information from the world, such as images or videos. It enables applications like facial recognition, object detection, image classification, and medical imaging. Techniques in computer vision include convolutional neural networks (CNNs), which are particularly effective for image analysis, and generative models for creating visual content.
## Robotics
Perhaps the most familiar set of technologies under the AI umbrella. Robotics close the gap between purely algorithmic contexts into the physical world. Robotics integrates AI into physical machines to enable them to perform tasks autonomously or semi-autonomously. Robotics often incorporates computer vision, natural language processing, and decision-making algorithms to interact with and adapt to the physical world. Examples include industrial robots, autonomous drones, and service robots that assist in healthcare or domestic settings.
## Expert Systems
Expert systems are designed to mimic human expertise in specific domains. They use knowledge bases of facts and rules, along with inference engines, to provide recommendations or solutions. Applications include medical diagnosis systems, financial decision-making tools, and fault diagnosis in engineering.
## Knowledge Representation
Focuses on how information is structured and stored in a machine to facilitate reasoning and decision-making. Techniques like semantic networks, ontologies, knowledge graphs and logical reasoning frameworks are used to represent relationships between concepts. Knowledge representation is fundamental for expert systems and NLP applications.
## Generative AI
GenAI is a growing subcategory that focuses on creating new data, such as images, code, music, or text, from existing datasets. Generative AI is used in applications like creating realistic art, synthetic data generation, and enhancing creativity tools.
A good introductory course on GenAI can be found here: https://www.cloudskillsboost.google/paths/118?utm_source=cgc&utm_medium=website&utm_campaign=evergreen
> [!Warning]
> This section is under #development
## Inference
In the field of AI, inference is the process that a trained machine learning model uses to draw conclusions from brand-new data. An AI model capable of making inferences can do so without examples of the desired result. In other words, inference is an AI model in action.
An example of AI inference would be a self-driving car that is capable of recognizing a stop sign, even on a road it has never driven on before. The process of identifying this stop sign in a new context is inference.
### AI Inference vs Training
- **Training** is the first phase for an AI model. Training may involve a process of trial and error, or a process of showing the model examples of the desired inputs and outputs, or both. At its essence, training involves feeding AI models large data sets. Those data sets can be structured or unstructured, labeled or unlabeled. Some types of models may need specific examples of inputs and their desired outputs. Other models—such as deep learning models—may only need raw data. Eventually the models learn to recognize patterns or correlations, and they can then make inferences based on new inputs. As training progresses, developers may need to fine-tune the models. They have it provide some inferences right after the initial training process, then correct the outputs. Imagine an AI model has been tasked to identify the photos of dogs from a data set of pet photographs. If the model instead identifies photos of cats, it needs some tuning.
- **Inference** is the process that follows AI training. The better trained a model is, and the more fine-tuned it is, the better its inferences will be — although they are never guaranteed to be perfect.
To get to the point of being able to identify stop signs in new locations, machine learning models go through a process of training. For the autonomous vehicle, its developers showed the model thousands or millions of images of stop signs. A vehicle running the model may have even been driven on roads (with a human driver as backup), enabling it to learn from trial and error. Eventually, after enough training, the model was able to identify stop signs on its own.
> [!attention]
> Training cost in machine learning refers to the computational resources and time required to train a model. It includes several factors such as the amount of data used, the complexity of the model, and the hardware required for processing. A more complex model with millions or billions of parameters, like deep neural networks, requires significantly more computation compared to simpler models like linear regression.
> Another aspect of training cost is the amount of energy consumed by GPUs during the training process. Training large models can be expensive in terms of electricity and infrastructure, especially when running on cloud platforms where pricing depends on the number of compute hours used. The cost also includes memory usage since storing large datasets and maintaining model weights require high-capacity RAM and storage solutions.
>From a practical perspective, training cost can be measured in financial terms, where organizations budget for computing power, cloud resources, and data storage. Optimizing training cost often involves using techniques like early stopping, model pruning, or distributed training to make the process more efficient. Reducing training time without sacrificing model performance is a key challenge in machine learning development.
## Time-Series Data and Forecasting
Complex systems generate tons of data. This data is generated by the sole act of the system running. It may be as simple as one bit indicating the status of a valve or as complex as Lidar data from an autonomous vehicle. Now, imagine that someone comes and puts this plot right in front of you:

And imagine this person does not say a single word about what the plot represents. Well, that curve could be anything: a measurement of temperature from an industrial process, the inflation rate of a country, the price of some stock, the blood pressure of a panda bear, or the deformation of a beam in a particular direction. As you start looking at it, you can start to see some features. You can see a change of trend at some point—it was more or less flat in the first third, then started to ramp up. You can observe some spikes here and there. Still, this plot says very little: we don’t even know what the axes represent. Let’s add more information to the mysterious plot:

Now, we see the plot comes from Google Trends (you can probably recognize the format), and you can see now that this is a plot that represents searches over time—in Google terms, _interest_ over time—about something. This plot now starts to say a lot more: it represents how much Google users have been googling about that something, in a time range. But, we remain largely ignorant—we do not know how many people (is it one person or millions?), and we do not know the topic searched. But we do know more about the axes now, the X axis is time and the range of this plot goes from 1 Jan 2004 until somewhere in 2018.
This is, then, time-series data. Time-series data is a collection of discrete observations, each accompanied by a timestamp that clearly indicates when that sample was obtained. Is it very simple and typical to plot time-series data versus time, this means, the y-axis is the value of the variable, and the x-axis is the time at which the sample of the variable was obtained.
Let’s complete the plot now for clarity:

The plot is about worldwide Google searches of the animal “Dog”, from 2004 to 2018. Now you could start to believe that you know everything you need to know about the data that originated the plot. But do you? Well, no. The data is still leaving questions unanswered. All you know is that people have been more or less searching for the animal “Dog” steadily from 2004 to somewhere in late 2009 until the trend mysteriously changed. Why did the trend change? Here lies the core of the challenge with time-series data analysis, whether it’s Google searches about pets or telemetry from an autonomous vehicle. One thing is to see that something has changed, a different story is to know why it changed; i.e., the dynamics behind it. You feel ill, you grab a thermometer, you can see you have a fever (you detect there’s a change compared with your “normal” temperature, and you also feel it), but that’s not enough to solve the issue; the key is to know why your body temperature has increased.
Digital systems, internally, are a network of computers and electronic devices. And because many of such systems are teleoperated artifacts—this means that once fielded, you can only get to know their status through measurement data—operators rely entirely on time-series data collected as the object works: currents, voltages, temperatures, different types of counters, binary status bits. Here, it is important to observe the following: measurements represent discrete “snapshots” of physical variables. But the physical variables themselves, we shall never be able to directly observe. It is only through transducers (aka sensors)—which are devices sensitive one way or another to those physical variables of interest—that we can obtain indirect samples of such variables. For example: a temperature sensor does not output a temperature but a voltage, a current, or a digital word that represents the temperature. When you check your liquid-in-glass thermometer out of your window to see how you need to dress on a cold morning, what you are reading is the expansion/contraction of a liquid in a capillary all put in a convenient scale which makes you believe you are reading a temperature, but you are not, nor you ever will be able to read a temperature directly.
A problem starts to reveal itself: because between our eyes and real physical variables, there will always be something, measurement devices such as sensors can—and will—introduce noise and artifacts which are not present in the physical variable being monitored, but how to know? A faulty sensor may interfere with the time-series data by introducing a trend that is not present in the variable it is sensing, or out-of-family samples that might look like very serious, although absurdly brief failures. The plot thickens (pun intended).
Is time-series data generated from purely stochastic processes? Most physical processes in the real world involve a random element in their structure. A stochastic process can be described as _a statistical phenomenon that evolves in time according to probabilistic laws_. Time-series data generated on board is a combination of deterministic processes (a power system, batteries, thermal control, star tracker sensors observing the sky) combined with a random part that stems from the contribution of the non-ideal nature of sensing devices, thermal noise, etc.
Time-series data can show different types of variations: seasonality, trends, and some other cyclic variations. Let’s use Google trends again to exemplify with a trivial and topical, one: Christmas.

With little surprise, we see that the term “Christmas” is very popular once per year, peaking at the end of December. In digital systems, there are plenty of variables that are Christmas-like variables: variables with strong seasonality. What marks a season on a system? It can be temperatures or any cycle associated with heavy/light use aligned with rush hours or weekdays.
Let’s see now a bit of a less obvious example of seasonality, using the term “Football”:

There are spikes all over the place, but those in June 2006, June 2010, June 2014, and June 2018 are sensitively higher. What is the reason behind those spikes? You probably guessed it right: FIFA World Cups. Now, we should note that people around the world like different types of football: American football, Australian football, Gaelic football, etc. All those searches are most likely also part of this plot. If we were strictly searching for what is also called “soccer”, all those other “footballs” can be considered noise, but they can’t be ignored^[The plot shows an interesting “negative” spike around April 2020. Most likely related to the COVID-19 pandemic and to the fact most football leagues around the world went to a full stop during lockdown.]. Time-series data in a digital system can show plenty of world-cup-like spikes. For example, when subsystems are turned on or off, there can be peaks associated with capacitors charging, inductances, inrush currents, etc. If needed, spikes or other artifacts can be smoothed out with different signal processing techniques such as filtering, moving averages, etc. But watch out: processing time-series data, by itself, can also create mirages^[Eugen Slutsky showed that by operating on a completely random series with both averaging and differencing procedures one could induce sinusoidal variation in the data. Slutsky went on to suggest that periodic behavior in some economic time series might be accounted for by the smoothing procedures used to form the data. More info: https://www.minneapolisfed.org/article/2009/the-meaning-of-slutsky)]. A proper data scientist must be able to look at the numbers with a good dose of skepticism.
One thing might be very clear by now: the multivariate nature of system telemetry. Very rarely a variable on a complex system is absolutely “self-contained”, this means, its behavior entirely depends on itself, where all the information about it lies on its value and its trends, and nothing else. Self-contained variables might be—by far—the exception. Let’s see an example, using the data about Google searches on “Pythagorean Theorem” versus time, worldwide, 2004-2021:

It shows a clear yearly seasonality, where every July of every year since 2004 the “interest” in the Pythagorean Theorem drops substantially. If this plot were to hold all the information needed, we would have to conclude that students and mathematicians just get globally uninterested or exhausted from using it, all at once, and all this happening every July. This is absurd, and the seasonality obviously responds to the school season being over at that time of the year. The plot below shows also searches of the term "school", which shows a strong correlation with searches about the Pythagorean theorem.

> [!Figure]
> In the plot above, see how a very interesting chaos kicks in around the time the COVID-19 pandemic started.
In complex systems, time-series data variables depend on a variety of other variables which in turn depend on other variables, and the analysis shall be done taking this dependence into account, otherwise, the conclusions drawn might be dangerously wrong. Drawing wrong conclusions on top of bad measurements or misidentifying correlations and covariances with causation can make the situation spiral down into a disaster. In some cases, humans in the loop can make the situation worse by adding risky time delays; we are smart, but we take our time to think in the face of ambiguity.
Although many systems' operation remains a very human-centered activity, equipping machines with decision-making power in the face of off-nominal situations does not come without its challenges as well. If anything, machines can be equipped with failure isolation capabilities which, at least, will minimize the probability of the situation worsening or snowballing.
Moreover, time-series data can show slow, long-term trends. Let’s see an example:

Here, you can see the trend for the “Mobile phone” topic. A topic which must be quite globally searched by people. This topic was a “meh” from 2004 to 2009, then there was a bit of a step and a change of trend for some time with a peak around 2012, to then decrease from 2013 on. Why? That’s a bit more complicated to guess.
Long-term, mild trends are particularly tricky to detect on system telemetry. Such trends can indicate a slow “build-up” of a situation leading to a failure. For example, a steadily increasing current consumption of some equipment on board, say, a radio, could indicate the device is requiring more and more power to function due to unknown reasons, which may eventually put too much stress on the power regulators feeding such devices if the power drain crosses some boundaries. Long-term trend analysis can be hard to detect as they may require processing very large amounts of data and be deprioritized in favor of more immediate, shorter-term, easier-to-identify data features. And they are perhaps the most unsettling type of anomaly to track: hopelessly watching a telemetry variable consistently approaching a critical threshold is not precisely an enjoyable experience.
Now, the big question is: can we predict time-series data? The quick answer is: it depends. The somewhat longer answer is that, by knowing the features of a variable (seasonal, cyclic, and trends) and by having some knowledge about the process that generated such variable, we can assess with some level of confidence that such variable will continue evolving one way or another, but we shall never be able to fully predict—this means, have 100% confidence— its behavior in the longer shot because there can always be “black swans”.
Let’s use a very famous example.
Consider a turkey that is fed every day. Every single feeding will firm up the bird’s belief that it is the general rule of life to be fed every day by friendly members of the human race. This feeding process goes on for days, weeks, months, and years. If a remote operator were able to monitor the turkey’s weight as a telemetry variable, it would be a very boring variable, showing a very “predictable” increasing trend. The operator could even do some hand calculations and play to assess her predictive skills by estimating how much weight the bird will gain next week. Absolutely nothing indicates the bird’s good life will change anytime soon, as the chubby feathered fella believes life couldn’t get any better. Until some Wednesday before Thanksgiving.
The history of a variable or a process tells only a partial story about what is going to happen next. Naive projections of the future from the past can be very misleading.

Systems can grow statistical models where initial “black swans” can become “whiter swans” as more data is gathered. For example, any satellite in a low-Earth, near-polar orbit. They will periodically visit a region called the [South Atlantic Anomaly](https://en.wikipedia.org/wiki/South_Atlantic_Anomaly), which is a region where the magnetosphere allows for an increased flux of high-energy particles. As [[Reliability Assessment Methods#Radiation|we have discussed]], such particles affect the performance of onboard electronics, creating unwanted resets, bit-flips, and other undesired effects. Any untrained algorithm, as well as an inexperienced human operator, might be puzzled the first time this happens during a SAA flyover, maybe still somewhat perplexed by the second and the third. By the fourth time, and because orbits can be predicted with reasonable accuracy, they shall be able to assess if the current geolocation of the satellite will approach such a critical area and take proper precautions, for example avoiding critical operations or going into safe mode. Fool me once…
Digital systems are remarkable sources of data about themselves in the form of a legion of sampled variables that can be stored for processing. This sea of data is what makes systems time-series data goldmines, generating a tide of multivariate information to be consumed. The question is: to be consumed by whom? Only by humans with a functioning brain capable of “connecting the dots''? Can’t algorithms connect those dots? They can, but such algorithms shall be equipped with the nuances needed to understand the dynamic of the processes hidden behind the numbers, and how those numbers and figures are related to each other.