Safety is All you Need

Lakera
7 min readJan 25, 2023

--

This article was originally posted here. Lakera’s developer platform enables ML teams to ship fail-safe computer vision models.

TL; DR

The rapid development of foundation models is driving a sea change in the way machine learning (ML) technologies are developed. They promise to unlock the great technological transformations of the coming decades, but they also represent single points of failure, trained on planetary scale datasets that are unavailable to the people building on top of them. A safety-first mindset must permeate ML development if these systems are to be deployed at scale. The notion of alignment needs its engineering counterpart: it is more urgent than ever to invest in the engineering processes needed to build ML systems in line with our expectations.

I’ve recently been reflecting on the progress of deep learning over the last decade, and have been humbled by the pace and breadth of change. Over the last years, I recall many discussions about the future of artificial intelligence (AI) in which I was often the skeptic. I failed to foresee how significant the advances would be and how quickly they would come. This is really what it feels like to be in the middle of a rapidly evolving, exponentially growing movement. I understood how hard it is to grasp exponential growth.

When I started my PhD in 2014, I was still computing gradients by hand, and remember tedious hours debugging them with finite differences. I also built large recurrent networks for language, which were hard to scale without knowing how to implement CUDA kernels at the time.

A couple of years later, AlphaGo and AlphaZero happen (here again, as a Go player myself, I was skeptical and confidently bet 5–0 against AlphaGo). We are using word2vec embeddings and similar techniques to achieve what would have been unthinkable a few years earlier. By the time I finished my PhD, Attention is all you need [1] was already out, giving birth to the era of transformers. Since then, we have seen the emergence of Meta’s DETR, OpenAI’s GPT3, CLIP, Google’s BERT and more broadly the transition to the world of foundation models [2]. The machine is in motion, and with that come massive opportunities, but also unprecedented challenges.

At the same time, the methodologies required to deploy these models safely and in line with expectations have not improved at the same pace. The notion of alignment, which so far has mostly had philosophical interest and refers to the AI’s alignment with objectives and values, does not yet have its engineering counterpart: how do we build AI systems in line with our expectations? Specific instances of alignment are often discussed, for example, when considering the bias of ML systems. But this corresponds to a narrow notion of alignment. Building an aligned ML system that “just works” by performing exactly as the developers expect is very challenging today, the methodology and tools do not currently exist.

We are reaching a tipping point. As the number of applications built on top of foundation models explodes, our lack of understanding of how to build these systems will finally catch up. These models are extremely large, trained on data that developers building on top of them never get to see, and they learn emergent behaviors that are mostly unknown. These models represent single points of failure that will put all downstream applications at risk. The lack of progress in safe AI and AI reliability could prevent us from unlocking the economic opportunities offered by advances in the technology, and presents a new set of risks.

While proponents of the scale hypothesis (roughly speaking, this means that we can achieve human-level intelligence by continuing to scale models and datasets without a paradigm shift in the methodology) believe that many of these problems will resolve themselves with “more”, I expect that attention is not quite “all you need”, and a safety mindset has to permeate academia and, especially, industry.

We are entering a new era.

It is helpful to take a step back and look at the overarching trend in how the complexity and transparency of software has decreased over the last few decades with the emergence of deep learning. The trend is this: as complexity and functionality increase, transparency decreases rapidly. Foundation models push this to the extreme. Here I’m referring to “unaccounted complexity”, since of course some pieces of traditional software are incredibly complex: the complexities of the system that are not explicitly encoded by a human. In ML models these behaviors are learned from the data as opposed to explicitly encoded by a human as code.

Foundation models are trained on planetary-scale datasets that are inaccessible to those building applications on top of them. A previously unimaginable level of functionality becomes available at marginal cost, at the expense of making it extremely difficult for developers to know if what they are building will actually work once it is deployed, or misbehave in unforeseen ways.

Let’s have a simplified look at the evolution of transparency vs. complexity through time.

  1. Pre-ML (1960–2011). In traditional software, what you see is what you get. The final system is usually the accumulation of a large number of small steps, each of which implements a specific behavior with a clear contract. Complexity emerges slowly from a large number of well-understood components interacting together. Transparency is high since we can inspect these components at any time, as well as the decisions that went into their design.
  2. First steps of deep learning (2012–2016). The central quest is identifying the best neural network on a fixed dataset. This led to significant progress in many academic benchmarks and the appearance of the first reusable feature extractors that could be leveraged for downstream tasks. The complexity of these systems is much higher since the behaviors are learned from data and emerge as soon as training is complete. As a result, transparency massively decreases, but all the data used to train is available in-house.
  3. Productionization (2016 — now). Models need to be put into production and the main focus is on getting the right data. Model architectures are often fixed and not the main area of focus anymore. Pre-trained models are often chosen and fine-tuned. Most work is spent collecting and annotating data. It’s a constant catchup game. Most companies out there are currently in this phase, and many of the corresponding challenges will remain going forward. Complexity remains high, and transparency decreases, especially since pre-trained models add a layer of “indirection” and a portion of the data used to train the system is typically not available in-house.
  4. Commoditization and foundational era (2022 onwards). The appearance of foundation models reduces the marginal cost of developing powerful AI-based systems. Few companies can afford to train such models. Individuals and companies that could never have trained such powerful models in the first place (training some of these models costs 10M+) can now unlock a whole new world of applications. The main drawback is that these models lack transparency. It is very difficult for someone building on top of them to understand where the limitations and risks are. Moreover, a failure or vulnerability in a foundation model is likely to be inherited by all downstream models, creating major security and safety vulnerabilities. At this point, the developer has no idea what data went into the model. The phenomenon of “emergence” [3] also means that as the model scales, it is able to perform tasks that were not part of the training objectives, further reducing transparency and increasing the risk of undesired behaviors. The vast majority of the data used to train the model is not available in-house.
Safety is no longer a nice-to-have. From On the Opportunities and Risks for Foundation Models

ML development is changing.

Machine learning (ML) development is going to look very different in this new era and will require a different set of skills and day-to-day concerns. The most pressing challenge will be to understand what the capabilities of these models are and where they might fail unexpectedly, much more than how they are doing on some held-out dataset (likely very well). Developers will need to probe for such behaviors without access to any of the data used to train the model in order to deploy aligned AI systems.

As these large models become single points of failure for an ocean of applications, developers will need to worry about how to assess and mitigate all the issues they inherited. These risks manifest in two overlapping areas:

  • Safety: Will my system behave as expected (alignment) and can I safely deploy it to production? This includes issues around bias in language models and whether the statements made by the models are truthful.
  • Security: Can someone exploit vulnerabilities in the base model to fool my system? For example, if you are tailoring a foundation model for identity verification based on facial recognition, someone who learns how to fool the base model is likely to be able to fool your model as well.

To successfully bring user-facing applications to market, development teams will need to mitigate all of these threats by fundamentally changing the way they work and adopting a safety- and security-oriented mindset. We’ve learned from decades of engineering that these are not afterthoughts but need to be at the core of engineering processes from day one [4].

Mitigating these unknown issues will become the core concern of the ML developer. There is still time to invest in developing the tools and methodologies needed for AI safety, but it must be done now. The positive impact that AI could have will depend on how well we address this challenge. Safety is no longer a nice-to-have. So maybe, after all, safety — and not attention — is all we need now.

[1] “Attention is All you Need“, Vaswani et al., NIPS 2017.

[2] “On the Opportunities and Risks of Foundation Models”, Bommasani et al.¸ 2021.

[3] “Emergent Abilities of Large Language Models”, Wei et al., 2022.

[4] “How did Software get so Reliable without Proof”, Hoare, FME 1996.

--

--

No responses yet