“To understand life, we need to understand proteins,” said New Scientist magazine in July 2022. And yet, if we’re being honest, the topic of “protein folding” doesn’t sound all that exciting to most of us.
Compare it to black holes, quantum computing or gene editing. You don’t need much prior knowledge of any of these to get a sense that they are thrilling developments at the cutting edge of science. But proteins? Well… To the uninitiated, protein equals nutrition. Important? Yes. Exciting? Not so much.
Of course, this couldn’t be more wrong. From the workings of every living thing to the spread of bacteria, viruses and diseases, proteins are central to pretty much everything. The clue is in the word: it derives from the Greek for first and foremost.
To understand how proteins work, we need to know their shape. This is complicated stuff. Each protein folds into a unique and absurdly complex 3D structure. In theory, we can predict this shape from each protein's sequence of amino acids. But in practice, it's nowhere near that simple: there are more possible folding configurations for an average protein than atoms in the known universe.
This “protein folding problem” has long been considered one of biology's biggest and most important challenges. Scientists spent decades trying to crack it, with relatively little success.
That was until artificial intelligence (AI) got involved. With AlphaFold, a model developed by Google’s DeepMind, we may be on the verge of a new scientific era.
Watch my short explainer to find out more:
The protein folding problem
The term “protein” was coined in 1838 by Swedish chemist Jöns Jacob Berzelius.¹ But it wasn’t until the 20th century that protein science started to get really interesting.
In 1949, British biochemist Frederick Sanger sequenced the first protein when he determined the order of the amino acids in insulin. That Nobel prize-winning discovery was a significant breakthrough. But to truly grasp how a protein functions, we also need to know its structure – the intricate three-dimensional forms that amino acids twist into when sequenced together. In 1957, John Kendrew (another British biochemist) solved this when he became the first person to determine the atomic structure of a protein – myoglobin.
These firsts were important, but they really only scratched the surface. It took scientists years to map the folding structure of one protein. How would we ever get close to discovering and understanding all of them?
Hot dogs and video games
Before scientists could begin studying proteins, they first had to overcome a logistical challenge: getting their hands on enough raw material. In the 1940s, the Chicago-based Armour Hot Dog Company purified 1kg of a protein found in cattle and made it freely available to scientists (it may not sound much, but 1kg was a lot of purified protein by the standards of the time). The protein in question, ribonuclease, became a key focus for protein studies for much of the 20th century – all thanks to the fast food industry’s donation.
Hot dogs may seem an unexpected ingredient in a major scientific breakthrough, but it took an intervention from an equally unlikely source to finally solve the “protein folding problem”. The story of how we cracked this grand challenge of biology starts with the classic 1990s video game Theme Park.
AlphaFold and the protein folding Olympics
After decades of slow progress in protein folding research, a group of biologists decided a new approach was needed.
In the early 1990s they launched an international competition, to be held every two years, where teams competed to find the best techniques for predicting protein structures from amino acid sequences. Called CASP (Critical Assessment of Structure Prediction) the competition was dubbed “the Olympics of protein folding”.
At around the same time, a 17-year-old programmer named Demis Hassabis was spending his gap year working for the video game company Bullfrog Productions. He co-designed and led the programming on a new style of simulation game in which players built and operated an amusement park.
The gap year placement did more than just help Hassabis pay his way through his studies at Cambridge University. The game he developed – Theme Park – employed a very early form of the AI and machine learning technology that is proving so revolutionary today.
As Hassabis explains in an interview with the New York Times:
So the cool thing about Theme Park — and why it was so successful and sold millions of copies around the world was that everybody who played it got a unique experience because the game adapted to how you were playing. It’s very primitive by today’s standards and learning systems, but back then, it was pretty groundbreaking. It was one of the first games, along with SimCity, to do these kinds of complicated simulations under the hood powered by AI.
When Hassabis co-founded the AI lab DeepMind in 2010, the company initially created neural networks to play games. Starting with simple ones like Pong, their AI systems developed rapidly and, within a few years, mastered one of the most complicated games known to man – Go.²
In 2016 DeepMind’s AlphaGo defeated world champion Lee Sedol in a best-of-five match. This marked a major milestone in the development of machine learning. Not only could AlphaGo beat the best human players, its series of neural networks actually developed new strategies for playing the game. AI wasn’t just mimicking humans, it was tackling complex problems that were previously thought to require human intuition and creativity.
After AlphaGo’s success, Hassabis decided that the “protein folding problem” was an ideal challenge for DeepMind’s increasingly powerful AI to set its sights on. There was a wealth of available data on which to train DeepMind’s systems (the 100,000 or so proteins whose structures had been identified in 50 years of research). And, crucially, the contest. He explains:
There was a game we could win and a leaderboard we could optimise against... That was another reason we picked this problem to work on because it had this amazing competition.
In 2018, DeepMind’s AlphaFold was declared the winner of the 13th edition of CASP, setting a new record for accuracy in predicting the structures of previously unknown proteins. By the next competition in 2020, AlphaFold2 predicted protein structures so accurately that CASP’s organisers declared that the challenge was essentially over – the “protein folding problem” had been solved.
Revolutionising biology
The rate of progress since AlphaFold has been extraordinary. By the middle of 2021, AlphaFold2 had mapped nearly all of the 20,000 proteins found in the human body. The following year, DeepMind announced it had predicted the structure of nearly all 200 million proteins known to science.
In May 2024, AlphaFold3 was launched, capable of modelling not just how individual proteins fold, but how they interact with other proteins. This promises to be another massive step forward in drug trials and medical research.
Real-world impact
Demis Hassabis told the New York Times:
I worked on AI my whole career because I think it’s going to be the most beneficial technology for humanity ever…
The 200 million protein structures modelled by AlphaFold and the algorithm's source code are now freely available online, opening up a new era of scientific possibilities.
At Oxford University, AlphaFold is helping scientists to create a more effective malaria vaccine. Malaria parasites constantly change appearance, making them elusive targets for traditional vaccines. Biochemist Matthew Higgins, who has been leading the research since 2006, said:
The crucial AlphaFold information enabled us to decide which bits of the protein we want to put in a vaccine and how we want to organise those proteins. AlphaFold has allowed us to take our project to the next level, from a fundamental science stage to the preclinical and clinical development stage.
Researchers at the University of Colorado Boulder have used AlphaFold to identify the structure of an enzyme crucial to antibiotic resistance in just minutes – a feat that a decade of lab work couldn’t achieve. The implications for medicine and the fight against antibiotic resistance could be huge.
It doesn’t end there. Armed with our newfound understanding of proteins, AlphaFold is being used to help with everything from creating plastic-eating enzymes that will clean up our oceans to researching evolutionary history and piecing together the genetic code of the 50,000-year-old “demon-duck of doom”.
The future of science
The impact of AlphaFold extends far beyond a single breakthrough. It could mark the beginning of a transformative new era in how scientific research is conducted. Technology has been crucial for all major scientific advances of recent years, but AlphaFold is different in that it taught itself what to look for in the data.
Google’s former CEO Eric Schmidt argues:
With the advent of AI, science is about to become much more exciting – and in some ways unrecognisable. The reverberations of this shift will be felt far outside the lab; they will affect us all.
Schmidt envisions a future where science is conducted in “self-driving labs” powered by AI, where robotic platforms conduct experiments at scales no human team could match, at a fraction of the cost. In this brave new AI-powered world, scientists will be able to test thousands of hypotheses simultaneously, making breakthroughs that might otherwise have taken decades.
Next up: Saving the planet?
Having successfully cracked one of science’s great challenges, Hassabis and the DeepMind team have set their sights on another – tackling climate change.
Long-term iluli followers will be familiar with the concept of nuclear fusion and its promise of a near-infinite supply of clean energy. (See my October 2022 newsletter for more on this).
It’s a long-running joke in science circles that nuclear fusion is always 30 years away.
Recent years have seen some encouraging progress, but not quite enough for us to pin our hopes on fusion replacing our reliance on fossil fuels anytime soon.
The plasma at the heart of a nuclear fusion reaction is hotter than the surface of the sun and incredibly unstable. To harness the potential of fusion we need to find a way to control and stabilise this plasma – something that has perplexed scientists for decades.
But could DeepMind soon have the answer? They’ve created a neural network which may just solve the problem. Watch this space…
Recommended links and further reading
Eric Schmidt: This is how AI will transform the way science gets done (MIT Technology Review)
What is a protein? A biologist explains (The Conversation)
DeepMind AI can predict how drugs interact with proteins (New Scientist – subscription required)
AlphaGo - The Movie | Full award-winning documentary (YouTube)
How AI solved a biological mystery (What’s Your Problem? podcast) Interview with DeepMind’s vice president of research, Pushmeet Kohli
Berzelius was among the first to recognise the importance of this group of molecules and suggested the term in a letter to a fellow chemist.
AI made headlines in 1996 when IBM's Deep Blue defeated chess grandmaster Garry Kasparov. Go presents a far greater challenge in terms of complexity. While its rules, like chess, are fairly straightforward, the number of possible moves in a single turn is exponentially larger, making it a much tougher game for AI to master.
Comments