- Bridgewater Associates reveals how the company is thinking about and using large language models to formulate investment strategies.
- McCann Worldgroup used AI to make individual signs and menus for owners of Mexican hot dog and hamburger stands.
- Insilico Medicine started the human trial phase of a lung disease drug designed by generative AI.
- The US Air Force is testing various large language models, including Scale AI Donovan.
I might have overwhelmed you with prompting techniques and tutorials in the last few weeks. I am told that there are many things that you still have to try from the last few weeks. Plus, this is vacation time.
Plus, next week, OpenAI will open access to the GPT-4 variant known as Code Interpreter, which also allows us to upload and analyze files. This will unlock a whole new set of use cases and tutorials. Stay tuned.
So, this week, let’s take a short break and fully focus on the adoption of AI across industries, feeding additional data to the AI Adoption Tracker.
This use case documentation activity is critical to understand how far your peers have come in their AI adoption journey and it remains the thing customers ask the most in consultations
More importantly, even if there’s no immediate value in knowing what company XYZ is doing in the Financial Services or in the Health Care industry, by reading about their experience, you’ll discover precious insights that inform your everyday use of AI. It’s the case of the Bridgewater Associates interview below.
Speaking of which: do you remember that I always said that the AI Adoption Tracker would exclusively focus on those use cases deployed in production? Well. I changed my mind.
There is enormous value in documenting use cases that are in a testing phase and might be abandoned later on. So from this week, you’ll see a new column in the AI Adoption Tracker: Adoption Status.
When an AI technology is not yet rolled out in production, you’ll see the Testing label.
I dedicated much of the week to revamping the database structure behind the AI Adoption Tracker to accommodate this and other future changes.
The tool is not yet ready for mobile use, but it will be. I appreciate your patience.
As always, let me know what you’d like to read next in this Splendid Edition of Synthetic Work.
What we talk about here is not about what it could be, but about what is happening today.
Every organization adopting AI that is mentioned in this section is recorded in the AI Adoption Tracker.
In the Financial Services industry, Bridgewater Associates co-Chief Investment Officer Greg Jensen reveals how the company is thinking about and using large language models to formulate investment strategies. It’s a fascinating perspective and there’s a lot to learn from it.
Tracy Alloway and Joe Weisenthal interview him for Bloomberg’s Odd Lots podcast:
And then in college, when I heard about Bridgewater, Bridgewater was a tiny place at the time. But the basic idea was that there was a place where we were trying to understand the world, trying to predict what was next, but doing that by taking human intuition and translating that into algorithms to predict what was next.
So if you go back, and this is now in the nineties, kind of where artificial intelligence was at the time, most of the focus was still on expert systems, was still on the notion that you could take human intuition, you could translate that into algorithms. And if you did enough of that, if you kept kind of representing things in symbolic algorithms, that you could build enough human knowledge to get kind of a superpowered human.
And Bridgewater was a rare example of where that worked, where given the focus of trying to predict what was next in markets, given the incredible investment that we made, creating the technology to take human intuition and translate that into algorithms and stress test. It’s an incredibly successful expert system, essentially, that was built over the years. I’d say probably the most profitable expert system out there.
If you go through the history of our competitors, they’re littered by people that tried to do something more statistical. Meaning that they would take the data, run regressions, and then after regressions, let’s say basic machine learning techniques, to predict the future.
And the problem that always had is that there wasn’t enough data. The truth is that market data isn’t like the data in the physical world in the sense that you only have one run through human history. You don’t have very many cycles. Debt cycles could take 70 years to play out. Economic cycles tend to play out around for seven years. There’s just not enough data to represent the world.
And secondly, the game changes as participants learn. So the existence of algorithms, as an example, changed the nature of markets such that the history that preceded it was less and less relevant to the world you’re living in. So those are big problems with, let’s say, a more pure statistical technique to markets.
So you had to get to a world where statistical techniques or machine learning could substitute for human intuition. And that’s really where kind of the exciting leaps are now. That you’re getting closer. It’s not totally there, but you’re much closer than you’ve ever been, where large language models actually allow a path to something that at least mimics human intuition, if not is human intuition.
AI mimicking human intuition is not as crazy as it sounds when you think about what intuition is.
Let’s continue to discover how much visibility Jensen has into the progress of generative AI:
And then in 2016 or ‘17, I was introduced to OpenAI and actually as they transitioned from a charity to a company, I was in that first round and I met a lot of the people and looked hard at their vision using scale, technical scale, to build general intelligence and build reasoning.
So I both was working with Dave Ferrucci and sort of understood many of the people at OpenAI at the time and moving forward with those things. And then I was literally the first check for Anthropic, another large language model, kind of [made by] people that had been at OpenAI.
many things are coming together now to say, okay, you can actually – in a way, at a pace and a speed humans could never do – you can replicate human reasoning.
Jensen gets into the details of how Bridgewater Associates is reinventing itself around machine learning:
specifically what we’ve done on the AI ML side is we’ve set up this venture. Essentially there’s 17 of us with me leading it. You know, I’m still very much involved in the core of Bridgewater, but the 16 others are a hundred percent dedicated to kind of reinventing Bridgewater in a way with machine learning.
We’re going to have a fund specifically run by machine learning techniques
on Bridgewater’s internal tests, you suddenly got to the point where it was able to answer our investment associate tests at the level of first year IA, right around with ChatGPT-3.5 and Anthropic’s most recent Claude. And then GPT-4 was able to do significantly better.
And yet it’s still 80th percentile kind of thing on a lot of those things
so if somebody’s going to use large language models to pick stocks, I think that’s hopeless. That is just a hopeless path. But if you use large language models to create some theories – it can theorize about things – and you use other techniques to judge those theories and you iterate between them to create a sort of an artificial reasoner where language models are good at certainly generating theories, any theories that already exist in human knowledge, and putting those things connected together.
But there are other ways to pair it with statistical models and other types of AI to combine those together. And that’s really what we’re focused on, which is combining large language models that are bad at precision with statistical models that are good at being precise about the past, but terrible about the future.
if you can query a model and think about what it has done and what it hasn’t done, then you can figure out what data’s missing, right? And you need to set up adversarial techniques in order to keep querying an algorithm for what it’s doing.
And again, I think that’s still an area of research, but a process that’s moving along quickly to basically get to the point where the standard is, even though a machine learning technique might be doing something very different than a human is, that it can still explain itself. And it might not perfectly explain itself just like humans don’t perfectly explain themselves, but to a very high degree of confidence across a wide range of outcomes so that you have a sense of what’s going on is possible. And that’s part of the design of what we’re putting in, which is, well, how do you query it? How do you give it more information, remove information, etc., see how it changes its mind to determine roughly what’s going on.
Some of the prompting techniques and GPT-4 tutorials we have discussed in the last few Splendid Editions are focused on decision-making for a reason.
Like Jensen, I believe that AI models can help all of us, not just business leaders, make better decisions. Not by replacing our decision-making capabilities, but by helping us overcome our biases and blind spots.
And so, when you use GPT-4 to look at an Excel spreadsheet that contains finance or budget data (even just at a personal level), a use case I’ve been asked about a lot, you can use the AI model to go behind the obvious and formulate theories about what to do next that you would not have thought of because of your own biases and blind spots.
There’s new, exciting research on this front that I plan to use for more tutorials in the future. I’m still reviewing it.
Let’s continue with Jensen’s interview:
all of a sudden if you have an 80th percentile investment associate, technologically, you have, you know, millions of them at once. And if you have the ability to control their hallucinations and their errors by having a rigorous statistical backdrop, you could do a tremendous amount at a rapid rate. And that’s really what we’re doing in our lab and proving out that process can work.
Statistical AI can then take theories and generate whether those have at least been true in the past and what the flaws with them are and refine them, offer suggestions on how to do them differently, which then you could dialogue with.
So then the other strength the language model has that humans are weaker at is, now take a complex statistical model and talk about what it’s doing. And there’s ways to train language models to do that then allow sort of a judgment to say, okay, now let’s think about what’s happening here and reason over what’s happening.
So you use, the way we’ve modeled this kind of out is that language models can come up with potential theories. Now there’s a limit to that. It’s not the most creative thing in the world, although it’s theory at scale, for sure. And again, that’s language models with good, you know, you have to tune your language models in a certain way so it’s not straight out of the box. But then you can use statistical things to control that.
Then you can use language models again to take what’s coming out of that statistical engine and talk about it with a human or other machine learning agents and kind of report back on what you’re finding and what that is and the types of theories that are out there that might run contrary to what you believe, which can lead to more tests and other things.
So that’s the loop that I’m very excited about and as I said, up until, statistical AI was limited because it was focused on the data of markets. Where language models, the good thing is it has a much better sense of something that a statistical model wouldn’t really have.
A statistical model of markets doesn’t get the concept of greed. Language models pretty much understand the concept of greed. They’ve read everything that’s ever been written about greed and fear and whatever. So now they can start to think about statistical results in the context of the human condition that generates those results. Big deal. And really a radical difference.
Inevitably, the conversation steers towards the impact of AI on jobs:
Let me ask you one very simple question, and it might be one that speaks to an anxiety of listeners. If already GPT can perform at maybe the type of level that a high-quality first-year or second-year associate or analyst at Bridgewater can do, does it mean fewer hires in the future – humans being hired at Bridgewater? Or does it mean the same number or more humans doing even more? Is it a replacement? What does it mean for the type of person that would’ve been, 10 years ago, a first-year employee at Bridgewater?
What I think people should expect at Bridgewater and just generally, is things are changing quick. That really requires people to be capable of playing whatever role is necessary in order to do that
We kind of got to this point where it was, I’d say, kind of humans settled into the role of intuition and idea-generation. And we used computers for memory and for constantly running those rules accurately, etc. That was a transition that half, it got to 50/50, technology and people. And now this is another leap, right?
And it’s definitely true that it’s going to change the roles that investment associates play. Now exactly how, and you still need– for as far foreseeable future – you’re going to want people around that, out that working on those things, there’s edges that these techniques I’m describing certainly won’t do well for an extended period of time. And there’s how to build the ecosystem of these machine learning agents, etc.
And so what I’ve found, and certainly the people in the lab, you want people who are curious about these new technologies. You want to utilize them. And that’s going to be really part of the future of work. I think it’s going to be very hard in any knowledge industry to not utilize these.
So all of a sudden, the skill sets are changing and they’re changing in ways that I think are a surprise to many because it’s actually a lot of the knowledge work, a lot of the things where, you know, content creating and whatever that I think people thought would be later in computer replacement that are happening faster.
So the main thing is, I’d say right now there’s so much in flux, that having flexible, the more you need flexible generalists who can have an eye towards this, an eye towards the goal, and be able to utilize whatever tools are necessary to get there. That’s really where I think, you know, you’re seeing a fair amount of change quickly.
you asked the question that’s like, “Can AI do our jobs?” and I don’t think the answer is yes. And I think it’s like, can the AI replace the stock picker? It doesn’t sound like the [answer] is yes. But can the AI augment the way someone is thinking, come up with theories that then can be rapidly tested, have that sort of go back and forth, and sort of do some of the work that currently sort of like junior analysts do in terms of testing ideas and stuff like that. You could see how it could be a force multiplier at a large fund.
And now, for some critical insight about the data you have to feed to AI models:
the basic problem that the data that you’re looking at isn’t necessarily the data you’ll face in real world, you’re not facing the adversarial problem when you’re looking at that data the way they were.
A statistical technique that’s very good at seasonality and trend following might not be very good at understanding macro cycles and so on.
the recognition that it’s not as simple as taking machine learning out of the pack and applying it to this problem. Even when there’s a ton of data
Some of the places where there is a lot more machine learning going on, very short term trading, arguably is better for machine learning because there’s a lot of data and you can learn faster over that data. And there’s some merit to that. And in terms of tangible places – this is now years ago – but where we started applying some of these techniques were in things like monitoring our transaction costs and looking for patterns in shorter term data. Because there’s a lot more data.
We’re trying to learn things that we don’t already know. So we’re being careful about what kind of Bridgewater knowledge we put in here because it’s not that helpful if we reinvent Bridgewater. Somewhat helpful, but it’s not as helpful as, let’s say, reinventing everything that we don’t know about. That other people have thought about, etc.
And so in the lab right now, at least we’re focused on not making this too Bridgewater-centric on purpose because it’s in that way we’ll learn things that we don’t already know. And if you just fed a Bridgewater information, which we may well do – that could be a productivity enhancing thing – but you’ll quickly, you know, produce something very similar to Bridgewater.
Whereas what’s been amazing so far is we’re producing good results by Bridgewater standards, but different, very, very different conclusions and different thoughts than what we have internally.
we’re big believers that you need to stress test across a very long period of time. So we have much longer data histories.
on the large language models, there’s still a lot of work to be done, but you certainly can train through reinforcement learning to, you know, to make sure that they’re not making mistakes that you know about. And so there’s ways to do that. Now we’ve been trying to avoid that for the reasons I was describing before. I avoid doing too much of that, injecting our own knowledge, and use external sources to do that. But that’s still part of you know, part of the tool set that will be available that, yes, you can train it more directly on things you already believe to be true if you want to do that. And that certainly will lead to answers that replicate your thinking more quickly.
Notice Jensen’s reference to a training toolset. Just yesterday, OpenAI provided some hints about an upcoming mechanism to fine-tune GPT-4:
Finally, Jensen’s take on the capability of large language models to make predictions:
ChatGPT as it comes out-of-the-box is only trained over to a certain history and it doesn’t care. Like unless you know how to make it care, it doesn’t care that it’s just answering a question about inflation based on everything it’s ever read about inflation. Time isn’t even that important unless you make time be very important to it and predicting.
And so, you have to know how to use the tools to generate the type of outcome that you’re describing. So do I think like AI out-of-the-box will do that? No, absolutely not. It’ll be awful at that. Are there ways to take what’s in embedded in AI to come up with a way to do that? Embedded in language models and if you combine that with statistical tools? Yeah, there’s a path there, but it’s not going to be as simple as open up ChatGPT and ask it that question. There’s more involved. But it is helpful to have an analyst that’s read everything that was ever produced, even if they stopped reading in 2022 or in 2021, I should say. There’s a way to use that, but you have to use it correctly and not misuse it in order to try to generate that answer.
In the Advertising industry, McCann Worldgroup used AI to make individual signs and menus for owners of Mexican hot dog and hamburger stands.
John Gapper, reporting for Financial Times:
the marketing group McCann Worldgroup used AI to make 42,000 individual signs and menus for 8,400 owners of Mexican hot dog and hamburger stands who are customers of its client Bimbo, the bakery group. While having an AI-designed fast food display cannot put you on a par with McDonald’s or KFC, it all helps.
More information about this is provided by LBB:
“It’s relatively easy to use the latest round of generative AI tools to produce stunning combinations of visuals and words,” says Ian Mackenzie, chief creative officer, Performance Art. “What’s much harder is using the tools to express the spirit of a brand like Bimbo, at scale, with cohesion across thousands of pieces of high visibility brand collateral. With the ‘Greatest Guide,’ we’re unlocking the power of generative AI for our client, while keeping the focus squarely on its customers and their highly creative food. At a time when generative AI is the story, we’re using its full power to shape local businesses and help drive economic impact. “
The inaugural promotional campaign for ‘Greatest Guide’ included 350 pieces of content that invited people to use the map to visit restaurants and carts throughout Mexico, reaching more than 77K visits and an average engagement rate of 10.85%, in addition to a 12% increase in sales in these stalls and carts—breaking sales record in the weeks of campaign implementation. Finally achieving total sales of the category in special channels that was +23% growth vs AA.
In the Pharmaceutical industry, Insilico Medicine started the human trial phase of a lung disease drug designed by generative AI.
Jamie Smyth, reporting for Financial Times:
Insilico Medicine, which was founded by Latvian-born scientist Alex Zhavoronkov, said it had dosed a patient in China with a novel therapy to treat the chronic lung disease idiopathic pulmonary fibrosis.
The company said the drug, INS018_055, was the first entirely “AI-discovered and AI designed” drug to begin a phase 2 clinical trial and represented an important milestone for the industry.
“Our company, and it’s a big, bold claim, can double the productivity of pretty much every big pharma company”.
Insilico is one of a new generation of biotechs, which have collectively raised billions of dollars to develop AI tools aimed at revolutionising drug development. It is part of a race by Big Pharma and investors to capitalise on a $50bn market opportunity for AI in the sector, according to a report by Morgan Stanley.
Zhavoronkov said Insilico’s AI platforms could potentially halve the time it took to discover drugs and slash the cost of bringing medicines to market — estimated by Deloitte at $2.3bn on average per therapy.
Sanofi, Fosun and Johnson & Johnson were among several pharma companies that had signed partnership deals that provided access to Insilico’s technology, he said.
Insilico uses generative AI to rapidly select novel drug targets and then design new molecules that can target a particular disease.
Zhavoronkov said Insilico’s AI could save two to four years in pre-clinical discovery depending on the novelty and complexity of the target. It did not save a lot of time in clinical development but improved the probability of success of a drug because of better chemistry and target choice. Insilico also used AI to recruit patients who were more likely to respond to the therapy, he said.
There are no guarantees AI-discovered drugs or the platforms that create them will be successful, and some critics warn the technology’s potential is overhyped. Last month Benevolent AI, a London-based biotech with an AI drug discovery platform, said it would lay off 180 staff, almost half its workforce, following the failure of its lead drug candidate.
Insilico conducted phase 1 trials on INS018_055 in New Zealand and China, which it said demonstrated favourable results that supported a phase 2 trial. This mid-stage trial will recruit 60 people with IPF in China and the US to assess the safety, tolerability and preliminary efficacy of the drug.
In the Defense industry, the US Air Force is testing various large language models, including Scale AI Donovan.
Katrina Manson, reporting for Bloomberg:
Five of these are being put through the paces as part of a broader series of Defense Department experiments that are focused on developing data integration and digital platforms across the military. The exercises are run by the Pentagon’s digital and AI office and military top brass, with participation from US allies. The Pentagon won’t say which LLMs are in testing, though Scale AI, a San Francisco-based startup, says its new Donovan product is among the LLM platforms being tested.
The use of LLMs would represent a major shift for the military, where so little is digitized or connected. Currently, making a request for information to a specific part of the military can take several staffers hours or even days to complete, as they jump on phones or rush to make slide decks, Strohmeyer says.
In one test, one of the AI tools completed a request in 10 minutes.
“That doesn’t mean it’s ready for primetime right now. But we just did it live. We did it with secret-level data,” he says of the experiment, adding it could be deployed by the military in the very near term.
Strohmeyer says they have fed the models with classified operational information to inform sensitive questions. The long-term aim of such exercises is to update the US warhorse so it can use AI-enabled data in decision-making, sensors and ultimately firepower.
The military exercise, which runs until July 26, will also serve as a test of whether military officials can use LLMs to generate entirely new options they’ve never considered.
For now, the US military team will experiment by asking LLMs for help planning the military’s response to an escalating global crisis that starts small and then shifts into the Indo-Pacific region.
In a demonstration based on feeding the model with 60,000 pages of open-source data, including US and Chinese military documents, Bloomberg News asked Scale AI’s Donovan whether the US could deter a Taiwan conflict, and who would win if war broke out. A series of bullet points with explanations came back within seconds.
Notice the similarities with the Bridgewater Associates story: LLMs used to formulate new theories then considered by humans.
Both Bridgewater and the US Air Force are perfectly aware of the biases and risks associated with LLMs. They are full of competent people well-versed in emerging technologies. It’s telling that, despite that, they are keenly experimenting with generative AI.
P.s.: We saw the US Air Force testing AI for at least another use case in Issue #13 – Hey, great news, you potentially are a problem gambler: autopiloting F16 fighter jets.