---
title: "Drafting Climate Guidance with RAG: What Worked, What Didn't"
date: 2026-04-13T16:34:39Z
modified: 2026-04-13T16:34:39Z
permalink: "https://labs.chancerylaneproject.org/2026/04/13/drafting-climate-guidance-with-rag-what-worked-what-didnt/"
type: post
status: publish
excerpt: ""
wpid: 208
categories:
  - Uncategorized
---

Here at The Chancery Lane Project, [we are in the process of building a knowledge graph](https://chancerylaneproject.org/news/our-next-steps-in-scaling-the-impact-of-climate-aligned-contracting/). This graph will give our content a more intuitive structure, allowing for traversal and relationship discovery by machine learning models, namely large language models (LLMs).

For me, as a machine learning engineer (MLE), this prospect is exciting. Currently, any leveraging of our store of high-quality text data is made possible only by embedding that text into numeric space. We have some metadata related to our content, such as jurisdiction and practice area for our clauses, but for the most part, we know only what is stated in the text.

But we also have so much institutional knowledge. And the knowledge graph is designed to codify and make explicit those relationships between pieces of information that currently sit in the heads of our team of lawyers. For more on that project, you can read [the blog announcing the initiative](https://chancerylaneproject.org/news/our-next-steps-in-scaling-the-impact-of-climate-aligned-contracting/).

In the meantime, however, we wanted to see what we could do with _just the semantic embedding of our current content._ If we’re still a couple of steps away from being able to use GraphRAG, what can we learn by doing simple RAG?

![](https://labs.chancerylaneproject.org/wp-content/uploads/2026/03/imported-image-RvO.png)

Image borrowed from [Kaan Sezen](https://medium.com/@kaantruk1923/rag-vs-graphrag-which-one-fits-better-for-your-use-case-c33b5b322d3f)## **What is RAG?**

Retrieval Augmented Generation (RAG) is, basically, a fancy version of prompting an LLM. As a technique, it accomplishes little more than a standard user, using the standard user interface of their LLM of choice, could do with manual labor and copy-and-paste skills.

The big benefit of RAG is that it automates what would be an arduous, manual process, performs it at scale, and selects content to ‘copy-and-paste’ in a way that is native to the machine.

So let’s talk a bit more about what actually happens in a RAG system.

Typically, when a user uses an LLM, the LLM takes the user’s query and embeds it into numeric space. This embedding can then be compared, using mathematics, to other embedded entities. This embedding captures not just the meaning of the individual words in the query but also the relationship between them.

It happens in high dimensions, but if we want to think about it in a 2D space, it might look something like this:

![](https://labs.chancerylaneproject.org/wp-content/uploads/2026/03/imported-image-Zb.png)

Image borrowed from [datadocs](https://polakowo.io/datadocs/docs/deep-learning/word-embeddings)_Man_ is to _woman_ as _king_ is to _queen_. _Man_ is to _king_ as _woman_ is to _queen_. These words are embedded in relationship to one another, and when queried, we will find that _woman_ is a similar distance to _queen_ and _man_, but further away from _king_. This happens in many dimensions, and embedding relationships as well as words themselves, but this example provides the basics of the method.

So, an LLM always embeds a user’s query in the manner discussed above. The difference in a RAG system is that we embed not only the queries but also pre-populated documents of interest. These documents are broken into smaller chunks, each of which are themselves cast into that numeric space. Then, when a user query comes in, we can retrieve (that’s the R in RAG) the chunks that are most numerically similar to the given query. In theory, this practice augments (there’s the A) the generation (and the G) of content from the LLM, because the model is relying on information within a larger document which is most similar (and, theoretically, most relevant) to a user’s query.

That’s the theory. So does it work?

Well, sometimes. We performed this process twice, and we got two very different results.

### **Why not use existing RAG tools?**+

There are tools online, like NotebookLM, that can perform RAG for you. A service like this takes the provided content, embeds it, and pulls the top _k_ relevant chunks when a user asks a query.

But there are limitations with these generic tools. For one, the astute reader will notice the stand-in _k_ in the prior paragraph\*.\* We don’t actually know, and cannot control, how many chunks these generic RAGs are pulling.

Further, while you can select which documents are queried for a given question, you can’t design a system that pulls different documents depending on the desired use case. And, you definitely cannot change the _k_ or the re-ranking algorithm depending on the instance of retrieval and generation.

Put simply, in a bespoke system, you make all the rules and choices. In a generic system, someone else decides the rules, the choices, and even what you can know about those rules and choices. And those decisions are not necessarily aligned with your use case or organisational requirements.

In law, this flexibility is paramount as legal systems and content must be intelligible and defensible. In sustainability, custom builds can choose greener LLMs (to the extent they are made public), test for quality versus energy-intensity in answers, building systems that scale on the sleekest infrastructure.


## **Experiment #1: California**

For our first experiment, we aimed to generate TCLP-style content to meet demand for guidance amid California’s changing corporate compliance landscape. The California legislature had introduced SB253 and SB261, which would mandate that large entities doing business in California disclose greenhouse gas emissions and climate-related financial risks. However, even when we began this project about eight months ago, there were noises about challenges and revisions to these proposed laws. So, we wanted to create something flexible and dynamic that could update as changes rolled in. Meanwhile, we were curious about trying out a RAG with our existing content.

Conceptually, the idea was to take a TCLP guide, add in content about the Californian context, and output a California guide (if woman + royal = queen, maybe TCLP guide + California context = TCLP California guide).

We stored the data according to the Figure presented below, prioritising delineation of specific types of content, so it could be given to the model according to the place in guide generation where it would be most useful. That way, not only was the model retrieving relevant context for the query at hand, but it would also be querying the right subset of information. This way, we were also able to better guide updates and the content related to required updates, once we received feedback and as things changed in California.

![](https://labs.chancerylaneproject.org/wp-content/uploads/2026/03/1-1024x576.png)

A visualisation of the data storage space for the California Project.Once we had the data stored in this way, we guardrailed the generation of the guide via a pipeline of messages coordinated with the general format of a TCLP guide. This structured and guardrailed guidance is only made possible by a custom-built system. This is especially true in being able to constrain and refine prompts when instructions are breached.

Once an initial guide was generated, it was tweaked for format by our internal team, and then we received expert feedback from lawyers and professionals working closely with these proposed regulations in California.

As we received their feedback and updates were made in California, we updated the guide accordingly.

The guide was well-received and was [published in full as part of this blog on our main website](https://chancerylaneproject.org/news/guidance-for-california-climate-disclosure-laws/).

## **Experiment #2: International Sustainability Standards Board (ISSB)**

Using the same backbone and guided by what worked well and what could use refinement from the California project, we then tackled guiding those attempting to align with ISSB standards.

Immediately, as the person building this system for the ISSB context, I had more concerns.

In opposition to the California project, where we had started with written guidance, we began the ISSB project with interviews with experts. These interviews were designed to form the foundation of our document store, but they also made something clear that was not the case in the California experiment: ISSB guidance is wide-ranging, jurisdictionally fragmented, deeply interwoven with other standards, and has evolved over time.

Unlike California, where we could fairly cleanly provide all documents that would be relevant, I felt immediately out of my depth on gathering the sources that would be required to give the RAG system the context it needed.

I attempted to structure a similar, divided space of data with the following delineations:

- Accounting standards
- Official guidance
- Interviews
- Jurisdictional profiles
- News
- IFRS
- Other guidance

But there was an additional source of uncertainty: I did not have a good sense of the structure that would dictate this guide, and I did not know where to direct the model to use these sources.

When we received less-than-glowing feedback for the document created from this process, the undertone of the comments was, ‘The LLM did a terrible job,’ but as the MLE behind this project, I knew it was more nuanced than that. I knew we had not created a structure that would have allowed the model to succeed.

Where California was straightforward and discrete, the ISSB process was fragmented and wide-ranging. The difference in complexity was like asking an intern to summarise a new internal process from 20 comprehensive documents versus asking them to map the entire competitive landscape for your company.

_To hammer home my point: this failure was not a failure of the model. It was a failure of how we used the model._ And using a different model will not help. Using a generic RAG will not improve results. It is the workflow that must improve, as imagined below.

## **The ideal workflow (for TCLP)**

Undoubtedly, the relational context provided by the knowledge graph mentioned at the beginning will improve our RAG system’s ability to find relevant information. No longer will retrieval strategies have to be precisely hand-coded by non-domain experts like me.

If we can integrate our knowledge graph with those being [developed by friends at Climate Policy Radar](https://github.com/climatepolicyradar/knowledge-graph) and like organisations, you suddenly have a much wider-ranging and better-connected exploration space. Additionally, I anticipate foundation models will begin allowing users to link their LLM instances directly to these structured knowledge sources through their interfaces, reducing the need for bespoke systems. We’re already moving in that direction with protocols like [Model Context Protocol (MCP)](https://www.anthropic.com/news/model-context-protocol).

However, for now, domains like ours still require thoughtful RAG design and implementation from both domain experts and machine learning engineers.

The system, as I imagine it, looks something like this:

![](https://labs.chancerylaneproject.org/wp-content/uploads/2026/03/Establishing-the-structure-Domain-experts-should-be-gathered-to-define-the-structure-they-would-expect-from-a-guide-or-other-piece-of-content-1024x659.png)

An imagined human-in-the-loop workflow.The lessons from California and ISSB are instructive: successful RAG systems in complex domains require not just better models or more data, but better collaboration between those who understand the content and those who understand the systems.

If you are interested in learning more about how we are using knowledge sources, LLMs, and human experts to generate climate-aligned content dynamically and flexibly, [reach out](mailto:georgia.ray@chancerylaneproject.org). If you are looking for a less technical, more domain-focused version of this blog, please look [here](https://chancerylaneproject.org/news/guidance-for-california-climate-disclosure-laws/) and [here](https://chancerylaneproject.org/news/navigate-california-laws-using-contracts/).