{"id":2999,"date":"2025-04-10T07:02:46","date_gmt":"2025-04-10T07:02:46","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/04\/10\/deb8flow-orchestrating-autonomous-ai-debates-with-langgraph-and-gpt-4o\/"},"modified":"2025-04-10T07:02:46","modified_gmt":"2025-04-10T07:02:46","slug":"deb8flow-orchestrating-autonomous-ai-debates-with-langgraph-and-gpt-4o","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/04\/10\/deb8flow-orchestrating-autonomous-ai-debates-with-langgraph-and-gpt-4o\/","title":{"rendered":"Deb8flow: Orchestrating Autonomous AI Debates with LangGraph and GPT-4o"},"content":{"rendered":"<p>    Deb8flow: Orchestrating Autonomous AI Debates with LangGraph and GPT-4o<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<h2 class=\"wp-block-heading\"><mdspan datatext=\"el1743654648107\" class=\"mdspan-comment\">Introduction<\/mdspan><\/h2>\n<p class=\"wp-block-paragraph\">I\u2019ve <mdspan datatext=\"el1744261913990\" class=\"mdspan-comment\">always<\/mdspan> been fascinated by debates\u2014the strategic framing, the sharp retorts, and the carefully timed comebacks. Debates aren\u2019t just entertaining; they\u2019re structured battles of ideas, driven by logic and evidence. Recently, I started wondering: could we replicate that dynamic using AI agents\u2014having them debate each other autonomously, complete with real-time fact-checking and moderation? The result was <strong>Deb8flow<\/strong>, an autonomous AI debating environment powered by <strong><a href=\"https:\/\/langchain-ai.github.io\/langgraph\/tutorials\/introduction\/\">LangGraph<\/a><\/strong>, OpenAI\u2019s <strong>GPT-4o<\/strong> model, and the new integrated <strong><a href=\"https:\/\/platform.openai.com\/docs\/guides\/tools-web-search?api-mode=chat\">Web Search<\/a><\/strong> feature.<\/p>\n<p class=\"wp-block-paragraph\">In Deb8flow, two agents\u2014Pro and Con\u2014square off on a given topic while a Moderator manages turn-taking. A dedicated Fact Checker reviews every claim in real time using GPT-4o\u2019s new browsing capabilities, and a final Judge evaluates the arguments for quality and coherence. If an agent repeatedly makes factual errors, they\u2019re automatically disqualified\u2014ensuring the debate stays grounded in truth.<\/p>\n<p class=\"wp-block-paragraph\">This article offers an in-depth look at the advanced architecture and dynamic workflows that power autonomous AI debates. I\u2019ll walk you through how Deb8flow\u2019s modular design leverages LangGraph\u2019s state management and conditional routing, alongside GPT-4o\u2019s capabilities.<\/p>\n<p class=\"wp-block-paragraph\">Even if you\u2019re new to AI agents or LangGraph (see resources [1] and [2] for primers), I\u2019ll explain the key concepts clearly. And if you\u2019d like to explore further, the full project is available on <a class=\"\" href=\"https:\/\/github.com\/iason-solomos\/Deb8flow\"><strong>GitHub: iason-solomos\/Deb8flow<\/strong><\/a>.<\/p>\n<p class=\"wp-block-paragraph\">Ready to see how AI agents can debate autonomously in practice?<\/p>\n<p class=\"wp-block-paragraph\"><strong>Let\u2019s dive in.<\/strong><\/p>\n<h2 class=\"wp-block-heading\">High-Level Overview: Autonomous Debates with Multiple Agents<\/h2>\n<p class=\"wp-block-paragraph\">In Deb8flow, we orchestrate a <strong>formal debate<\/strong> between two AI agents \u2013 one arguing <strong>Pro<\/strong> and one <strong>Con<\/strong> \u2013 complete with a <strong>Moderator<\/strong>, a <strong>Fact Checker<\/strong>, and a final <strong>Judge<\/strong>. The debate unfolds autonomously, with each agent playing a role in a structured format.<\/p>\n<p class=\"wp-block-paragraph\">At its core, Deb8flow is a LangGraph-powered agent system, built atop LangChain, using GPT-4o to power each role\u2014Pro, Con, Judge, and beyond. We use GPT-4o\u2019s preview model with browsing capabilities to enable real-time fact-checking. In essence, the Pro and Con agents debate; after each statement, a fact-checker agent uses GPT-4o\u2019s web search to catch any hallucinations or inaccuracies in that statement <strong>in real time<\/strong>.\u200b The debate only continues once the statement is verified. The whole process is coordinated by a LangGraph-defined workflow that ensures proper turn-taking and conditional logic.<\/p>\n<p class=\"wp-block-paragraph\">\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/04\/workflow-diagram.png?ssl=1\" alt=\"\" class=\"wp-image-600973\"><figcaption class=\"wp-element-caption\"><img decoding=\"async\" src=\"blob:https:\/\/chatgpt.com\/7bfbd09f-e5a6-4eb8-ac9a-848211727d0c\" alt=\"\"><br \/><em>High-level debate flow graph. Each rectangle is an agent node (Pro\/Con debaters, Fact Checker, Judge, etc.), and diamonds are control nodes (Moderator and a router after fact-checking). Solid arrows denote the normal progression, while dashed arrows indicate retries if a claim fails fact-check. The Judge node outputs the final verdict, then the workflow ends.<\/em> <br \/>Image generated by the author with DALL-E<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\"> The debate workflow goes through these stages:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>Topic Generation:<\/strong> A Topic Generator agent produces a nuanced, debatable topic for the session (e.g. <em>\u201cShould AI be used in classroom education?\u201d<\/em>).<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Opening: <\/strong>The <strong>Pro Argument <\/strong>Agent makes an opening statement in favor of the topic, kicking off the debate.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Rebuttal: <\/strong>The <strong>Debate Moderator<\/strong> then gives the floor to the <strong>Con Argument<\/strong> agent, who rebuts the Pro\u2019s opening statement.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Counter: <\/strong>The Moderator gives the floor back to the <strong>Pro<\/strong> agent, who counters the Con agent\u2019s points.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Closing: <\/strong>The Moderator switches the floor to the <strong>Con<\/strong> agent one last time for a closing argument.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Judgment<\/strong>: Finally, the <strong>Judge<\/strong> agent reviews the full debate history and evaluates both sides based on argument quality, clarity, and persuasiveness. The most convincing side wins.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">After <strong>every single speech<\/strong>, the <strong>Fact Checker<\/strong> agent steps in to verify the factual accuracy of that statement\u200b. If a debater\u2019s claim doesn\u2019t hold up (e.g. cites a wrong statistic or \u201challucinates\u201d a fact), the workflow triggers a <em>retry<\/em>: the speaker has to correct or modify their statement. (If either debater accumulates 3 fact-check failures, they are <strong>automatically disqualified<\/strong> for repeatedly spreading inaccuracies, and their opponent wins by default.) This mechanism keeps our AI debaters honest and grounded in reality!<\/p>\n<h2 class=\"wp-block-heading\">Prerequisites and Setup<\/h2>\n<p class=\"wp-block-paragraph\">Before diving into the code, make sure you have the following in place:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>Python 3.12+<\/strong> installed.<\/li>\n<li class=\"wp-block-list-item\">An <strong>OpenAI API key<\/strong> with access to the GPT-4o model. You can create your own API key here: <a href=\"https:\/\/platform.openai.com\/settings\/organization\/api-keys\">https:\/\/platform.openai.com\/settings\/organization\/api-keys<\/a>\n<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Project Code<\/strong>: Clone the Deb8flow repository from GitHub (<code>git clone https:\/\/github.com\/iason-solomos\/Deb8flow.git<\/code>). The repo includes a <code>requirements.txt<\/code> for all required packages. Key dependencies include LangChain\/LangGraph (for building the agent graph) and the OpenAI Python client.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Install Dependencies<\/strong>: In your project directory, run: <code>pip install -r requirements.txt<\/code> to install the necessary libraries.<\/li>\n<li class=\"wp-block-list-item\">Create a <code>.env<\/code> file in the project root to hold your OpenAI API credentials. It should be of the form: <code>OPENAI_API_KEY_GPT4O = \"sk-\u2026\"<\/code>\n<\/li>\n<li class=\"wp-block-list-item\">You can also at any time check out the README file: <a href=\"https:\/\/github.com\/iason-solomos\/Deb8flow\">https:\/\/github.com\/iason-solomos\/Deb8flow<\/a> if you simply want to run the finished app.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">Once dependencies are installed and the environment variable is set, you should be ready to run the app. The project structure is organized for clarity:<\/p>\n<p class=\"wp-block-paragraph\">Deb8flow\/<br \/>\u251c\u2500\u2500 configurations\/<br \/>\u2502 \u251c\u2500\u2500 debate_constants.py<br \/>\u2502 \u2514\u2500\u2500 llm_config.py<br \/>\u251c\u2500\u2500 nodes\/<br \/>\u2502 \u251c\u2500\u2500 base_component.py<br \/>\u2502 \u251c\u2500\u2500 topic_generator_node.py<br \/>\u2502 \u251c\u2500\u2500 pro_debater_node.py<br \/>\u2502 \u251c\u2500\u2500 con_debater_node.py<br \/>\u2502 \u251c\u2500\u2500 debate_moderator_node.py<br \/>\u2502 \u251c\u2500\u2500 fact_checker_node.py<br \/>\u2502 \u251c\u2500\u2500 fact_check_router_node.py<br \/>\u2502 \u2514\u2500\u2500 judge_node.py<br \/>\u251c\u2500\u2500 prompts\/<br \/>\u2502 \u251c\u2500\u2500 topic_generator_prompts.py<br \/>\u2502 \u251c\u2500\u2500 pro_debater_prompts.py<br \/>\u2502 \u251c\u2500\u2500 con_debater_prompts.py<br \/>\u2502 \u2514\u2500\u2500 \u2026 (prompts for other agents)<br \/>\u251c\u2500\u2500 tests\/ (contains unit and whole workflow tests)<br \/>\u2514\u2500\u2500 debate_workflow.py<\/p>\n<p class=\"wp-block-paragraph\">A quick tour of this structure:<\/p>\n<p class=\"wp-block-paragraph\"><strong><code>configurations\/<\/code><\/strong> holds constant definitions and LLM configuration classes.<\/p>\n<p class=\"wp-block-paragraph\"><strong><code>nodes\/<\/code><\/strong> contains the implementation of each agent or functional node in the debate (each of these is a module defining one agent\u2019s behavior).<\/p>\n<p class=\"wp-block-paragraph\"><strong><code>prompts\/<\/code><\/strong> stores the prompt templates for the language model (so each agent knows how to prompt GPT-4o for its specific task).<\/p>\n<p class=\"wp-block-paragraph\"><strong><code>debate_workflow.py<\/code><\/strong> ties everything together by defining the LangGraph workflow (the graph of nodes and transitions).<\/p>\n<p class=\"wp-block-paragraph\"><strong><code>debate_state.py<\/code><\/strong> defines the shared data structure that the agents will be using on each run.<\/p>\n<p class=\"wp-block-paragraph\"><strong><code>tests\/<\/code><\/strong> includes some basic tests and example runs to help you verify everything is working.<\/p>\n<h2 class=\"wp-block-heading\">Under the Hood: State Management and Workflow Setup<\/h2>\n<p class=\"wp-block-paragraph\">To coordinate a complex multi-turn debate, we need a shared state and a well-defined flow. We\u2019ll start by looking at how Deb8flow defines the <strong>debate state<\/strong> and constants, and then see how the <strong>LangGraph workflow<\/strong> is constructed.<\/p>\n<h3 class=\"wp-block-heading\">Defining the Debate State Schema (<code>debate_state.py<\/code>)<\/h3>\n<p class=\"wp-block-paragraph\">Deb8flow uses a <strong>shared state<\/strong> (<a href=\"https:\/\/langchain-ai.github.io\/langgraph\/concepts\/low_level\/#state\">https:\/\/langchain-ai.github.io\/langgraph\/concepts\/low_level\/#state <\/a>) in the form of a Python <code>TypedDict<\/code> that all agents can read from and update. This state tracks the debate\u2019s progress and context \u2013 things like the topic, the history of messages, whose turn it is, etc. By centralizing this information, each agent node can make decisions based on the current state of the debate.<\/p>\n<p class=\"wp-block-paragraph\">Link: <a href=\"https:\/\/github.com\/iason-solomos\/Deb8flow\/blob\/main\/debate_state.py\"><strong>debate_state.py<\/strong><\/a><\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">from typing import TypedDict, List, Dict, Literal\n\n\nDebateStage = Literal[\"opening\", \"rebuttal\", \"counter\", \"final_argument\"]\n\nclass DebateMessage(TypedDict):\n    speaker: str  # e.g. pro or con\n    content: str  # The message each speaker produced\n    validated: bool  # Whether the FactChecker ok\u2019d this message\n    stage: DebateStage # The stage of the debate when this message was produced\n\nclass DebateState(TypedDict):\n    debate_topic: str\n    positions: Dict[str, str]\n    messages: List[DebateMessage]\n    opening_statement_pro_agent: str\n    stage: str  # \"opening\", \"rebuttal\", \"counter\", \"final_argument\"\n    speaker: str  # \"pro\" or \"con\"\n    times_pro_fact_checked: int # The number of times the pro agent has been fact-checked. If it reaches 3, the pro agent is disqualified.\n    times_con_fact_checked: int # The number of times the con agent has been fact-checked. If it reaches 3, the con agent is disqualified.<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Key fields that we need to have in the <code>DebateState<\/code> include:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<code>debate_topic<\/code> (str): The topic being debated.<\/li>\n<li class=\"wp-block-list-item\">\n<code>messages<\/code> (List[DebateMessage]): A list of all messages exchanged so far. Each message is a dictionary with fields for <code>speaker<\/code> (e.g. <code>\"pro\"<\/code> or <code>\"con\"<\/code> or <code>\"fact_checker\"<\/code>), the message <code>content<\/code> (text), a <code>validated<\/code> flag (whether it passed fact-check), and the <code>stage<\/code> of the debate when it was produced.<\/li>\n<li class=\"wp-block-list-item\">\n<code>stage<\/code> (str): The current debate stage (one of <code>\"opening\"<\/code>, <code>\"rebuttal\"<\/code>, <code>\"counter\"<\/code>, <code>\"final_argument\"<\/code>).<\/li>\n<li class=\"wp-block-list-item\">\n<code>speaker<\/code> (str): Whose turn it is currently (<code>\"pro\"<\/code> or <code>\"con\"<\/code>).<\/li>\n<li class=\"wp-block-list-item\">\n<code>times_pro_fact_checked<\/code> \/ <code>times_con_fact_checked<\/code> (int): Counters for how many times each side has been caught with a false claim. (In our rules, if a debater fails fact-check 3 times, they could be disqualified or automatically lose.)<\/li>\n<li class=\"wp-block-list-item\">\n<code>positions<\/code> (Dict[str, str]): (Optional) A mapping of each side\u2019s general stance (e.g., <code>\"pro\": \"In favor of the topic\"<\/code>).<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">By structuring the debate\u2019s state, agents find it easy to access the conversation history or check the current stage, and the control logic can update the state between turns. The state is essentially the memory of the debate.<\/p>\n<h3 class=\"wp-block-heading\">Constants and Configuration<\/h3>\n<p class=\"wp-block-paragraph\">To avoid \u201cmagic strings\u201d scattered in the code, we define some constants in <code>debate_constants.py<\/code>. For example, constants for stage names (<code>STAGE_OPENING = \"opening\"<\/code>, etc.), speaker identifiers (<code>SPEAKER_PRO = \"pro\"<\/code>, <code>SPEAKER_CON = \"con\"<\/code>, etc.), and node names (<code>NODE_PRO_DEBATER = \"pro_debater_node\"<\/code>, etc.). These make the code easier to maintain and read.<\/p>\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/github.com\/iason-solomos\/Deb8flow\/blob\/main\/configurations\/debate_constants.py\"><strong>debate_constants.py<\/strong><\/a><strong>:<\/strong><\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\"># Stage names\nSTAGE_OPENING = \"opening\"\nSTAGE_REBUTTAL = \"rebuttal\"\nSTAGE_COUNTER = \"counter\"\nSTAGE_FINAL_ARGUMENT = \"final_argument\"\nSTAGE_END = \"end\"\n\n# Speakers\nSPEAKER_PRO = \"pro\"\nSPEAKER_CON = \"con\"\nSPEAKER_JUDGE = \"judge\"\n\n# Node names\nNODE_PRO_DEBATER = \"pro_debater_node\"\nNODE_CON_DEBATER = \"con_debater_node\"\nNODE_DEBATE_MODERATOR = \"debate_moderator_node\"\nNODE_JUDGE = \"judge_node\"\n<\/code><\/pre>\n<p class=\"wp-block-paragraph\">We also set up LLM configuration in <a href=\"https:\/\/github.com\/iason-solomos\/Deb8flow\/blob\/main\/configurations\/llm_config.py\">llm_config.py<\/a>. Here, we define classes for OpenAI or Azure OpenAI configs and then create a dictionary <code>llm_config_map<\/code> mapping model names to their config. For instance, we map <code>\"gpt-4o\"<\/code> to an <code>OpenAILLMConfig<\/code> that holds the model name and API key. This way, whenever we need to initialize a GPT-4o agent, we can just do <code>llm_config_map[\"gpt-4o\"]<\/code> to get the right config. All our main agents (debaters, topic generator, judge) use this same GPT-4o configuration.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">import os\nfrom dataclasses import dataclass\nfrom typing import Union\n\n@dataclass\nclass OpenAILLMConfig:\n    \"\"\"\n    A data class to store configuration details for OpenAI models.\n\n    Attributes:\n        model_name (str): The name of the OpenAI model to use.\n        openai_api_key (str): The API key for authenticating with the OpenAI service.\n    \"\"\"\n    model_name: str\n    openai_api_key: str\n\n\nllm_config_map = {\n    \"gpt-4o\": OpenAILLMConfig(\n        model_name=\"gpt-4o\",\n        openai_api_key=os.getenv(\"OPENAI_API_KEY_GPT4O\"),\n    )\n}\n<\/code><\/pre>\n<h3 class=\"wp-block-heading\">Building the LangGraph Workflow (<code>debate_workflow.py<\/code>)<\/h3>\n<p class=\"wp-block-paragraph\">With state and configs in place, we construct the <a href=\"https:\/\/github.com\/iason-solomos\/Deb8flow\/blob\/main\/workflow\/debate_workflow.py\">debate workflow graph<\/a>. LangGraph\u2019s <strong>StateGraph<\/strong> is the backbone that connects all our agent nodes in the order they should execute. Here\u2019s how we set it up:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">class DebateWorkflow:\n\n    def _initialize_workflow(self) -&gt; StateGraph:\n        workflow = StateGraph(DebateState)\n        # Nodes\n        workflow.add_node(\"generate_topic_node\", GenerateTopicNode(llm_config_map[\"gpt-4o\"]))\n        workflow.add_node(\"pro_debater_node\", ProDebaterNode(llm_config_map[\"gpt-4o\"]))\n        workflow.add_node(\"con_debater_node\", ConDebaterNode(llm_config_map[\"gpt-4o\"]))\n        workflow.add_node(\"fact_check_node\", FactCheckNode())\n        workflow.add_node(\"fact_check_router_node\", FactCheckRouterNode())\n        workflow.add_node(\"debate_moderator_node\", DebateModeratorNode())\n        workflow.add_node(\"judge_node\", JudgeNode(llm_config_map[\"gpt-4o\"]))\n\n        # Entry point\n        workflow.set_entry_point(\"generate_topic_node\")\n\n        # Flow\n        workflow.add_edge(\"generate_topic_node\", \"pro_debater_node\")\n        workflow.add_edge(\"pro_debater_node\", \"fact_check_node\")\n        workflow.add_edge(\"con_debater_node\", \"fact_check_node\")\n        workflow.add_edge(\"fact_check_node\", \"fact_check_router_node\")\n        workflow.add_edge(\"judge_node\", END)\n        return workflow\n\n\n\n    async def run(self):\n        workflow = self._initialize_workflow()\n        graph = workflow.compile()\n        # graph.get_graph().draw_mermaid_png(output_file_path=\"workflow_graph.png\")\n        initial_state = {\n            \"topic\": \"\",\n            \"positions\": {}\n        }\n        final_state = await graph.ainvoke(initial_state, config={\"recursion_limit\": 50})\n        return final_state\n<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Let\u2019s break down what\u2019s happening:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">We initialize a new <code>StateGraph<\/code> with our <code>DebateState<\/code> type as the state schema.<\/li>\n<li class=\"wp-block-list-item\">We add each node (agent) to the graph with a name. For nodes that need an LLM, we pass in the GPT-4o config. For example, <code>\"pro_debater_node\"<\/code> is added as <code>ProDebaterNode(llm_config_map[\"gpt-4o\"])<\/code>, meaning the Pro debater agent will use GPT-4o as its underlying model.<\/li>\n<li class=\"wp-block-list-item\">We set the <strong>entry point<\/strong> of the graph to <code>\"generate_topic_node\"<\/code>. This means the first step of the workflow is to generate a debate topic.<\/li>\n<li class=\"wp-block-list-item\">Then we add directed edges to connect nodes. The edges above encode the primary sequence: topic -&gt; pro\u2019s turn -&gt; fact-check -&gt; (then a routing decision) -&gt; \u2026 eventually -&gt; judge -&gt; END. We don\u2019t connect the Moderator or Fact Check Router with static edges, since these nodes use dynamic commands to redirect the flow. The final edge connects the judge to an <code>END<\/code> marker to terminate the graph.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">When the workflow runs, control will pass along these edges in order, but whenever we hit a <strong>router or moderator node<\/strong>, that node will output a command telling the graph which node to go to next (overriding the default edge). This is how we create conditional loops: the <code>fact_check_router_node<\/code> might send us back to a debater node for a retry, instead of following a straight line. LangGraph supports this by allowing nodes to return a special <code>Command<\/code> object with <code>goto<\/code> instructions.<\/p>\n<p class=\"wp-block-paragraph\">In summary, at a high level we\u2019ve defined an <strong>agentic workflow<\/strong>: a graph of autonomous agents where control can branch and loop based on the agents\u2019 outputs. Now, let\u2019s explore what each of these agent nodes actually does.<\/p>\n<h2 class=\"wp-block-heading\">Agent Nodes Breakdown<\/h2>\n<p class=\"wp-block-paragraph\">Each stage or role in the debate is encapsulated in a node (agent). In LangGraph, nodes are often simple functions, but I wanted a more object-oriented approach for clarity and reusability. So in Deb8flow, every node is a <strong>class<\/strong> with a <code>__call__<\/code> method. All the main agent classes inherit from a common <code>BaseComponent<\/code> for shared functionality. This design makes the system modular: we can easily swap out or extend agents by modifying their class definitions, and each agent class is responsible for its piece of the workflow.<\/p>\n<p class=\"wp-block-paragraph\">Let\u2019s go through the key agents one by one.<\/p>\n<h3 class=\"wp-block-heading\">\n<code>BaseComponent<\/code> \u2013 A Reusable Agent Base Class<\/h3>\n<p class=\"wp-block-paragraph\">Most of our agent nodes (like the debaters and judge) share common needs: they use an LLM to generate output, they might need to retry on errors, and they should track token usage. The <code>BaseComponent<\/code> class (defined in <code>&lt;a href=\"https:\/\/github.com\/iason-solomos\/Deb8flow\/blob\/main\/nodes\/base_component.py\"&gt;nodes\/base_component.py&lt;\/a&gt;<\/code>) provides these common features so we don\u2019t repeat code.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">class BaseComponent:\n    \"\"\"\n    A foundational class for managing LLM-based workflows with token tracking.\n    Can handle both Azure OpenAI (AzureChatOpenAI) and OpenAI (ChatOpenAI).\n    \"\"\"\n\n    def __init__(\n        self,\n        llm_config: Optional[LLMConfig] = None,\n        temperature: float = 0.0,\n        max_retries: int = 5,\n    ):\n        \"\"\"\n        Initializes the BaseComponent with optional LLM configuration and temperature.\n\n        Args:\n            llm_config (Optional[LLMConfig]): Configuration for either Azure or OpenAI.\n            temperature (float): Controls the randomness of LLM outputs. Defaults to 0.0.\n            max_retries (int): How many times to retry on 429 errors.\n        \"\"\"\n        logger = logging.getLogger(self.__class__.__name__)\n        tracer = trace.get_tracer(__name__, tracer_provider=get_tracer_provider())\n\n        self.logger = logger\n        self.tracer = tracer\n        self.llm: Optional[ChatOpenAI] = None\n        self.output_parser: Optional[StrOutputParser] = None\n        self.state: Optional[DebateState] = None\n        self.prompt_template: Optional[ChatPromptTemplate] = None\n        self.chain: Optional[RunnableSequence] = None\n        self.documents: Optional[List] = None\n        self.prompt_tokens = 0\n        self.completion_tokens = 0\n        self.max_retries = max_retries\n\n        if llm_config is not None:\n            self.llm = self._init_llm(llm_config, temperature)\n            self.output_parser = StrOutputParser()\n\n    def _init_llm(self, config: LLMConfig, temperature: float):\n        \"\"\"\n        Initializes an LLM instance for either Azure OpenAI or standard OpenAI.\n        \"\"\"\n        if isinstance(config, AzureOpenAILLMConfig):\n            # If it's Azure, use the AzureChatOpenAI class\n            return AzureChatOpenAI(\n                deployment_name=config.deployment_name,\n                azure_endpoint=config.azure_endpoint,\n                openai_api_version=config.openai_api_version,\n                openai_api_key=config.openai_api_key,\n                temperature=temperature,\n            )\n        elif isinstance(config, OpenAILLMConfig):\n            # If it's standard OpenAI, use the ChatOpenAI class\n            return ChatOpenAI(\n                model_name=config.model_name,\n                openai_api_key=config.openai_api_key,\n                temperature=temperature,\n            )\n        else:\n            raise ValueError(\"Unsupported LLMConfig type.\")\n\n    def validate_initialization(self) -&gt; None:\n        \"\"\"\n        Ensures we have an LLM and an output parser.\n        \"\"\"\n        if not self.llm:\n            raise ValueError(\"LLM is not initialized. Ensure `llm_config` is provided.\")\n        if not self.output_parser:\n            raise ValueError(\"Output parser is not initialized.\")\n\n    def execute_chain(self, inputs: Any) -&gt; Any:\n        \"\"\"\n        Executes the LLM chain, tracks token usage, and retries on 429 errors.\n        \"\"\"\n        if not self.chain:\n            raise ValueError(\"No chain is initialized for execution.\")\n\n        retry_wait = 1  # Initial wait time in seconds\n\n        for attempt in range(self.max_retries):\n            try:\n                with get_openai_callback() as cb:\n                    result = self.chain.invoke(inputs)\n                    self.logger.info(\"Prompt Token usage: %s\", cb.prompt_tokens)\n                    self.logger.info(\"Completion Token usage: %s\", cb.completion_tokens)\n                    self.prompt_tokens = cb.prompt_tokens\n                    self.completion_tokens = cb.completion_tokens\n\n                return result\n\n            except Exception as e:\n                # If the error mentions 429, do exponential backoff and retry\n                if \"429\" in str(e):\n                    self.logger.warning(\n                        f\"Rate limit reached. Retrying in {retry_wait} seconds... \"\n                        f\"(Attempt {attempt + 1}\/{self.max_retries})\"\n                    )\n                    time.sleep(retry_wait)\n                    retry_wait *= 2\n                else:\n                    self.logger.error(f\"Unexpected error: {str(e)}\")\n                    raise e\n\n        raise Exception(\"API request failed after maximum number of retries\")\n\n    def create_chain(\n        self, system_template: str, human_template: str\n    ) -&gt; RunnableSequence:\n        \"\"\"\n        Creates a chain for unstructured outputs.\n        \"\"\"\n        self.validate_initialization()\n        self.prompt_template = ChatPromptTemplate.from_messages(\n            [\n                (\"system\", system_template),\n                (\"human\", human_template),\n            ]\n        )\n        self.chain = self.prompt_template | self.llm | self.output_parser\n        return self.chain\n\n    def create_structured_output_chain(\n        self, system_template: str, human_template: str, output_model: Type[BaseModel]\n    ) -&gt; RunnableSequence:\n        \"\"\"\n        Creates a chain that yields structured outputs (parsed into a Pydantic model).\n        \"\"\"\n        self.validate_initialization()\n        self.prompt_template = ChatPromptTemplate.from_messages(\n            [\n                (\"system\", system_template),\n                (\"human\", human_template),\n            ]\n        )\n        self.chain = self.prompt_template | self.llm.with_structured_output(output_model)\n        return self.chain\n\n    def build_return_with_tokens(self, node_specific_data: dict) -&gt; dict:\n        \"\"\"\n        Convenience method to add token usage info into the return values.\n        \"\"\"\n        return {\n            **node_specific_data,\n            \"prompt_tokens\": self.prompt_tokens,\n            \"completion_tokens\": self.completion_tokens,\n        }\n\n    def __call__(self, state: DebateState) -&gt; None:\n        \"\"\"\n        Updates the node's local copy of the state.\n        \"\"\"\n        self.state = state\n        for key, value in state.items():\n            setattr(self, key, value)\n<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Key features of <code>BaseComponent<\/code>:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">It stores an LLM client (e.g. an OpenAI <code>ChatOpenAI<\/code> instance) initialized with a given model and API key, as well as an output parser.<\/li>\n<li class=\"wp-block-list-item\">It provides a method <code>create_chain(system_template, human_template)<\/code> which sets up a LangChain <strong>prompt chain<\/strong> (a <code>RunnableSequence<\/code>) combining a system prompt and a human prompt. This chain is what actually generates outputs when run.<\/li>\n<li class=\"wp-block-list-item\">It has an <code>execute_chain(inputs)<\/code> method that invokes the chain and includes logic to <strong>retry<\/strong> if the OpenAI API returns a rate-limit error (HTTP 429). This is done with exponential backoff up to a <code>max_retries<\/code> count.<\/li>\n<li class=\"wp-block-list-item\">It keeps track of token usage (prompt tokens and completion tokens) for logging or analysis.<\/li>\n<li class=\"wp-block-list-item\">The <code>__call__<\/code> method of BaseComponent (which each subclass will call via <code>super().__call__(state)<\/code>) can perform any setup needed before the node\u2019s main logic runs (like ensuring the LLM is initialized).<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">By building on <code>BaseComponent<\/code>, each agent class can focus on its unique logic (like what prompt to use and how to handle the state), while inheriting the heavy lifting of interacting with GPT-4o reliably.<\/p>\n<h3 class=\"wp-block-heading\">Topic Generator Agent (<code>GenerateTopicNode<\/code>)<\/h3>\n<p class=\"wp-block-paragraph\">The <strong>Topic Generator<\/strong> <strong>(<\/strong><a href=\"https:\/\/github.com\/iason-solomos\/Deb8flow\/blob\/main\/nodes\/topic_generator_node.py\">topic_generator_node.py<\/a><strong>)<\/strong> is the first agent in the graph. Its job is to come up with a debatable topic for the session. We give it a prompt that instructs it to output a nuanced topic that could reasonably have a pro and con side.<\/p>\n<p class=\"wp-block-paragraph\">This agent inherits from <code>BaseComponent<\/code> and uses a prompt chain (system + human prompt) to generate one item of text \u2013 the debate topic. When called, it executes the chain (with no special input, just using the prompt) and gets back a <code>topic_text<\/code>. It then updates the state with:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<code>debate_topic<\/code>: the generated topic (stripped of any extra whitespace),<\/li>\n<li class=\"wp-block-list-item\">\n<code>positions<\/code>: a dictionary assigning the pro and con stances (by default we use <code>\"In favor of the topic\"<\/code> and <code>\"Against the topic\"<\/code>),<\/li>\n<li class=\"wp-block-list-item\">\n<code>stage<\/code>: set to <code>\"opening\"<\/code>,<\/li>\n<li class=\"wp-block-list-item\">\n<code>speaker<\/code>: set to <code>\"pro\"<\/code> (so the Pro side will speak first).<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">In code, the return might look like:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">return {\n    \"debate_topic\": debate_topic,\n    \"positions\": positions,\n    \"stage\": \"opening\",\n    \"speaker\": first_speaker  # \"pro\"\n}\n<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Here are the prompts for the topic generator:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">SYSTEM_PROMPT = \"\"\"\nYou are a brainstorming AI that suggests debate topics.\nYou will provide a single, interesting or timely topic that can have two opposing views.\n\"\"\"\n\nHUMAN_PROMPT = \"\"\"\nPlease suggest one debate topic for two AI agents to discuss.\nFor example, it could be about technology, politics, philosophy, or any interesting domain.\nJust provide the topic in a concise sentence.\n\"\"\"<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Then we pass these prompts in the constructor of the class itself.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">class GenerateTopicNode(BaseComponent):\n    def __init__(self, llm_config, temperature: float = 0.7):\n        super().__init__(llm_config, temperature)\n        # Create the prompt chain.\n        self.chain: RunnableSequence = self.create_chain(\n            system_template=SYSTEM_PROMPT,\n            human_template=HUMAN_PROMPT\n        )\n\n    def __call__(self, state: DebateState) -&gt; Dict[str, str]:\n        \"\"\"\n        Generates a debate topic and assigns positions to the two debaters.\n        \"\"\"\n        super().__call__(state)\n\n        topic_text = self.execute_chain({})\n\n        # Store the topic and assign stances in the DebateState\n        debate_topic = topic_text.strip()\n        positions = {\n            \"pro\": \"In favor of the topic\",\n            \"con\": \"Against the topic\"\n        }\n\n        \n        first_speaker = \"pro\"\n        self.logger.info(\"Welcome to our debate panel! Today's debate topic is: %s\", debate_topic)\n        return {\n            \"debate_topic\": debate_topic,\n            \"positions\": positions,\n            \"stage\": \"opening\",\n            \"speaker\": first_speaker\n        }<\/code><\/pre>\n<p class=\"wp-block-paragraph\">It\u2019s a pattern we will repeat for all classes except for those not using LLMs and the fact checker.<\/p>\n<p class=\"wp-block-paragraph\">Now we can implement the 2 stars of the show, the Pro and Con argument agents!<\/p>\n<h3 class=\"wp-block-heading\">Debater Agents (Pro and Con)<\/h3>\n<p class=\"wp-block-paragraph\">Link: <a href=\"https:\/\/github.com\/iason-solomos\/Deb8flow\/blob\/main\/nodes\/pro_debater_node.py\">pro_debater_node.py<\/a><\/p>\n<p class=\"wp-block-paragraph\">The two debater agents are very similar in structure, but each uses different prompt templates tailored to their role (pro vs con) and the stage of the debate.<\/p>\n<p class=\"wp-block-paragraph\">The Pro debater, for example, has to handle an <strong>opening statement<\/strong> and a <strong>counter-argument<\/strong> (countering the Con\u2019s rebuttal). We also need logic for retries in case a statement fails fact-check. In code, the ProDebater class sets up multiple prompt chains:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<code>opening_chain<\/code> and an <code>opening_retry_chain<\/code> (using slightly different human prompts \u2013 the retry prompt might instruct it to try again without repeating any factually dubious claims).<\/li>\n<li class=\"wp-block-list-item\">\n<code>counter_chain<\/code> and <code>counter_retry_chain<\/code> for the counter-argument stage.<\/li>\n<\/ul>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">class ProDebaterNode(BaseComponent):\n    def __init__(self, llm_config, temperature: float = 0.7):\n        super().__init__(llm_config, temperature)\n        self.opening_chain = self.create_chain(SYSTEM_PROMPT, OPENING_HUMAN_PROMPT)\n        self.opening_retry_chain = self.create_chain(SYSTEM_PROMPT, OPENING_RETRY_HUMAN_PROMPT)\n        self.counter_chain = self.create_chain(SYSTEM_PROMPT, COUNTER_HUMAN_PROMPT)\n        self.counter_retry_chain = self.create_chain(SYSTEM_PROMPT, COUNTER_RETRY_HUMAN_PROMPT)\n\n    def __call__(self, state: DebateState) -&gt; Dict[str, Any]:\n        super().__call__(state)\n\n        debate_topic = state.get(\"debate_topic\")\n        messages = state.get(\"messages\", [])\n        stage = state.get(\"stage\")\n        speaker = state.get(\"speaker\")\n\n        # Check if retrying (last message was by pro and not validated)\n        last_msg = messages[-1] if messages else None\n        retrying = last_msg and last_msg[\"speaker\"] == SPEAKER_PRO and not last_msg[\"validated\"]\n\n        if stage == STAGE_OPENING and speaker == SPEAKER_PRO:\n            chain = self.opening_retry_chain if retrying else self.opening_chain # select which chain we are triggering: the normal one or the fact-cehcked one\n            result = chain.invoke({\n                \"debate_topic\": debate_topic\n            })\n        elif stage == STAGE_COUNTER and speaker == SPEAKER_PRO:\n            opponent_msg = self._get_last_message_by(SPEAKER_CON, messages)\n            debate_history = get_debate_history(messages)\n            chain = self.counter_retry_chain if retrying else self.counter_chain\n            result = chain.invoke({\n                \"debate_topic\": debate_topic,\n                \"opponent_statement\": opponent_msg,\n                \"debate_history\": debate_history\n            })\n        else:\n            raise ValueError(f\"Unknown turn for ProDebater: stage={stage}, speaker={speaker}\")\n        new_message = create_debate_message(speaker=SPEAKER_PRO, content=result, stage=stage)\n        self.logger.info(\"Speaker: %s, Stage: %s, Retry: %snMessage:n%s\", speaker, stage, retrying, result)\n        return {\n            \"messages\": messages + [new_message]\n        }\n\n    def _get_last_message_by(self, speaker_prefix, messages):\n        for m in reversed(messages):\n            if m.get(\"speaker\") == speaker_prefix:\n                return m[\"content\"]\n        return \"\"<\/code><\/pre>\n<p class=\"wp-block-paragraph\">When the ProDebater\u2019s <code>__call__<\/code> runs, it looks at the current <code>stage<\/code> and <code>speaker<\/code> in the state to decide what to do:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">If it\u2019s the <strong>opening<\/strong> stage and the speaker is \u201cpro\u201d, it uses the <code>opening_chain<\/code> to generate an opening argument. If the last message from Pro was marked invalid (not validated), it knows this is a retry, so it would use the <code>opening_retry_chain<\/code> instead.<\/li>\n<li class=\"wp-block-list-item\">If it\u2019s the <strong>counter<\/strong> stage and speaker is \u201cpro\u201d, it generates a counter-argument to whatever the opponent (Con) just said. It will fetch the last message by the Con from the <code>messages<\/code> history, and feed that into the prompt (so that the Pro can directly counter it). Again, if the last Pro message was invalid, it would switch to the retry chain.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">After generating its argument, the Debater agent creates a new message entry (with <code>speaker=\"pro\"<\/code>, the content text, <code>validated=False<\/code> initially, and the stage) and appends it to the state\u2019s message list. That becomes the output of the node (LangGraph will merge this partial state update into the global state).<\/p>\n<p class=\"wp-block-paragraph\">The <strong>Con Debater<\/strong> agent mirrors this logic for its stages:<\/p>\n<p class=\"wp-block-paragraph\">It similarly appends its message to the state.<\/p>\n<p class=\"wp-block-paragraph\">It has a <strong>rebuttal<\/strong> and <strong>closing argument<\/strong> (final argument) stage, each with a normal and a retry chain.<\/p>\n<p class=\"wp-block-paragraph\">It checks if it\u2019s the rebuttal stage (speaker \u201ccon\u201d) or final argument stage (speaker \u201ccon\u201d) and invokes the appropriate chain, possibly using the last Pro message for context when rebutting.<\/p>\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/github.com\/iason-solomos\/Deb8flow\/blob\/main\/nodes\/con_debater_node.py\">con_debater_node.py<\/a><\/p>\n<p class=\"wp-block-paragraph\">By using class-based implementation, our debaters\u2019 code is easier to maintain. We can clearly separate what the Pro does vs what the Con does, even if they share structure. Also, by encapsulating prompt chains inside the class, each debater can manage multiple possible outputs (regular vs retry) cleanly.<\/p>\n<p class=\"wp-block-paragraph\"><strong>Prompt design:<\/strong> The actual prompts (in <code>prompts\/pro_debater_prompts.py<\/code> and <code>con_debater_prompts.py<\/code>) guide the GPT-4o model to take on a persona (\u201cYou are a debater arguing <em>for\/against<\/em> the topic\u2026\u201d) and produce the argument. They also instruct the model to keep statements factual and logical. If a fact check fails, the retry prompt may say something like: \u201cYour previous statement had an unverified claim. Revise your argument to be factually correct while maintaining your position.\u201d \u2013 encouraging the model to correct itself.<\/p>\n<p class=\"wp-block-paragraph\">With this, our AI debaters can engage in a multi-turn duel, and even recover from factual missteps.<\/p>\n<h3 class=\"wp-block-heading\">Fact Checker Agent (<code>FactCheckNode<\/code>)<\/h3>\n<p class=\"wp-block-paragraph\">After each debater speaks, the Fact Checker agent swoops in to verify their claims. This agent is implemented in <code>&lt;a href=\"https:\/\/github.com\/iason-solomos\/Deb8flow\/blob\/main\/nodes\/fact_checker_node.py\"&gt;fact_checker_node.py&lt;\/a&gt;<\/code>, and interestingly, it uses the <strong>GPT-4o model\u2019s browsing ability<\/strong> rather than our own custom prompts. Essentially, we delegate the fact-checking to OpenAI\u2019s GPT-4 with web search.<\/p>\n<p class=\"wp-block-paragraph\">How does this work? The OpenAI Python client for GPT-4 (with browsing) allows us to send a user message and get a structured response. In <code>FactCheckNode.__call__<\/code>, we do something like:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">completion = self.client.beta.chat.completions.parse(\n            model=\"gpt-4o-search-preview\",\n            web_search_options={},\n            messages=[{\n                \"role\": \"user\",\n                \"content\": (\n                        f\"Consider the following statement from a debate. \"\n                        f\"If the statement contains numbers, or figures from studies, fact-check it online.nn\"\n                        f\"Statement:n\"{claim}\"nn\"\n                        f\"Reply clearly whether any numbers or studies might be inaccurate or hallucinated, and why.\"\n                        f\"n\"\n                        f\"If the statement doesn't contain references to studies or numbers cited, don't go online to fact-check, and just consider it successfully fact-checked, with a 'yes' score.nn\"\n                )\n            }],\n            response_format=FactCheck\n        )<\/code><\/pre>\n<p class=\"wp-block-paragraph\">If the result is <strong>\u201cyes\u201d<\/strong> (meaning the claim seems truthful or at least not factually wrong), the Fact Checker will mark the last message\u2019s <code>validated<\/code> field as True in the state, and output <code>{\"validated\": True}<\/code> with no further changes. This signals that the debate can continue normally.<\/p>\n<p class=\"wp-block-paragraph\">If the result is <strong>\u201cno\u201d<\/strong> (meaning it found the claim to be incorrect or dubious), the Fact Checker will append a new message to the state with <code>speaker=\"fact_checker\"<\/code> describing the finding (or we could simply mark it, but providing a brief note like <em>\u201c(Fact Checker: The statistic cited could not be verified.)\u201d<\/em> can be useful). It will also set <code>validated: False<\/code> and increment a counter for whichever side made the claim. The output state from this node includes <code>validated: False<\/code> and an updated <code>times_pro_fact_checked<\/code> or <code>times_con_fact_checked<\/code> count.<\/p>\n<p class=\"wp-block-paragraph\">We also use a Pydantic BaseModel to control the output of the LLM:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">class FactCheck(BaseModel):\n    \"\"\"\n    Pydantic model for the fact checking the claims made by debaters.\n\n    Attributes:\n        binary_score (str): 'yes' if the claim is verifiable and truthful, 'no' otherwise.\n    \"\"\"\n\n    binary_score: str = Field(\n        description=\"Indicates if the claim is verifiable and truthful. 'yes' or 'no'.\"\n    )\n    justification: str = Field(\n        description=\"Explanation of the reasoning behind the score.\"\n    )<\/code><\/pre>\n<h3 class=\"wp-block-heading\">Debate Moderator Agent (<code>DebateModeratorNode<\/code>)<\/h3>\n<p class=\"wp-block-paragraph\">The Debate Moderator is the conductor of the debate. Instead of producing lengthy text, this agent\u2019s job is to manage <strong>turn-taking<\/strong> and stage progression. In the workflow, after a statement is validated by the Fact Checker, control passes to the Moderator node. The Moderator then issues a <code>Command<\/code> that updates the state for the next turn and directs the flow to the appropriate next agent.<\/p>\n<p class=\"wp-block-paragraph\">The logic in <code>DebateModeratorNode.__call__<\/code> (see <code>&lt;a href=\"https:\/\/github.com\/iason-solomos\/Deb8flow\/blob\/main\/nodes\/debate_moderator_node.py\"&gt;nodes\/debate_moderator_node.py&lt;\/a&gt;<\/code>) goes roughly like this:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">if stage == STAGE_OPENING and speaker == SPEAKER_PRO:\n            return Command(\n                update={\"stage\": STAGE_REBUTTAL, \"speaker\": SPEAKER_CON},\n                goto=NODE_CON_DEBATER\n            )\n        elif stage == STAGE_REBUTTAL and speaker == SPEAKER_CON:\n            return Command(\n                update={\"stage\": STAGE_COUNTER, \"speaker\": SPEAKER_PRO},\n                goto=NODE_PRO_DEBATER\n            )\n        elif stage == STAGE_COUNTER and speaker == SPEAKER_PRO:\n            return Command(\n                update={\"stage\": STAGE_FINAL_ARGUMENT, \"speaker\": SPEAKER_CON},\n                goto=NODE_CON_DEBATER\n            )\n        elif stage == STAGE_FINAL_ARGUMENT and speaker == SPEAKER_CON:\n            return Command(\n                update={},\n                goto=NODE_JUDGE\n            )\n\n        raise ValueError(f\"Unexpected stage\/speaker combo: stage={stage}, speaker={speaker}\")<\/code><\/pre>\n<p class=\"wp-block-paragraph\">Each conditional corresponds to a point in the debate where a turn just ended, and sets up the next turn. For example, after the <strong>opening<\/strong> (Pro just spoke), it sets stage to <strong>rebuttal<\/strong>, switches speaker to Con, and directs the workflow to the Con debater node\u200b. After the <strong>final_argument<\/strong> (Con\u2019s closing), it directs to the Judge with no further update (the debate stage effectively ends).<\/p>\n<h3 class=\"wp-block-heading\">Fact Check Router (<code>FactCheckRouterNode<\/code>)<\/h3>\n<p class=\"wp-block-paragraph\">This is another control node (like the Moderator) that introduces conditional logic. The Fact Check Router sits right after the Fact Checker agent in the flow. Its purpose is to <strong>branch the workflow depending on the fact-check result<\/strong>.<\/p>\n<p class=\"wp-block-paragraph\">In <code>&lt;a href=\"https:\/\/github.com\/iason-solomos\/Deb8flow\/blob\/main\/nodes\/fact_check_router_node.py\"&gt;nodes\/fact_check_router_node.py&lt;\/a&gt;<\/code>, the logic is:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">if pro_fact_checks &gt;= 3 or con_fact_checks &gt;= 3:\n            disqualified = SPEAKER_PRO if pro_fact_checks &gt;= 3 else SPEAKER_CON\n            winner = SPEAKER_CON if disqualified == SPEAKER_PRO else SPEAKER_PRO\n\n            verdict_msg = {\n                \"speaker\": \"moderator\",\n                \"content\": (\n                    f\"Debate ended early due to excessive factual inaccuracies.nn\"\n                    f\"DISQUALIFIED: {disqualified.upper()} (exceeded fact check limit)n\"\n                    f\"WINNER: {winner.upper()}\"\n                ),\n                \"validated\": True,\n                \"stage\": \"verdict\"\n            }\n            return Command(\n                update={\"messages\": messages + [verdict_msg]},\n                goto=END\n            )\n        if last_message.get(\"validated\"):\n            return Command(goto=NODE_DEBATE_MODERATOR)\n        elif speaker == SPEAKER_PRO:\n            return Command(goto=NODE_PRO_DEBATER)\n        elif speaker == SPEAKER_CON:\n            return Command(goto=NODE_CON_DEBATER)\n        raise ValueError(\"Unable to determine routing in FactCheckRouterNode.\")<\/code><\/pre>\n<p class=\"wp-block-paragraph\">First, the Fact Check Router checks if either side\u2019s fact-check count has reached 3. If so, it creates a Moderator-style message announcing an early end: the offending side is disqualified and the other side is the winner\u200b. It appends this verdict to the messages and returns a Command that jumps to <code>END<\/code>, effectively terminating the debate without going to the Judge (because we already know the outcome).<\/p>\n<p class=\"wp-block-paragraph\">If we\u2019re not ending the debate early, it then looks at the Fact Checker\u2019s result for the last message (which is stored as <code>validated<\/code> on that message). If validated is <em>True<\/em>, we go to the debate moderator: <code>Command(goto=debate_moderator_node)<\/code>.<\/p>\n<p class=\"wp-block-paragraph\">Else if the statement fails fact-check, the workflow goes back to the debater to produce a revised statement (with the state counters updated to reflect the failure). This loop can happen multiple times if needed (up to the disqualification limit).<\/p>\n<p class=\"wp-block-paragraph\">This dynamic control is the heart of Deb8flow\u2019s \u201cagentic\u201d nature \u2013 the ability to adapt the path of execution based on the content of the agents\u2019 outputs. It showcases LangGraph\u2019s strength: combining control flow with state. We\u2019re essentially encoding debate rules (like allowing retries for false claims, or ending the debate if someone cheats too often) directly into the workflow graph.<\/p>\n<h3 class=\"wp-block-heading\">Judge Agent (<code>JudgeNode<\/code>)<\/h3>\n<p class=\"wp-block-paragraph\"><strong>Last but not least, the Judge agent<\/strong> delivers the final verdict based on <strong>rhetorical skill, clarity, structure, and overall persuasiveness<\/strong>. Its <strong>system prompt<\/strong> and <strong>human prompt<\/strong> make this explicit:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>System Prompt<\/strong>: \u201cYou are an impartial debate judge AI. \u2026 Evaluate which debater presented their case more clearly, persuasively, and logically. You must focus on communication skills, structure of argument, rhetorical strength, and overall coherence.\u201d<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Human Prompt<\/strong>: \u201cHere is the full debate transcript. Please analyze the performance of both debaters\u2014PRO and CON. Evaluate rhetorical performance\u2014clarity, structure, persuasion, and relevance\u2014and decide who presented their case more effectively.\u201d<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">When the <strong>Judge<\/strong> node runs, it receives the entire debate transcript (all validated messages) alongside the original topic. It then uses GPT-4o to examine how each side framed their arguments, handled counterpoints, and supported (or failed to support) claims with examples or logic. Crucially, the <strong>Judge<\/strong> is forbidden to evaluate which position is <em>objectively correct<\/em> (or who it <strong>thinks <\/strong>might be correct)\u2014only <em>who argued more persuasively<\/em>.<\/p>\n<p class=\"wp-block-paragraph\">Below is an example final verdict from a Deb8flow run on the topic:<br \/><strong>\u201cShould governments implement a universal basic income in response to increasing automation in the workforce?\u201d<\/strong><\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-python\">WINNER: PRO\n\nREASON: The PRO debater presented a more compelling and rhetorically effective case for universal basic income. Their arguments were well-structured, beginning with a clear statement of the issue and the necessity of UBI in response to automation. They effectively addressed potential counterarguments by highlighting the unprecedented speed and scope of current technological changes, which distinguishes the current situation from past technological shifts. The PRO also provided empirical evidence from UBI pilot programs to counter the CON's claims about work disincentives and economic inefficiencies, reinforcing their argument with real-world examples.\n\nIn contrast, the CON debater, while presenting valid concerns about UBI, relied heavily on historical analogies and assumptions about workforce adaptability without adequately addressing the unique challenges posed by modern automation. Their arguments about the fiscal burden and potential inefficiencies of UBI were less supported by specific evidence compared to the PRO's rebuttals.\n\nOverall, the PRO's arguments were more coherent, persuasive, and backed by empirical evidence, making their case more convincing to a neutral observer.\n<\/code><\/pre>\n<h2 class=\"wp-block-heading\">Langsmith Tracing<\/h2>\n<p class=\"wp-block-paragraph\">Throughout Deb8flow\u2019s development, I relied on <strong>LangSmith<\/strong> (LangChain\u2019s tracing and observability toolkit) to ensure the entire debate pipeline was behaving correctly. Because we have multiple agents passing control between themselves, it\u2019s easy for unexpected loops or misrouted states to occur. <strong>LangSmith<\/strong> provides a convenient way to:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>Visualize Execution Flow:<\/strong> You can see each agent\u2019s prompt, t<strong>he tokens consumed<\/strong> (so you can also track costs), and any intermediate states. This makes it much simpler to confirm that, say, the Con Debater is properly referencing the Pro Debater\u2019s last message, or that the Fact Checker is accurately receiving the claim to verify.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Debug State Updates:<\/strong> If the Moderator or Fact Check Router is sending the flow to the wrong node, the trace will highlight that mismatch. You can trace which agent was invoked at each step and why, helping you spot stage or speaker misalignments early.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Track Prompt and Completion Tokens:<\/strong> With multiple GPT-4o calls, it\u2019s useful to see how many tokens each stage is using, which LangSmith logs automatically if you enable tracing.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">Integrating <strong>LangSmith<\/strong> is unexpectedly easy. You will just need to provide these 3 keys in your .env file: <code>LANGCHAIN_API_KEY<\/code><\/p>\n<p class=\"wp-block-paragraph\"><code>LANGCHAIN_TRACING_V2<\/code><\/p>\n<p class=\"wp-block-paragraph\"><code>LANGCHAIN_PROJECT<\/code><\/p>\n<p class=\"wp-block-paragraph\">Then you can open the LangSmith UI to see a structured trace of each run. This greatly reduces the guesswork involved in debugging multi-agent systems and is, in my experience, essential for more <strong>complex AI orchestration<\/strong> like ours. Example of a single run:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" height=\"1024\" width=\"664\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/04\/trace-in-langsmith-664x1024.png?resize=664%2C1024&#038;ssl=1\" alt=\"\" class=\"wp-image-601086\"><figcaption class=\"wp-element-caption\">The trace in waterfall mode in Lansmith of one run, showing how the whole flow ran. Source: Generated by the author using Langsmith.<\/figcaption><\/figure>\n<h2 class=\"wp-block-heading\">Reflections and Next Steps<\/h2>\n<p class=\"wp-block-paragraph\">Building Deb8flow was an eye-opening exercise in orchestrating <strong>autonomous agent workflows<\/strong>. We didn\u2019t just chain a single model call \u2013 we created an entire<em> <\/em>debate simulation with AI agents, each with a specific role, and allowed them to interact according to a set of rules. LangGraph provided a clear framework to define how data and control flows between agents, making the complex sequence manageable in code. By using class-based agents and a shared state, we maintained modularity and clarity, which will pay off for any software engineering project in the long run.<\/p>\n<p class=\"wp-block-paragraph\">An exciting aspect of this project was seeing emergent behavior. Even though each agent follows a script (a prompt), the unscripted combination \u2013 a debater trying to deceive, a fact-checker catching it, the debater rephrasing \u2013 felt surprisingly realistic! It\u2019s a small step toward more <strong><a href=\"https:\/\/towardsdatascience.com\/tag\/agentic-ai\/\" title=\"Agentic Ai\">Agentic Ai<\/a> systems<\/strong> that can perform non-trivial multi-step tasks with oversight on each other.<\/p>\n<p class=\"wp-block-paragraph\">There\u2019s plenty of ideas for improvement:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>User Interaction:<\/strong> Currently it\u2019s fully autonomous, but one could add a mode where a human provides the topic or even takes the role of one side against an AI opponent.<\/li>\n<li class=\"wp-block-list-item\">We can switch the order in which the Debaters talk.<\/li>\n<li class=\"wp-block-list-item\">We can change the prompts, and thus to a good degree the behavior of the agents, and experiment with different prompts.<\/li>\n<li class=\"wp-block-list-item\">Make the debaters also perform web search before producing their statements, thus providing them with the latest information.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">The broader implication of Deb8flow is how it showcases a pattern for <strong>composable AI agents<\/strong>. By defining clear boundaries and interactions (just like microservices in software), we can have complex AI-driven processes that remain interpretable and controllable. Each agent is like a cog in a machine, and LangGraph is the gear system making them work in unison.<\/p>\n<p class=\"wp-block-paragraph\">I found this project energizing, and I hope it inspires you to explore multi-agent workflows. Whether it\u2019s debating, collaborating on writing, or solving problems from different expert angles, the combination of <strong>GPT<\/strong>, <strong>tools<\/strong>, and structured <strong>agentic workflows<\/strong> opens up a new world of possibilities for AI development. Happy hacking!<\/p>\n<h2 class=\"wp-block-heading\">References<\/h2>\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/medium.com\/data-science\/from-basics-to-advanced-exploring-langgraph-e8c1cf4db787\">[1] D. Bouchard, \u201cFrom Basics to Advanced: Exploring LangGraph,\u201d <em>Medium<\/em>, Nov. 22, 2023. [Online]. Available: <\/a><a class=\"\" href=\"https:\/\/medium.com\/data-science\/from-basics-to-advanced-exploring-langgraph-e8c1cf4db787\">https:\/\/medium.com\/data-science\/from-basics-to-advanced-exploring-langgraph-e8c1cf4db787<\/a>. [Accessed: Apr. 1, 2025].<\/p>\n<p class=\"wp-block-paragraph\">[2] A. W. T. Ng, \u201cBuilding a Research Agent that Can Write to Google Docs: Part 1,\u201d <em>Towards Data Science<\/em>, Jan. 11, 2024. [Online]. Available: <a class=\"\" href=\"https:\/\/towardsdatascience.com\/building-a-research-agent-that-can-write-to-google-docs-part-1-4b49ea05a292\/\">https:\/\/towardsdatascience.com\/building-a-research-agent-that-can-write-to-google-docs-part-1-4b49ea05a292\/<\/a>. [Accessed: Apr. 1, 2025].<\/p>\n<p>The post <a href=\"https:\/\/towardsdatascience.com\/deb8flow-orchestrating-autonomous-ai-debates-with-langgraph-and-gpt-4o\/\">Deb8flow: Orchestrating Autonomous AI Debates with LangGraph and GPT-4o<\/a> appeared first on <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Iason Solomos<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/towardsdatascience.com\/deb8flow-orchestrating-autonomous-ai-debates-with-langgraph-and-gpt-4o\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Deb8flow: Orchestrating Autonomous AI Debates with LangGraph and GPT-4o Introduction I\u2019ve always been fascinated by debates\u2014the strategic framing, the sharp retorts, and the carefully timed comebacks. Debates aren\u2019t just entertaining; they\u2019re structured battles of ideas, driven by logic and evidence. Recently, I started wondering: could we replicate that dynamic using AI agents\u2014having them debate each [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[678,62,69,1200,67,88,160],"tags":[98,2341,1163],"class_list":["post-2999","post","type-post","status-publish","format-standard","hentry","category-agentic-ai","category-aimldsaimlds","category-artificial-intelligence","category-autonomous-agent","category-deep-dives","category-deep-learning","category-programming","tag-ai","tag-debflow","tag-gpt"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/2999"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=2999"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/2999\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=2999"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=2999"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=2999"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}