Calling Bedrock Models with LangChain

01

Problem

Building a conversational AI application on raw cloud APIs becomes unwieldy faster than expected. Every invocation requires hand-crafting JSON payloads, navigating nested response objects, and re-implementing message history from scratch — none of which contributes to the application itself. When a model updates its response schema or you need to switch providers, all that integration code breaks and must be rewritten. For teams trying to build real products, this scaffolding consumes development cycles that belong to application logic. The result is a persistent gap between "I can call the model" and "I can ship a production LLM application" that most developers quietly struggle to close.

02

Solution

LangChain provides a composable abstraction layer over LLM APIs that handles message formatting, response parsing, prompt templating, streaming, and stateful memory — eliminating everything that has nothing to do with application behavior. This project built a staff scheduling chatbot for a fictional music venue, AMusicVenue, using Amazon Bedrock's Nova Lite model through LangChain's ChatBedrock interface. The implementation progressed in four deliberate phases — raw boto3 invocations, structured output parsers, streaming responses, message history, and CSV document injection — exposing exactly how each LangChain component replaces low-level boilerplate. By the final phase, the chatbot could read a staff shift schedule from a CSV file loaded at startup, maintain multi-turn conversation context across the session, and answer natural language scheduling questions grounded in real data. A direct boto3-versus-LangChain comparison made the architectural trade-off concrete: less fine-grained control over the wire, dramatically less code per feature.

03

Skills Acquired

LangChain — the primary application framework. LangChain Expression Language (LCEL) chains a prompt template, model call, and output parser into a single composable pipeline using the | operator — the same mental model as Unix pipes. Cross-provider compatibility means switching the underlying model from Bedrock to OpenAI requires changing one line.
Amazon Bedrock — AWS's managed inference service that hosts and serves foundation models without requiring GPU provisioning. The project called Nova Lite through Bedrock's API both directly via boto3 and through LangChain, demonstrating both the low-level and high-level access patterns side by side.
ChatBedrock — LangChain's adapter class that wraps Bedrock models as a standard LangChain chat interface. It handles Bedrock-specific request and response formatting internally, so the rest of the chain remains provider-agnostic and portable across LLM providers.
Python — the implementation language for all four phases of the project, from the initial boto3 comparison through the final document-injected chatbot.
Jupyter — the development environment for iterating on each chatbot phase, combining executable code with inline outputs and explanatory prose that documents the evolution of the design.
boto3 — AWS's Python SDK, used in the project's first phase to invoke Bedrock at the raw API level. Working with boto3 directly before introducing LangChain made the abstraction trade-offs visible and grounded.
langchain-aws — the integration package that provides the ChatBedrock class and other AWS-specific LangChain components. It sits on top of boto3 and handles the AWS serialization details that make LangChain's cross-provider abstraction possible.

With those building blocks in place, the central design question becomes concrete — and the answer turns out to be more interesting than a simple preference for one SDK over another.

04

Deep Dive

If you can already invoke an Amazon Bedrock model with boto3 — a few lines of JSON, an API call, some indexing into the response object — why would you add an abstraction layer on top?

That question is at the center of this project. Working through an AWS Cloud Institute lab using the fictional AMusicVenue as a business context, I built a series of AI tools — ticket sales assistant, bar manager utilities, and a staff scheduling chatbot — first with raw boto3, then with LangChain. The difference in code complexity, flexibility, and capability answered the question clearly.

LangChain handles the low-level API details, response parsing, and error handling — allowing you to focus on building application logic rather than managing model integration complexities. The gap between a single invocation and a production-ready LLM application is exactly where LangChain lives.

What This Project Covers

This project spans four tasks, each building on the last to produce a fully functional, stateful, context-aware chatbot — all running against Amazon Bedrock's Nova Lite model via LangChain.

Comparing low-level boto3 invocations to high-level LangChain ChatBedrock calls
Working with structured messages: SystemMessage, HumanMessage, AIMessage
Building reusable prompt templates with PromptTemplate and ChatPromptTemplate
Parsing structured output with CommaSeparatedListOutputParser
Streaming model responses chunk by chunk for a better user experience
Adding statefulness to a chatbot by maintaining a message history list
Loading CSV data with CSVLoader and injecting it into the model's system context
Calculating per-invocation cost from token usage metadata

boto3 vs. LangChain: The Core Comparison

The project starts by invoking the same model — Amazon Nova Lite — using both approaches side by side. The boto3 path is explicit but verbose: construct a JSON payload, encode it, call invoke_model(), decode the response body, navigate nested keys to extract the generated text.

boto3 — low-level invocation

import boto3
import json

bedrock_client = boto3.client('bedrock-runtime')

native_request = {
    "messages": [{"role": "user", "content": [{"text": prompt}]}],
    "inferenceConfig": {"maxTokens": 512, "temperature": 0.5, "topP": 0.9}
}

response = bedrock_client.invoke_model(
    modelId="amazon.nova-lite-v1:0",
    body=json.dumps(native_request).encode('utf-8')
)
response = json.loads(response["body"].read())
generated_text = response.get('output').get('message').get('content')[0].get('text')

The LangChain equivalent reduces the same call to three lines — and the response is a clean Python object, not a raw JSON blob requiring chain-indexed extraction.

LangChain — ChatBedrock invocation

from langchain_aws.chat_models.bedrock import ChatBedrock

nova_llm = ChatBedrock(model_id="amazon.nova-lite-v1:0")

response = nova_llm.invoke([("human", prompt)])
print(response.content)  # clean string, no JSON indexing needed

The difference isn't just aesthetics. LangChain's abstraction becomes critical the moment you add streaming, structured messages, prompt templates, output parsers, or conversation history — each of which would require significant boilerplate to implement against the raw boto3 API.

How It Was Built

Task 3 — Part 1

Structured Messages: Booking Shows with ChatBedrock

LangChain's chat models accept lists of role-tagged messages rather than raw strings. This is the same interface that all modern chat APIs expose — but LangChain provides first-class Python classes for each role, making conversation construction explicit and type-safe.

structured messages — venue booking

nova_chat = ChatBedrock(
    client=bedrock_client,
    model_id="amazon.nova-lite-v1:0",
    model_kwargs={"maxTokens": 512, "temperature": 0.5, "topP": 0.9}
)

messages = [
    ("system", "You are the manager of a music venue. You respond to artists who reach out to book shows."),
    ("human", "Hello! We are an up-and-coming punk band with thousands of fans. We are coming to your city on September 17th.")
]

ai_msg = nova_chat.invoke(messages)
print(ai_msg.content)

The model — primed with a system message defining its role as a venue manager — responded with a detailed, contextually appropriate email requesting show details, financial terms, and logistics. The system message pattern is the primary mechanism for giving a chat model a persona and behavioral constraints before the user speaks.

Task 4 — Part 1

LangChain Messages Classes: HumanMessage & SystemMessage

Rather than passing tuples, LangChain's langchain_core.messages module provides typed classes — SystemMessage, HumanMessage, and AIMessage — that become the building blocks for stateful conversation history. I used these to build a bar staff communication tool: draft an email to all bar staff about updated closing procedures.

The model also supports streaming via nova_chat.stream(prompt), which yields response chunks as they generate — making long outputs feel responsive rather than frozen while the model thinks.

Task 4 — Part 2

Prompt Templates: Automating Purchase Orders

PromptTemplate and ChatPromptTemplate separate the structure of a prompt from its variable inputs — enabling the same prompt design to be reused across many inputs without string concatenation. I applied this to generate three purchase orders (soda, napkins, receipt paper) from a list of order dictionaries, and a bar recipe assistant using ChatPromptTemplate.

PromptTemplate — purchase order generator

from langchain_core.prompts import PromptTemplate

template = """
Human: Create a purchase order for {product} to {supplier} from our company,
AMusicVenue. The order should include {quantity} units at ${price} per unit.
Assistant:"""

prompt_template = PromptTemplate.from_template(template)

orders = [
    {"product": "soda", "supplier": "BevCo", "quantity": 100, "price": 1.50},
    {"product": "napkins", "supplier": "SupplyCo", "quantity": 500, "price": 0.10},
    {"product": "receipt paper", "supplier": "OfficePro", "quantity": 50, "price": 3.00},
]

for order in orders:
    formatted_prompt = prompt_template.format(**order)
    response = nova_chat.invoke(formatted_prompt)
    print(response.content)

The loop generates three distinct, properly formatted purchase orders without any repeated prompt engineering — the template handles the structure, the data dictionary provides the values. This pattern scales: the same template runs against any number of orders with zero additional code.

Task 4 — Part 3

Output Parsers: Structured List Extraction

Raw model output is a string. For applications that need structured data — a list of items, a JSON object, a yes/no decision — that string needs to be parsed. LangChain's output parsers sit between the model response and your application logic, handling that conversion automatically.

I used CommaSeparatedListOutputParser to build an ingredient substitution tool for the AMusicVenue bar — given a cocktail ingredient, return a comma-separated list of alternatives the bar could use if the original is unavailable.

CommaSeparatedListOutputParser — ingredient substitutions

from langchain_core.output_parsers import CommaSeparatedListOutputParser
from langchain_core.prompts import ChatPromptTemplate

output_parser = CommaSeparatedListOutputParser()
format_instructions = output_parser.get_format_instructions()

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a knowledgeable bartender with expertise in cocktail ingredients."),
    ("human", "List 5 substitutes for {ingredient} in cocktails.\n{format_instructions}")
])

chain = prompt | nova_chat | output_parser

result = chain.invoke({
    "ingredient": "triple sec",
    "format_instructions": format_instructions
})
print(result)  # ['Cointreau', 'Grand Marnier', 'Blue Curacao', 'limoncello', 'orange juice']

The chain = prompt | nova_chat | output_parser pattern is LangChain's pipe operator — composing prompt construction, model invocation, and output parsing into a single callable. The result is a Python list ready for downstream use, not a string that requires further parsing.

Task 5

Stateless Chatbot: The Starting Point and Its Flaws

The first version of the AMusicVenue shift scheduling chatbot was deliberately simple: a loop that takes user input, sends it to the model, and prints the response. No history, no context — each message is treated as a fresh conversation.

stateless chatbot — no memory

messages = [
    SystemMessage(content="You are a helpful assistant for AMusicVenue staff scheduling.")
]

while True:
    user_input = input("You: ")
    if user_input.lower() == "exit":
        break
    response = nova_chat.invoke([HumanMessage(content=user_input)])
    print(f"Assistant: {response.content}")

Two fundamental problems emerge immediately. First, the model has no memory between turns — asking "Who is working Friday night?" followed by "Can they swap with Saturday?" produces a confused response because the second message contains no reference to the first. Second, the model has no knowledge of the actual shift schedule — it can only respond with generic advice, not specific data about AMusicVenue's staff.

Task 6 — Part 1

Adding Statefulness: Persistent Message History

The fix for the memory problem is straightforward: maintain a running list of messages and append each exchange — both the human message and the AI's response — before passing the full list to the model on the next invocation.

stateful chatbot — message history

messages = [
    SystemMessage(content="You are a helpful assistant for AMusicVenue staff scheduling.")
]

while True:
    user_input = input("You: ")
    if user_input.lower() == "exit":
        break

    messages.append(HumanMessage(content=user_input))
    response = nova_chat.invoke(messages)
    messages.append(AIMessage(content=response.content))

    print(f"Assistant: {response.content}")

    # Track cost per turn
    input_tokens = response.usage_metadata["input_tokens"]
    output_tokens = response.usage_metadata["output_tokens"]
    cost = (input_tokens * 0.00000006) + (output_tokens * 0.00000024)
    print(f"[Cost: ${cost:.6f}]")

The model now receives the full conversation on each turn. A follow-up like "Can they swap shifts?" resolves correctly because the prior context — who was mentioned, what day was discussed — is present in the message list. Token cost is calculated per turn from response.usage_metadata, giving operational visibility into per-conversation cost at Nova Lite's rates.

Task 6 — Part 2

Adding Context: CSVLoader and Shift Data Injection

Statefulness solves memory, but the second problem remains: the model doesn't know AMusicVenue's actual shift schedule. The solution is to load the shifts.csv file using LangChain's CSVLoader from langchain_community and inject it as context into the system message before the conversation begins.

CSVLoader — injecting shift data into system context

from langchain_community.document_loaders import CSVLoader

loader = CSVLoader(file_path="shifts.csv")
docs = loader.load()

# Convert documents to a single string for injection
shifts_string = "\n".join([doc.page_content for doc in docs])

system_prompt = f"""You are a helpful assistant for AMusicVenue staff scheduling.
You have access to the current shift schedule below. Use this data to answer
questions accurately about who is working, when, and in what role.

SHIFT SCHEDULE:
{shifts_string}
"""

messages = [SystemMessage(content=system_prompt)]

With both statefulness and context in place, the chatbot can answer precise questions: "Who is bartending on Friday?" returns the actual name from the CSV. "Can Alex cover Saturday?" takes into account what was already discussed in the session. The combination of document context and conversation history transforms a generic Q&A loop into a functional scheduling assistant.

This is a lightweight version of RAG — instead of embedding and vector search, the entire document is injected directly into the context window. It works well for small, structured datasets like a weekly shift schedule. For larger corpora, a proper vector retrieval approach (like the Bedrock Knowledge Bases project) becomes necessary.

Key Takeaways

LangChain abstracts model-specific API formats. The same LangChain code — message lists, prompt templates, output parsers — works against any supported model provider. Switching from Nova Lite to Claude or GPT-4 is a one-line change to the model ID, not a rewrite of the application logic.
Statefulness requires explicit message management. LLMs are stateless by design — every invocation is independent. Conversation memory is an application-layer concern. Maintaining a message history list and appending to it on every turn is the simplest pattern; production systems use memory modules or vector-store-backed retrieval.
Prompt Templates separate structure from data. Hard-coding variables into prompt strings doesn't scale. Templates enforce a consistent format and make prompts testable, versionable, and reusable across different inputs.
Output Parsers bridge LLM responses and application logic. CommaSeparatedListOutputParser is the simplest example of a broader pattern: structured output (JSON, lists, typed objects) that can be consumed directly by downstream code without ad-hoc string parsing.
CSVLoader enables lightweight document context. For small, structured datasets, injecting the full document as a system message is a viable alternative to building a full RAG pipeline. Understanding when to use direct injection vs. vector retrieval is an important architectural decision.
Token costs are observable and calculable. response.usage_metadata exposes input and output token counts on every invocation. In a stateful chatbot, input tokens grow with each turn as history accumulates — a cost model consideration for long sessions.

What I Learned & Why It Matters to Employers

LangChain is the industry standard for LLM application development. Most production AI applications — whether on AWS, Azure, or GCP — are built with LangChain or a comparable orchestration framework. Understanding its core abstractions (chains, prompts, parsers, document loaders) is a prerequisite for contributing to these systems.
Stateful conversation is an architectural pattern, not a framework feature. The message history approach demonstrated here — append, invoke, append — generalizes to every chat model API. Understanding the underlying pattern means I can implement it in LangChain, raw boto3, the OpenAI SDK, or any other tool that surfaces a messages-based interface.
Prompt engineering is engineering. PromptTemplates, format instructions, and output parsers are not tricks — they are the disciplined application of software engineering principles (separation of concerns, reusability, testability) to the specific challenge of instructing language models.
The boto3 comparison clarifies when abstractions add value. Going through both implementations for the same task builds intuition about where the abstraction boundary sits and what it costs. That judgment — when to use managed tooling vs. raw APIs — is a real skill in cloud AI development.

Conclusion & Reflections

Starting with boto3 and rebuilding the same features in LangChain made the value of the abstraction concrete rather than theoretical. The jump from raw API calls to structured messages, typed prompt templates, and chainable output parsers is not cosmetic — each layer removes a class of boilerplate and enables composition that would otherwise require significant custom code.

The final chatbot — stateful, context-aware, and cost-observable — demonstrates a pattern applicable to any domain: load a document, inject it as context, maintain conversation history, parse outputs. For AMusicVenue's operations team, this translates to a shift scheduling assistant that knows the actual schedule and can hold a real back-and-forth conversation about coverage, swaps, and staffing decisions — without anyone writing a single SQL query or reading a spreadsheet.

Objective	Status
Invoke Amazon Bedrock models with LangChain's ChatBedrock	COMPLETED ✓
Compare boto3 low-level API with LangChain high-level abstraction	COMPLETED ✓
Use SystemMessage, HumanMessage, AIMessage typed classes	COMPLETED ✓
Build and apply PromptTemplate and ChatPromptTemplate	COMPLETED ✓
Parse structured list output with CommaSeparatedListOutputParser	COMPLETED ✓
Implement a stateful chatbot with persistent message history	COMPLETED ✓
Load CSV data with CSVLoader and inject into model context	COMPLETED ✓
Track and calculate per-invocation token cost	COMPLETED ✓