RAG with Amazon Bedrock Knowledge Bases

01

Problem

Foundation models are trained on public data, which means they know about the world but nothing about your organization. When a customer service assistant gets asked about a specific product's warranty terms, internal return policy, or a historical support ticket, the model can only generate plausible-sounding text — it cannot retrieve accurate, company-specific facts it was never trained on. Building a custom RAG pipeline from scratch to solve this requires assembling an embedding model, a vector database, a chunking strategy, a retrieval layer, and a generation step — each with its own infrastructure overhead. For most teams, that complexity is prohibitive relative to the business value they need to unlock. The gap between a capable foundation model and a genuinely useful enterprise assistant comes down to one unsolved problem: grounding.

02

Solution

Amazon Bedrock Knowledge Bases provides a fully managed RAG pipeline that handles document ingestion, embedding generation, vector storage, semantic retrieval, and generation augmentation — without requiring any custom infrastructure. This project built a proof of concept for AnyCompany Consumer Electronics, tasked with answering customer service questions using internal documents: product descriptions, satisfaction surveys, support ticket histories, and sales records. Documents were uploaded to S3, ingested into a Knowledge Base backed by OpenSearch Serverless, and embedded using Amazon Titan Text Embeddings v2. The RetrieveAndGenerate API then combined semantic document retrieval with Nova Lite generation, producing answers that cited specific source documents rather than hallucinating generic responses. A direct comparison between a knowledge-base-augmented LLM response and a vanilla LLM response made the grounding improvement immediately concrete.

03

Skills Acquired

Amazon Bedrock — the managed foundation model service used both as the generation backbone (Nova Lite) and as the RAG orchestration layer through its Knowledge Bases feature. Bedrock handles inference without requiring any model hosting infrastructure.
Knowledge Bases — Amazon Bedrock's managed RAG feature. A Knowledge Base connects a document source to a vector store, runs the full embedding pipeline on ingestion, and exposes a single RetrieveAndGenerate API endpoint that handles retrieve-then-generate in one call.
RAG (Retrieval Augmented Generation) — the architectural pattern at the core of this project. RAG retrieves relevant document chunks at query time and injects them into the model's context window, letting a frozen LLM answer questions about data it was never trained on — without retraining or fine-tuning.
OpenSearch Serverless — the vector store that indexes document embeddings and serves approximate nearest-neighbor queries at retrieval time. Bedrock Knowledge Bases provisions and manages the OpenSearch collection automatically, requiring no direct OpenSearch configuration.
Amazon S3 — the document source layer. Raw files — PDFs, CSVs, and text documents — were uploaded to an S3 bucket and synced into the Knowledge Base, which handled chunking and embedding on ingestion.
Python — used to query the Knowledge Base programmatically via the RetrieveAndGenerate API, enabling scripted testing and integration into downstream application code.
boto3 — the AWS Python SDK used to send retrieve-and-generate requests, inspect source attribution citations, and compare grounded versus ungrounded model responses in code.

What those tools accomplish together is best understood by seeing them in action — starting with the question that motivated the entire architecture.

04

Deep Dive

The hardest part of building a Retrieval Augmented Generation (RAG) system isn't the AI model — it's connecting proprietary data to an LLM in a way that's reliable, grounded, and maintainable. Most RAG tutorials show you how to wire up vector databases, embedding pipelines, and retrieval logic from scratch. Amazon Bedrock Knowledge Bases lets you skip that scaffolding entirely.

In this project, I played the role of an AI specialist at AnyCompany Consumer Electronics — a fictional company with a real-world problem. The VP of Customer Service wanted a chat assistant that could answer questions using the team's internal documents: product descriptions, customer satisfaction data, support ticket histories, and sales records. The goal was to build a proof of concept without building a custom RAG pipeline from scratch.

"The response returned was informed by the contents of the knowledge base — and thus more helpful." This is the core insight: an LLM trained on general data will hallucinate AnyCompany-specific answers. A knowledge-base-augmented LLM retrieves grounded facts from the actual documents.

Why This Project?

My previous AWS project (Serverless AI Application) wired Lambda, API Gateway, and Amazon Bedrock together to invoke an LLM via API. But the LLM had no awareness of AnyCompany's internal data — it could only generate generic answers. RAG solves that problem by giving the model a memory of your organization's specific knowledge at query time.

Understanding how to implement RAG is increasingly central to applied AI work. Whether you're building enterprise chatbots, internal search tools, or domain-specific assistants, the ability to ground an LLM in proprietary data without retraining it is a foundational skill in the modern AI stack.

What this project covers:

Creating an S3-backed data source and uploading structured + unstructured documents
Creating and syncing an Amazon Bedrock Knowledge Base with a vector store (OpenSearch Serverless)
Generating text embeddings using Amazon Titan Text Embeddings v2
Testing RAG responses in the Bedrock console and analyzing source attribution
Querying the knowledge base programmatically using the RetrieveAndGenerate API via Python and boto3

Architecture Overview

The knowledge base sits between the user's query and the foundation model, retrieving the most relevant document chunks and injecting them into the prompt as context.

User Query │ ▼ Amazon Bedrock Knowledge Base │ ├─── Embedding model: Titan Text Embeddings v2 │ Converts query → vector │ ├─── Vector store: OpenSearch Serverless │ Semantic similarity search │ Returns top-K relevant chunks │ └─── Foundation Model: Amazon Nova Lite Receives: original query + retrieved context Returns: grounded, source-attributed response │ ▼ Response with citations (footnotes → source document → S3 object)

Component	Service	Role
Data Source	Amazon S3	Stores the raw documents (CSV files, text files) that the knowledge base ingests
Embedding Model	Titan Text Embeddings v2	Converts document chunks and queries into dense numerical vectors for semantic comparison
Vector Store	OpenSearch Serverless	Stores and indexes the embeddings; performs approximate nearest-neighbor search at query time
Foundation Model	Amazon Nova Lite (on-demand)	Generates responses grounded in the retrieved context; provides source citations
Orchestration	Amazon Bedrock Knowledge Bases	Manages the full RAG pipeline — ingestion, sync, retrieval, and generation — without custom code
API Access	Bedrock Agent Runtime (boto3)	Exposes `RetrieveAndGenerate` for programmatic RAG queries

How It Was Built

Step 01

Create the S3 Bucket and Upload Documents

The knowledge base needs a data source. I created an S3 bucket with a randomized name to avoid conflicts, then uploaded AnyCompany's internal documents — a mix of CSV files (customer satisfaction scores, interaction data, support tickets, sales records) and text files (product descriptions, investor summary).

bash — S3 setup

# Create bucket with randomized name
kbbucket=$(aws s3api create-bucket \
  --bucket kbbucket-$RANDOM-$RANDOM \
  --region us-east-1 \
  | cut -d '"' -f4 | cut -d '/' -f2 | sed -n 2p)
echo "bucket: "$kbbucket

# Upload all CSV and TXT files to the bucket
for csv in $(ls *.csv); do
  aws s3 cp $csv s3://$kbbucket/
done
for txt in $(ls *.txt); do
  aws s3 cp $txt s3://$kbbucket/
done

# Verify: 6 documents now in the bucket
aws s3 ls $kbbucket

Console View — Task 4.1

After uploading, the S3 console shows anycompany-product-descriptions.txt (2.8 KB) alongside the other five documents in the bucket. Each object is listed with its size, last modified date, and S3 URI. This S3 bucket becomes the single data source that Bedrock Knowledge Bases ingests, indexes, and queries against.

The six documents cover different facets of AnyCompany's business: product specs, customer interactions, support ticket history, CSAT scores, sales data, and an investor summary. Together they give the knowledge base the context to answer questions a real customer support team would ask.

Step 02

Create a Bedrock Knowledge Base and Sync the Data

In the Bedrock console, I created the AnyCompanyCustomerSupport knowledge base, selecting the S3 bucket as the data source. The key configuration decisions:

Embeddings model: Amazon Titan Text Embeddings v2 — converts document chunks into dense vectors that encode semantic meaning
Vector store: An existing OpenSearch Serverless collection (pre-provisioned for the lab) with a pre-created vector index
IAM role: A pre-configured KBRole with permissions to read from S3 and write embeddings to OpenSearch Serverless

After creation, I triggered a Sync operation. Syncing is what actually processes the documents: Bedrock reads each file, splits it into chunks, generates embeddings using Titan, and writes the embedding vectors to the OpenSearch Serverless index. If new documents are added to S3 later, a re-sync is required to ingest them.

The sync step is where the S3 bucket becomes a queryable knowledge base. Without it, the documents are just static files. After sync, each chunk exists as a vector in the index — ready for semantic similarity search against any incoming query.

Step 03

Test RAG Responses in the Bedrock Console

The Bedrock Knowledge Base test panel provides an interactive way to validate the RAG pipeline before touching any code. I ran the same prompt against the LLM alone (Chat/Text playground) and then with the knowledge base enabled — the difference in response quality demonstrated RAG's core value.

The console test panel also surfaces source attribution: each response includes numbered footnotes linking to the specific document chunk that informed the answer, with the S3 URI of the source file.

Console View — Task 4.4

After the initial smartphone features query, I continued testing with more specific questions. The knowledge base correctly identified that Company L primarily uses email to open support tickets, and that the most common smartphone issue was network connectivity — information that only exists inside AnyCompany's support ticket documents, not in any public training data. A follow-up question asking "Were most of those issues resolved?" received a response acknowledging that most were not — again sourced directly from the CSV data.

This multi-turn capability (where follow-up questions carry context from earlier in the session) is managed via sessionId — a token returned by the API that links conversation turns together.

Step 04

Query the Knowledge Base Programmatically via RetrieveAndGenerate API

The same RAG capability available in the console is fully accessible through the bedrock-agent-runtime boto3 client. The Python script accepts a prompt from the command line, calls retrieve_and_generate(), and prints the full structured response — including retrieved document chunks, S3 source URIs, and the generated output text.

retrieve-generate.py

import os
import boto3
import json
import sys

boto3_session = boto3.session.Session()
region = boto3_session.region_name

bedrock_agent_runtime_client = boto3.client(
    'bedrock-agent-runtime', region_name=region
)

# Knowledge Base ID (FMI-1) and foundation model ID (FMI-2)
kb_id = "JYOO2OEH0A"
print('kb_id = ' + kb_id)
model_id = "amazon.nova-lite-v1:0"
model_arn = f'arn:aws:bedrock:us-east-1::foundation-model/{model_id}'
sessionId = ""

def retrieveAndGenerate(input_value, kbId, model_arn, sessionId):
    """Query the knowledge base and generate a grounded response."""
    print(input_value, kbId, model_arn, sessionId)

    config = {
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            "knowledgeBaseId": kbId,
            "modelArn": model_arn,
        },
    }

    if sessionId:
        print("sessionId is not empty")
        response = bedrock_agent_runtime_client.retrieve_and_generate(
            input={"text": input_value},
            retrieveAndGenerateConfiguration=config,
            sessionId=sessionId
        )
    else:
        print("sessionId is empty")
        response = bedrock_agent_runtime_client.retrieve_and_generate(
            input={"text": input_value},
            retrieveAndGenerateConfiguration=config,
        )

    print(json.dumps(response, indent=4, default=str))
    return response['sessionId']

# Accept prompt from command line
if len(sys.argv) < 2:
    print("Usage: python retrieve-generate.py '<your prompt>'")
    sys.exit(1)

input_value = sys.argv[1]
sessionId = retrieveAndGenerate(input_value, kb_id, model_arn, sessionId)

VSCode + Terminal — Task 5.3

Running python retrieve-generate.py "Name a few of AnyCompany's products" in the terminal returns the full JSON response. The citations array contains the retrieved document chunk (raw text from the knowledge base), the s3Location URI identifying the exact source file, and the output.text field with the generated answer. The sessionId returned can be passed back in subsequent calls to maintain conversation context across turns.

The API response structure mirrors what the console shows visually — retrieved chunks, source attribution, and generated text — but as a structured JSON object that can be parsed, stored, and displayed in any application. This is the foundation for building a production chat interface on top of the knowledge base.

Key Takeaways

RAG solves the knowledge cutoff problem without retraining. An LLM queried directly about AnyCompany's products or customers returns generic or hallucinated answers. With a knowledge base, those same queries return grounded, document-sourced responses.
Embeddings are the bridge between text and semantic search. The Titan Text Embeddings v2 model converts both the documents and incoming queries into high-dimensional vectors. Proximity in vector space = semantic similarity — not keyword overlap.
Source attribution is built in. Every response includes footnotes linking back to the specific document chunk — and by extension, the S3 object — that grounded the answer. This is critical for enterprise trust and auditability.
Session IDs enable multi-turn conversation. The sessionId returned by the API links conversation turns, allowing follow-up questions like "Were most of those issues resolved?" to be answered with awareness of the prior question.
Sync is required to ingest new data. The knowledge base isn't live — it reflects the state of the S3 bucket at the time of the last sync. Adding new documents requires re-syncing to regenerate embeddings and update the vector index.

What I Learned & Why It Matters to Employers

Managed RAG vs. custom RAG: Bedrock Knowledge Bases handles chunking, embedding, indexing, and retrieval without custom code. Understanding when to use a managed service vs. building with LangChain or LlamaIndex is a real architectural decision.
Vector databases in practice: OpenSearch Serverless serves as a production vector store here — the same role that Pinecone, Weaviate, or pgvector play in custom RAG stacks. The underlying concept (approximate nearest-neighbor search over embedding vectors) is the same across all of them.
Grounding LLMs in organizational data is the dominant enterprise use case. Most enterprise AI deployments aren't fine-tuning models — they're building RAG pipelines over internal documents. This project demonstrates that workflow end-to-end, from data ingestion through programmatic API access.
API structure mirrors console behavior. The RetrieveAndGenerate API returns the same citations and source metadata visible in the console UI. Understanding both the managed interface and the programmatic API provides flexibility for production integration.

Conclusion & Reflections

This project reinforced a core idea in applied AI: the model is often not the hard part. Getting the right data in front of the model — cleanly, reliably, with traceable sourcing — is where most of the engineering effort goes. Amazon Bedrock Knowledge Bases abstracts that plumbing, but understanding what it's doing underneath (chunking, embedding, vector search, prompt augmentation) makes you a better practitioner of RAG regardless of which tooling you use.

For AnyCompany's customer support team, the POC demonstrated that support staff could query their own historical data conversationally — finding out which customers prefer email, which product issues are most common, and whether prior tickets were resolved — without writing a single SQL query. That's a tangible productivity gain, and it's the kind of AI application that organizations are actively building.

Objective	Status
Add source documents to an Amazon S3 bucket	COMPLETED ✓
Create a Bedrock Knowledge Base with OpenSearch Serverless vector store	COMPLETED ✓
Sync documents — generate embeddings with Titan Text Embeddings v2	COMPLETED ✓
Test RAG responses in the Bedrock console test panel	COMPLETED ✓
Analyze source attribution and document citations in responses	COMPLETED ✓
Use the RetrieveAndGenerate API via Python and boto3	COMPLETED ✓

Performing RAG with Knowledge Bases for Amazon Bedrock

Problem

Solution

Skills Acquired

Deep Dive

Why This Project?

Architecture Overview

How It Was Built

Key Takeaways

Conclusion & Reflections