Problem
Foundation models are trained on public data, which means they know about the world but nothing about your organization. When a customer service assistant gets asked about a specific product's warranty terms, internal return policy, or a historical support ticket, the model can only generate plausible-sounding text — it cannot retrieve accurate, company-specific facts it was never trained on. Building a custom RAG pipeline from scratch to solve this requires assembling an embedding model, a vector database, a chunking strategy, a retrieval layer, and a generation step — each with its own infrastructure overhead. For most teams, that complexity is prohibitive relative to the business value they need to unlock. The gap between a capable foundation model and a genuinely useful enterprise assistant comes down to one unsolved problem: grounding.
Solution
Amazon Bedrock Knowledge Bases provides a fully managed RAG pipeline that handles document ingestion, embedding generation, vector storage, semantic retrieval, and generation augmentation — without requiring any custom infrastructure. This project built a proof of concept for AnyCompany Consumer Electronics, tasked with answering customer service questions using internal documents: product descriptions, satisfaction surveys, support ticket histories, and sales records. Documents were uploaded to S3, ingested into a Knowledge Base backed by OpenSearch Serverless, and embedded using Amazon Titan Text Embeddings v2. The RetrieveAndGenerate API then combined semantic document retrieval with Nova Lite generation, producing answers that cited specific source documents rather than hallucinating generic responses. A direct comparison between a knowledge-base-augmented LLM response and a vanilla LLM response made the grounding improvement immediately concrete.
Skills Acquired
- Amazon Bedrock — the managed foundation model service used both as the generation backbone (Nova Lite) and as the RAG orchestration layer through its Knowledge Bases feature. Bedrock handles inference without requiring any model hosting infrastructure.
- Knowledge Bases — Amazon Bedrock's managed RAG feature. A Knowledge Base connects a document source to a vector store, runs the full embedding pipeline on ingestion, and exposes a single
RetrieveAndGenerateAPI endpoint that handles retrieve-then-generate in one call. - RAG (Retrieval Augmented Generation) — the architectural pattern at the core of this project. RAG retrieves relevant document chunks at query time and injects them into the model's context window, letting a frozen LLM answer questions about data it was never trained on — without retraining or fine-tuning.
- OpenSearch Serverless — the vector store that indexes document embeddings and serves approximate nearest-neighbor queries at retrieval time. Bedrock Knowledge Bases provisions and manages the OpenSearch collection automatically, requiring no direct OpenSearch configuration.
- Amazon S3 — the document source layer. Raw files — PDFs, CSVs, and text documents — were uploaded to an S3 bucket and synced into the Knowledge Base, which handled chunking and embedding on ingestion.
- Python — used to query the Knowledge Base programmatically via the
RetrieveAndGenerateAPI, enabling scripted testing and integration into downstream application code. - boto3 — the AWS Python SDK used to send retrieve-and-generate requests, inspect source attribution citations, and compare grounded versus ungrounded model responses in code.
What those tools accomplish together is best understood by seeing them in action — starting with the question that motivated the entire architecture.
Deep Dive
The hardest part of building a Retrieval Augmented Generation (RAG) system isn't the AI model — it's connecting proprietary data to an LLM in a way that's reliable, grounded, and maintainable. Most RAG tutorials show you how to wire up vector databases, embedding pipelines, and retrieval logic from scratch. Amazon Bedrock Knowledge Bases lets you skip that scaffolding entirely.
In this project, I played the role of an AI specialist at AnyCompany Consumer Electronics — a fictional company with a real-world problem. The VP of Customer Service wanted a chat assistant that could answer questions using the team's internal documents: product descriptions, customer satisfaction data, support ticket histories, and sales records. The goal was to build a proof of concept without building a custom RAG pipeline from scratch.
Why This Project?
My previous AWS project (Serverless AI Application) wired Lambda, API Gateway, and Amazon Bedrock together to invoke an LLM via API. But the LLM had no awareness of AnyCompany's internal data — it could only generate generic answers. RAG solves that problem by giving the model a memory of your organization's specific knowledge at query time.
Understanding how to implement RAG is increasingly central to applied AI work. Whether you're building enterprise chatbots, internal search tools, or domain-specific assistants, the ability to ground an LLM in proprietary data without retraining it is a foundational skill in the modern AI stack.
What this project covers:
- Creating an S3-backed data source and uploading structured + unstructured documents
- Creating and syncing an Amazon Bedrock Knowledge Base with a vector store (OpenSearch Serverless)
- Generating text embeddings using Amazon Titan Text Embeddings v2
- Testing RAG responses in the Bedrock console and analyzing source attribution
- Querying the knowledge base programmatically using the
RetrieveAndGenerateAPI via Python and boto3
Architecture Overview
The knowledge base sits between the user's query and the foundation model, retrieving the most relevant document chunks and injecting them into the prompt as context.
| Component | Service | Role |
|---|---|---|
| Data Source | Amazon S3 | Stores the raw documents (CSV files, text files) that the knowledge base ingests |
| Embedding Model | Titan Text Embeddings v2 | Converts document chunks and queries into dense numerical vectors for semantic comparison |
| Vector Store | OpenSearch Serverless | Stores and indexes the embeddings; performs approximate nearest-neighbor search at query time |
| Foundation Model | Amazon Nova Lite (on-demand) | Generates responses grounded in the retrieved context; provides source citations |
| Orchestration | Amazon Bedrock Knowledge Bases | Manages the full RAG pipeline — ingestion, sync, retrieval, and generation — without custom code |
| API Access | Bedrock Agent Runtime (boto3) | Exposes RetrieveAndGenerate for programmatic RAG queries |
How It Was Built
Step 01
Create the S3 Bucket and Upload Documents
The knowledge base needs a data source. I created an S3 bucket with a randomized name to avoid conflicts, then uploaded AnyCompany's internal documents — a mix of CSV files (customer satisfaction scores, interaction data, support tickets, sales records) and text files (product descriptions, investor summary).
# Create bucket with randomized name kbbucket=$(aws s3api create-bucket \ --bucket kbbucket-$RANDOM-$RANDOM \ --region us-east-1 \ | cut -d '"' -f4 | cut -d '/' -f2 | sed -n 2p) echo "bucket: "$kbbucket # Upload all CSV and TXT files to the bucket for csv in $(ls *.csv); do aws s3 cp $csv s3://$kbbucket/ done for txt in $(ls *.txt); do aws s3 cp $txt s3://$kbbucket/ done # Verify: 6 documents now in the bucket aws s3 ls $kbbucket
Console View — Task 4.1
After uploading, the S3 console shows anycompany-product-descriptions.txt (2.8 KB) alongside the other five documents in the bucket. Each object is listed with its size, last modified date, and S3 URI. This S3 bucket becomes the single data source that Bedrock Knowledge Bases ingests, indexes, and queries against.
The six documents cover different facets of AnyCompany's business: product specs, customer interactions, support ticket history, CSAT scores, sales data, and an investor summary. Together they give the knowledge base the context to answer questions a real customer support team would ask.
Step 02
Create a Bedrock Knowledge Base and Sync the Data
In the Bedrock console, I created the AnyCompanyCustomerSupport knowledge base, selecting the S3 bucket as the data source. The key configuration decisions:
- Embeddings model: Amazon Titan Text Embeddings v2 — converts document chunks into dense vectors that encode semantic meaning
- Vector store: An existing OpenSearch Serverless collection (pre-provisioned for the lab) with a pre-created vector index
- IAM role: A pre-configured KBRole with permissions to read from S3 and write embeddings to OpenSearch Serverless
After creation, I triggered a Sync operation. Syncing is what actually processes the documents: Bedrock reads each file, splits it into chunks, generates embeddings using Titan, and writes the embedding vectors to the OpenSearch Serverless index. If new documents are added to S3 later, a re-sync is required to ingest them.
Step 03
Test RAG Responses in the Bedrock Console
The Bedrock Knowledge Base test panel provides an interactive way to validate the RAG pipeline before touching any code. I ran the same prompt against the LLM alone (Chat/Text playground) and then with the knowledge base enabled — the difference in response quality demonstrated RAG's core value.
The console test panel also surfaces source attribution: each response includes numbered footnotes linking to the specific document chunk that informed the answer, with the S3 URI of the source file.
Console View — Task 4.4
After the initial smartphone features query, I continued testing with more specific questions. The knowledge base correctly identified that Company L primarily uses email to open support tickets, and that the most common smartphone issue was network connectivity — information that only exists inside AnyCompany's support ticket documents, not in any public training data. A follow-up question asking "Were most of those issues resolved?" received a response acknowledging that most were not — again sourced directly from the CSV data.
This multi-turn capability (where follow-up questions carry context from earlier in the session)
is managed via sessionId —
a token returned by the API that links conversation turns together.
Step 04
Query the Knowledge Base Programmatically via RetrieveAndGenerate API
The same RAG capability available in the console is fully accessible through the
bedrock-agent-runtime
boto3 client. The Python script accepts a prompt from the command line, calls
retrieve_and_generate(),
and prints the full structured response — including retrieved document chunks, S3 source URIs,
and the generated output text.
import os import boto3 import json import sys boto3_session = boto3.session.Session() region = boto3_session.region_name bedrock_agent_runtime_client = boto3.client( 'bedrock-agent-runtime', region_name=region ) # Knowledge Base ID (FMI-1) and foundation model ID (FMI-2) kb_id = "JYOO2OEH0A" print('kb_id = ' + kb_id) model_id = "amazon.nova-lite-v1:0" model_arn = f'arn:aws:bedrock:us-east-1::foundation-model/{model_id}' sessionId = "" def retrieveAndGenerate(input_value, kbId, model_arn, sessionId): """Query the knowledge base and generate a grounded response.""" print(input_value, kbId, model_arn, sessionId) config = { "type": "KNOWLEDGE_BASE", "knowledgeBaseConfiguration": { "knowledgeBaseId": kbId, "modelArn": model_arn, }, } if sessionId: print("sessionId is not empty") response = bedrock_agent_runtime_client.retrieve_and_generate( input={"text": input_value}, retrieveAndGenerateConfiguration=config, sessionId=sessionId ) else: print("sessionId is empty") response = bedrock_agent_runtime_client.retrieve_and_generate( input={"text": input_value}, retrieveAndGenerateConfiguration=config, ) print(json.dumps(response, indent=4, default=str)) return response['sessionId'] # Accept prompt from command line if len(sys.argv) < 2: print("Usage: python retrieve-generate.py '<your prompt>'") sys.exit(1) input_value = sys.argv[1] sessionId = retrieveAndGenerate(input_value, kb_id, model_arn, sessionId)
VSCode + Terminal — Task 5.3
Running python retrieve-generate.py "Name a few of AnyCompany's products"
in the terminal returns the full JSON response. The
citations array contains the retrieved document chunk
(raw text from the knowledge base), the s3Location URI
identifying the exact source file, and the output.text
field with the generated answer. The sessionId
returned can be passed back in subsequent calls to maintain conversation context across turns.
The API response structure mirrors what the console shows visually — retrieved chunks, source attribution, and generated text — but as a structured JSON object that can be parsed, stored, and displayed in any application. This is the foundation for building a production chat interface on top of the knowledge base.
Key Takeaways
- RAG solves the knowledge cutoff problem without retraining. An LLM queried directly about AnyCompany's products or customers returns generic or hallucinated answers. With a knowledge base, those same queries return grounded, document-sourced responses.
- Embeddings are the bridge between text and semantic search. The Titan Text Embeddings v2 model converts both the documents and incoming queries into high-dimensional vectors. Proximity in vector space = semantic similarity — not keyword overlap.
- Source attribution is built in. Every response includes footnotes linking back to the specific document chunk — and by extension, the S3 object — that grounded the answer. This is critical for enterprise trust and auditability.
-
Session IDs enable multi-turn conversation.
The
sessionIdreturned by the API links conversation turns, allowing follow-up questions like "Were most of those issues resolved?" to be answered with awareness of the prior question. - Sync is required to ingest new data. The knowledge base isn't live — it reflects the state of the S3 bucket at the time of the last sync. Adding new documents requires re-syncing to regenerate embeddings and update the vector index.
What I Learned & Why It Matters to Employers
- Managed RAG vs. custom RAG: Bedrock Knowledge Bases handles chunking, embedding, indexing, and retrieval without custom code. Understanding when to use a managed service vs. building with LangChain or LlamaIndex is a real architectural decision.
- Vector databases in practice: OpenSearch Serverless serves as a production vector store here — the same role that Pinecone, Weaviate, or pgvector play in custom RAG stacks. The underlying concept (approximate nearest-neighbor search over embedding vectors) is the same across all of them.
- Grounding LLMs in organizational data is the dominant enterprise use case. Most enterprise AI deployments aren't fine-tuning models — they're building RAG pipelines over internal documents. This project demonstrates that workflow end-to-end, from data ingestion through programmatic API access.
-
API structure mirrors console behavior.
The
RetrieveAndGenerateAPI returns the same citations and source metadata visible in the console UI. Understanding both the managed interface and the programmatic API provides flexibility for production integration.
Conclusion & Reflections
This project reinforced a core idea in applied AI: the model is often not the hard part. Getting the right data in front of the model — cleanly, reliably, with traceable sourcing — is where most of the engineering effort goes. Amazon Bedrock Knowledge Bases abstracts that plumbing, but understanding what it's doing underneath (chunking, embedding, vector search, prompt augmentation) makes you a better practitioner of RAG regardless of which tooling you use.
For AnyCompany's customer support team, the POC demonstrated that support staff could query their own historical data conversationally — finding out which customers prefer email, which product issues are most common, and whether prior tickets were resolved — without writing a single SQL query. That's a tangible productivity gain, and it's the kind of AI application that organizations are actively building.
| Objective | Status |
|---|---|
| Add source documents to an Amazon S3 bucket | COMPLETED ✓ |
| Create a Bedrock Knowledge Base with OpenSearch Serverless vector store | COMPLETED ✓ |
| Sync documents — generate embeddings with Titan Text Embeddings v2 | COMPLETED ✓ |
| Test RAG responses in the Bedrock console test panel | COMPLETED ✓ |
| Analyze source attribution and document citations in responses | COMPLETED ✓ |
| Use the RetrieveAndGenerate API via Python and boto3 | COMPLETED ✓ |