Viet Nguyen

Machine Learning Engineer · Data Scientist

Summary

Machine Learning Engineer and Data Scientist with hands-on experience building end-to-end ML pipelines, RAG systems, LangChain conversational AI, and serverless generative AI on AWS. Proficient in Python, scikit-learn, LangChain, Amazon Bedrock, FastAPI, and SQL. Pursuing an M.S. in Computer Science (Machine Learning) at Georgia Tech. Seeking ML Engineering and Data Science roles in generative AI and model deployment.

Skills

Languages & Tools: Python, SQL, Git, GitHub, Jupyter Notebook
Machine Learning: scikit-learn, Random Forest, Decision Tree, Gradient Boosting, Feature Engineering, SMOTE, Supervised Learning, Unsupervised Learning, ML Pipelines, Hyperparameter Tuning, GridSearchCV, Model Evaluation, Classification, Regression
Generative AI & LLMs: LangChain, ChatBedrock, Amazon Bedrock, Retrieval-Augmented Generation (RAG), Prompt Engineering, LLM APIs, Conversational AI, Vector Databases, Titan Embeddings, OpenSearch Serverless
Data Science: Pandas, NumPy, Matplotlib, Seaborn, SciPy, Exploratory Data Analysis (EDA), Statistical Analysis, Hypothesis Testing, A/B Testing
Cloud & AWS: AWS Lambda, Amazon API Gateway, Amazon S3, Amazon EMR, AWS EC2, CloudWatch, IAM, boto3, AWS Certified Cloud Practitioner
Model Deployment & APIs: FastAPI, Pydantic, REST API, Model Serving, joblib
Familiar With: PyTorch, TensorFlow, Docker, HuggingFace Transformers, LightGBM

Experience

Co-Founder & Operations Lead

TechX Robotics · Tustin, CA

  • Managed full P&L for a $43,000+ revenue robotics education business across payroll, insurance, equipment, and tournament operations
  • Applied LLM-based comparative analysis (Claude, ChatGPT, Gemini) to evaluate owner compensation scenarios — direct application of LLM evaluation to a real financial decision
  • Designed competitive curriculum that led VEX Robotics teams to 1 World Championship and multiple State Championships

Marketing Data Analyst · Volunteer

Association of Talent Development – Orange County (ATD-OC) · Anaheim, CA

  • Delivered a data-driven digital marketing analysis using an AI-powered agentic workspace, surfacing actionable growth opportunities across member engagement channels
  • Developed a 3-phase strategic roadmap targeting 15% membership growth and 40% engagement improvement
  • Deployed an interactive marketing analytics dashboard via Vercel to accompany the strategic report

Engineering & Robotics Teacher · VEX AI Robotics Coach

Jeffrey Trail Middle School · Irvine, CA

  • Manage a $15,000–$20,000 annual engineering and robotics program budget
  • Teach AI, machine learning concepts, robotics, and computer science to middle school students
  • Coached teams to 4 consecutive World Championships and 5 consecutive State Championships

I.T. & Cybersecurity Support Intern

Manhattan Beachwear · Cypress, CA

  • Supported post-ransomware IT overhaul following a $10M breach; deployed MFA, AI-endpoint protection, and secure remote access across all employee systems

Projects

Built a stateful, context-aware staff scheduling chatbot using LangChain's ChatBedrock — implementing prompt templates, output parsers, conversation memory, and CSV document injection to power multi-turn AI conversations on Amazon Bedrock.

Engineered a retrieval-augmented generation (RAG) pipeline on Amazon Bedrock using Titan Embeddings for semantic search, OpenSearch Serverless as the vector store, and the RetrieveAndGenerate API to ground LLM responses in private enterprise documents.

Deployed a Random Forest regression model as a production REST API using FastAPI and Pydantic schema validation, with startup model loading via joblib and a /health endpoint for monitoring.

Built a binary classification model using feature engineering, SMOTE oversampling, and GridSearchCV across 108 hyperparameter combinations. Achieved F1 = 0.6197 on a 20% minority-class imbalanced dataset.

Benchmarked Apache Spark and Hadoop across EC2 instance types on Amazon EMR to identify the optimal distributed computing configuration for a real-world platform. Delivered cost/performance analysis via CloudWatch metrics.

Connected Amazon Bedrock to a frontend via Lambda and API Gateway to generate AI-powered flashcards from study notes — fully serverless, CORS-enabled, and prompt-engineered.

Analyzed 500 clients across 5 merged tables to compute per-user monthly revenue; confirmed via two-sample t-test that Surf generates significantly more revenue than Ultimate (avg $50.33 vs $47.31, p ≈ 0).

Random Forest classifier recommending the right Megaline mobile plan — 81.8% test accuracy, beating the target by 6.8 points.

SQL + hypothesis testing proving bad weather extends Loop-to-O'Hare rides by 20.6% (t = 6.84, p ≈ 0).

Mined 16,715 game records to build a data-backed 2017 ad strategy — hypothesis-tested platform and genre preferences across NA, EU, and Japan.

Tested whether highly-rated "Golden Age" TV shows also get the most IMDb votes — cleaned messy real-world data before confirming the hypothesis.

Cleaned and analyzed 4.5M order records to uncover peak shopping windows, reorder rhythms, and top items across 206K customers.

Proposed SpotFi localization, RGB-D 3D mapping, and P2P decentralized coordination to solve the limited field-of-view problem in autonomous VEX robots. (Georgia Tech CS6675)

AI Fairness in Housing Lending

Applied Disparate Impact and Statistical Parity Difference to 2.5M Fannie Mae mortgage records across Race and Gender; measured whether Reweighting bias mitigation survives classifier training.

Certifications

AWS Certified Cloud Practitioner

Amazon Web Services

AWS Cloud Institute — Cloud Application Developer

Amazon Web Services

Education

M.S. Computer Science — Machine Learning Specialization

Georgia Institute of Technology

AI and Machine Learning Bootcamp

TripleTen

M.A. Teaching — Science Education

University of Southern California