Machine Learning Engineer · Data Scientist
Machine Learning Engineer and Data Scientist with hands-on experience building end-to-end ML pipelines, RAG systems, LangChain conversational AI, and serverless generative AI on AWS. Proficient in Python, scikit-learn, LangChain, Amazon Bedrock, FastAPI, and SQL. Pursuing an M.S. in Computer Science (Machine Learning) at Georgia Tech. Seeking ML Engineering and Data Science roles in generative AI and model deployment.
TechX Robotics · Tustin, CA
Association of Talent Development – Orange County (ATD-OC) · Anaheim, CA
Jeffrey Trail Middle School · Irvine, CA
Manhattan Beachwear · Cypress, CA
Built a stateful, context-aware staff scheduling chatbot using LangChain's ChatBedrock — implementing prompt templates, output parsers, conversation memory, and CSV document injection to power multi-turn AI conversations on Amazon Bedrock.
Engineered a retrieval-augmented generation (RAG) pipeline on Amazon Bedrock using Titan Embeddings for semantic search, OpenSearch Serverless as the vector store, and the RetrieveAndGenerate API to ground LLM responses in private enterprise documents.
Deployed a Random Forest regression model as a production REST API using FastAPI and Pydantic schema validation, with startup model loading via joblib and a /health endpoint for monitoring.
Built a binary classification model using feature engineering, SMOTE oversampling, and GridSearchCV across 108 hyperparameter combinations. Achieved F1 = 0.6197 on a 20% minority-class imbalanced dataset.
Benchmarked Apache Spark and Hadoop across EC2 instance types on Amazon EMR to identify the optimal distributed computing configuration for a real-world platform. Delivered cost/performance analysis via CloudWatch metrics.
Connected Amazon Bedrock to a frontend via Lambda and API Gateway to generate AI-powered flashcards from study notes — fully serverless, CORS-enabled, and prompt-engineered.
Analyzed 500 clients across 5 merged tables to compute per-user monthly revenue; confirmed via two-sample t-test that Surf generates significantly more revenue than Ultimate (avg $50.33 vs $47.31, p ≈ 0).
Random Forest classifier recommending the right Megaline mobile plan — 81.8% test accuracy, beating the target by 6.8 points.
SQL + hypothesis testing proving bad weather extends Loop-to-O'Hare rides by 20.6% (t = 6.84, p ≈ 0).
Mined 16,715 game records to build a data-backed 2017 ad strategy — hypothesis-tested platform and genre preferences across NA, EU, and Japan.
Tested whether highly-rated "Golden Age" TV shows also get the most IMDb votes — cleaned messy real-world data before confirming the hypothesis.
Cleaned and analyzed 4.5M order records to uncover peak shopping windows, reorder rhythms, and top items across 206K customers.
Proposed SpotFi localization, RGB-D 3D mapping, and P2P decentralized coordination to solve the limited field-of-view problem in autonomous VEX robots. (Georgia Tech CS6675)
Applied Disparate Impact and Statistical Parity Difference to 2.5M Fannie Mae mortgage records across Race and Gender; measured whether Reweighting bias mitigation survives classifier training.
Amazon Web Services
Amazon Web Services
Georgia Institute of Technology
TripleTen
University of Southern California