Hello 👋 I'm Cat, a Computer Scientist passionate about machine learning, deep learning, and building intelligence.
Computer Scientist 🧑🏻💻 focused on machine learning, deep learning, and AI research. Passionate about reinforcement learning, mathematics, and building intelligent systems at scale.
In my free time, I love swimming, playing basketball, and exploring new places. Growing up in a coastal town, I’ve always been drawn to the sky and the ocean—and the pictures above remind me of home!
May 2025 - Aug 2025
Cohere
Designed and deployed scalable data pipelines using PySpark on GCP Dataproc to ingest, preprocess, and store millions of multi-domain documents daily for LLM training Developed and benchmarked a high-precision RAG framework, leveraging retrieval outputs to contextualize and curate long-context training datasets Trained and conducted ablation studies on LLMs across data and architecture configurations using distributed GPU clusters on CoreWeave with Kubernetes and Grafana
May 2024 - May 2025
University of Cincinnati
Improved Patch-based DDPM framework by incorporating conditional diffusion, achieving a 15% boost in SSIM for PET image reconstruction Developed two meta-learning algorithms for image classification: DG-SharpMAML and AS-MAML, incorporating gradient-matching and sharpness-aware minimization Demonstrated performance through ablation studies and theoretical analyses, beating state-of-the-art by 2% on accuracy benchmarks
Aug 2023 - Dec 2023
Kinetic Vision
Enhanced Nvidia's NeMo ASR into a speaker diarization pipeline, achieving word error rate below 5% on videos over 1 hour in length Built a Streamlit app for video transcription with OpenAI Whisper, integrating Longformer and BART for summarization Proposed robust training pipeline for YOLOv8 to detect pharmaceutical products using 21,500 synthetic images, achieving 98% mAP@50 score
Jan 2023 - May 2023
FPT Software
Optimized PaddleOCR for information extraction from 40,000+ multilingual invoices for SAP clients Automated data labeling and information extraction with Python, reducing processing time per batch by 16%
Jan 2022 - Dec 2022
Digital Scholarship Center
Standardized 31 datasets and integrated 106K documents via New York Times API to create 3 new datasets Streamlined dataset mapping with Python and JavaScript on AWS OpenSearch, reducing upload time by 12% Categorized 256 application essays using Linear SVC, Naive Bayes, and KNN, achieving 88% accuracy
If you want to connect with me, feel free to reach out on LinkedIn, or you can send me an email and I'll be sure to get back to you.