About Me

Hello 👋 I'm Cat, a Computer Scientist passionate about machine learning, deep learning, and building intelligence.

Short Bio

Computer Scientist 🧑🏻‍💻 focused on machine learning, deep learning, and AI research. Passionate about reinforcement learning, mathematics, and building intelligent systems at scale.

In my free time, I love swimming, playing basketball, and exploring new places. Growing up in a coastal town, I’ve always been drawn to the sky and the ocean—and the pictures above remind me of home!

Experience

May 2025 - Aug 2025

Member of Technical Staff

Cohere

Designed and deployed scalable data pipelines using PySpark on GCP Dataproc to ingest, preprocess, and store millions of multi-domain documents daily for LLM training Developed and benchmarked a high-precision RAG framework, leveraging retrieval outputs to contextualize and curate long-context training datasets Trained and conducted ablation studies on LLMs across data and architecture configurations using distributed GPU clusters on CoreWeave with Kubernetes and Grafana

May 2024 - May 2025

Undergraduate Researcher

University of Cincinnati

Improved Patch-based DDPM framework by incorporating conditional diffusion, achieving a 15% boost in SSIM for PET image reconstruction Developed two meta-learning algorithms for image classification: DG-SharpMAML and AS-MAML, incorporating gradient-matching and sharpness-aware minimization Demonstrated performance through ablation studies and theoretical analyses, beating state-of-the-art by 2% on accuracy benchmarks

Aug 2023 - Dec 2023

Machine Learning Intern

Kinetic Vision

Enhanced Nvidia's NeMo ASR into a speaker diarization pipeline, achieving word error rate below 5% on videos over 1 hour in length Built a Streamlit app for video transcription with OpenAI Whisper, integrating Longformer and BART for summarization Proposed robust training pipeline for YOLOv8 to detect pharmaceutical products using 21,500 synthetic images, achieving 98% mAP@50 score

Jan 2023 - May 2023

AI Engineer Intern

FPT Software

Optimized PaddleOCR for information extraction from 40,000+ multilingual invoices for SAP clients Automated data labeling and information extraction with Python, reducing processing time per batch by 16%

Jan 2022 - Dec 2022

Data Science Intern

Digital Scholarship Center

Standardized 31 datasets and integrated 106K documents via New York Times API to create 3 new datasets Streamlined dataset mapping with Python and JavaScript on AWS OpenSearch, reducing upload time by 12% Categorized 256 application essays using Linear SVC, Naive Bayes, and KNN, achieving 88% accuracy

Let's Connect

If you want to connect with me, feel free to reach out on LinkedIn, or you can send me an email and I'll be sure to get back to you.