About Me
Hi, there!
I’m Jiufeng (pronounced as /jo phone/) from China, you can also call me FENG (/phone/)! Welcome to my personal website. :D
I have recently completed my master’s degree in Data & Computer Science at Heidelberg University and successfully finished my Data Science & AI internship and working student position at Mercedes-Benz Group on Recommendation Systems and AI Agents. I’m fascinated by data science, machine learning, and AI, especially the advancements in Generative AI and Large Language Models. I’m also diving into the startup world with a small AI-focused team, aiming to create products that are truly valuable to people.
Read more →I have recently completed my master's degree in Data & Computer Science at Heidelberg University and successfully finished my Data Science & AI internship and working student position at Mercedes-Benz Group on Recommendation Systems and AI Agents. I'm fascinated by data science, machine learning, and AI, especially...
Interests & Hobbies
Technical Skills
3+
Years ExperienceLanguages & Frameworks
Tools & Libraries
Databases
Core AI Skills
Data + AI Frameworks
Models
Cloud Platforms
MLOps
Languages
Work Experience
My professional journey in data science, ML, and AI
Mercedes-Benz Group
May 2024 - August 2025Data Scientist & Machine Learning Engineer (Intern & Working Student)
- Built AI-Driven MVPs: Built and shipped AI-driven MVPs, including an enterprise RAG pipeline for technical docs, a vehicle recommendation pipeline, a GPT-4o-based conversational assistant, and an agentic vehicle consulting system, using Python, LangChain, and OpenAI APIs to support millions of customers across 8 markets
- ML Pipeline Optimization: Maintained, optimized, and deployed end-to-end AI/ML pipelines and recommendation models (retrieval and ranking models) on Azure (Azure Databricks), ensuring high availability, cost efficiency, and real-time insights
- Large-scale Pipelines & Feature Engineering: Designed, developed, maintained, and monitored Databricks + Delta Lake ETL moving 2.4 TB/day of click-stream and user behaviour data; query latency ↓ 65% and freshness < 15 min; Generated 200+ user–vehicle features using PySpark; orchestrated with Airflow and Databricks Workflow, validated with dbt tests and Great Expectations
- Multi-Agent Systems: Engineered multi-agent systems in Python with LangChain, LangGraph, Pinecone, and MCP tool-calling, focusing on tool orchestration, retrieval workflows, and guardrails to make autonomous decision workflows more reliable and debuggable
- CI/CD & MLOps: Developed and maintained CI/CD pipelines (GitHub Actions) for Python services and ML/LLM workloads, automating tests, model validation, and deployments to the cloud (Azure), cutting release time by 30% and ensuring reproducible agent and API rollouts
- Mentorship & Innovation: Mentored new Data Science Intern, fostering expertise in Recommendation System Modeling, LLM fine-tuning, Retrieval-Augmented Generation (RAG), and AI-powered automation, while driving continuous innovation in GenAI applications
Heidelberg University Data Science & NLP Group
April 2024 - August 2024Data Science & AI Engineer Intern
- LLM/RAG Extraction Pipeline: Built an LLM/RAG extraction pipeline over a 12 TB financial corpus in Python, achieving 94% field-level accuracy and automating ETL plus data-quality validation, cutting manual curation costs by 45%
- AI Agent for Financial Data: Designed a production-oriented AI agent for quantitative financial data extraction, using LLM-as-a-judge validation to automatically score outputs and surface errors, improving efficiency by 35% and reducing operational costs by 15%
- Schema-Guided Extraction: Implemented schema-guided extraction for semi-structured PDFs/Images (tables/OCR) with constrained JSON outputs in Python, including unit/currency normalization and date parsing; ingested JSON into Snowflake, flattened nested structures, and converted to a relational model, cutting manual post-editing time per document by 65%
- Production API Deployment: Deployed the quantity-extraction AI agent as a production API on AWS Bedrock with monitoring and alerting, adding dynamic prompt adaptation and fallback strategies to maintain 99%+ uptime under varying workloads
Heidelberg Academy of Sciences
August 2023 - March 2024Student Research Assistant (Data Science & Machine Learning)
- Data Transformations: Migrated legacy XML to an event-driven JSON microservice; added QA gates and clear documentation, improving reliability and making handovers/onboarding smoother
- Generative AI Applications: Deployed a LangChain + GPT-3.5 query assistant; standardized prompts and examples, cutting manual lookups and saving researcher query time by 40%
- Frontend Development & UX Implementation: Developed and maintained the Heidelberg Academy website using TypeScript, implementing user-centered design principles and ensuring responsive, intuitive interfaces that improved user engagement and accessibility across devices
- Collaborative Expertise: Worked closely with researchers, computer linguists, data scientists, and developer teams to validate and enhance AI tools and frameworks
BGI Genomics
November 2021 - April 2022ML & Software Intern
- ML Pipeline Development: Trained a YOLOv4 detector and built an ML pipeline using PyTorch on AWS Cloud services for nuclei and cell segmentation with class-balanced sampling and hard-negative mining, then converted detections into instance masks via morphological refinements and watershed segmentation
- Experimentation & MLOps: Conducted experimentation with PyTorch and CUDA, tracked hyperparameters and metrics in MLflow, and versioned datasets and labels with DVC for full reproducibility
- Auto-labeling Pipeline: Built an auto-labeling pipeline using Airflow, combining denoising, teacher–student pseudo-labels, reducing annotation time by 45% and improving dataset consistency
- Training Optimization: Ran structured training cycles with ablations and HPO (learning-rate schedules, optimizer/weight decay), addressed class imbalance with focal loss/oversampling and hard-negative mining, improving detection performance and accuracy by ~15% and stabilized results across microscopes and staining conditions
- Production Deployment: Deployed models as TorchScript/ONNX and exposed them via a FastAPI REST microservice in Docker on AWS, with CI/CD and readiness/liveness health checks to ensure reliable inference in production
- Cross-Team Collaboration: Collaborated closely with machine learning engineers, developers, and product teams to validate and enhance model performance and advance the automation pipeline and AI platforms
Projects
Explore my latest fun projects
Latest Blog Posts
Explore my latest thoughts on AI, technology, and innovation