Feng

About Me

Hi, there!

I’m Jiufeng (pronounced as /jo phone/) from China, you can also call me FENG (/phone/)! Welcome to my personal website. :D

I have recently completed my master’s degree in Data & Computer Science at Heidelberg University and successfully finished my Data Science & AI internship and working student position at Mercedes-Benz Group on Recommendation Systems and AI Agents. I’m fascinated by data science, machine learning, and AI, especially the advancements in Generative AI and Large Language Models. I’m also diving into the startup world with a small AI-focused team, aiming to create products that are truly valuable to people.

Hi, there! I'm Jiufeng (pronounced as /jo phone/) from China, you can also call me FENG (/phone/)! Welcome to my personal website. :D

I have recently completed my master's degree in Data & Computer Science at Heidelberg University and successfully finished my Data Science & AI internship and working student position at Mercedes-Benz Group on Recommendation Systems and AI Agents. I'm fascinated by data science, machine learning, and AI, especially...

Interests & Hobbies

Reading Movies Music Weightlifting Swimming Cooking Travel ML/AI

Technical Skills

5

Blog Posts

3+

Years Experience

Languages & Frameworks

Python TypeScript Pydantic TensorFlow PyTorch FastAPI Streamlit

Tools & Libraries

Pandas NumPy Docker CI/CD

Databases

SQL NoSQL Vector DB Snowflake

Core AI Skills

Machine Learning Deep Learning NLP LLMs

Data + AI Frameworks

LangChain LangGraph RAG Hugging Face Databricks PySpark dbt Airflow

Models

GPT Claude Gemini DeepSeek

Cloud Platforms

AWS Azure GCP Heroku

MLOps

MLOps LLMOps MLflow SageMaker Azure ML Studio

Languages

English 中文 Deutsch

Work Experience

My professional journey in data science, ML, and AI

Mercedes-Benz Group

May 2024 - August 2025

Data Scientist & Machine Learning Engineer (Intern & Working Student)

Built AI-Driven MVPs: Built and shipped AI-driven MVPs, including an enterprise RAG pipeline for technical docs, a vehicle recommendation pipeline, a GPT-4o-based conversational assistant, and an agentic vehicle consulting system, using Python, LangChain, and OpenAI APIs to support millions of customers across 8 markets
ML Pipeline Optimization: Maintained, optimized, and deployed end-to-end AI/ML pipelines and recommendation models (retrieval and ranking models) on Azure (Azure Databricks), ensuring high availability, cost efficiency, and real-time insights
Large-scale Pipelines & Feature Engineering: Designed, developed, maintained, and monitored Databricks + Delta Lake ETL moving 2.4 TB/day of click-stream and user behaviour data; query latency ↓ 65% and freshness < 15 min; Generated 200+ user–vehicle features using PySpark; orchestrated with Airflow and Databricks Workflow, validated with dbt tests and Great Expectations
Multi-Agent Systems: Engineered multi-agent systems in Python with LangChain, LangGraph, Pinecone, and MCP tool-calling, focusing on tool orchestration, retrieval workflows, and guardrails to make autonomous decision workflows more reliable and debuggable
CI/CD & MLOps: Developed and maintained CI/CD pipelines (GitHub Actions) for Python services and ML/LLM workloads, automating tests, model validation, and deployments to the cloud (Azure), cutting release time by 30% and ensuring reproducible agent and API rollouts
Mentorship & Innovation: Mentored new Data Science Intern, fostering expertise in Recommendation System Modeling, LLM fine-tuning, Retrieval-Augmented Generation (RAG), and AI-powered automation, while driving continuous innovation in GenAI applications

Heidelberg University Data Science & NLP Group

April 2024 - August 2024

Data Science & AI Engineer Intern

LLM/RAG Extraction Pipeline: Built an LLM/RAG extraction pipeline over a 12 TB financial corpus in Python, achieving 94% field-level accuracy and automating ETL plus data-quality validation, cutting manual curation costs by 45%
AI Agent for Financial Data: Designed a production-oriented AI agent for quantitative financial data extraction, using LLM-as-a-judge validation to automatically score outputs and surface errors, improving efficiency by 35% and reducing operational costs by 15%
Schema-Guided Extraction: Implemented schema-guided extraction for semi-structured PDFs/Images (tables/OCR) with constrained JSON outputs in Python, including unit/currency normalization and date parsing; ingested JSON into Snowflake, flattened nested structures, and converted to a relational model, cutting manual post-editing time per document by 65%
Production API Deployment: Deployed the quantity-extraction AI agent as a production API on AWS Bedrock with monitoring and alerting, adding dynamic prompt adaptation and fallback strategies to maintain 99%+ uptime under varying workloads

Heidelberg Academy of Sciences

August 2023 - March 2024

Student Research Assistant (Data Science & Machine Learning)

Data Transformations: Migrated legacy XML to an event-driven JSON microservice; added QA gates and clear documentation, improving reliability and making handovers/onboarding smoother
Generative AI Applications: Deployed a LangChain + GPT-3.5 query assistant; standardized prompts and examples, cutting manual lookups and saving researcher query time by 40%
Frontend Development & UX Implementation: Developed and maintained the Heidelberg Academy website using TypeScript, implementing user-centered design principles and ensuring responsive, intuitive interfaces that improved user engagement and accessibility across devices
Collaborative Expertise: Worked closely with researchers, computer linguists, data scientists, and developer teams to validate and enhance AI tools and frameworks

BGI Genomics

November 2021 - April 2022

ML & Software Intern

ML Pipeline Development: Trained a YOLOv4 detector and built an ML pipeline using PyTorch on AWS Cloud services for nuclei and cell segmentation with class-balanced sampling and hard-negative mining, then converted detections into instance masks via morphological refinements and watershed segmentation
Experimentation & MLOps: Conducted experimentation with PyTorch and CUDA, tracked hyperparameters and metrics in MLflow, and versioned datasets and labels with DVC for full reproducibility
Auto-labeling Pipeline: Built an auto-labeling pipeline using Airflow, combining denoising, teacher–student pseudo-labels, reducing annotation time by 45% and improving dataset consistency
Training Optimization: Ran structured training cycles with ablations and HPO (learning-rate schedules, optimizer/weight decay), addressed class imbalance with focal loss/oversampling and hard-negative mining, improving detection performance and accuracy by ~15% and stabilized results across microscopes and staining conditions
Production Deployment: Deployed models as TorchScript/ONNX and exposed them via a FastAPI REST microservice in Docker on AWS, with CI/CD and readiness/liveness health checks to ensure reliable inference in production
Cross-Team Collaboration: Collaborated closely with machine learning engineers, developers, and product teams to validate and enhance model performance and advance the automation pipeline and AI platforms

Projects

Explore my latest fun projects

Emotion Net

An interactive web-based visualization platform for exploring multimodal emotion knowledge graphs.

View Website

Company Atlas

A unified firmographic data platform with thousands of companies from open-source datasets.

View Website

Latest Blog Posts

Explore my latest thoughts on AI, technology, and innovation

LLM

Gemini 3 Pro sets new AI benchmarks

Gemini 3 Pro: A New Era of Intelligence from Google DeepMind On November 18, 2025, Google CEO Sundar Pichai announced what may be the most significant advancement in AI this year: Gemini 3, described as "our …

Nov 18, 2025 9min

gemini 3 pro gemini

AI Research

DeepSeek-R1 reaches Nature with pure reinforcement learning

DeepSeek-R1 Makes Nature Cover: How Pure Reinforcement Learning Revolutionizes LLM Reasoning On September 21, 2025, a research paper from China's DeepSeek team landed on the cover of Nature magazine, marking …

Sep 21, 2025 6min

deepseek reinforcement-learning

LLM

ChatGPT vs Claude: different users, different uses

The Great AI Divide: ChatGPT vs Claude Usage Patterns Reveal Distinct User Preferences Introduction In an unprecedented move, both OpenAI and Anthropic have released usage studies of their AI assistants, …

Sep 17, 2025 6min

chatgpt claude

View All Blog Posts

Read More

The people who are crazy enough to think they can change the world are the ones who do.