.png?table=block&id=16671659-3274-807f-9fe7-e4377a4f1237&cache=v2)
.png?table=block&id=16671659-3274-8079-94aa-d5020271880d&cache=v2)
Project Overview
Pioneered the development of a prototype monitoring web app for NPU cards tailored for AI inference tasks. This platform is the first of its kind in Korea to integrate multiple NPU chip brands, setting a new standard in the AI-Chip market.
Key Responsibilities
- Project Management: Led the entire project lifecycle, ensuring seamless communication and issue resolution with Vietnamese designers and developers.
- Dashboard and Architecture Design: Crafted intuitive dashboard screens and architected data structures to manage users, clusters, servers, NPUs, storage, and inference endpoints.
- API Development: Created NPU inference endpoint APIs using Kubernetes and Istio to streamline AI inference workflows.
- Object Storage Integration: Implemented features for user-specific Object Storage (MinIO) creation and deletion to manage inference data effectively.
- Time-Series Data Collection: Deployed InfluxDB for capturing and analyzing NPU utilization metrics over time.
- User Interface Development: Built a Streamlit-based UI for NPU inference tasks, enabling video uploads for object detection and returning detailed inference results.
Achievements
- Market Innovation: Developed the first-ever Prometheus-compatible exporter, CLI tool, and unified monitoring dashboard in Korea’s emerging NPU and AI-Chip market.
- Efficiency Gains: Reduced the project timeline by one-third through optimized collaboration with the Vietnamese branch.
- Cloud Inference Demo Service: Delivered a Kubernetes cluster-based NPU inference demo platform, showcasing real-time AI capabilities to stakeholders.
Key Learnings and Insights
This project provided deep technical exposure to NPU monitoring, AI inference infrastructure, and multi-vendor AI chip integration, reinforcing my expertise in observability, system architecture, and cloud-based inference services.
- Building a Unified Monitoring Platform for Various AI Chips
- Designed a multi-brand NPU monitoring system, gaining a comprehensive understanding of how different AI chips handle and export performance data.
- Learned vendor-specific logging, telemetry, and data extraction mechanisms, allowing seamless integration into a unified dashboard.
- Understanding Prometheus and Time-Series Data Collection
- Developed custom Prometheus exporters for various NPU chips, ensuring real-time performance tracking.
- Gained a deeper understanding of Prometheus’s data collection architecture and applied similar principles to build custom monitoring tools.
- Integrated InfluxDB for historical data analysis, enabling time-series insights into NPU utilization and inference workloads.
- Enhancing Expertise in Kubernetes and Cloud-Based AI Inference
- Designed Kubernetes-based inference APIs, improving AI workload efficiency by leveraging Istio and service mesh concepts.
- Implemented user-specific MinIO storage solutions, allowing efficient management of inference results and AI model artifacts.
- Built a Streamlit-based UI, enhancing user interaction by enabling real-time video-based AI inference testing.
- Scaling AI Infrastructure and Tooling Capabilities
- Gained proficiency in developing exporters, monitoring tools, and cloud-based inference services.
- Strengthened DevOps and MLOps skills, ensuring scalable and high-performance AI infrastructure for real-world applications.
- Successfully managed cross-functional teams, collaborating efficiently with Vietnamese developers and designers to reduce project timelines by one-third.
This project significantly expanded my expertise in AI infrastructure monitoring, multi-chip compatibility, and cloud-based AI inference, setting a strong foundation for scalable and efficient AI system management.