

- API Pods and logs from kubernetes

Project Overview
Implemented a big data analysis platform (Kubeflow) for statistical and AI services within a closed network. I attended this Project and operating this project more than 3+ years.
- Developed a course recommendation model for current students by analyzing data from graduates and enrolled students.
- Built a predictive model to analyze the probability of student attrition (e.g., withdrawal, expulsion) for currently enrolled students.
- Analyzed system usage logs to calculate the most frequently used programs in real-time, daily, and weekly intervals.
- Evaluated course registration logs to compute competition rates for popular courses during enrollment periods.
- Created a model to provide popular book lists by department and grade using library data.
All models were implemented using linear regression techniques (e.g., Scikit-learn) instead of deep learning.
Key Responsibilities
- Built CI/CD environments for model development and deployment in clsoed network and isolated clusters, Using DVC, Minio, Gitlab Community Edition, Harbor Registry, Kubeflow.
- Developed statistical models using Pandas and NumPy.
- Created APIs for AI and statistical models using Python Flask.
- API overload test using locust, Split requests round-robin way by multiple deployment
Achievements
- Automated the model service redeployment process, reducing time from 1 hour to 10 minutes.
- Enhanced project stability and security by segregating dev, stage, and production environments.
Details
Model
- Extracted data in CSV and pkl format using Kubeflow Notebooks and served it through an API.
- Limited use of Kubeflow features, relying primarily on Notebooks.
Kubernetes (Version 1.15–1.16)
- Consisted of 3 master nodes and 2 worker nodes.
- Utilized Rook-Ceph as the storage class.
- Managed API services with separate environments for
loc
,dev
, andprod
, each using distinct StatefulSets, Services, and VirtualServices.
Kubeflow (Version 1.0 or 1.1)
- Only the Notebook functionality was actively used.
- Workflow:
- Model development in Notebooks → Push to GitLab → Build and deploy.
- Queried the database directly from Notebooks to extract data.
GitLab
- Deployed GitLab on Kubernetes for internal network use only.
- Implemented CI/CD pipelines.
- Automated workflow: Updating model data and pushing code changes triggers build and deployment.
MinIO
- Adopted for DVC (Data Version Control), enabling storage and retrieval of DVC data via the internal MinIO setup.
Harbor
- Operated a private Harbor registry for internal use.
PyPI Uploads
- Managed an internal PyPI server for package uploads, as external network access was restricted.
I update model very easily (each semester)
- Extract data from database (python)
- Run notebooks to train models
- Test models by scripts
- dvc add models
- dvc push
- git add .
- git commit
- git push
- Gitlab CI/CD update models with newest data
