kade.im
[DevOps] Operating Multiple k8s GPU Clusters with On-Premises in startup (Part 2 )

[DevOps] Operating Multiple k8s GPU Clusters with On-Premises in startup (Part 2 )

Tags
Ops
Infra
k8s
Wrote
2024.04
✏️
First wrote on 2024-04-05

Volume and Storage Setups

notion image
  • I’m using synology NAS and NFS storageclass to provide dynamic storage for each pvc requests
 
Advantages of using synology NAS in On-premise ( Not by nfs share with worker node’s disk)
  • Cost-Effectiveness : Offers a more budget-friendly option compared to enterprise-level storage solutions
  • Efficiency : Provides a good blend of read and write speed, enhancing overall system performance.
  • Reliability : RAID 10 Offers excellent data protection and fault tolerance
  • Centralized Storage Management : NAS provides a single point of management for all your storage need
  • Scalability : can add more storage capacity to your NAS without having to physically add more hard disks to each worker node
  • Flexibility : I can reorganize or repurpose storage without reconfiguring each node
  • Reduced Network Load on Worker Nodes
 
 

Network Stack

notion image
I’m using istio service mesh on kubernetes for two reasons
  1. Integrated TLS/SSL certificate management + Cert-Manager
  1. VirtualService
 
Integrated Management: Handling TLS/SSL within Kubernetes and Istio simplifies certificate management as it's integrated with the existing infrastructure. This eliminates the need for additional tools or processes on Host Node.
 
Automatic Certificate Rotation: K8s and Istio can automate the process of certificate issuance and rotation.
 
Fine-Grained Control: By managing TLS/SSL at the Istio level, you can enforce specific security policies and configurations at the service level, offering more control than traditional perimeter-based security models.
 
Reduced Complexity: Handling TLS/SSL internally reduces the complexity associated with configuring external load balancers or reverse proxies for TLS termination.
 
End-to-End Encryption: This setup allows for end-to-end encryption within the cluster, ensuring that traffic remains secure all the way to the destination service.
 
Compliance and Security: Internal management of TLS/SSL helps in meeting compliance requirements for data protection by ensuring encrypted communication between services.
 
If I handle TLS/SSL certificate by myself of host node, I have update my certificate with doing letsencrypt DNS challenge every 3 months. I know certbot can renew but, it does not work as well as cert-manager does on k8s.