kade.im
[DevOps] Operating Multiple k8s GPU Clusters with On-Premises in startup (Part1)

[DevOps] Operating Multiple k8s GPU Clusters with On-Premises in startup (Part1)

Tags
Ops
Infra
k8s
Wrote
2024.03
I have many experiences in operating and hosting service with multiple K8s clusters in on-premise (baremetal machines) I’m going to explain each stack in detail

Main Architecture

notion image

Domain Resolving and Setup?

  • My company bought domain from “whois.co.kr”
  • I setup nameservers as ‘Google Cloud DNS’ at whois.co.kr
  • Google Cloud DNS send traffic to office’s static IP
  • I always use LetsEncrypt to provide https domain, my certificates are automatically updated in my k8s cluster (by using certificate & gcloud credential)
 

Raspberry PI?

  • We setup raspberry pi to provide port-forwarding by using nginx
  • When main raspberry pi is dead, traffic goes to the other one
  • There is nginx forward settings like below
    • map $host $ip { ~[^\.]prod.kade.com 192.xx.xx.xx; ~[^\.]dev.kade.com 192.xx.xx.xx; } server { listen 30000-32767; location / { proxy_set_header X-Real-IP $remote_addr; proxy_set_header Host $host; proxy_pass http://$ip:$server_port$request_uri; } }
       

Https (TLS/SSL)

  • Nignx forward traffics to each k8s istio-ingressgateway
  • After traffic arrived at “istio-ingressgateway”, it handles TLS/SSL network
 
 

Required Skills in above architecture

  • Network : Domain nameserver, DNS, proxy, ingress, port-forwarding, vpn, DHCP, gateway, static IP… etc
  • Network Devices : Switch, Router,