KANS3 - Cilium CNI - 03/03

L2 Announcements / L2 Aware LB (Beta)

  • L2 Announcements는 로컬 영역 네트워크에서 서비스를 표시하고 도달 가능하게 만드는 기능입니다. 이 기능은 주로 사무실 또는 캠퍼스 네트워크와 같이 BGP 기반 라우팅이 없는 네트워크 내에서 온프레미스 배포를 위해 고안되었습니다.
  • 이 기능을 사용하면 ExternalIP 및/또는 LoadBalancer IP에 대한 ARP 쿼리에 응답합니다. 이러한 IP는 여러 노드의 가상 IP(네트워크 장치에 설치되지 않음)이므로 각 서비스에 대해 한 번에 한 노드가 ARP 쿼리에 응답하고 MAC 주소로 응답합니다. 이 노드는 서비스 로드 밸런싱 기능으로 로드 밸런싱을 수행하여 북쪽/남쪽 로드 밸런서 역할을 합니다.
  • NodePort 서비스에 비해 이 기능의 장점은 각 서비스가 고유한 IP를 사용할 수 있으므로 여러 서비스가 동일한 포트 번호를 사용할 수 있다는 것입니다. NodePort를 사용할 때 트래픽을 보낼 호스트를 결정하는 것은 클라이언트에게 달려 있으며 노드가 다운되면 IP+Port 콤보를 사용할 수 없게 됩니다. L2 공지를 사용하면 서비스 VIP가 다른 노드로 간단히 마이그레이션되고 계속 작동합니다.

실습

# 설정
helm upgrade cilium cilium/cilium --namespace kube-system --reuse-values \
--set l2announcements.enabled=true --set externalIPs.enabled=true \
--set l2announcements.leaseDuration=3s --set l2announcements.leaseRenewDeadline=1s --set l2announcements.leaseRetryPeriod=200ms
 
# 확인
c0 config --all  |grep L2
---
EnableL2Announcements             : true
EnableL2NeighDiscovery            : true
L2AnnouncerLeaseDuration          : 3000000000
L2AnnouncerRenewDeadline          : 1000000000
L2AnnouncerRetryPeriod            : 200000000

# CiliumL2AnnouncementPolicy 생성
cat <<EOF | kubectl apply -f - 
apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
  name: policy1
spec:
  serviceSelector:
    matchLabels:
      color: blue
  nodeSelector:
    matchExpressions:
      - key: node-role.kubernetes.io/control-plane
        operator: DoesNotExist
  interfaces:
  - ^ens[0-9]+
  externalIPs: true
  loadBalancerIPs: true
EOF

# 확인
kubectl get ciliuml2announcementpolicy
---
NAME      AGE
policy1   4s

kc describe l2announcement
---
Name:         policy1
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  cilium.io/v2alpha1
Kind:         CiliumL2AnnouncementPolicy
Metadata:
  Creation Timestamp:  2024-10-26T15:37:41Z
  Generation:          1
  Resource Version:    8607
  UID:                 1f042825-4cb6-4324-9f91-aa69e5a425e4
Spec:
  External I Ps:  true
  Interfaces:
    ^ens[0-9]+
  Load Balancer I Ps:  true
  Node Selector:
    Match Expressions:
      Key:       node-role.kubernetes.io/control-plane
      Operator:  DoesNotExist
  Service Selector:
    Match Labels:
      Color:  blue
Events:       <none>

# IP 풀 추가
cat <<EOF | kubectl apply -f -
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
  name: "cilium-pool"
spec:
  allowFirstLastIPs: "No"
  blocks:
  - cidr: "10.10.200.0/29"
EOF

# 확인
kubectl get CiliumLoadBalancerIPPool
---
NAME          DISABLED   CONFLICTING   IPS AVAILABLE   AGE
cilium-pool   false      False         6               9s

# 테스트용 리소스 생성
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: webpod1
  labels:
    app: webpod
spec:
  nodeName: k8s-w1
  containers:
  - name: container
    image: traefik/whoami
  terminationGracePeriodSeconds: 0
---
apiVersion: v1
kind: Pod
metadata:
  name: webpod2
  labels:
    app: webpod
spec:
  nodeName: k8s-w2
  containers:
  - name: container
    image: traefik/whoami
  terminationGracePeriodSeconds: 0
---
apiVersion: v1
kind: Service
metadata:
  name: svc1
spec:
  ports:
    - name: svc1-webport
      port: 80
      targetPort: 80
  selector:
    app: webpod
  type: LoadBalancer  # 서비스 타입이 LoadBalancer
---
apiVersion: v1
kind: Service
metadata:
  name: svc2
spec:
  ports:
    - name: svc2-webport
      port: 80
      targetPort: 80
  selector:
    app: webpod
  type: LoadBalancer
---
apiVersion: v1
kind: Service
metadata:
  name: svc3
spec:
  ports:
    - name: svc3-webport
      port: 80
      targetPort: 80
  selector:
    app: webpod
  type: LoadBalancer
EOF

# 배포 확인
kubectl get svc,ep
---
NAME                 TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
service/kubernetes   ClusterIP      10.10.0.1       <none>        443/TCP        58m
service/svc1         LoadBalancer   10.10.65.127    10.10.200.1   80:31443/TCP   23s
service/svc2         LoadBalancer   10.10.179.33    10.10.200.2   80:30125/TCP   23s
service/svc3         LoadBalancer   10.10.197.106   10.10.200.3   80:31986/TCP   23s

NAME                   ENDPOINTS                        AGE
endpoints/kubernetes   192.168.10.10:6443               58m
endpoints/svc1         172.16.1.93:80,172.16.2.115:80   23s
endpoints/svc2         172.16.1.93:80,172.16.2.115:80   23s
endpoints/svc3         172.16.1.93:80,172.16.2.115:80   23s

# 테스트
curl -s 10.10.200.1
---
Hostname: webpod1
IP: 127.0.0.1
IP: ::1
IP: 172.16.1.93
IP: fe80::c2d:7bff:fe9a:4e3
RemoteAddr: 192.168.10.10:46986
GET / HTTP/1.1
Host: 10.10.200.1
User-Agent: curl/7.81.0
Accept: */*

Cilium Egress Gateway

  • 특정 파드들을 특정 Egress GW(특정 IP)를 고정으로 외부(혹은 특정 대역)와 통신을 설정할 수 있다
  • Egress IP Gateway 를 활성화하고 Egress SNAT 정책 설정

실습을 진행했으나, 정상동작하지 않아 추후 재작업 예정

실습

# 설정
helm upgrade cilium cilium/cilium --namespace kube-system --reuse-values --set egressGateway.enabled=true --set kubeProxyReplacement=true

kubectl rollout restart ds cilium -n kube-system
kubectl rollout restart deploy cilium-operator -n kube-system

# 배포
kubectl create -f https://raw.githubusercontent.com/cilium/cilium/1.11.2/examples/kubernetes-dns/dns-sw-app.yaml

# 배포 확인
k get pod,svc -o wide
---
NAME           READY   STATUS    RESTARTS   AGE     IP             NODE     NOMINATED NODE   READINESS GATES
pod/mediabot   1/1     Running   0          2m35s   172.16.2.155   k8s-w2   <none>           <none>

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE   SELECTOR
service/kubernetes   ClusterIP   10.10.0.1    <none>        443/TCP   75m   <none>

# 호출 확인
kubectl exec mediabot -- curl -s ipinfo.io
---
{
  "ip": "13.125.***.***",
  "hostname": "ec2-13-125-***-***.ap-northeast-2.compute.amazonaws.com",
  "city": "Incheon",
  "region": "Incheon",
  "country": "KR",
  "loc": "37.4565,126.7052",
  "org": "AS16509 Amazon.com, Inc.",
  "postal": "21505",
  "timezone": "Asia/Seoul",
  "readme": "https://ipinfo.io/missingauth"
}
cat <<EOF | kubectl create -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: "egress-ip-assign"
  labels:
    name: "egress-ip-assign"
spec:
  replicas: 1
  selector:
    matchLabels:
      name: "egress-ip-assign"
  template:
    metadata:
      labels:
        name: "egress-ip-assign"
    spec:
      affinity:
        # the following pod affinity ensures that the "egress-ip-assign" pod
        # runs on the same node as the mediabot pod
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: class
                    operator: In
                    values:
                      - mediabot
                  - key: org
                    operator: In
                    values:
                      - empire
              topologyKey: "kubernetes.io/hostname"
      hostNetwork: true
      containers:
      - name: egress-ip
        image: docker.io/library/busybox:1.31.1
        command: ["/bin/sh","-c"]
        securityContext:
          privileged: true
        env:
        - name: EGRESS_IPS
          value: "192.168.10.240/24 192.168.10.241/24"
        args:
        - "for i in \$EGRESS_IPS; do ip address add \$i dev ens5; done; sleep 10000000"
        lifecycle:
          preStop:
            exec:
              command:
              - "/bin/sh"
              - "-c"
              - "for i in \$EGRESS_IPS; do ip address del \$i dev ens5; done"
EOF

Every 2.0s: ip -4 addr show ens5                                                        k8s-w2: Sun Oct 27 01:01:18 2024

2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    altname enp0s5
    inet 192.168.10.102/24 metric 100 brd 192.168.10.255 scope global dynamic ens5
       valid_lft 2325sec preferred_lft 2325sec
    inet 192.168.10.240/24 scope global secondary ens5
       valid_lft forever preferred_lft forever
    inet 192.168.10.241/24 scope global secondary ens5
       valid_lft forever preferred_lft forever
cat <<EOF | kubectl apply -f -
apiVersion: cilium.io/v2
kind: CiliumEgressGatewayPolicy
metadata:
  name: egress-sample
spec:
  selectors:
  - podSelector:
      matchLabels:
        org: empire
        class: mediabot
        io.kubernetes.pod.namespace: default
  destinationCIDRs:
  - "192.168.20.100/32"
  egressGateway:
      nodeSelector:
        matchLabels:  
          kubernetes.io/hostname: k8s-w2
      egressIP: "192.168.10.240"
EOF

cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: net-pod
  labels:
    org: empire
    class: mediabot
spec:
  nodeName: k8s-w1
  containers:
  - name: netshoot-pod
    image: nicolaka/netshoot
    command: ["tail"]
    args: ["-f", "/dev/null"]
  terminationGracePeriodSeconds: 0
EOF


kubectl delete ciliumegressgatewaypolicy egress-sample


cat <<EOF | kubectl apply -f -
apiVersion: cilium.io/v2
kind: CiliumEgressGatewayPolicy
metadata:
  name: egress-sample
spec:
  selectors:
  - podSelector:
      matchLabels:
        org: empire
        class: mediabot
        io.kubernetes.pod.namespace: default
  destinationCIDRs:
  - "0.0.0.0/0"
  egressGateway:
      nodeSelector:
        matchLabels:  
          kubernetes.io/hostname: k8s-w2
      egressIP: "192.168.10.241"
EOF

LoadBalancer IP Address Management (LB IPAM)

  • Cilium v1.10 부터는 별도의 추가 구성요소 없이, 서비스(Service)의 IP 를 외부에 BGP 로 광고할 수 있습니다 ← MetalLBCilium 파드에 포함됨
    • Now, services are able to be reached externally from traffic outside of the cluster, without any additional components.
  • Cilium v1.11 부터는 파드의 CIDREgress IP Gateway 를 통해 BGP 로 광고가 가능합니다. - 링크
    • BGP Pod CIDR Announcement: Advertise PodCIDR IP routes to your network using BGP.

테스트 환경에서 정상작동하지 않아 추후 재작성예정

IPSec, WireGuard

cilium config view | grep l7
---
enable-l7-proxy                                   true

#설정
helm upgrade cilium cilium/cilium --namespace kube-system --reuse-values --reuse-values --set l7Proxy=false --set encryption.enabled=true --set encryption.type=wireguard

# 설정 확인
cilium config view | egrep 'l7|wireguard'
---
enable-l7-proxy                                   false
enable-wireguard                                  true
wireguard-persistent-keepalive                    0s

c0 status --verbose | grep Encryption:
---
Encryption:       Wireguard   [NodeEncryption: Disabled, cilium_wg0 (Pubkey: vzWePSrGmmKpZl9V0pF3KDwMHns5lTw6TjiV1BCucAM=, Port: 51871, Peers: 1)]

kubectl get cn -o yaml | grep annotations -A1
---
    annotations:
      network.cilium.io/wg-pub-key: vzWePSrGmmKpZl9V0pF3KDwMHns5lTw6TjiV1BCucAM=
--
    annotations:
      network.cilium.io/wg-pub-key: clU4IL5aPG86LxitH3d9hPh86xC+gXK3z0Iz/2pReWY=
--
    annotations:
      network.cilium.io/wg-pub-key: 05+bM1oUXdNfvVUPyDo14tCSmlbwZ3iPXRAtcNWBBjE=


wg show all endpoints
---
cilium_wg0	05+bM1oUXdNfvVUPyDo14tCSmlbwZ3iPXRAtcNWBBjE=	192.168.10.102:51871
clU4IL5aPG86LxitH3d9hPh86xC+gXK3z0Iz/2pReWY=	192.168.10.101:51871

wg show all transfer
---
cilium_wg0	05+bM1oUXdNfvVUPyDo14tCSmlbwZ3iPXRAtcNWBBjE=	0	0
cilium_wg0	clU4IL5aPG86LxitH3d9hPh86xC+gXK3z0Iz/2pReWY=	0	0

ip -d -c addr show cilium_wg0
---
30: cilium_wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 8921 qdisc noqueue state UNKNOWN group default
    link/none  promiscuity 0 minmtu 0 maxmtu 2147483552
    wireguard numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

# 샘플 pod 배포
cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: netpod1
  labels:
    app: netpod
spec:
  nodeName: k8s-w1
  containers:
  - name: netshoot-pod
    image: nicolaka/netshoot
    command: ["tail"]
    args: ["-f", "/dev/null"]
  terminationGracePeriodSeconds: 0
---
apiVersion: v1
kind: Pod
metadata:
  name: netpod2
  labels:
    app: netpod
spec:
  nodeName: k8s-w2
  containers:
  - name: netshoot-pod
    image: nicolaka/netshoot
    command: ["tail"]
    args: ["-f", "/dev/null"]
  terminationGracePeriodSeconds: 0
EOF

# 변수지정
POD1IP=$(kubectl get pods netpod1 -o jsonpath='{.status.podIP}')
POD2IP=$(kubectl get pods netpod2 -o jsonpath='{.status.podIP}')

#테스트
kubectl exec -it netpod1 -- ping -c 1 $POD2IP
kubectl exec -it netpod2 -- ping -c 1 $POD1IP

# 검증
hubble observe --pod netpod2  --protocol icmp
---
Oct 26 16:44:02.825: default/netpod1 (ID:54400) -> default/netpod2 (ID:54400) to-network FORWARDED (ICMPv4 EchoRequest)
Oct 26 16:44:02.825: default/netpod1 (ID:54400) -> default/netpod2 (ID:54400) to-endpoint FORWARDED (ICMPv4 EchoRequest)
Oct 26 16:44:02.825: default/netpod1 (ID:54400) <- default/netpod2 (ID:54400) to-network FORWARDED (ICMPv4 EchoReply)
Oct 26 16:44:02.826: default/netpod1 (ID:54400) <- default/netpod2 (ID:54400) to-endpoint FORWARDED (ICMPv4 EchoReply)
Oct 26 16:44:05.765: default/netpod2 (ID:54400) -> default/netpod1 (ID:54400) to-network FORWARDED (ICMPv4 EchoRequest)
Oct 26 16:44:05.765: default/netpod2 (ID:54400) -> default/netpod1 (ID:54400) to-endpoint FORWARDED (ICMPv4 EchoRequest)
Oct 26 16:44:05.765: default/netpod2 (ID:54400) <- default/netpod1 (ID:54400) to-network FORWARDED (ICMPv4 EchoReply)
Oct 26 16:44:05.766: default/netpod2 (ID:54400) <- default/netpod1 (ID:54400) to-endpoint FORWARDED (ICMPv4 EchoReply)

hubble observe --pod netpod1
---
Oct 26 16:44:02.825: default/netpod1 (ID:54400) -> default/netpod2 (ID:54400) to-endpoint FORWARDED (ICMPv4 EchoRequest)
Oct 26 16:44:02.825: default/netpod1 (ID:54400) <- default/netpod2 (ID:54400) to-network FORWARDED (ICMPv4 EchoReply)
Oct 26 16:44:05.765: default/netpod2 (ID:54400) -> default/netpod1 (ID:54400) to-network FORWARDED (ICMPv4 EchoRequest)
Oct 26 16:44:05.766: default/netpod2 (ID:54400) <- default/netpod1 (ID:54400) to-endpoint FORWARDED (ICMPv4 EchoReply)

후기

기다리던 Cilium에 대한 회차였습니다. Cilium은 eBPF를 알게되면서 인지한 CNI였는데, 어느새 여러 기능이 붙으면서 Service Mesh 등 다양한 기능을 제공한다는 것을 알게 되었습니다.

물론, 모든게 잘 된다고 모든 환경과 정책에 부합하다는 것은 아니다보니 최근 개발용 GKE에 Dataplane v2(cilium)을 테스트하는 입장에서 잘 검토하고 다양한 기능을 잘 사용하도록 해봐야겠다는 생각이 들었습니다.