Skip to content

Operations Overview

This chapter covers the first operational checks for Nantian Gateway and points to the deeper metrics, Grafana, alerting, troubleshooting, and backup pages.

Nantian Gateway is mostly stateless. Kubernetes resources are the source of truth, the control plane rebuilds routing state from the Kubernetes API, and the data plane receives runtime snapshots over gRPC/xDS. Start operations work by checking Kubernetes status, route attachment, control plane logs, data plane logs, metrics, and admin endpoints.

Run these checks before debugging individual routes:

Terminal window
kubectl get pods -n nantian-gw
kubectl get gatewayclass nantian-gw
kubectl get svc -n nantian-gw
kubectl logs -n nantian-gw deploy/nantian-gw-controlplane --tail=100
kubectl logs -n nantian-gw deploy/nantian-gw-dataplane --tail=100

Interpret the result in this order:

CheckWhat To Look ForMeaning
PodsControl plane and data plane pods are Running and ready.Workloads are scheduled, probes are passing, and the data plane has completed startup.
GatewayClassnantian-gw exists and uses controller gateway.networking.k8s.io/nantian-gw.The chart installed the class that application Gateway objects should reference.
ServicesFixed service names and ports match the table below.Other components and operators can use stable in-cluster addresses.
Control plane logsReconciliation, status, snapshot, or xDS messages.The control plane is watching resources and publishing state.
Data plane logsxDS connection and configuration-apply messages.The data plane is connected and has received runtime configuration.

The fixed services created by the Helm chart are:

ServicePortPurpose
nantian-gw-controlplane-grpc18080Data plane xDS/gRPC connection.
nantian-gw-controlplane-admin18081Control plane admin API.
nantian-gw-controlplane-metrics18082Control plane Prometheus metrics.
nantian-gw-dataplane-admin19080Data plane admin API.
nantian-gw-dataplane-metrics19080Data plane metrics scrape entry.
nantian-gw-dashboard3000Dashboard web UI.

The data plane runtime HTTP listener is configured as 0.0.0.0:10080. Port-forward the deployment for local route tests:

Terminal window
kubectl port-forward -n nantian-gw deploy/nantian-gw-dataplane 10080:10080

Use Deployment targets for quick inspection:

Terminal window
kubectl logs -n nantian-gw deploy/nantian-gw-controlplane --tail=100
kubectl logs -n nantian-gw deploy/nantian-gw-dataplane --tail=100

For a specific pod:

Terminal window
kubectl get pods -n nantian-gw
kubectl logs -n nantian-gw pod/<pod-name> --tail=200

Set debug logging only for short investigations. Debug output can be large and should not remain enabled in production.

Forward control plane admin and metrics services when inspecting locally:

Terminal window
kubectl port-forward -n nantian-gw svc/nantian-gw-controlplane-admin 18081:18081
kubectl port-forward -n nantian-gw svc/nantian-gw-controlplane-metrics 18082:18082

Then query:

Terminal window
curl -s http://localhost:18081/livez
curl -s http://localhost:18081/readyz
curl -s http://localhost:18082/metrics | head

Forward the data plane admin service when you need data plane runtime details:

Terminal window
kubectl port-forward -n nantian-gw svc/nantian-gw-dataplane-admin 19080:19080
curl -s http://localhost:19080/livez
curl -s http://localhost:19080/readyz

Prometheus Operator ServiceMonitor resources are disabled by default in the Helm chart. Enable them only when the Prometheus Operator CRDs are installed and NetworkPolicies allow the Prometheus namespace to scrape the metrics services.

When a route does not work, inspect status before testing traffic:

Terminal window
kubectl get gateway,httproute -A
kubectl describe gateway <gateway-name> -n <namespace>
kubectl describe httproute <route-name> -n <namespace>

Look for accepted parents, listener matches, backend reference resolution, and status conditions. A route that is not attached will not receive traffic even if the data plane is healthy.

Restart stateless components with rolling updates:

Terminal window
kubectl rollout restart deployment/nantian-gw-controlplane -n nantian-gw
kubectl rollout restart deployment/nantian-gw-dataplane -n nantian-gw

Scale the data plane when traffic or resource pressure requires it:

Terminal window
kubectl scale deployment/nantian-gw-dataplane -n nantian-gw --replicas=4

Watch rollout status:

Terminal window
kubectl rollout status deployment/nantian-gw-controlplane -n nantian-gw
kubectl rollout status deployment/nantian-gw-dataplane -n nantian-gw
PageCovers
Metrics ReferencePrometheus metrics emitted by the control plane and data plane.
Grafana DashboardHow to import and use the bundled dashboard assets.
Alerting RulesRecommended Prometheus alert rules.
TroubleshootingCommon symptoms and diagnostic flows.
Backup & RecoveryWhat to back up and how to restore gateway configuration.