Operations Overview
This chapter covers the first operational checks for Nantian Gateway and points to the deeper metrics, Grafana, alerting, troubleshooting, and backup pages.
Nantian Gateway is mostly stateless. Kubernetes resources are the source of truth, the control plane rebuilds routing state from the Kubernetes API, and the data plane receives runtime snapshots over gRPC/xDS. Start operations work by checking Kubernetes status, route attachment, control plane logs, data plane logs, metrics, and admin endpoints.
First Checks After Install
Section titled “First Checks After Install”Run these checks before debugging individual routes:
kubectl get pods -n nantian-gwkubectl get gatewayclass nantian-gwkubectl get svc -n nantian-gwkubectl logs -n nantian-gw deploy/nantian-gw-controlplane --tail=100kubectl logs -n nantian-gw deploy/nantian-gw-dataplane --tail=100Interpret the result in this order:
| Check | What To Look For | Meaning |
|---|---|---|
| Pods | Control plane and data plane pods are Running and ready. | Workloads are scheduled, probes are passing, and the data plane has completed startup. |
| GatewayClass | nantian-gw exists and uses controller gateway.networking.k8s.io/nantian-gw. | The chart installed the class that application Gateway objects should reference. |
| Services | Fixed service names and ports match the table below. | Other components and operators can use stable in-cluster addresses. |
| Control plane logs | Reconciliation, status, snapshot, or xDS messages. | The control plane is watching resources and publishing state. |
| Data plane logs | xDS connection and configuration-apply messages. | The data plane is connected and has received runtime configuration. |
Helm Service Reference
Section titled “Helm Service Reference”The fixed services created by the Helm chart are:
| Service | Port | Purpose |
|---|---|---|
nantian-gw-controlplane-grpc | 18080 | Data plane xDS/gRPC connection. |
nantian-gw-controlplane-admin | 18081 | Control plane admin API. |
nantian-gw-controlplane-metrics | 18082 | Control plane Prometheus metrics. |
nantian-gw-dataplane-admin | 19080 | Data plane admin API. |
nantian-gw-dataplane-metrics | 19080 | Data plane metrics scrape entry. |
nantian-gw-dashboard | 3000 | Dashboard web UI. |
The data plane runtime HTTP listener is configured as 0.0.0.0:10080. Port-forward the deployment for local route tests:
kubectl port-forward -n nantian-gw deploy/nantian-gw-dataplane 10080:10080Use Deployment targets for quick inspection:
kubectl logs -n nantian-gw deploy/nantian-gw-controlplane --tail=100kubectl logs -n nantian-gw deploy/nantian-gw-dataplane --tail=100For a specific pod:
kubectl get pods -n nantian-gwkubectl logs -n nantian-gw pod/<pod-name> --tail=200Set debug logging only for short investigations. Debug output can be large and should not remain enabled in production.
Metrics And Admin Access
Section titled “Metrics And Admin Access”Forward control plane admin and metrics services when inspecting locally:
kubectl port-forward -n nantian-gw svc/nantian-gw-controlplane-admin 18081:18081kubectl port-forward -n nantian-gw svc/nantian-gw-controlplane-metrics 18082:18082Then query:
curl -s http://localhost:18081/livezcurl -s http://localhost:18081/readyzcurl -s http://localhost:18082/metrics | headForward the data plane admin service when you need data plane runtime details:
kubectl port-forward -n nantian-gw svc/nantian-gw-dataplane-admin 19080:19080curl -s http://localhost:19080/livezcurl -s http://localhost:19080/readyzPrometheus Operator ServiceMonitor resources are disabled by default in the Helm chart. Enable them only when the Prometheus Operator CRDs are installed and NetworkPolicies allow the Prometheus namespace to scrape the metrics services.
Route-Level Checks
Section titled “Route-Level Checks”When a route does not work, inspect status before testing traffic:
kubectl get gateway,httproute -Akubectl describe gateway <gateway-name> -n <namespace>kubectl describe httproute <route-name> -n <namespace>Look for accepted parents, listener matches, backend reference resolution, and status conditions. A route that is not attached will not receive traffic even if the data plane is healthy.
Common Operations
Section titled “Common Operations”Restart stateless components with rolling updates:
kubectl rollout restart deployment/nantian-gw-controlplane -n nantian-gwkubectl rollout restart deployment/nantian-gw-dataplane -n nantian-gwScale the data plane when traffic or resource pressure requires it:
kubectl scale deployment/nantian-gw-dataplane -n nantian-gw --replicas=4Watch rollout status:
kubectl rollout status deployment/nantian-gw-controlplane -n nantian-gwkubectl rollout status deployment/nantian-gw-dataplane -n nantian-gwChapter Structure
Section titled “Chapter Structure”| Page | Covers |
|---|---|
| Metrics Reference | Prometheus metrics emitted by the control plane and data plane. |
| Grafana Dashboard | How to import and use the bundled dashboard assets. |
| Alerting Rules | Recommended Prometheus alert rules. |
| Troubleshooting | Common symptoms and diagnostic flows. |
| Backup & Recovery | What to back up and how to restore gateway configuration. |