diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..47717f1 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,216 @@ +# Contributing to LIDSOL Infrastructure + +Thank you for your interest in contributing to the LIDSOL infrastructure repository! This document explains how to get started, what kinds of contributions are welcome, and how the review process works. + +## Table of Contents + +- [Code of Conduct](#code-of-conduct) +- [Ways to Contribute](#ways-to-contribute) +- [Getting Started](#getting-started) +- [Development Workflow](#development-workflow) +- [Making Changes](#making-changes) + - [Ansible Playbooks and Roles](#ansible-playbooks-and-roles) + - [Kubernetes Manifests (GitOps)](#kubernetes-manifests-gitops) + - [Documentation](#documentation) +- [Commit Messages](#commit-messages) +- [Pull Request Process](#pull-request-process) +- [Reporting Issues](#reporting-issues) +- [Contact](#contact) + +## Code of Conduct + +This project follows UNAM's community standards. We expect all contributors to be respectful, constructive, and collaborative. Harassment or discrimination of any kind will not be tolerated. + +## Ways to Contribute + +- **Bug reports**: Open an issue describing a misconfiguration, broken deployment, or unexpected behavior. +- **Feature requests**: Suggest new services, roles, or infrastructure improvements. +- **Documentation**: Improve or translate any documentation file. +- **Ansible roles**: Add new roles or improve existing ones (security, mirrors, network, etc.). +- **Kubernetes manifests**: Add new application deployments to `gitops/` or tune existing ones. +- **Security improvements**: Harden configurations, reduce attack surfaces, or update vulnerable components. + +## Getting Started + +### 1. Fork and clone + +```bash +# Fork the repo on GitHub, then: +git clone --recurse-submodules https://github.com//infra.git +cd infra +``` + +### 2. Install prerequisites + +| Tool | Minimum Version | +|------|----------------| +| [Ansible](https://docs.ansible.com/ansible/latest/installation_guide/) | 2.14 | +| [kubectl](https://kubernetes.io/docs/tasks/tools/) | 1.28 | +| [Git](https://git-scm.com/) | 2.40 | +| Python | 3.10 | + +Install Ansible collections used in this repo: + +```bash +ansible-galaxy collection install community.general +``` + +### 3. Set up a test environment + +> **Important**: Never test changes directly against production nodes without first verifying them in a safe environment. Use local VMs (e.g., with Vagrant or Multipass) or dedicated test machines. + +```bash +# Example: spin up test VMs with multipass +multipass launch --name test-node --cpus 2 --memory 2G +``` + +## Development Workflow + +``` +fork → clone → branch → change → test → commit → push → pull request +``` + +1. Create a branch from `main`: + ```bash + git checkout -b feat/my-feature + ``` +2. Make your changes (see [Making Changes](#making-changes) below). +3. Test your changes in a non-production environment. +4. Commit your changes following the [commit message guidelines](#commit-messages). +5. Push and open a pull request against `main`. + +## Making Changes + +### Ansible Playbooks and Roles + +- Follow the standard [Ansible best practices](https://docs.ansible.com/ansible/latest/tips_tricks/ansible_tips_tricks.html). +- Keep tasks idempotent: running a playbook multiple times must always produce the same result. +- Use **roles** for logically grouped tasks (e.g., `security`, `nginx`, `mirrors`). +- Store sensitive values (passwords, tokens, keys) in **Ansible Vault** — never commit them in plain text. +- Use descriptive `name:` fields on every task. +- Prefer YAML block scalars over long single-line strings. +- Add or update the relevant `docs/` file when introducing a new role or changing an existing one significantly. + +**Running a playbook in check mode** (dry run, no changes applied): + +```bash +ansible-playbook -i public-server/inventory.yml public-server/main.yaml --check +``` + +**Limiting execution to a single host or role**: + +```bash +# Single host +ansible-playbook -i public-server/inventory.yml public-server/main.yaml --limit mirrors + +# Single tag +ansible-playbook -i public-server/inventory.yml public-server/main.yaml --tags nginx +``` + +**Linting Ansible files**: + +```bash +pip install ansible-lint +ansible-lint public-server/main.yaml +``` + +### Kubernetes Manifests (GitOps) + +All Kubernetes manifests live in `gitops/`. ArgoCD watches this directory and automatically applies changes when they are merged to `main`. + +- One manifest file per application (e.g., `gitops/myapp-deployment.yaml`). +- Always specify resource `requests` and `limits` for CPU and memory. +- Use `Namespace` resources within the manifest when a dedicated namespace is needed. +- Prefer `ClusterIssuer` and `Certificate` resources from cert-manager for TLS — do not store TLS secrets in the repository. +- For `LoadBalancer` services that require a static IP from MetalLB, annotate the service with the desired IP: + ```yaml + metadata: + annotations: + metallb.universe.tf/loadBalancerIPs: "10.8.24.1XX" + ``` + +**Validate manifests before committing**: + +```bash +kubectl apply --dry-run=client -f gitops/myapp-deployment.yaml +``` + +### Documentation + +- Documentation lives in `docs/` (detailed per-component guides) and in the root `README.md` and `CONTRIBUTING.md`. +- Write in English; Spanish is also acceptable for documents that are primarily read by the local team. +- Use [Markdown](https://www.markdownlang.org/). Keep lines under 120 characters. +- Update the table of contents in any file you modify if it has one. +- Reference related files with relative links (e.g., `[cluster.md](docs/cluster.md)`). + +## Commit Messages + +Follow the [Conventional Commits](https://www.conventionalcommits.org/) specification: + +``` +(): + +[optional body] + +[optional footer] +``` + +**Types**: + +| Type | When to use | +|------|------------| +| `feat` | New feature or new Ansible role | +| `fix` | Bug fix or misconfiguration correction | +| `docs` | Documentation-only changes | +| `refactor` | Code restructuring without behavior change | +| `chore` | Dependency updates, minor maintenance | +| `security` | Security-related hardening or patch | + +**Examples**: + +``` +feat(mirrors): add Fedora mirror sync role + +fix(router): correct nftables masquerade rule for WAN interface + +docs(cluster): document MetalLB IP range configuration + +security(ssh): disable root login in sshd hardening task +``` + +## Pull Request Process + +1. Ensure your branch is up to date with `main`: + ```bash + git fetch origin + git rebase origin/main + ``` +2. Open a pull request against `main` on GitHub. +3. Fill in the pull request template: + - Describe **what** changed and **why**. + - List any manual steps required (e.g., rotating a secret, running a migration). + - Reference related issues with `Closes #`. +4. At least **one maintainer review** is required before merging. +5. All checks (if any CI workflows are configured) must pass. +6. The PR author is responsible for resolving review comments. +7. Squash or rebase commits before merging if the history is noisy. + +> **Production changes**: Any change that will affect production infrastructure (playbooks, K8s manifests) must be tested in a staging or dev environment first and documented in the PR description. + +## Reporting Issues + +Use the [GitHub Issues](https://github.com/LIDSOL/infra/issues) tracker to report bugs or request features. + +When reporting a bug, please include: + +- A clear title and description of the problem. +- Steps to reproduce (which playbook/command was run, against which host). +- Expected behavior vs. actual behavior. +- Relevant log output or error messages. +- Environment details (OS, Ansible version, Kubernetes version if applicable). + +## Contact + +- **Email**: lidsol-info@proton.me +- **GitHub**: [@LIDSOL](https://github.com/LIDSOL) +- **Website**: [lidsol.unam.mx](https://lidsol.unam.mx) diff --git a/README.md b/README.md new file mode 100644 index 0000000..13a9749 --- /dev/null +++ b/README.md @@ -0,0 +1,176 @@ +# LIDSOL Infrastructure + +[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE) + +Infrastructure-as-code repository for [LIDSOL](https://lidsol.unam.mx) (Laboratorio de Investigación y Desarrollo de Software Libre) at UNAM. This repository manages the full stack: a production Kubernetes cluster, public-facing services, Linux distribution mirrors, and the underlying network. + +## Table of Contents + +- [Architecture Overview](#architecture-overview) +- [Repository Structure](#repository-structure) +- [Prerequisites](#prerequisites) +- [Quick Start](#quick-start) +- [Components](#components) +- [Contributing](#contributing) +- [License](#license) + +## Architecture Overview + +``` +Internet + │ + ▼ +┌─────────────────────────────────────────────────────────────────────┐ +│ Router / Gateway (Armbian) │ +│ WAN: 132.248.59.72/24 ──► LAN: 10.8.24.0/24 │ +│ nftables firewall, DNSmasq, WireGuard VPN │ +└──────────────────────────────┬──────────────────────────────────────┘ + │ LAN + ┌──────────────────┼──────────────────┐ + ▼ ▼ ▼ + ┌──────────┐ ┌──────────┐ ┌──────────┐ + │ hp-alpha │ │ hp-beta │ │ gamma │ + │K3s master│ │K3s agent │ │K3s agent │ + └──────────┘ └──────────┘ └──────────┘ + │ │ │ + └──────────────────┼──────────────────┘ + │ + ┌──────────▼──────────┐ + │ Kubernetes (K3s) │ + │ API: 10.8.24.101 │ + │ MetalLB: .102-.120 │ + │ │ + │ ArgoCD (GitOps) │ + │ Cert-Manager │ + │ Ingress-Nginx │ + │ Longhorn (storage) │ + │ LIDSOL Website │ + │ Nextcloud │ + │ DrawDB │ + │ Speedtest Tracker │ + └─────────────────────┘ + +Public Server (lidsol.fi-b.unam.mx) + Nginx + Let's Encrypt + Linux Mirrors: AlmaLinux, ArchLinux, Debian, Fedora, Linux Mint +``` + +## Repository Structure + +``` +infra/ +├── README.md # This file +├── CONTRIBUTING.md # How to contribute +├── LICENSE # Apache 2.0 License +├── docs/ # Detailed documentation +│ ├── cluster.md # K3s cluster setup and management +│ ├── gitops.md # GitOps / ArgoCD deployments +│ ├── public-server.md # Public server (mirrors, router, security) +│ └── mirrors.md # Linux distribution mirrors +├── cluster/ # Kubernetes cluster provisioning +│ ├── argo-config/ # ArgoCD application manifests +│ ├── config/ # Ansible inventory and variables +│ ├── k3s-ansible/ # Git submodule: k3s-ansible +│ ├── node.yml # Ansible playbook: node preparation +│ └── argocd.yml # Ansible playbook: ArgoCD + Longhorn install +├── gitops/ # Kubernetes manifests (managed by ArgoCD) +│ ├── certmanager-deployment.yaml +│ ├── ingress-deployment.yaml +│ ├── longhorn-ui-deployment.yaml +│ ├── nextcloud-deployment.yaml +│ ├── website-deployment.yaml +│ ├── https-issuer-deployment.yaml +│ ├── drawdb-deployment.yaml +│ ├── speedtest-tracker-deployment.yaml +│ └── ExplicacionIngress.md # Ingress controller documentation (Spanish) +└── public-server/ # Public server configuration + ├── main.yaml # Main Ansible playbook + ├── inventory.yml # Ansible inventory + ├── security/ # Role: SSH hardening + automatic updates + ├── network/ # Role: IRQ balancing + sysctl tuning + ├── nginx/ # Role: Nginx + Let's Encrypt + ├── router/ # Role: routing, firewall, VPN, DNS + └── mirrors/ # Role: Linux distribution mirrors +``` + +## Prerequisites + +The following tools must be installed on your workstation before working with this repository. + +| Tool | Minimum Version | Purpose | +|------|----------------|---------| +| [Ansible](https://docs.ansible.com/ansible/latest/installation_guide/) | 2.14 | Configuration management and playbook execution | +| [kubectl](https://kubernetes.io/docs/tasks/tools/) | 1.28 | Kubernetes cluster management | +| [Git](https://git-scm.com/) | 2.40 | Version control and submodule support | +| Python | 3.10 | Required by Ansible | + +> **Note**: Access to the cluster nodes via SSH (with your key in `cluster/config/authorized_keys`) is required for cluster operations. + +## Quick Start + +### 1. Clone the repository + +```bash +git clone --recurse-submodules https://github.com/LIDSOL/infra.git +cd infra +``` + +If you already cloned without `--recurse-submodules`: + +```bash +git submodule update --init --recursive +``` + +### 2. Set up the Kubernetes cluster + +See [docs/cluster.md](docs/cluster.md) for full details. + +```bash +# 1. Configure node variables +cp cluster/config/host_vars/hp-alpha.yml cluster/config/host_vars/.yml +# Edit the file with your node's IP and network settings + +# 2. Prepare the nodes +ansible-playbook -i cluster/config/hosts.ini cluster/node.yml + +# 3. Deploy K3s +ansible-playbook -i cluster/config/hosts.ini cluster/k3s-ansible/site.yml + +# 4. Install ArgoCD and Longhorn +ansible-playbook -i cluster/config/hosts.ini cluster/argocd.yml +``` + +### 3. Configure the public server + +See [docs/public-server.md](docs/public-server.md) for full details. + +```bash +# Review and update the inventory +vim public-server/inventory.yml + +# Run the main playbook +ansible-playbook -i public-server/inventory.yml public-server/main.yaml +``` + +### 4. Deploy applications via GitOps + +Once ArgoCD is running, it automatically syncs manifests from the `gitops/` directory. See [docs/gitops.md](docs/gitops.md) for details. + +## Components + +| Component | Documentation | Description | +|-----------|--------------|-------------| +| K3s Cluster | [docs/cluster.md](docs/cluster.md) | Lightweight Kubernetes cluster on bare metal | +| GitOps | [docs/gitops.md](docs/gitops.md) | ArgoCD-managed application deployments | +| Public Server | [docs/public-server.md](docs/public-server.md) | Nginx, routing, security, and mirrors | +| Mirrors | [docs/mirrors.md](docs/mirrors.md) | Linux distribution mirror setup and sync | + +## Contributing + +We welcome contributions of all kinds — bug fixes, documentation improvements, new features, and feedback. + +Please read [CONTRIBUTING.md](CONTRIBUTING.md) before submitting a pull request. + +## License + +This project is licensed under the [Apache License 2.0](LICENSE). diff --git a/docs/cluster.md b/docs/cluster.md new file mode 100644 index 0000000..f03deaf --- /dev/null +++ b/docs/cluster.md @@ -0,0 +1,242 @@ +# K3s Cluster + +This document covers how to provision, configure, and manage the LIDSOL Kubernetes cluster, which is built on [K3s](https://k3s.io/) and deployed using Ansible. + +## Table of Contents + +- [Architecture](#architecture) +- [Node Inventory](#node-inventory) +- [Network Configuration](#network-configuration) +- [Prerequisites](#prerequisites) +- [Cluster Setup](#cluster-setup) + - [1. Prepare the Nodes](#1-prepare-the-nodes) + - [2. Deploy K3s](#2-deploy-k3s) + - [3. Install ArgoCD and Longhorn](#3-install-argocd-and-longhorn) +- [Accessing the Cluster](#accessing-the-cluster) +- [Upgrading K3s](#upgrading-k3s) +- [Storage (Longhorn)](#storage-longhorn) +- [Load Balancing (MetalLB)](#load-balancing-metallb) +- [High Availability (Kube-vip)](#high-availability-kube-vip) +- [Troubleshooting](#troubleshooting) + +## Architecture + +The cluster consists of three bare-metal nodes connected through the internal LAN (`10.8.24.0/24`). A virtual IP (`10.8.24.101`) is provided by Kube-vip for high-availability access to the Kubernetes API server. + +``` +LAN: 10.8.24.0/24 + ┌──────────────────────────────┐ + │ Virtual IP: 10.8.24.101 │ + │ (Kube-vip, ARP mode) │ + └──────────────┬───────────────┘ + │ + ┌────────────────────┼────────────────────┐ + ▼ ▼ ▼ + ┌──────────┐ ┌──────────┐ ┌──────────┐ + │ hp-alpha │ │ hp-beta │ │ gamma │ + │ master │ │ agent │ │ agent │ + └──────────┘ └──────────┘ └──────────┘ + +MetalLB pool: 10.8.24.102 – 10.8.24.120 (LoadBalancer services) +ArgoCD: 10.8.24.102 +Longhorn UI: 10.8.24.105 +``` + +**Key components installed by `argocd.yml`**: + +| Component | Version | Purpose | +|-----------|---------|---------| +| K3s | v1.30.2+k3s2 | Lightweight Kubernetes distribution | +| ArgoCD | Latest stable | GitOps continuous delivery | +| Longhorn | v1.8.1 | Distributed block storage | +| ArgoCD Image Updater | Latest | Automatic container image updates | +| MetalLB | v0.14.8 | Bare-metal load balancer | +| Kube-vip | v0.8.2 | HA virtual IP for the API server | + +## Node Inventory + +Inventory file: `cluster/config/hosts.ini` + +| Hostname | Role | Internal IP | Config File | +|----------|------|------------|-------------| +| `hp-alpha` | master | 10.8.24.x | `cluster/config/host_vars/hp-alpha.yml` | +| `hp-beta` | agent | 10.8.24.x | `cluster/config/host_vars/hp-beta.yml` | +| `gamma` | agent | 10.8.24.x | `cluster/config/host_vars/gamma.yml` | + +Node-specific variables (network interface, IP, etc.) are stored in `cluster/config/host_vars/.yml`. + +## Network Configuration + +Cluster-wide network settings live in `cluster/config/group_vars/all.yml`: + +| Variable | Default | Description | +|----------|---------|-------------| +| `k3s_version` | `v1.30.2+k3s2` | K3s release to install | +| `k3s_token` | (set by operator) | Shared cluster token — **keep secret** | +| `api_endpoint` | `10.8.24.101` | Kube-vip virtual IP | +| `kube_vip_tag_version` | `v0.8.2` | Kube-vip image version | +| `metal_lb_speaker_tag_version` | `v0.14.8` | MetalLB version | +| `metal_lb_ip_range` | `10.8.24.102-10.8.24.120` | Pool for LoadBalancer IPs | +| `flannel_iface` | node-specific | Network interface for Flannel CNI | +| `cluster_cidr` | `10.42.0.0/16` | Pod CIDR | +| `service_cidr` | `10.43.0.0/16` | Service CIDR | + +**CNI options**: Flannel (default), Calico, or Cilium — controlled by variables in `all.yml`. + +## Prerequisites + +- SSH access to all cluster nodes with a key listed in `cluster/config/authorized_keys`. +- The user running the playbooks must have passwordless `sudo` on the nodes (or supply the sudo password via `--ask-become-pass`). +- Ansible ≥ 2.14 installed on the control machine. +- The k3s-ansible submodule must be initialized: + ```bash + git submodule update --init --recursive + ``` + +## Cluster Setup + +### 1. Prepare the Nodes + +The `node.yml` playbook configures each node before K3s is deployed. It: + +- Creates a dedicated sudo user. +- Installs authorized SSH keys. +- Enables unattended security updates. +- Installs NFS and iSCSI client packages (for external storage). +- Configures the network interface used for the cluster. + +```bash +ansible-playbook -i cluster/config/hosts.ini cluster/node.yml +``` + +> You may be prompted for the SSH password on the first run if key-based auth is not yet set up. Pass `--ask-pass` if needed. + +### 2. Deploy K3s + +The `k3s-ansible` submodule provides the deployment playbook. + +```bash +ansible-playbook -i cluster/config/hosts.ini cluster/k3s-ansible/site.yml +``` + +After this step, K3s is running. Retrieve the kubeconfig: + +```bash +scp :/etc/rancher/k3s/k3s.yaml ~/.kube/config +# Replace the server IP 127.0.0.1 with the VIP: +sed -i 's/127.0.0.1/10.8.24.101/' ~/.kube/config +kubectl get nodes +``` + +### 3. Install ArgoCD and Longhorn + +The `argocd.yml` playbook installs ArgoCD, Longhorn, and the ArgoCD Image Updater on top of the running cluster. + +```bash +ansible-playbook -i cluster/config/hosts.ini cluster/argocd.yml +``` + +After completion: + +- **ArgoCD** is accessible at `http://10.8.24.102` (LoadBalancer IP). +- The initial admin password is stored in the `argocd-initial-admin-secret` Kubernetes secret: + ```bash + kubectl -n argocd get secret argocd-initial-admin-secret \ + -o jsonpath="{.data.password}" | base64 -d + ``` +- ArgoCD is configured (via `cluster/argo-config/default-manifest.yml`) to watch the `gitops/` directory of this repository and auto-sync all manifests. + +## Accessing the Cluster + +```bash +# List nodes +kubectl get nodes -o wide + +# List all pods +kubectl get pods -A + +# Access ArgoCD UI (port-forward if LoadBalancer IP is not reachable) +kubectl port-forward svc/argocd-server -n argocd 8080:443 +# Then open https://localhost:8080 +``` + +## Upgrading K3s + +To upgrade K3s, update the `k3s_version` variable in `cluster/config/group_vars/all.yml` and re-run the K3s playbook: + +```bash +ansible-playbook -i cluster/config/hosts.ini cluster/k3s-ansible/site.yml +``` + +Always check the [K3s release notes](https://github.com/k3s-io/k3s/releases) for breaking changes before upgrading. + +## Storage (Longhorn) + +[Longhorn](https://longhorn.io/) provides distributed replicated block storage across the cluster nodes. + +- **Version**: v1.8.1 +- **UI**: Available at `http://10.8.24.105` (LoadBalancer service `longhorn-ui` in namespace `longhorn-system`). +- **Default StorageClass**: Longhorn is set as the default `StorageClass` so that `PersistentVolumeClaims` without an explicit `storageClassName` use it automatically. + +To check storage status: + +```bash +kubectl get storageclass +kubectl get pv,pvc -A +``` + +## Load Balancing (MetalLB) + +[MetalLB](https://metallb.universe.tf/) allocates external IP addresses from the pool `10.8.24.102–10.8.24.120` to `LoadBalancer`-type services. + +To request a specific IP for a service, annotate it: + +```yaml +metadata: + annotations: + metallb.universe.tf/loadBalancerIPs: "10.8.24.110" +``` + +Currently allocated IPs: + +| Service | IP | +|---------|-----| +| ArgoCD | 10.8.24.102 | +| Longhorn UI | 10.8.24.105 | + +## High Availability (Kube-vip) + +[Kube-vip](https://kube-vip.io/) keeps the Kubernetes API server accessible via a virtual IP (`10.8.24.101`) using ARP broadcasts. If the current leader node fails, the VIP migrates to another master node automatically. + +The Kube-vip version is controlled by `kube_vip_tag_version` in `cluster/config/group_vars/all.yml`. + +## Troubleshooting + +**Check K3s service status on a node**: + +```bash +sudo systemctl status k3s # on master +sudo systemctl status k3s-agent # on agent nodes +sudo journalctl -u k3s -f # follow logs +``` + +**Node not joining the cluster**: + +- Verify the `k3s_token` in `all.yml` matches on all nodes. +- Ensure the API server endpoint (`10.8.24.101`) is reachable from all nodes. +- Check firewall rules allow TCP 6443 (API) and UDP 8472 (Flannel VXLAN). + +**Pod stuck in `Pending`**: + +```bash +kubectl describe pod -n +# Check for storage (Longhorn) or scheduling issues +kubectl get events -n --sort-by='.lastTimestamp' +``` + +**ArgoCD application out of sync**: + +```bash +argocd app sync +# or from the UI: click "Sync" on the application card +``` diff --git a/docs/gitops.md b/docs/gitops.md new file mode 100644 index 0000000..e59b702 --- /dev/null +++ b/docs/gitops.md @@ -0,0 +1,289 @@ +# GitOps Deployments + +This document describes the GitOps workflow used by LIDSOL to deploy and manage applications on the Kubernetes cluster. All Kubernetes manifests are stored in the `gitops/` directory and continuously reconciled by [ArgoCD](https://argo-cd.readthedocs.io/). + +## Table of Contents + +- [Overview](#overview) +- [How GitOps Works Here](#how-gitops-works-here) +- [Deployed Applications](#deployed-applications) +- [Adding a New Application](#adding-a-new-application) +- [TLS / HTTPS](#tls--https) +- [Ingress](#ingress) +- [Updating a Container Image](#updating-a-container-image) +- [Troubleshooting](#troubleshooting) + +## Overview + +``` +GitHub repository (LIDSOL/infra) + │ + │ git push to main + ▼ + gitops/ directory + │ + │ ArgoCD polls every 3 minutes (or webhook) + ▼ + Kubernetes cluster + (auto-sync, auto-prune, self-heal) +``` + +ArgoCD monitors the `gitops/` directory of the `main` branch. Any change merged to `main` is automatically applied to the cluster within minutes, without manual `kubectl apply` commands. + +## How GitOps Works Here + +The ArgoCD `Application` resource that drives the sync is defined in `cluster/argo-config/default-manifest.yml`. Key settings: + +| Setting | Value | Meaning | +|---------|-------|---------| +| Source repo | `https://github.com/LIDSOL/infra` | This repository | +| Source path | `gitops/` | Only manifests under this folder are synced | +| Target branch | `main` | Changes on `main` trigger a sync | +| Auto-sync | enabled | ArgoCD applies changes automatically | +| Auto-prune | enabled | Deleted manifests are removed from the cluster | +| Self-heal | enabled | Manual `kubectl` edits are reverted back to the declared state | +| Destination | `https://kubernetes.default.svc` | In-cluster API server | +| Namespace | `gitops` | Default namespace for resources without an explicit namespace | + +## Deployed Applications + +### LIDSOL Website + +| Property | Value | +|----------|-------| +| Manifest | `gitops/website-deployment.yaml` | +| Image | `ghcr.io/lidsol/sitio-web-lidsol:master` | +| Replicas | 3 | +| Namespace | `gitops` | +| Public URL | `https://lidsol.unam.mx` | + +Static website for LIDSOL, served through Ingress-Nginx with a Let's Encrypt TLS certificate. + +### Nextcloud + +| Property | Value | +|----------|-------| +| Manifest | `gitops/nextcloud-deployment.yaml` | +| Namespace | `gitops` | +| Storage (DB) | 20 Gi (Longhorn) | +| Storage (Data) | 80 Gi (Longhorn) | + +Collaborative file-sharing suite backed by MariaDB. Persistent volumes are provisioned automatically by Longhorn. + +### Cert-Manager + +| Property | Value | +|----------|-------| +| Manifest | `gitops/certmanager-deployment.yaml` | +| Namespace | `cert-manager` | + +Automates TLS certificate issuance and renewal using Let's Encrypt. See [TLS / HTTPS](#tls--https). + +### Ingress-Nginx + +| Property | Value | +|----------|-------| +| Manifest | `gitops/ingress-deployment.yaml` | +| Version | v1.13.2 | +| Namespace | `ingress-nginx` | +| HTTP port | 80 | +| HTTPS port | 443 | + +Reverse proxy and load balancer for all HTTP/HTTPS traffic entering the cluster. See also `gitops/ExplicacionIngress.md` (Spanish) for additional details on the ingress controller configuration and how to write `Ingress` resources. + +### HTTPS Issuer + +| Property | Value | +|----------|-------| +| Manifest | `gitops/https-issuer-deployment.yaml` | +| Type | `ClusterIssuer` | +| Provider | Let's Encrypt (ACME) | + +Defines the `ClusterIssuer` resources (`letsencrypt-staging` and `letsencrypt-prod`) used by cert-manager to obtain TLS certificates. + +### Longhorn UI + +| Property | Value | +|----------|-------| +| Manifest | `gitops/longhorn-ui-deployment.yaml` | +| IP | 10.8.24.105 | +| Port | 80 | + +Exposes the Longhorn storage dashboard as a `LoadBalancer` service on the internal network. + +### DrawDB + +| Property | Value | +|----------|-------| +| Manifest | `gitops/drawdb-deployment.yaml` | +| Namespace | `gitops` | + +Web-based database schema design tool. + +### Speedtest Tracker + +| Property | Value | +|----------|-------| +| Manifest | `gitops/speedtest-tracker-deployment.yaml` | +| Namespace | `gitops` | + +Continuously monitors and records internet connection speed. + +## Adding a New Application + +1. Create a manifest file in `gitops/`: + ```bash + touch gitops/myapp-deployment.yaml + ``` + +2. Write the manifest. A minimal example: + + ```yaml + --- + apiVersion: apps/v1 + kind: Deployment + metadata: + name: myapp + namespace: gitops + spec: + replicas: 1 + selector: + matchLabels: + app: myapp + template: + metadata: + labels: + app: myapp + spec: + containers: + - name: myapp + image: myorg/myapp:latest + resources: + requests: + cpu: "100m" + memory: "128Mi" + limits: + cpu: "500m" + memory: "256Mi" + --- + apiVersion: v1 + kind: Service + metadata: + name: myapp + namespace: gitops + spec: + selector: + app: myapp + ports: + - port: 80 + targetPort: 8080 + ``` + +3. Add an `Ingress` resource if the app needs external HTTP/HTTPS access (see [Ingress](#ingress) below). + +4. Validate the manifest before committing: + ```bash + kubectl apply --dry-run=client -f gitops/myapp-deployment.yaml + ``` + +5. Commit and open a pull request. After merge to `main`, ArgoCD applies the manifest automatically. + +## TLS / HTTPS + +Certificates are issued by [cert-manager](https://cert-manager.io/) using Let's Encrypt. + +To enable HTTPS for an ingress, add the cert-manager annotation and a `tls` block: + +```yaml +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: myapp + namespace: gitops + annotations: + cert-manager.io/cluster-issuer: "letsencrypt-prod" +spec: + ingressClassName: nginx + tls: + - hosts: + - myapp.lidsol.unam.mx + secretName: myapp-tls + rules: + - host: myapp.lidsol.unam.mx + http: + paths: + - path: / + pathType: Prefix + backend: + service: + name: myapp + port: + number: 80 +``` + +> Use `letsencrypt-staging` during development to avoid rate limits; switch to `letsencrypt-prod` for production. + +## Ingress + +All ingress traffic is handled by **ingress-nginx** (v1.13.2). The controller listens on the cluster node ports and routes traffic based on `Ingress` resources. + +For detailed examples and configuration options, see `gitops/ExplicacionIngress.md`. + +Common annotations: + +| Annotation | Example | Purpose | +|-----------|---------|---------| +| `nginx.ingress.kubernetes.io/rewrite-target` | `/` | URL rewriting | +| `nginx.ingress.kubernetes.io/ssl-redirect` | `"true"` | Force HTTPS | +| `nginx.ingress.kubernetes.io/proxy-body-size` | `"50m"` | Increase upload limit | +| `cert-manager.io/cluster-issuer` | `letsencrypt-prod` | Request a TLS certificate | + +## Updating a Container Image + +Images in `gitops/` manifests can be updated in two ways: + +1. **Manual update**: Edit the `image:` field in the manifest file, commit, and push to `main`. ArgoCD will detect the change and perform a rolling update. + +2. **ArgoCD Image Updater** (configured by `argocd.yml`): Automatically detects new tags in the container registry and commits updated image tags back to the repository. Configure the update policy with annotations on the `Application` resource. + +## Troubleshooting + +**Check ArgoCD application status**: + +```bash +# List all applications +kubectl get applications -n argocd + +# Describe a specific application +kubectl describe application -n argocd +``` + +**Force a manual sync**: + +```bash +# Using kubectl +kubectl patch application -n argocd \ + --type merge -p '{"operation":{"initiatedBy":{"username":"admin"},"sync":{}}}' + +# Using argocd CLI +argocd app sync +``` + +**Inspect events on a namespace**: + +```bash +kubectl get events -n gitops --sort-by='.lastTimestamp' +``` + +**View logs for a pod**: + +```bash +kubectl logs -n gitops deployment/myapp --tail=100 -f +``` + +**CrashLoopBackOff or ImagePullBackOff**: + +```bash +kubectl describe pod -n gitops +# Look at Events section for the root cause +``` diff --git a/docs/mirrors.md b/docs/mirrors.md new file mode 100644 index 0000000..02729a9 --- /dev/null +++ b/docs/mirrors.md @@ -0,0 +1,245 @@ +# Linux Distribution Mirrors + +This document describes the Linux distribution mirror infrastructure hosted by LIDSOL at `lidsol.fi-b.unam.mx`. Mirrors are managed by the `mirrors` Ansible role located at `public-server/mirrors/`. + +## Table of Contents + +- [Overview](#overview) +- [Hosted Distributions](#hosted-distributions) +- [Role Structure](#role-structure) +- [Variables](#variables) +- [Deploying the Mirrors Role](#deploying-the-mirrors-role) +- [Mirror Sync Tools](#mirror-sync-tools) + - [AlmaLinux](#almalinux) + - [ArchLinux](#archlinux) + - [Debian](#debian) + - [Fedora](#fedora) + - [Linux Mint](#linux-mint) +- [Nginx Configuration](#nginx-configuration) +- [rsync Daemon](#rsync-daemon) +- [Adding a New Mirror](#adding-a-new-mirror) +- [Troubleshooting](#troubleshooting) + +## Overview + +LIDSOL mirrors several popular Linux distributions to provide fast, local package downloads for users at UNAM. The mirrors are synchronized from upstream sources using `rsync` and custom sync scripts, and served over HTTP/HTTPS by Nginx. + +``` +Upstream mirror sources (internet) + │ + │ rsync / ftpsync + ▼ +/srv// (disk storage on mirror server) + │ + │ Nginx (with per-mirror access/error logs) + ▼ +lidsol.fi-b.unam.mx// +``` + +A dedicated `mirrors` OS user owns all mirror data under `/srv/`. + +## Hosted Distributions + +| Distribution | Local Path | Upstream Source | +|-------------|-----------|----------------| +| AlmaLinux | `/srv/almalinux/` | rsync.repo.almalinux.org | +| ArchLinux | `/srv/archlinux/` | mirror.rackspace.com | +| Debian | `/srv/debian/` | ftp.us.debian.org (ftpsync) | +| Debian CD | `/srv/debian-cd/` | cdimage.debian.org | +| Fedora | `/srv/fedora/` | dl.fedoraproject.org | +| Linux Mint | `/srv/linux-mint/` | rsync.linuxmint.com | +| Linux Mint ISO | `/srv/linux-mint-cd/` | rsync.linuxmint.com::linuxmint-cd | + +## Role Structure + +``` +public-server/mirrors/ +├── tasks/ +│ ├── main.yaml # Orchestrator: includes all subtasks +│ ├── almalinux.yaml # AlmaLinux sync setup +│ ├── archlinux.yaml # ArchLinux sync setup +│ ├── debian.yaml # Debian & Debian-CD sync setup +│ ├── fedora.yaml # Fedora sync setup +│ ├── linux-mint.yaml # Linux Mint sync setup +│ ├── nginx.yaml # Nginx virtual host configuration +│ └── rsync-daemon.yaml # rsync daemon setup +├── files/distros/ +│ ├── AlmaLinux/ # AlmaLinux sync scripts and config +│ ├── ArchLinux/ # ArchLinux sync scripts +│ ├── Debian/ +│ │ ├── repository/ftpsync/ # Debian ftpsync tool +│ │ └── scripts/ # Mirror check script +│ ├── Fedora/ # Fedora sync scripts and config +│ └── LinuxMint/ # Linux Mint sync scripts +├── handlers/ +├── templates/ # Jinja2 templates (nginx vhosts, cron entries, etc.) +└── vars/ + └── main.yml # Mirror paths, logging, distro metadata +``` + +## Variables + +All mirror-related variables are defined in `public-server/mirrors/vars/main.yml`. + +Key variables: + +| Variable | Description | +|----------|-------------| +| `mirrors_base_dir` | Base directory for all mirrors (default: `/srv`) | +| `mirrors_user` | OS user owning mirror data (default: `mirrors`) | +| `mirrors_log_dir` | Directory for per-mirror logs | +| Distro-specific vars | Upstream rsync URLs, local paths, bandwidth limits | + +## Deploying the Mirrors Role + +```bash +# Deploy only the mirrors role +ansible-playbook -i public-server/inventory.yml public-server/main.yaml \ + --limit servers --tags mirrors + +# Deploy mirrors + nginx together +ansible-playbook -i public-server/inventory.yml public-server/main.yaml \ + --limit servers --tags mirrors,nginx +``` + +After deployment, the cron jobs / systemd timers for each distribution will be configured and sync will begin on schedule. + +## Mirror Sync Tools + +### AlmaLinux + +- **Script/tool**: Custom rsync wrapper +- **Source**: `public-server/mirrors/files/distros/AlmaLinux/` +- **Upstream**: `rsync://rsync.repo.almalinux.org/almalinux/` +- **Local path**: `/srv/almalinux/` +- Sync is scheduled via a cron job (or systemd timer) installed by the `almalinux.yaml` task. + +### ArchLinux + +- **Script**: `archlinux-syncrepo.sh` +- **Source**: `public-server/mirrors/files/distros/ArchLinux/repository/scripts/` +- **Upstream**: `rsync://mirror.rackspace.com/archlinux/` +- **Local path**: `/srv/archlinux/` +- The script follows the [ArchLinux Mirror Guidelines](https://wiki.archlinux.org/title/DeveloperWiki:NewMirrors). + +### Debian + +- **Tool**: [ftpsync](https://salsa.debian.org/mirror-team/archvsync) (included at `public-server/mirrors/files/distros/Debian/repository/ftpsync/`) +- **Local paths**: `/srv/debian/`, `/srv/debian-cd/` +- **Check script**: `public-server/mirrors/files/distros/Debian/repository/scripts/debian-mirror-sync-check.sh` — validates that the mirror is up to date and reports errors. + +Running the check script manually: + +```bash +sudo -u mirrors /home/mirrors/debian/scripts/debian-mirror-sync-check.sh +``` + +### Fedora + +- **Source**: `public-server/mirrors/files/distros/Fedora/` +- **Upstream**: `rsync://dl.fedoraproject.org/fedora-enchilada/linux/` +- **Local path**: `/srv/fedora/` +- Includes configuration to sync only the actively supported Fedora releases. + +### Linux Mint + +- **Scripts**: + - `linuxmint-mirror-sync.sh` — syncs package repositories + - `linuxmint-cd-sync.sh` — syncs ISO images +- **Source**: `public-server/mirrors/files/distros/LinuxMint/` +- **Upstream**: `rsync://rsync.linuxmint.com/` +- **Local paths**: `/srv/linux-mint/`, `/srv/linux-mint-cd/` + +## Nginx Configuration + +The `nginx.yaml` task deploys a virtual host configuration for the mirrors. Each distribution is accessible at: + +``` +http://lidsol.fi-b.unam.mx// +https://lidsol.fi-b.unam.mx// +``` + +Each distribution has its own Nginx `access_log` and `error_log` for easier monitoring. Log paths are configured in `vars/main.yml`. + +To check the current Nginx mirror configuration on the server: + +```bash +sudo nginx -T | grep -A 20 "server_name lidsol.fi-b.unam.mx" +sudo tail -f /var/log/nginx/mirrors-access.log +``` + +## rsync Daemon + +The `rsync-daemon.yaml` task sets up an `rsyncd` service so that other mirror operators can pull from LIDSOL. This is important for participating in the wider mirror network. + +The rsync daemon configuration restricts which modules (paths) are shared, sets bandwidth limits, and requires read-only access. + +To check rsync daemon status: + +```bash +sudo systemctl status rsync +cat /etc/rsyncd.conf +``` + +## Adding a New Mirror + +1. **Create the task file**: Add `public-server/mirrors/tasks/.yaml` following the pattern of an existing task (e.g., `almalinux.yaml`). + +2. **Add variables**: Add the new distro's variables (path, upstream URL, log files) to `public-server/mirrors/vars/main.yml`. + +3. **Add sync scripts**: Place any sync scripts or tools in `public-server/mirrors/files/distros//`. + +4. **Update nginx.yaml**: Add a new `location` block for the distro's path. + +5. **Update rsync-daemon.yaml**: Add a new module to `rsyncd.conf` if the distro should be available for re-mirroring. + +6. **Update this document**: Add the new distribution to the [Hosted Distributions](#hosted-distributions) table. + +7. **Provision disk space**: Ensure the server has sufficient disk space. Before deploying, estimate the mirror size from the upstream documentation. + +8. **Test the sync**: + ```bash + ansible-playbook -i public-server/inventory.yml public-server/main.yaml \ + --limit servers --tags mirrors --check + ``` + +## Troubleshooting + +**Sync script fails with rsync error**: + +```bash +# Check upstream reachability +rsync --list-only rsync://// + +# Run the sync manually as the mirrors user +sudo -u mirrors /home/mirrors//scripts/sync.sh +``` + +**Nginx returns 403 Forbidden for a mirror path**: + +```bash +# Check file permissions +ls -la /srv// +# The mirrors user must own the files; the Nginx worker must be able to read them +sudo chown -R mirrors:mirrors /srv// +sudo chmod -R o+r /srv// +``` + +**Mirror is out of date**: + +```bash +# Check when the last sync ran +sudo journalctl -u rsync -f +cat /var/log/mirrors/-sync.log | tail -50 + +# Force an immediate sync +sudo systemctl start -mirror-sync.timer # or run the script directly +``` + +**Disk space exhausted**: + +```bash +df -h /srv +du -sh /srv/* # Check per-distro usage +# Consider pruning old releases or adding storage +``` diff --git a/docs/public-server.md b/docs/public-server.md new file mode 100644 index 0000000..e0430b6 --- /dev/null +++ b/docs/public-server.md @@ -0,0 +1,236 @@ +# Public Server + +This document describes the configuration of LIDSOL's public-facing server (`lidsol.fi-b.unam.mx`). It is managed with Ansible and includes four roles: **security**, **network**, **nginx**, and **router**. The mirrors role is covered separately in [mirrors.md](mirrors.md). + +## Table of Contents + +- [Overview](#overview) +- [Inventory](#inventory) +- [Running the Playbook](#running-the-playbook) +- [Roles](#roles) + - [Security](#security) + - [Network](#network) + - [Nginx](#nginx) + - [Router](#router) +- [Network Diagram](#network-diagram) +- [Firewall (nftables)](#firewall-nftables) +- [VPN (WireGuard)](#vpn-wireguard) +- [DNS / DHCP (DNSmasq)](#dns--dhcp-dnsmasq) +- [Troubleshooting](#troubleshooting) + +## Overview + +The public server acts as the perimeter router and mirrors host for LIDSOL. It bridges the university's public network (`132.248.59.0/24`) to the internal lab LAN (`10.8.24.0/24`) and exposes services on `lidsol.fi-b.unam.mx`. + +| Host group | Description | +|------------|-------------| +| `servers` | The main public server; runs mirrors, nginx, security hardening | +| `routers` | The gateway/router; runs routing, firewall, VPN, DNSmasq | + +## Inventory + +File: `public-server/inventory.yml` + +Edit this file to set the correct `ansible_host` (IP address) and `ansible_user` for each host. + +## Running the Playbook + +```bash +# Full run (all roles on all hosts) +ansible-playbook -i public-server/inventory.yml public-server/main.yaml + +# Dry run (no changes applied) +ansible-playbook -i public-server/inventory.yml public-server/main.yaml --check + +# Limit to a single host group +ansible-playbook -i public-server/inventory.yml public-server/main.yaml --limit servers + +# Limit to a single role/tag +ansible-playbook -i public-server/inventory.yml public-server/main.yaml --tags nginx +``` + +## Roles + +### Security + +**Path**: `public-server/security/` + +Applied to all hosts. Hardens the server by: + +1. **SSH hardening** (`tasks/sshd.yml`): + - Disables password authentication (public-key only). + - Disables root login over SSH. + - Restarts `sshd` via a handler only when the configuration changes. + +2. **Unattended upgrades** (`tasks/unattended-upgrades.yml`): + - Installs `unattended-upgrades` and enables automatic security updates. + - Ensures that critical OS patches are applied without operator intervention. + +> **Important**: Before applying this role to a new server, ensure your SSH public key is present in `~/.ssh/authorized_keys` on the target host. Once password auth is disabled, key-based access is the only way in. + +### Network + +**Path**: `public-server/network/` + +Optimizes the network stack for high-throughput workloads (e.g., mirror syncing). + +1. **IRQ balancing** (`tasks/irqbalance.yaml`): + - Installs and enables `irqbalance` to distribute hardware interrupts across CPU cores. + +2. **Kernel tuning** (`tasks/sysctl.yaml`): + - Sets `net.core` and `net.ipv4` kernel parameters for improved TCP performance. + - Parameters are persisted in `/etc/sysctl.d/`. + +3. **Interface configuration** (`tasks/interfaces.yaml`): + - Configures network interfaces as needed. + +### Nginx + +**Path**: `public-server/nginx/` + +Sets up the Nginx web server and reverse proxy for public-facing services. + +- Installs Nginx. +- Configures virtual hosts for `lidsol.fi-b.unam.mx`. +- Obtains and renews TLS certificates via **Certbot** (Let's Encrypt). +- Serves as the front end for the Linux distribution mirrors (see [mirrors.md](mirrors.md)). + +**Useful commands on the server**: + +```bash +sudo systemctl status nginx +sudo nginx -t # Test configuration syntax +sudo certbot renew --dry-run # Test certificate renewal +``` + +### Router + +**Path**: `public-server/router/` + +Configures the gateway node to route traffic between the public internet and the internal LAN. + +| Task file | Purpose | +|-----------|---------| +| `tasks/main.yaml` | Orchestrates all router sub-tasks | +| `files/armbian.yaml` | Netplan network configuration (WAN + LAN bridge) | +| `files/lan.conf` | DNSmasq LAN DNS/DHCP configuration | +| `files/nftables.conf` | Main nftables firewall rules | +| `files/nftables-override.conf` | Additional nftables rules (overrides) | +| `files/99-router.conf` | Router-specific sysctl tuning | +| `files/wg0.conf` | WireGuard VPN tunnel configuration | + +Sub-tasks performed: + +- **Netplan**: Applies the `armbian.yaml` network configuration file and runs `netplan apply`. +- **DNSmasq**: Deploys `lan.conf` to provide DHCP and local DNS resolution on the LAN. +- **nftables**: Deploys firewall rules and enables the `nftables` service. +- **WireGuard**: Deploys `wg0.conf` and enables the `wg-quick@wg0` service. + +## Network Diagram + +``` +Internet (132.248.59.0/24) + │ + │ WAN: 132.248.59.72/24 + │ GW: 132.248.59.254 + ▼ +┌───────────────────────────────┐ +│ Router / Gateway │ +│ nftables (NAT + stateful) │ +│ DNSmasq (DHCP/DNS) │ +│ WireGuard (VPN, wg0) │ +└──────────────┬────────────────┘ + │ LAN bridge br0 + │ 10.8.24.0/24 + ▼ + ┌───────────────────┐ + │ Internal LAN │ + │ K3s cluster │ + │ (10.8.24.x) │ + └───────────────────┘ +``` + +## Firewall (nftables) + +Rules are defined in `public-server/router/files/nftables.conf`. The firewall implements: + +- **Stateful packet filtering**: Established/related connections are automatically allowed. +- **NAT masquerade**: Internal LAN traffic is masqueraded to the WAN IP when going out to the internet. +- **Port forwarding**: Specific ports can be forwarded from the WAN to internal hosts. +- **Default drop policy**: Any traffic not explicitly allowed is dropped. + +To inspect active rules on the router: + +```bash +sudo nft list ruleset +``` + +To reload rules after a change (applied by Ansible via the `nftables` handler): + +```bash +sudo systemctl reload nftables +``` + +## VPN (WireGuard) + +WireGuard is used to establish secure tunnels between the lab's infrastructure and remote peers. The configuration is stored in `public-server/router/files/wg0.conf`. + +> **Note**: `wg0.conf` contains private keys. This file must **never** be committed to the repository with real key material. Use Ansible Vault to encrypt sensitive values, or manage the file out-of-band. + +**WireGuard management commands**: + +```bash +sudo wg show # Show current VPN status and peers +sudo wg-quick up wg0 # Bring the tunnel up manually +sudo wg-quick down wg0 # Bring the tunnel down +sudo systemctl status wg-quick@wg0 +``` + +## DNS / DHCP (DNSmasq) + +DNSmasq provides DHCP leases and local DNS resolution for the `10.8.24.0/24` LAN. Configuration: `public-server/router/files/lan.conf`. + +Key settings: + +- **DHCP range**: Defined in `lan.conf` (check the file for the exact range). +- **DNS forwarding**: Upstream DNS servers are configured for external name resolution. +- **Static leases**: Cluster nodes (hp-alpha, hp-beta, gamma) have static DHCP leases tied to their MAC addresses. + +To check DNSmasq status on the router: + +```bash +sudo systemctl status dnsmasq +sudo journalctl -u dnsmasq -f +``` + +## Troubleshooting + +**Cannot connect via SSH after running security role**: +- Verify your public key was in `authorized_keys` before the playbook ran. +- Try connecting from the console (KVM/IPMI) to recover access. + +**Nginx returns 502 Bad Gateway**: +```bash +sudo nginx -t # Check config syntax +sudo systemctl status nginx +sudo journalctl -u nginx -f +``` + +**WireGuard tunnel not coming up**: +```bash +sudo journalctl -u wg-quick@wg0 -f +sudo wg show # Check handshake timestamps +``` + +**nftables rules not applied after reboot**: +```bash +sudo systemctl enable nftables +sudo systemctl start nftables +sudo nft list ruleset +``` + +**Netplan configuration errors**: +```bash +sudo netplan try # Apply with 120-second rollback safety +sudo netplan apply # Apply permanently +```