You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* feat: add CUDA/gsplat environment check script
Add scripts/gsplat_check — a lightweight Python tool (managed by uv) that
verifies whether the current device can run the gsplat 3DGS training backend.
Checks performed:
- CUDA GPU detection via nvidia-smi + PyTorch tensor smoke-test
- gsplat library import and rasterization kernel validation (8 Gaussians)
- External tool availability (nvidia-smi, python3, ffmpeg, colmap)
Reports a structured pass/fail verdict similar to the Rust preflight binary.
Usage: cd scripts/gsplat_check && uv run main.py
Also adds a reference to the new tool in the root README documentation section.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* feat: add Azure Container App Job infrastructure for GPU 3DGS processing
Add complete azd-based infrastructure for running the 3DGS video processor
as a GPU Container App Job on Azure:
Infrastructure (infra/):
- Bicep modules for ACR, Storage, Container Apps Environment (GPU T4 profile),
Container Apps Job, Managed Identity, Log Analytics, and RBAC
- Standalone RBAC deployment (infra/rbac/) for privileged user separation
- Parameter bindings for azd environment variables
Scripts (infra/scripts/):
- assign-rbac.sh / verify-rbac.sh / cleanup-rbac.sh for RBAC management
- hooks/acr-build.sh: builds GPU image via ACR Tasks with minimal staging dir
- hooks/preprovision.sh: captures deployer identity, RBAC preflight
- hooks/postprovision.sh: builds image and updates job post-provision
- run-job.sh: start job with --wait/--logs support
- deploy-job.sh: rebuild and redeploy image
- upload-testdata.sh: upload South Building test videos to blob storage
Bug fixes required for GPU batch mode:
- src/azure/sdk.rs: pass AZURE_CLIENT_ID to ManagedIdentityCredential for
user-assigned managed identity support in Container Apps
- src/backends/gsplat.rs: fix COLMAP sparse dir and images dir resolution
for batch mode (TEMP_PATH-based layout vs workspace-relative)
- src/backends/gsplat.rs: add inline PLY-to-SPLAT converter fallback when
external converter tools are unavailable
- scripts/gsplat_train.py: fix cameras.bin parser to read correct number of
parameters per camera model instead of reading to EOF
- .dockerignore: exclude .venv dirs, output/, infra/ from build context;
keep Dockerfile for ACR Tasks
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: disable ANSI color codes in container log output
Use std::io::IsTerminal to detect non-TTY environments (containers,
log aggregators) and disable ANSI escape sequences. This ensures
clean text output in Azure Container Apps Log Analytics and other
log collection systems.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* docs: add Azure Container Apps Job GPU deployment guide
Add comprehensive section to DEPLOYMENT.md covering:
- Quick start steps for azd-based GPU deployment
- What resources get provisioned (table of Bicep modules)
- Detailed RBAC requirements with specific permissions, role IDs,
and clear guidance on what fails without each role
- Deployer vs Managed Identity permission separation
- Test data upload and job execution instructions
- Configuration variables and GPU region availability
- Scripts reference with privilege requirements
- Troubleshooting guide for common failures
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* docs: add changelog for Azure Container Apps GPU infrastructure
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: enable BuildKit inline cache for ACR builds
Add BUILDKIT_INLINE_CACHE=1 build arg to embed cache metadata in
pushed images. While ACR Tasks don't persist Docker layer cache
between runs (each gets a fresh VM), the inline cache metadata
enables faster rebuilds when using BuildKit-aware builders.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix: address PR #14 review feedback (7 items)
1. gsplat_train.py: Fix COLMAP camera model parameter counts
(OPENCV_FISHEYE=8, FULL_OPENCV=12, FOV=5, THIN_PRISM_FISHEYE=12).
Fail fast on unknown model_id instead of defaulting to 4.
2. gsplat.rs: Fix PLY-to-SPLAT converter fallback order — try
configured converter binary before inline Python fallback.
3. gsplat.rs: Remove unused numpy import from inline Python script;
keep it stdlib-only (struct + sys) for maximum portability.
4. run-job.sh: Apply BATCH_INPUT_PREFIX to job env vars via
az containerapp job update before starting execution.
5. storage.bicep: Default allowSharedKeyAccess to false; only enable
when useStorageKeys=true (reduces blast radius of leaked keys).
6. main.bicep + storage.bicep: Wire storageConnectionString from
storage module to job module when useStorageKeys=true, preventing
empty-secret deployment failures.
7. sdk.rs: Filter empty AZURE_CLIENT_ID strings before constructing
ManagedIdentityCredentialOptions to prevent confusing auth errors.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat: add local Docker build+push for fast incremental rebuilds
Add local-build.sh as alternative to acr-build.sh for development:
- Uses Docker BuildKit with persistent layer cache
- Incremental rebuilds (src/ change only) take ~2.5 min vs ~35 min on ACR
- Push uses delta layers — only changed layers uploaded (~12s vs full push)
- deploy-job.sh supports --local flag to select build method
Build comparison (src/ change only):
ACR Tasks: ~35 min (no cache between runs, fresh VM each time)
Local+push: ~2.5 min build + ~12s push = ~3 min total
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- A custom role with `Microsoft.Authorization/roleAssignments/write` permission
428
+
429
+
> **If the deployer does not have these permissions**, the `azd provision` step will
430
+
> succeed but the job will fail at runtime with authentication errors (HTTP 403 on
431
+
> Storage or image pull failures on ACR). Have a privileged user run the RBAC scripts.
432
+
433
+
#### Assigning RBAC Roles
434
+
435
+
```bash
436
+
# Assign roles via Azure CLI (reads values from azd env automatically)
437
+
./infra/scripts/assign-rbac.sh
438
+
439
+
# Or assign via Bicep deployment (alternative)
440
+
./infra/scripts/assign-rbac.sh --use-bicep
441
+
```
442
+
443
+
#### Verifying RBAC Roles
444
+
445
+
```bash
446
+
# Check that both roles are assigned
447
+
./infra/scripts/verify-rbac.sh
448
+
```
449
+
450
+
Expected output when roles are correctly assigned:
451
+
```
452
+
🔍 Verifying RBAC role assignments for Managed Identity...
453
+
Principal ID: <managed-identity-principal-id>
454
+
455
+
✅ AcrPull on Container Registry
456
+
✅ Storage Blob Data Contributor on Storage Account
457
+
458
+
✅ All RBAC role assignments are in place.
459
+
```
460
+
461
+
If any roles are missing:
462
+
```
463
+
❌ AcrPull on Container Registry — MISSING
464
+
465
+
⚠️ 1 RBAC role assignment(s) missing.
466
+
Run './infra/scripts/assign-rbac.sh' as a privileged user to fix.
467
+
```
468
+
469
+
#### What Fails Without RBAC
470
+
471
+
| Missing Role | Symptom |
472
+
|--------------|---------|
473
+
| **AcrPull** | Job execution fails immediately — Container Apps cannot pull the image. The execution shows `Failed` status with no container logs (image never starts). |
474
+
| **Storage Blob Data Contributor** | Container starts but fails with `Failed to download input blobs from Azure Blob Storage` — the managed identity token is rejected with HTTP 403. |
475
+
476
+
### Uploading Test Data
477
+
478
+
The South Building dataset (128 multi-view images from UNC Chapel Hill) is used for testing.
479
+
The upload script downloads it if needed, creates 3 test videos, and uploads them:
480
+
481
+
```bash
482
+
# Download test data + upload to blob storage (default prefix: south_building/)
0 commit comments