Skip to content

nas: 4 new CLI commands + --group on model list (G4)#474

Merged
digaobarbosa merged 5 commits into
mainfrom
peter/nas-cli
May 7, 2026
Merged

nas: 4 new CLI commands + --group on model list (G4)#474
digaobarbosa merged 5 commits into
mainfrom
peter/nas-cli

Conversation

@probicheaux
Copy link
Copy Markdown
Contributor

@probicheaux probicheaux commented May 6, 2026

Description

Wraps the new public API routes from the agentic-surface-area onsite plan as CLI commands. Companion backend PRs (all live on staging):

New CLI commands

roboflow train cancel <project>/<version> [--continue-if-no-refund]
roboflow train stop   <project>/<version>
roboflow train results <project>/<version>
roboflow model star   <model-id> [--unstar]

Plus roboflow model list -p <project> -g/--group <modelGroup> — the canonical "list NAS models per run" path. When --group is set, the list command hits the public /models endpoint with the enriched projection (hardware/latency/map5095/paretoOptimalFor/recommended) instead of walking versions via the SDK.

Adapter additions

In roboflow/adapters/rfapi.py:

  • cancel_version_training, stop_version_training
  • get_training_results
  • list_project_models (with optional group=), get_model_by_url
  • favorite_nas_model

How tested

Unit tests

+13 cases across test_train_handler.py and test_model_handler.py covering register, happy paths, the 409/CANNOT_CANCEL hint surfacing, the MODEL_NOT_NAS hint, the unstar flow, and the --group endpoint switch. 298/298 CLI tests pass locally; ruff check + ruff format clean.

CLI-COMMANDS.md got two new sections — Train, monitor, cancel, stop and NAS models — list, star, deploy.

E2E verified live on staging

Driven against api.roboflow.one on peter-robicheaux/beer-can-hackathon with API_URL=https://api.roboflow.one:

train results on a NAS parent (v410, 52 children):

$ roboflow --json train results peter-robicheaux/beer-can-hackathon/410 | jq '{jobType, modelGroup, modelCount, recommendedByHardware}'
{
  "jobType": "nas",
  "modelGroup": "pVYKOWUB6AUIVJMgPc7u-410-rfdetrNasGroup",
  "modelCount": 52,
  "recommendedByHardware": {"gpu": "C6KmrxA6h85kMmamGnsS"}
}

model list --group rendered a 53-row leaderboard:

$ roboflow model list -p peter-robicheaux/beer-can-hackathon -g pVYKOWUB6AUIVJMgPc7u-410-rfdetrNasGroup
URL                                                 TYPE        HARDWARE  LATENCY  MAP50  MAP5095  REC
peter-robicheaux/beer-can-hackathon-410-nas-gpu-…  rfdetr-nas                      100
peter-robicheaux/beer-can-hackathon-410-nas-gpu-b  rfdetr-nas  gpu       4.36     100    83.97
peter-robicheaux/beer-can-hackathon-410-nas-gpu-…  rfdetr-nas  gpu       1.09     100    62.57    ★
... 50 more rows

model star on a real NAS child:

$ roboflow --json -w peter-robicheaux model star 14CwSGmGetWh6rB0EnjL
{"success": true, "model": {"id": "14CwSGmGetWh6rB0EnjL", "favorites": {...}}}
$ roboflow --json -w peter-robicheaux model star 14CwSGmGetWh6rB0EnjL --unstar
{"success": true, ...}

train cancel on a finished version (text mode shows the actionable hint):

$ roboflow train cancel peter-robicheaux/beer-can-hackathon/318
Error: Cannot cancel non-running train job.
  Hint: Cancel only applies to in-flight runs. Check status with
        'roboflow train results <project>/<version>'.

E2E exposed one error-shape mismatch on the backend (flat {error: "Conflict", message: ...} lost the descriptive message through the CLI's parser); fixed in roboflow#11610 (nested {error: {message, code, type}}).

🤖 Generated with Claude Code

Wraps the new public API routes from the agentic-surface-area onsite plan:

  roboflow train cancel <project>/<version> [--continue-if-no-refund]
  roboflow train stop   <project>/<version>
  roboflow train results <project>/<version>
  roboflow model star   <model-id> [--unstar]

Plus extends `roboflow model list -p <project>` with `-g/--group <modelGroup>`,
the canonical "list NAS models per run" path. When --group is set, the list
command hits the public /models endpoint (full enriched projection: hardware,
latency, map5095, paretoOptimalFor, recommended ★) instead of walking versions
via the SDK.

Adapter additions in roboflow/adapters/rfapi.py:
  cancel_version_training, stop_version_training, get_training_results,
  list_project_models (with optional group), get_model_by_url,
  favorite_nas_model

Backend companions:
  - roboflow#11603 (G1, validator)
  - roboflow#11605 (G6, projection + ?group=)
  - roboflow#11610 (G2, public train cancel/stop + favorite)
  - roboflow#11612 (G3, training results)

Tests: +13 cases across test_train_handler.py and test_model_handler.py
covering register, success paths, 409 + MODEL_NOT_NAS hint surfacing,
unstar flow, and the --group endpoint switch. All 298 CLI tests pass
locally; ruff check + ruff format clean.

CLI-COMMANDS.md updated with two new sections (train lifecycle + NAS
list/star/deploy).

E2E: driven against staging (api.roboflow.one) on
peter-robicheaux/beer-can-hackathon:
  - `train results .../410` returned full NAS bundle (52 models,
    recommendedByHardware, modelGroup)
  - `model list -p ... -g <modelGroup>` rendered 53-row leaderboard
    table with HARDWARE / LATENCY / MAP50 / MAP5095 / REC columns
  - `model star 14CwSGmGetWh6rB0EnjL` → success, favorites reflected
  - `model star --unstar` flips state
  - `train cancel .../318` (finished version) → 409 surfaces hint
    "Cancel only applies to in-flight runs."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@digaobarbosa digaobarbosa self-requested a review May 7, 2026 14:24
Comment thread roboflow/cli/handlers/model.py Outdated
str,
typer.Argument(
help=(
"NAS-trained model id (Firestore document id). "
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't thin we should talk about firestore here. It shows to the user.

Comment thread roboflow/cli/handlers/model.py
tonylampada
tonylampada previously approved these changes May 7, 2026
output_error(args, msg, hint=hint, exit_code=3)
return

output(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Medium] train cancel may report success even when the server did not cancel

  • File: roboflow/cli/handlers/train.py:338-342 (refund semantics documented at L92-95).
  • _cancel always emits status: "cancelled" and Training cancelled ... after any 2xx, but the documented default behavior is for the server to reply refund:false without cancelling unless --continue-if-no-refund is passed.
  • Why it matters: agents/scripts will treat the CLI result as confirmation that an in-flight paid run was cancelled when the server may have only returned a refund-window check. Behavioral correctness issue on a destructive command.
  • Fix: derive status from the payload — if cancelled:false or refund:false, output a distinct status: "not_cancelled" and surface a hint to rerun with --continue-if-no-refund. Add a unit test for that exact response.

return response.json()


def get_model_by_url(api_key: str, workspace_url: str, model_url: str):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude-only — [Medium] get_model_by_url is dead code

  • File: roboflow/adapters/rfapi.py:154-161.
  • No CLI command calls it. Either remove or comment as scaffolding for a follow-on PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can delete this

The public favorite endpoint now accepts the model URL slug
(roboflow#11646), so the CLI can drop the Firestore-doc-id wart.

Changes:
- star_model argument is now `model_url`, accepting either the bare
  slug (when -w is set / a default workspace exists) or the
  workspace-prefixed form `<ws>/<slug>` — same shape as `model get`.
- rfapi.favorite_nas_model parameter renamed `model_id` → `model_url`
  with urllib.parse.quote() for safety, since the slug is now what
  appears in the path.
- Hints updated to point at models[].modelUrl instead of modelId, and
  the workspace fallback hint mentions the prefix form.

Tests: +2 cases for the new parsing (workspace-prefixed URL vs bare
slug + -w fallback). 22/22 model handler tests pass; 36/36 across
model + train.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the backend cleanup in roboflow#11646. The training-results
fixture now uses the public shape (trainingId is workspace/project/
version, models[].modelUrl, recommendedByHardware values are URL
slugs).

No behavior change in the CLI handler — it passes the response through.
@probicheaux
Copy link
Copy Markdown
Contributor Author

✅ E2E verified against api.roboflow.one (backend roboflow#11646 deployed to staging via light-v2-api).

Branch installed via pip install -e . into a clean venv; commands run with API_URL=https://api.roboflow.one.

roboflow --json train results peter-robicheaux/beer-can-hackathon/410 (52-model NAS run):

{
  "trainingId": "peter-robicheaux/beer-can-hackathon/410",
  "versionId": "410",
  "status": "finished",
  "jobType": "nas",
  "modelGroup": "pVYKOWUB6AUIVJMgPc7u-410-rfdetrNasGroup",
  "modelCount": 52,
  "recommendedByHardware": { "gpu": "beer-can-hackathon-410-nas-gpu-ec8a0e" },
  "models": [{ "modelUrl": "beer-can-hackathon-410-nas-gpu-066866", "modelType": "rfdetr-nas", "metrics": {...} }]
}

trainingId is the slug form, no datasetId, recommendedByHardware value is a URL slug, every models[] row has modelUrl only.

roboflow --json -w peter-robicheaux model star beer-can-hackathon-410-nas-gpu-ec8a0e

  • Star: 200, response body is the public projection only — no id, dataset, owner, or projects[].
  • Unstar (--unstar): 200, favorites[<uid>] = false. Round-trip clean.

roboflow --json -w peter-robicheaux model star bogus-not-a-real-slug

  • Output: {"error": {"type": "NotFound", "message": "Model not found in this workspace", "code": "MODEL_NOT_IN_WORKSPACE", "hint": "Verify the model URL and workspace. The slug is the same value 'roboflow train results' returns as models[].modelUrl."}}
  • Exit code: 3 (not-found).

probicheaux and others added 2 commits May 7, 2026 12:20
Mirrors the wire rename in roboflow#11646. The public API field for
the opaque model identifier is now `modelId` (the value is still the
URL slug; that's an implementation detail callers shouldn't have to
reason about).

Changes:
- `roboflow model star` argument: `model_url` → `model_id`. Help text
  and error hints updated to point at `models[].modelId`.
- `rfapi.favorite_nas_model(model_url=...)` → `favorite_nas_model(
  model_id=...)`. Internal local var becomes `public_model_id` to
  keep the call-site readable.
- Test fixtures: `model_url` arg → `model_id`, `models[].modelUrl` →
  `models[].modelId`. 36/36 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@digaobarbosa digaobarbosa merged commit 2d1acde into main May 7, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants