smp-server: support namespaces#1784
Open
shumvgolove wants to merge 31 commits into
Open
Conversation
13bb6f5 to
a6b960b
Compare
a6b960b to
236e69d
Compare
Field additions in NameRecord will be gated by SMP version bumps, matching the IDS QIK precedent at Protocol.hs:1912-1979. The separate nrVersion field created a redundant version axis. Add unNameOwner and unNameLink so downstream consumers can read back the underlying bytes/text without the bare constructors.
Add Resolver party, RSLV command, NAME response, NameRecord (with NameOwner/NameLink newtypes + smart ctors), LookupKey, and version constant namesSMPVersion = 20 in Transport.hs. Bump current/proxied SMP versions 19/18 -> 20. Includes minimal stub clauses in vc/processCommand so the build passes; proper forwarded gate and handler land in steps 2-3.
Add Bool forwarded arg to verifyTransmission/verifyLoadedQueue/ verifyQueueTransmission so vc SResolver (RSLV _) can distinguish direct (reject CMD PROHIBITED) from forwarded (verify, defer auth to handler). Direct path passes False; forwarded path (rejectOrVerify) passes True.
Add Cmd SResolver (RSLV _) to the forwarded-command whitelist in rejectOrVerify so RSLV can be processed inside PFWD. Relocate incStat from Server.hs to Server/Stats.hs and export it so the upcoming Resolver module can use it. processCommand still returns ERR AUTH for RSLV — real handler lands once Env, Stats, and Resolver subtree are in place (steps 4-7).
Add Server/Names.hs skeleton (NamesConfig, RpcAuth, NamesEnv, ResolveError, newNamesEnv, closeNamesEnv, stub resolveName). Real implementation lands in step 5. Add namesConfig field to ServerConfig and namesEnv field to Env. newEnv constructs NamesEnv when [NAMES] enable: on, and refuses startup when combined with allowSMPProxy unless allow_dangerous_colocation is set. closeServer now closes the NamesEnv on shutdown. Add [NAMES] section parser (readNamesConfig) with parseEthAddr (0x-prefixed 40 hex, EIP-55 check deferred to step 5) and parseRpcAuth (bearer / basic). Default INI template carries the [NAMES] block (enable: off) with documented keys.
Add Server/Names/Eth/SNRC.hs: Keccak-256 namehash (NOT SHA3-256), SNRC selector, encodeGetRecord, and bounded ABI primitives (decodeWord256Int64, decodeAddress, decodeString, decodeStringArray) enforcing the 8 safety invariants. decodeGetRecord is a placeholder that maps non-zero owners to Nothing pending the Part 1 contract ABI; the primitives are production-ready. Add Server/Names/Eth/RPC.hs: EthRpcEnv with http-client Manager, http-client-tls for HTTPS, QSem-bounded concurrency, brReadSome-bounded response body, Basic/Bearer Authorization, JSON-RPC envelope, and scrubUrl that strips userinfo from log strings. Add Server/Names/Resolver.hs: NamesConfig, ResolveError, NamesEnv with HashPSQ cache (TTL + byte cap + FIFO eviction) and inflight TMap. resolveName coalesces concurrent identical requests under E.mask so the TMap is always cleaned up; fetchOnceTimed bounds RPCs by rpcTimeoutMs. Refactor Server/Names.hs into a thin façade re-exporting from Resolver (types/functions) and Eth.RPC (RpcAuth) — keeps the import surface stable for Env/STM.hs and Main.hs while the implementation modules break the cyclic import. Cabal: add http-client, http-client-tls, network-uri, psqueues deps; expose the three new Server.Names.* modules in the !flag(client_library) block.
Add NameResolverStats and NameResolverStatsData sub-records to ServerStats with seven counters: reqs, succ, notFound, ethErrs, cacheHits, cacheMiss, disabled. Provide newNameResolverStats, get/getReset/setNameResolverStats and a StrEncoding instance keyed by "rslvStats:" with backwards-compatible parsing (zero defaults). Append the seven counters to the daily CSV stats log and emit a simplex_smp_names_* Prometheus metric block.
Pass rslvCacheHits and rslvCacheMiss IORefs from ServerStats.rslvStats into NamesEnv at construction so the resolver's cache hit/miss counters share storage with the periodic stats exporter (no double bookkeeping). Replace the ERR-AUTH stub in processCommand with the real RSLV handler: increment rslvReqs, dispatch to resolveName when namesEnv is Just, map Right -> NAME / NotFound -> ERR AUTH / other Left -> ERR AUTH while bumping the respective counters.
Add tests/SMPNamesTests.hs covering the highest-risk surface:
- NameRecord wire encoding round-trip, negative-expiry rejection,
combined channel+contact list cap, max-payload size budget.
- LookupKey 64-byte parser cap; mkNameOwner / mkNameLink length
invariants; smart-constructor / accessor round-trip.
- Keccak-256 reference vectors (empty, "abc"); assert Keccak ≠
SHA3-256; namehash of "" and "eth" matches ENS vectors; selector
and encodeGetRecord byte layout.
- ABI primitive bounds: each of decodeWord256Int64,
decodeAddress, decodeString, decodeStringArray fails its
relevant safety invariant without crashing.
- decodeGetRecord: zero-owner returns Nothing, truncated buffer
returns AbiTruncated.
- Resolver: stub-backed resolveName routes zero-owner to NotFound,
counts as a miss; concurrent identical lookups don't crash.
Tests run via `cabal test --test-option=--match="Names resolver
tests"` and complete in ~5ms.
Defer to follow-up: ForwardedRslvSpec (PFWD round-trip), MockRpcSpec
(fake HTTP server), AbiSpec fixture-based encoder, StartupGuardSpec,
UrlValidationSpec, EipChecksumSpec — all of which need either a
running server harness or pinned binary fixtures.
Derive Eq for ResolveError so tests can `shouldBe` against
constructors.
Bump SMP protocol version 19 -> 20 (header line, abstract, version
history list). Add "Resolver commands" subsection to the SMP commands
section with:
- RSLV / NAME ABNF and byte layout for NameRecord (display name,
owner, channel + contact links capped at 8 combined, optional
admin contact, expiry, isTest).
- Forwarded-only access via PFWD — direct RSLV rejected with
CMD PROHIBITED.
- Error semantics — every failure collapses to ERR AUTH; per-cause
stats are out of band.
- Note that the backing store is implementation-defined (the
reference impl uses an Ethereum SNRC contract).
Update "Router security requirements" to spell out the names-role
outbound HTTP threat: the lookup key reaches the Ethereum endpoint,
so operators MUST run their own endpoint and MUST NOT co-locate the
names role with the SMP proxy role (RSLV cache miss can serialise
other forwarded commands).
SMP server gains public-namespace resolution (SMP protocol v20). The names role is disabled by default; configure [NAMES] to enable.
The hard refusal in newEnv assumed allowSMPProxy was a real on/off flag, but it's hardcoded to True in Server/Main.hs:606 — there's no [PROXY] enable key. So every smp-server with [NAMES] enable: on would fail to start unless the operator also set allow_dangerous_colocation, which made that flag mandatory rather than an opt-in. Demote to a one-time startup warning. The performance footgun is real (slow RSLV cache miss can head-of-line-block other forwarded commands on the same proxy-relay session) but it's a soft problem the operator should be told about, not a hard refusal blocking trivial deployments. Tighten back to a refusal once [PROXY] gains a real enable toggle or forkForwardedCmd makes the head-of-line concern moot.
The flag served no purpose once the hard refusal became a warning — operators couldn't change behaviour, only silence the message. Remove it from NamesConfig, INI parser, default template, plan, and tests. The warning now fires unconditionally for any [NAMES] enable: on deployment co-located with the proxy role (i.e. every smp-server, since proxy is always on).
decodeString returns raw bytes; the module header invariant 8 promised UTF-8 validation via decodeUtf8' returning AbiBadUtf8, which was never implemented. Add decodeUtf8Text :: ... -> Either AbiError Text that composes decodeString + decodeUtf8' and returns AbiBadUtf8 on invalid input. This will be used by the eventual NameRecord decoder for displayName / link / admin fields; AbiBadUtf8 becomes a real return path instead of a phantom constructor.
The previous leader-side catch used E.catch on SomeException, which silently converted AsyncCancelled / ThreadKilled into a normal Left EthHttpErr result and let the leader thread continue running. Supervisors that try to cancel an outstanding eth_call (server shutdown, client disconnect) had no way to actually abort the lookup. Rewrite the leader path with E.try + SomeAsyncException detection: on async exception, fill the leader's TMVar with EthHttpErr so waiters unblock, remove the inflight entry to drop the cache pollution, then rethrow so the supervisor sees cancellation actually take effect. Synchronous exceptions still collapse to EthHttpErr as before.
The constant 4096 made cacheMaxBytes effectively a 16384-entry count cap under defaults, regardless of actual content size. A maximal NameRecord (8 × 1024-byte links + texts) is ~17 KB real memory, so operators who set cacheMaxBytes = 64 MB could see ~280 MB actual usage. Sum the encoded byte lengths of the variable-length fields (displayName, links, admin contact/email) plus a 256-byte fixed overhead for the wrapper (PSQ node + ByteString headers + key copy). Tight enough that cacheMaxBytes becomes a meaningful upper bound; still cheap to compute on insert.
Two pieces of dead code from the public-names branch: - unLookupKey was exported but had zero callers; destructure via the constructor (RSLV (LookupKey key)) where needed. Delete. - scrubUrl was exported but had zero callers; useful enough to wire in rather than delete. Add a logInfo line at newEnv when [NAMES] enable is on, showing the configured endpoint with userinfo stripped — so operators get auditable visibility without leaking basic-auth credentials into logs.
Server/Main.hs:hexDecode and Names/Eth/RPC.hs:decodeHex were byte-for-byte identical, plus Main's version was partial-by-contract (caller had to pre-validate length; B.index s 1 crashes on odd input). Export the validating fromHex from Eth/RPC, rewrite parseEthAddr to use it, drop the local hexDecode and the now-redundant isHex predicate. Also clean up trailing imports unused after earlier refactors: - Env/STM.hs no longer needs closeNamesEnv (used only in Server.hs) - Main.hs no longer needs Control.Applicative.<|> (parseEthAddr was the only consumer) - Resolver.hs fetchOnce never used the as-pattern env binding
The plan promised a URL validator for [NAMES] ethereum_endpoint that rejects userinfo, query, fragment, requires an explicit port, and only permits http/https schemes. Was not implemented — the previous code passed the raw string straight to http-client's parseRequest, which accepts userinfo (silently honoured: credentials leak via Host header and logs). Add validateUrl using network-uri's parseAbsoluteURI. Reject: - non-http(s) schemes - empty host - userinfo (use rpc_auth instead — separate concern, separate parsing, no header-injection surface) - missing explicit port (a config like "http://localhost" silently defaults to :80 when Reth listens on :8545 — that's a likely-typo) - query string - URL fragment Also reject https://non-loopback without rpc_auth — almost always a misconfiguration. Loopback exempt so http://127.0.0.1:8545 without auth remains the recommended same-host topology. Surfaces validation failures at startup as fail-fast error with the specific reason ("[NAMES] ethereum_endpoint: ..."), rather than the silent SSRF/credential-leak hazard before.
Previous test stubbed ethCall as a fast non-blocking function and asserted callCount >= 1 — which a broken impl that fires one ethCall per caller also satisfies. The test passed against any implementation. Block the stub on a TMVar gate so the leader's eth_call cannot return before the 7 waiters race to register at the inflight TMap. Fire 8 concurrent resolveName calls in a background async, sleep 50 ms to let the waiters attach, then release the gate. Assert callCount == 1 — only the leader hit the RPC. Now the test genuinely fails if coalescing breaks.
decodeWord256Int64 rejected values with any non-zero byte in the high 24 bytes, but accepted values 2^63 .. 2^64-1 in the low 8 bytes — those silently decode to *negative* Int64 because the Int64 type is signed. Downstream length math (decodeString len, decodeStringArray cnt) sees the negative value, compares len > cap as False, off + 32 + len as not-overflowing, then B.take clamps negatives to 0 and returns empty bytes instead of failing. A malicious or buggy SNRC contract could null out string and array fields with no AbiOversized / AbiTruncated signal. Add the sign-bit check on byte (off + 24). Both invariants are now necessary and together sufficient for a safe Int64 cast. Add tests covering the sign-bit case and the max representable positive value (Int64.maxBound = 0x7FFF...FFFF).
Infura-style endpoints like https://mainnet.infura.io/v3/<API_KEY> embed the secret in the URL path. They were accepted by the previous validator (path wasn't checked), and the startup logInfo line scrubbed only userinfo — not path — so the API key would land in journald. Reject any path beyond / at config time. Operators with a path-style endpoint must move the credential into rpc_auth (Basic or Bearer) instead. Loopback endpoints (http://127.0.0.1:8545) naturally have empty/"/" paths, so the same-host topology is unaffected. Also flips the URL leak vector promised to be defended-against when scrubUrl was wired into the startup log.
- CHANGELOG bullet 6 referenced allow_dangerous_colocation which was dropped in 7f94f49, and the hard refusal was demoted to a warning in f9269be. Rewrite the bullet to match current behaviour. - The [NAMES] INI template's snrc_address example was the zero address. An operator uncommenting it verbatim would have a syntactically valid config that eth_calls to 0x0, gets zero bytes back, hits the zero-owner sentinel, and silently AUTHs every RSLV. Replace with "0x<paste-your-contract-address>" so an unsubstituted copy fails hex parsing at startup. - Add an INI comment explaining cache_max_entries / cache_max_bytes interaction — whichever fills first triggers eviction.
Previously fetchOnce returned Left NotFound and coalesce skipped cache insertion for any Left, so every RSLV for a non-existent name hit eth_call. An adversary holding a proxy connection could spam unique non-existent names to keep rpcMaxConcurrency saturated indefinitely. Cache NotFound results with a shorter TTL (30s, bounded by cacheSeconds) so newly-registered names become visible quickly while still absorbing unique-name bursts. CacheEntry now carries ceResult :: Maybe NameRecord (Nothing = NotFound) and ceTtlNs per-entry so positive and negative results expire on different schedules. Transient errors (HTTP / decode / timeout) still bypass the cache so a flapping endpoint doesn't durably poison entries. Add a regression test: two sequential NotFound lookups must produce exactly one ethCall (hit + miss counters confirm).
nrExpiry was parsed and bounds-checked at the wire layer but never compared against current time. A pre-upgrade or buggy SNRC contract that returns expired records unmodified would have its data cached and served to clients. In fetchOnce, after a successful ABI decode, compare nrExpiry to the current POSIX time. If nrExpiry > 0 and < now, treat the record as NotFound (now also cache-friendly per the previous fix). nrExpiry == 0 is the sentinel for "never expires" (reserved names); unaffected. This is defense in depth — the contract is expected to filter expired records itself by returning the zero-owner sentinel — but the smp-server also enforces, so a stale contract doesn't poison the user experience.
Caddy and most HTTP/RFC 7235 docs write the auth scheme capitalized (Bearer / Basic). Operators who paste "Bearer <token>" verbatim into rpc_auth previously got a parse error because the matcher only accepted the all-lowercase form. Fold the scheme keyword to lowercase before matching. Token / user / pass preserve case (HTTP wire convention applies the scheme name only).
Two observability fixes for [NAMES] enable: on deployments: 1. Endpoint probe. newEnv now calls pingEndpoint immediately after constructing NamesEnv: a single eth_call to getRecord(namehash "") against the configured snrcAddress. Logs "endpoint probe ok" on success or a warning on transport failure (DNS, TLS, refused, 4xx, 5xx). JSON-RPC errors are tolerated — those indicate the endpoint is reachable but possibly misconfigured, which surfaces later via the rslvEthErrs counter. Deliberately does NOT exitFailure: a network blip or an Ethereum host that comes up minutes after smp-server should not block the server from accepting non-RSLV traffic. The warning is the operator signal. 2. Log exception text before mapping to EthHttpErr. The previous mapSyncEthExn returned EthHttpErr with no diagnostic — TLS errors, DNS failures, IOExceptions all collapsed identically and the only observability was the aggregate counter. Now the original exception message is logged at the synchronous catch point in coalesce.
Five parallel cases mapping EthRpcError to ResolveError were inlined in fetchOnce, conflating the structural mapping with the resolver's main flow. Extract mapEthRpcError so fetchOnce reads as one round-trip: "call, map transport errors, decode, expiry-check." The mapping is a named concept; the resolver loop reads as the domain. No behavior change.
Single-use one-liner that discarded its argument — the exception text was already logged immediately above the call site by the N7 fix. The function was a remnant from before that logging was added. Inline the constant return at the call site; the line is gone.
Four parallel branches that each (a) chose a counter and (b) built a response carried the same shape — "incStat <X> $> response (corrId, NoEntity, <msg>)". Compute the (counter, msg) pair in one dispatch, then increment + respond once. The post-processing on a sum value is now in one place; adding a new ResolveError variant means changing one mapping line, not four. No behavior change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.