Skip to content

Feat: per-request proxy#186

Closed
AltayAkkus wants to merge 3 commits intointernetarchive:masterfrom
AltayAkkus:feat-granular-proxies
Closed

Feat: per-request proxy#186
AltayAkkus wants to merge 3 commits intointernetarchive:masterfrom
AltayAkkus:feat-granular-proxies

Conversation

@AltayAkkus
Copy link
Copy Markdown

#362 and #136 in Zeno request to have more granular control of the proxies which are used by gowarc.

Previous implementation

Previously, the proxy was defined statically per-client via NewWARCWritingHTTPClient.
More specificially, in the newCustomDialer()

gowarc/dialer.go

Lines 243 to 261 in ee15d6a

if proxyURL != "" {
u, err := url.Parse(proxyURL)
if err != nil {
return nil, err
}
var proxyDialer proxy.Dialer
if proxyDialer, err = proxy.FromURL(u, d); err != nil {
return nil, err
}
d.proxyDialer = proxyDialer.(proxy.ContextDialer)
// Determine if this proxy requires hostname (remote DNS) or can use IP (local DNS)
// Proxies with remote DNS: socks5h, socks4a, http, https
// Proxies with local DNS: socks5, socks4
d.proxyNeedsHostname = u.Scheme == "socks5h" || u.Scheme == "socks4a" ||
u.Scheme == "http" || u.Scheme == "https"
}

The d *customDialer returned by that method is then re-used for each request made using the Client. It is fully static and set once when creating the Client.

When you client.Do(req) it enters here

gowarc/dialer.go

Lines 397 to 408 in ee15d6a

func (d *customDialer) CustomDialTLSContext(ctx context.Context, network, address string) (net.Conn, error) {
var plainConn net.Conn
var err error
if d.proxyDialer != nil && d.proxyNeedsHostname {
// Remote DNS proxy (socks5h, socks4a, http, https)
// Skip DNS archiving to avoid privacy leak and ensure accuracy.
plainConn, err = d.proxyDialer.DialContext(ctx, network, address)
if err != nil {
return nil, err
}

gowarc/dialer.go

Lines 350 to 360 in ee15d6a

func (d *customDialer) CustomDialContext(ctx context.Context, network, address string) (conn net.Conn, err error) {
if d.proxyDialer != nil && d.proxyNeedsHostname {
// Remote DNS proxy (socks5h, socks4a, http, https)
// Skip DNS archiving to avoid privacy leak and ensure accuracy.
conn, err = d.proxyDialer.DialContext(ctx, network, address)
if err != nil {
return nil, err
}
return d.wrapConnection(ctx, conn, "http"), nil
}

called by

New implementation

gowarc already has a way to configure something for a single request only, via ContextKey

gowarc/dialer.go

Lines 31 to 42 in ee15d6a

const (
// ContextKeyFeedback is the context key for the feedback channel.
// When provided, the channel will receive a signal once the WARC record
// has been written to disk, making WARC writing synchronous.
// Use WithFeedbackChannel() helper function for convenience.
ContextKeyFeedback contextKey = "feedback"
// ContextKeyWrappedConn is the context key for the wrapped connection channel.
// This is used internally to retrieve the wrapped connection for advanced use cases.
// Use WithWrappedConnection() helper function for convenience.
ContextKeyWrappedConn contextKey = "wrappedConn"
)

This is where I implemented the ContextKeyProxy, gowarc can retrieve the configured proxy per request rather than reading the static d.proxyDialer. For DNS reasons we have to deal with the proxies at the CustomDialContext level too, see README

gowarc/README.md

Lines 119 to 143 in ee15d6a

### DNS Resolution and Proxy Behavior
The library handles DNS resolution differently depending on the connection type:
#### Direct Connections (No Proxy)
- DNS is resolved locally using configured DNS servers
- DNS queries and responses are archived in WARC files as `resource` records
- Resolved IP addresses are cached with configurable TTL
#### Local DNS Proxies (`socks5://`, `socks4://`)
- DNS is resolved locally by gowarc
- DNS records are archived to WARC files
- Resolved IP addresses are sent to the proxy
- Only one DNS query is made (no duplicate resolution)
#### Remote DNS Proxies (`socks5h://`, `socks4a://`, `http://`, `https://`)
- **DNS archiving is skipped** to prevent privacy leaks
- Hostnames are sent directly to the proxy
- The proxy handles DNS resolution on its end
- Trade-offs:
-**Privacy**: No local DNS queries that could expose browsing activity
-**Accuracy**: WARC reflects the actual connection (no potential DNS mismatch)
- ⚠️ **No DNS WARC records**: DNS information is not archived for these connections
**Important for Privacy**: When using `socks5h://` or other remote DNS proxies, your local DNS servers will not see any queries for the target domains, maintaining better privacy and anonymity.

The proxyDialers are cached by their connection string in a map, so they can be re-used. They are handed over from the DialContext to dialParallel and/or dialSingle.
That's why the diff of dialer.go looks more complicated than I wished for, sorry.

Tests

The client_test.go already hosts a SOCKS5 proxy for testing, I consolidated the bringup into startSOCKS5Server
The method returns the connection string, a counter which increments on each request through the proxy, and a cleanup/stop func (so we can omit the goroutine channel signalling logic)

The proxy counts how many requests are passed through himself with a things-go/go-socks5 RuleSet

I validated the per-request proxy feature using that counter, see client_test.go

// WithProxy adds a per-request proxy URL to the request context.
// When provided, this proxy will be used instead of the client-level default.
// Pass an empty string to force a direct connection, bypassing any default proxy.
func WithProxy(ctx context.Context, proxyURL string) context.Context {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helper

if proxyURL == "" {
return nil, nil
}
if cached, ok := d.proxyCache.Load(proxyURL); ok {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try cache first before resolving

}

func (d *customDialer) CustomDialContext(ctx context.Context, network, address string) (conn net.Conn, err error) {
if d.proxyDialer != nil && d.proxyNeedsHostname {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use context resolved proxy instead of static d.proxyDialer

@@ -396,12 +457,16 @@ func (d *customDialer) CustomDial(network, address string) (net.Conn, error) {

func (d *customDialer) CustomDialTLSContext(ctx context.Context, network, address string) (net.Conn, error) {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CustomDialTLSContext and CustomDialContext are the two sites where implementation bugs are likely, special care needed here in the review.

The dialParallel just pass through the rp, and the changes in dialSingle are minimal

@AltayAkkus
Copy link
Copy Markdown
Author

Should have looked through the PRs before writing this, possible duplicate of #160 😵‍💫

@NGTmeaty
Copy link
Copy Markdown
Collaborator

NGTmeaty commented Mar 25, 2026

Hi there @AltayAkkus - it is definitely similar to #160 and we had some unresolved questions there that we never surfaced on GitHub. Primarily there were some concerns around how ProxyTypes were being treated. Both are pretty large changes to how gowarc operates and need to be done carefully to ensure we don't break anything and are "in scope" to gowarc. One of our design philosophies was to reduce anything Zeno specific we were adding to gowarc to ensure it could be used in different projects. Proxy support was historically in Zeno itself, but I can definitely see why it would be added to gowarc.

We'll also need to think a little on this one as well.
(cc @willmhowes )

@AltayAkkus
Copy link
Copy Markdown
Author

I think that #160 is better than this branch (even though I dont understand why ProxyTypes shall become a concept inside gowarc, you could just group the Proxies by their unique ID URL yourself, you could differentiate between different proxy providers, costs etc. without it becoming a part of gowarc).
Closing this, cya!

@AltayAkkus AltayAkkus closed this Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants