[Bug] DNSCache never evicts unresolvable hostnames after a BE is dropped, causing be.WARNING flood and persistent brpc EPOLLOUT timeout

### Search before asking

- [x] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues.


### Version

Doris BE 2.1.x.

### What's Wrong?

After a group of BE nodes was permanently removed from the cluster (DROP BACKEND on the FE, the machines were shut down, and their DNS A/PTR records were deleted), every surviving BE in the same cluster keeps logging two kinds of
  WARNING forever:

  Symptom A — DNSCache refresh thread floods be.WARNING

  W<date> <ts> <tid> network_util.cpp:115] failed to get ip from host: be-old-1.example.com  err: Name or service not known
  W<date> <ts> <tid> status.h:415] meet error status: [INTERNAL_ERROR]failed to get ip from host: be-old-1.example.com, err: Name or service not known
          0#  doris::hostname_to_ipv4(...) at be/src/util/network_util.cpp:125
          1#  doris::hostname_to_ip(...)   at be/src/util/network_util.cpp:104
          2#  doris::DNSCache::_update(...) at be/src/common/status.h:494
          3#  doris::DNSCache::_refresh_cache() at be/src/common/status.h:380

  Once per minute per stale hostname, indefinitely.

  Symptom B — brpc keeps reconnecting to the cached (now unreachable) IP

  W<date> <ts> <tid> socket.cpp:1270] Fail to wait EPOLLOUT of fd=<n>: Connection timed out [110]

  In our case this fires ~4 times per second, ~340K times per hour, accumulating > 3.7M occurrences over 11 days. The IPs the BE keeps trying to reach are the last successfully resolved IPs of the dropped hostnames, served back by
  DNSCache::_resolve_hostname() after every refresh failure. A single BE's be.WARNING grew to 634 MB in 11 days — multiplied by every BE in the cluster.

  Root cause

  be/src/util/dns_cache.cpp (master HEAD, lines 57–121):

  - _refresh_cache() iterates every cached hostname every 60 s and calls _update.
  - _update → _resolve_hostname. On resolution failure, _resolve_hostname returns the stale cached IP so callers can keep using it. That is a reasonable graceful-degradation choice.
  - However, the entry is never removed from the cache map. There is no failure counter, no TTL, no eviction policy.
  - Consequence: as long as the BE process lives, the hostname is re-resolved (and re-fails) once per minute, forever. BrpcClientCache / ClientCache keep handing the stale IP to brpc, which keeps timing out at the kernel level (ETIMEDOUT
  after tcp_syn_retries, ~127 s).

### What You Expected?

After a hostname has consistently failed to resolve for a configurable threshold (e.g. 30 consecutive refresh attempts = 30 minutes), the entry should be evicted from the cache. Subsequent callers will either re-resolve successfully (if
   DNS comes back) or get a clean InternalError rather than silently retrying a long-dead IP.


### How to Reproduce?

  1. Bring up a Doris cluster (≥ 2 BEs).
  2. Pick a hostname victim.example.com that points to a working BE. Issue queries / data ingestion that go through DNSCache::get (e.g. broker load, internal RPC) so the hostname enters the cache.
  3. Decommission and remove the BE: DROP BACKEND "victim.example.com:9050";
  4. Delete victim.example.com from DNS (or /etc/hosts).
  5. Observe be.WARNING on the other BEs. Within 1 minute the first failed to get ip from host line appears. It never goes away.

### Anything Else?

_No response_

### Are you willing to submit PR?

- [x] Yes I am willing to submit a PR!

### Code of Conduct

- [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] DNSCache never evicts unresolvable hostnames after a BE is dropped, causing be.WARNING flood and persistent brpc EPOLLOUT timeout #63358

Search before asking

Version

What's Wrong?

What You Expected?

How to Reproduce?

Anything Else?

Are you willing to submit PR?

Code of Conduct

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug] DNSCache never evicts unresolvable hostnames after a BE is dropped, causing be.WARNING flood and persistent brpc EPOLLOUT timeout #63358

Description

Search before asking

Version

What's Wrong?

What You Expected?

How to Reproduce?

Anything Else?

Are you willing to submit PR?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions