Skip to content

Convert heavy queries from 5xx to 4xx#7374

Open
eeldaly wants to merge 14 commits intocortexproject:masterfrom
eeldaly:query-4xx
Open

Convert heavy queries from 5xx to 4xx#7374
eeldaly wants to merge 14 commits intocortexproject:masterfrom
eeldaly:query-4xx

Conversation

@eeldaly
Copy link
Copy Markdown
Contributor

@eeldaly eeldaly commented Mar 24, 2026

What this PR does:
This PR introduces a timeout in querier (default 59s) to timeout before we hit the ambassador timeout. Once this is hit, we convert queries that took longer than X (default 40s) PromQL evaluation time from 5XX to 4XX. This conversion and 1s earlier timeout is disabled by default.

Default new configs:
querier.timeout-classification-enabled: false
querier.timeout-classification-deadline: 59s
querier.timeout-classification-eval-threshold: 40s

Response Outputs:

Current output on timeout:

'Response code: 504\n'
{'Date': 'Thu, 26 Mar 2026 20:59:06 GMT', 'Content-Type': 'text/plain', 'Content-Length': '24', 'Connection': 'keep-alive', 'x-amzn-RequestId': 'ee98f602-651c-4783-946e-f547a8d88e32', 'server': 'amazon'}
''
upstream request timeout

New output on timeout (non-breaching):

'Response code: 504\n'
{'Date': 'Thu, 26 Mar 2026 22:21:05 GMT', 'Content-Type': 'application/json', 'Content-Length': '75', 'Connection': 'keep-alive', 'x-amzn-RequestId': 'ec88ba9b-7d21-45f4-8996-418b2e7b812b', 'server': 'amazon', 'vary': 'Origin'}
''
{"status":"error","errorType":"timeout","error":"upstream request timeout"}

New output on timeout (breaching):

'Response code: 422\n'
{'Date': 'Thu, 26 Mar 2026 21:05:46 GMT', 'Content-Type': 'application/json', 'Content-Length': '138', 'Connection': 'keep-alive', 'x-amzn-RequestId': 'bf5b69b7-8d11-4e9e-95be-09fbfa017bbd', 'server': 'amazon', 'vary': 'Origin'}
''
{"status":"error","errorType":"execution","error":"query timed out: query spent too long in evaluation - consider simplifying your query"}

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
eeldaly added 2 commits March 26, 2026 15:24
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
eeldaly added 8 commits March 30, 2026 15:59
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
@eeldaly eeldaly marked this pull request as ready for review March 31, 2026 16:59
@dosubot dosubot bot added component/querier type/feature type/production Issues related to the production use of Cortex, inc. configuration, alerting and operating. labels Mar 31, 2026
eeldaly added 3 commits March 31, 2026 11:11
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/querier size/XL type/feature type/production Issues related to the production use of Cortex, inc. configuration, alerting and operating.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant