Skip to content

[Azure Logs] Add dedicated aadgraphactivitylogs data stream#18880

Draft
terrancedejesus wants to merge 9 commits intomainfrom
enhancement/azure-ad-graph-activitylogs
Draft

[Azure Logs] Add dedicated aadgraphactivitylogs data stream#18880
terrancedejesus wants to merge 9 commits intomainfrom
enhancement/azure-ad-graph-activitylogs

Conversation

@terrancedejesus
Copy link
Copy Markdown
Contributor

@terrancedejesus terrancedejesus commented May 7, 2026

Adds the azure.aadgraphactivitylogs data stream to ingest the AzureADGraphActivityLogs diagnostic category from Microsoft Entra ID, parallel to azure.graphactivitylogs for Microsoft Graph. Without this, AAD Graph events fall through to azure.platformlogs and the AAD-Graph-specific properties survive only inside event.original.

Proposed commit message

azure: add aadgraphactivitylogs data stream

Add a dedicated data stream for the AzureADGraphActivityLogs
diagnostic category from Microsoft Entra ID. Without this,
legacy Azure AD Graph (graph.windows.net) events fall through
to the platformlogs catch-all and lose schema-aware parsing.

The events router maps routing.category ==
"AzureADGraphActivityLogs" to the new dataset. The ingest
pipeline extracts ECS fields: event.action from HTTP method +
URI collection, event.outcome from response status,
event.category [iam, web], and related.user including the
OAuth app_id for client correlation.

Legacy AAD Graph is still actively used by Microsoft first-party
tooling, older line-of-business apps, and adversary tooling
(ROADtools, AzureHound v1, AADInternals). The dedicated dataset
makes these events available for detection rules and dashboards.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practicesN/A, no dashboards added.

Author's Checklist

  • PII redacted from pipeline test fixtures (tenant IDs, user/session/request UUIDs, sign-in activity IDs, internal user-agents). Synthetic test values throughout.
  • elastic-package check passes; elastic-package test pipeline -d aadgraphactivitylogs passes.
  • End-to-end stack test: events POSTed to logs-azure.events-default are correctly rerouted to logs-azure.aadgraphactivitylogs-default with full ECS field extraction.

How to test this PR locally

cd packages/azure
elastic-package check
elastic-package stack up -d -v
elastic-package install
elastic-package test pipeline -d aadgraphactivitylogs

Optional end-to-end:

# Pipe the captured fixtures through the live events router
python3 -c "
import json
with open('data_stream/aadgraphactivitylogs/_dev/test/pipeline/test-aadgraph-activity.log') as f:
    for line in f:
        line = line.strip()
        if line:
            print(json.dumps({'create': {'_index': 'logs-azure.events-default'}}))
            print(json.dumps({'message': line}))
" > /tmp/aadgraph-bulk.ndjson

curl -sk -u user:pass -X POST \
  "https://localhost:9200/_bulk?refresh=wait_for" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary @/tmp/aadgraph-bulk.ndjson

curl -sk -u user:pass  \
  "https://localhost:9200/logs-azure.aadgraphactivitylogs-default/_count"
# expected: {"count":4,...}
  1. Open Discover at https://localhost:5601 (user / pass)
  2. data view logs-azure.aadgraphactivitylogs-*
  3. confirm event.action, event.outcome, http.*, url.path, azure.aadgraphactivitylogs.properties.*, and related.user all populate.

Related issues

Screenshots

Pipeline tests passing locally

Screenshot 2026-05-07 at 2 05 54 PM

Discover view of the new dataset with ECS-parsed events
Expanded document showing the full ECS field tree

Screenshot 2026-05-07 at 2 07 25 PM Screenshot 2026-05-07 at 2 09 42 PM

terrancedejesus and others added 2 commits May 7, 2026 14:23
Adds the azure.aadgraphactivitylogs data stream to ingest the
AzureADGraphActivityLogs diagnostic category from Microsoft Entra ID,
parallel to azure.graphactivitylogs for Microsoft Graph. Without this,
AAD Graph events fall through to azure.platformlogs and the
AAD-Graph-specific properties survive only inside event.original.
Comment thread packages/azure/changelog.yml Outdated
Comment thread packages/azure/manifest.yml Outdated
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@efd6 efd6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the proposed commit message so that it's something that can be used in context of git; no Markdown, appropriately wrapped etc.

For example based on the current code here (update as needed):

azure: add aadgraphactivitylogs data stream

Add a dedicated data stream for the AzureADGraphActivityLogs
diagnostic category from Microsoft Entra ID. Without this,
legacy Azure AD Graph (graph.windows.net) events fall through
to the platformlogs catch-all and lose schema-aware parsing.

The events router maps routing.category ==
"AzureADGraphActivityLogs" to the new dataset. The ingest
pipeline extracts ECS fields: event.action from HTTP method +
URI collection, event.outcome from response status,
event.category [iam, web], and related.user including the
OAuth app_id for client correlation.

Legacy AAD Graph is still actively used by Microsoft first-party
tooling, older line-of-business apps, and adversary tooling
(ROADtools, AzureHound v1, AADInternals). The dedicated dataset
makes these events available for detection rules and dashboards.

@terrancedejesus
Copy link
Copy Markdown
Contributor Author

terrancedejesus commented May 8, 2026

@efd6 proposed commit message updated. Thank you!
Any specific labeling or additional checks?

@efd6
Copy link
Copy Markdown
Contributor

efd6 commented May 8, 2026

The build is complaining:

Error: error validating packages in directory 'packages': error checking data streams from 'packages/azure': package "packages/azure" shares ownership across data streams but these ones [packages/azure/data_stream/aadgraphactivitylogs] lack owners

I think you will need to add a line before this. Who will be the owner of this data stream?

@terrancedejesus
Copy link
Copy Markdown
Contributor Author

terrancedejesus commented May 8, 2026

Who will be the owner of this data stream?

Yes, I noticed the buildkite fail related to owners. There are a few owners across the Azure package data streams it seems so I am not sure what team should be the owner/maintainer? I assume since these are the legacy data stream for Microsoft Graph and it was provisioned to write threat detection rules on, we mirror that so @elastic/security-service-integrations?

@github-actions

This comment has been minimized.

@elastic-vault-github-plugin-prod
Copy link
Copy Markdown

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@elasticmachine
Copy link
Copy Markdown

💚 Build Succeeded

History

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file shouldn't be here, because the log input isn't used. It should be removed for all data streams. If you don't want to include that in this PR, at least don't add the new one.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list of data streams in the README should be updated.

If it's intended that this data stream has its own item in the integrations browser, separate from the full integration, as is the case for several of the other data streams, then there should be a policy template entry in the top level manifest, and an extra file of documentation in _dev/build/docs/.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pipeline does match about half the other pipelines with the same name:

find -name 'azure-shared-pipeline.yml' | xargs md5sum | sort
28624170d9ba87d593c9aef7dd72284a  ./data_stream/application_gateway/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
28624170d9ba87d593c9aef7dd72284a  ./data_stream/firewall_logs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
352b4a0232fcf818b45a174958b161e8  ./data_stream/identity_protection/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
352b4a0232fcf818b45a174958b161e8  ./data_stream/provisioning/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
8fb30aa0822189b990f17ba026aeb928  ./data_stream/platformlogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
d9d42c87fb050f4264f140fe5c4fddd0  ./data_stream/aadgraphactivitylogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
d9d42c87fb050f4264f140fe5c4fddd0  ./data_stream/activitylogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
d9d42c87fb050f4264f140fe5c4fddd0  ./data_stream/auditlogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
d9d42c87fb050f4264f140fe5c4fddd0  ./data_stream/eventhub/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
d9d42c87fb050f4264f140fe5c4fddd0  ./data_stream/graphactivitylogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
d9d42c87fb050f4264f140fe5c4fddd0  ./data_stream/signinlogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml
d9d42c87fb050f4264f140fe5c4fddd0  ./data_stream/springcloudlogs/elasticsearch/ingest_pipeline/azure-shared-pipeline.yml

It's a good time to use the new links functionality, at least for the ones that do still match.

Comment on lines +4 to +7
dynamic_fields:
# This can be removed after ES 8.14 is the minimum version.
# Relates: https://github.com/elastic/elasticsearch/pull/105689
url.extension: '^.*$'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like 8.19 is the minimum, so this comment's advice can be followed.

type: keyword
description: |
Result signature.
- name: properties
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all the fields documented in https://learn.microsoft.com/en-us/azure/azure-monitor/reference/tables/aadgraphactivitylogs retained here or mapped to common fields outside of this prefix or to ECS fields?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful for future maintenance if that API doc was linked either in the README or in a comment in the ingest pipeline.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description in data_stream/events/manifest.yml, starting with

Collect all the supported (see list below) Azure logs from Event Hub to a target data stream.

New in version 1.20.0+: by enabling this integration, you can collect all the logs from the following Azure services and route them to the appropriate data stream:

should also be updated.

@github-actions
Copy link
Copy Markdown
Contributor

TL;DR

Buildkite failed before tests ran because the PR checkout hook hit a GitHub remote 500 while fetching the target branch, so this is an infrastructure/transient fetch failure rather than a code regression in this PR. Re-run the build first; if it repeats, add retry/backoff around the target-branch git fetch in the hook.

Remediation

  • Re-run Buildkite build #42650 (same commit 30046ce36184e5916a2b5f69455b6c5b0fe976f6).
  • If this recurs, harden .buildkite/hooks/post-checkout by retrying git fetch -v origin "${target_branch}" (line 28) with bounded retries/backoff, then re-run Get reference from target branch.
Investigation details

Root Cause

The failing step is Get reference from target branch, which executes the repository post-checkout hook. The hook fetches the PR base branch in .buildkite/hooks/post-checkout and exited non-zero when GitHub returned HTTP 500 during git fetch.

Relevant code path:

  • .buildkite/hooks/post-checkout:28git fetch -v origin "${target_branch}"
  • .buildkite/hooks/post-checkout:70 invokes checkout_merge, so this fetch is required for all PR builds.

The PR changes are focused on Azure package files/CODEOWNERS and do not modify Buildkite hooks, which supports this being infra/transient rather than a PR logic/config bug.

Evidence

remote: Internal Server Error
fatal: unable to access 'https://github.com/elastic/integrations.git/': The requested URL returned error: 500
🚨 Error: running "repository post-checkout" shell hook: The repository post-checkout hook exited with status 128

Verification

  • Not run locally (failure is in CI checkout/bootstrap phase before package checks/tests).

Follow-up

If the retry succeeds, no PR code change is needed. If repeated 500s continue across builds, treat as persistent CI infrastructure issue and add fetch retry logic in the hook.


What is this? | From workflow: PR Buildkite Detective

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Azure]: Extend Support to AAD Graph Activity Logs

4 participants