On a recent client engagement I had the opportunity to develop Datadog dashboards to monitor a production Microsoft Active Directory deployment and expose vital metrics relating to service health and capacity. Building on basic monitoring of the core infrastructure running these AD/DNS instances, we dived deeper to collect and present service-specific metrics, service statuses and Windows events relating to these functions.
This post covers the steps I followed to find the information required to report on these functions and subsequently configure the Datadog Agent to collect them for use.
Remember to install the “Windows Service” and “WMI” Datadog Integrations from the Datadog web UI.
Windows System Events
A core source of service information within Active Directory instances are Windows system events. Finding the right logs to monitor is relatively straight-forward process:
- Open Windows Event Viewer (run eventvwr from the command line)
- Open the “Application and Services Logs”
- Open each of the application logs of interest and select an event in order to the locate the “Log Name” field. This is required to configure the “Win32 Event Log” section in the Datadog agent config
- The application logs required for AD/DNS monitoring are:
- Active Directory Web Service
- DFS Replication
- Directory Service
- DNS Server
- I tested with the following log-levels enabled, however turned off “Information” after initial testing as Windows services can be very chatty:
I also configured the Datadog agent to return particular high-value security, application and audit events, as per the following Datadog blog post: