Forum Discussion

hodgm's avatar
hodgm
Icon for Contributor rankContributor
3 months ago

Azure subscription monitoring max retries error

Hi all,

We are in the process of setting up SL1 to monitor a mix of Azure and on-prem VMWare devices.

The Azure subscription monitors repeatedly error with:
"Azure: Network Failure on App: ARMVMPerformance, Message: HTTPSConnectionPool(host='management.azure.com', port=443): Max retries exceeded with url:
Failed to establish a new connection: [Errno -2] Name or service not known',)),".

We have 14 subscriptions setup and some don't show the error at all, some show it weekly and 1 shows it every 15 minutes generating 1000's of critical alerts a month.

SL1 support/PS say it is down to a DNS issue: We have checked with the ENG Team and it looks like a random DNS issue where it can't resolve "management.azure.com" from the collector and it looks transient. The error message points to a failure to resolve the hostname when the config da’s are run.

However we use the same DNS server for all collectors and some subscriptions don't see the error. I can also do successful nslookups/pings/nmaps from the collectors.

Has anyone seen something like this before?

Thanks,

Mark

  • Further information on this issue for anyone experiencing the problem.
    It turns out I had 2 issues:

    - There was a DNS issue on with our collectors in Azure UK West, I spotted this when our on-prem exchange servers had the same issues contacting Azure. The DNS server forwarders were pointing to another internal DNS server that was decommissioned rather than a working internal (or external) DNS server

    - The second issue was a collector based in Azure UK South where DNS was working fine , muddying the water for troubleshooting. This issue was down to the Subscription being monitored missing part of the URL.
    So for a working subscription you would see it accessing https://management.azure.com/subscriptions/abcdefg12345 etc. The non working subscription in my case was missing a / and subscription so looks like: https://management.azure.comdefg12345
    I have checked the credential page, which is how the url is pulled together and it is being passed to the SaaS backend correctly.
    Support are still troubleshooting for me but I hope this helps. To check the URL check the summary tab of the subscription and the device logs section at the bottom

  • In our case we found our that NG firewalls were opening ssl traffic, and the list of source addresses were not complete, so looked like the correct EDL files + some additions were needed for FW. The connections went randomly to IP addresses that were not listed, so ssl certs were opened in FW and therefore DA, correctly, thought that there was in issue.