URL Monitoring

URL_Alerts

The Blackbox exporter helps monitor whether websites or web services are up and running and how long they take to respond. It checks various URLs by simulating requests to each of them, then reports back with simple metrics on availability and speed

Metrics Overview

  1. probe_success

    • Description: Indicates whether the probe was successful (1 for success, 0 for failure).
    • Example: 1 (success), 0 (failure)
  2. probe_http_status_code

    • Description: The HTTP status code returned by the server (ranging from 200 to 599).
    • Example: 200, 404, 503
  3. probe_http_ssl

    • Description: Indicates whether SSL is enabled on the HTTP service (YES/NO).
    • Example: YES (SSL enabled), NO (SSL not enabled)
  4. probe_ssl_earliest_cert_expiry

    • Description: The earliest expiration time of the SSL certificate in seconds since epoch.
    • Example: 7400972.36 (Time in seconds)
  5. probe_duration_seconds

    • Description: The time taken to complete the probe, in seconds.
    • Example: 0.2345 (Time in seconds)
  6. probe_dns_lookup_time_seconds

    • Description: The time taken for DNS lookup during the probe, in seconds.
    • Example: 0.0617 (Time in seconds)

Main Panels

Status

Shows if each URL is UP (reachable) or DOWN (not reachable).

HTTP Status Code

The response code, like 200 for success.

  • Successful responses ( 200 – 299 )
  • Redirection messages ( 300 – 399 )
  • Client error responses ( 400 – 499 )
  • Server error responses ( 500 – 599 )

SSL

Indicates whether the site is using SSL for secure connections, and shows how long until the SSL certificate expires.

Average Response Time

This panel shows how long a service takes to respond to requests,Short response times indicate good performance, while longer times may point to issues like server overload or high traffic.

Average DNS Lookup

The Average DNS Lookup Time panel shows how long it takes to translate a domain name into an IP address. This step is crucial because a slow DNS lookup can delay a website or service from loading

Info about the values and thersolds

URL Status HTTP Status Code SSL SSL Expiration (sec) Average Probe Duration Average Probe Duration Warning Average Probe Duration Critical Avg DNS Lookup Avg DNS Lookup Warning Avg DNS Lookup Critical
https://api.rupid.in UP 200 YES 7400972.36 1002.56ms 3000ms 9000ms 22.16ms 200ms 400ms
https://elitical.sayukth.com UP 200 YES 7401618.36 186.69ms 3000ms 9000ms 49.82ms 200ms 400ms
https://sayukth.com UP 200 YES 7401681.36 422.11ms 3000ms 9000ms 69.89ms 200ms 400ms
https://merusphere.in UP 200 YES 7401642.36 380.38ms 3000ms 9000ms 17.85ms 200ms 400ms
https://merusphere.com UP 200 YES 7401634.36 723.36ms 3000ms 9000ms 70.00ms 200ms 400ms
https://panchayatseva.com UP 200 YES 7401661.36 696.83ms 3000ms 9000ms 61.69ms 200ms 400ms
https://erp.panchayatseva.com UP 200 YES 7401624.36 694.13ms 3000ms 9000ms 77.11ms 200ms 400ms
https://demo-webapp.eagleeyeview.ai UP 200 YES 7403143.36 174.48ms 3000ms 9000ms 21.24ms 200ms 400ms
https://app.eagleeyeview.ai UP 200 YES 7403112.36 156.96ms 3000ms 9000ms 32.45ms 200ms 400ms
https://api.eagleeyeview.ai UP 200 YES 7403103.36 173.38ms 3000ms 9000ms 24.79ms 200ms 400ms
https://demo-api.eagleeyeview.ai UP 200 YES 7403122.36 160.30ms 3000ms 9000ms 37.51ms 200ms 400ms
https://demo-app.eagleeyeview.ai UP 200 YES 7403133.36 146.25ms 3000ms 9000ms 21.66ms 200ms 400ms
https://webapp.eagleeyeview.ai UP 200 YES 7403165.36 159.70ms 3000ms 9000ms 66.35ms 200ms 400ms
https://app.sb.panchayatseva.com UP 200 YES 3347117.36 329.80ms 3000ms 9000ms 157.92ms 200ms 400ms
https://orbit.panchayatseva.com UP 200 YES 3347077.36 117.97ms 3000ms 9000ms 47.49ms 200ms 400ms
https://api.sb.panchayatseva.com UP 200 YES 3347131.36 222.15ms 3000ms 9000ms 82.56ms 200ms 400ms
https://apd.sb.panchayatseva.com UP 200 YES 3347145.36 433.52ms 3000ms 9000ms 61.77ms 200ms 400ms
https://mdm.panchayatseva.com UP 200 YES 4161224.36 126.95ms 3000ms 9000ms 103.56ms 200ms 400ms
https://artifacts.panchayatseva.in UP 200 YES 860132.36 94.27ms 3000ms 9000ms 41.30ms 200ms 400ms
https://surveydoc.sb.panchayatseva.com UP 200 YES 3462083.36 267.22ms 3000ms 9000ms 112.53ms 200ms 400ms

Remedy for the Warning and critical times

1. URL Status (probe_success)

Warning: If the URL keeps going up and down, check if the network or server is having issues. Critical: If the URL is staying down (probe success = 0), make sure the service is running. Try restarting the service or server, and let the DevOps or Development team know if the problem continues

2. HTTP Status Code (probe_http_status_code)

  • Warning:
    • 3xx Status Codes (Redirection): If you see 3xx codes, it means the URL is being redirected. Confirm if this is expected. If not, you may need to update the URL to avoid unnecessary redirects, which can slow down response times.
    • 4xx Status Codes (Client Errors): If there are 4xx codes, the URL might be incorrect or the page could be missing. Double-check the URL to ensure it’s accessible and correct. If the error continues, reach out to the team managing the content.
  • Critical:
    • 5xx Status Codes (Server Errors): If you’re getting 5xx codes, the server has an issue. First, check if the service is running properly and restart the server if needed. If the error persists, it could be a problem with the application itself, so notify the Development or DevOps team to investigate.

3. SSL Status (probe_http_ssl)

  • Warning:

    • If SSL is suddenly disabled, review recent changes to see if SSL settings were affected. If SSL is supposed to be active, re-enable it.
  • Critical:

    • Check the SSL configuration files. If SSL is still not enabled, escalate the issue to the DevOps team for further investigation and resolution.

4. SSL Expiration (probe_ssl_earliest_cert_expiry)

  • Warning:

    • If the SSL certificate expiration is approaching (e.g., within a 10 or 5 days), start the renewal process and inform relevant teams.
  • Critical:

    • If the SSL certificate is expired, prioritize renewing it immediately to avoid service interruptions and security issues.

5. Average Probe Duration (probe_duration_seconds)

  • Warning:

    • If response times are between 3,000 ms and 9,000 ms, check if there’s a spike in traffic or any performance bottlenecks.
    • Investigate server resources or network issues that could be causing delays.
  • Critical:

    • For response times exceeding 9,000 ms, investigate server load, check for resource constraints (e.g., CPU, memory), and review application performance.
    • Restart the service if needed, or escalate to the corresponding team if the issue persists.

6. DNS Lookup Time (probe_dns_lookup_time_seconds)

  • Warning:

    • For DNS lookup times above 200 ms but below 400 ms, check the performance of the DNS server and review any potential network issues that could be affecting speed.
  • Critical:

    • For DNS lookup times above 400 ms, investigate the DNS provider’s status and consider switching to a faster, more responsive DNS server.
    • Monitor for network connectivity issues and address any problems that might be slowing down DNS resolution.

General Remedy Steps

  • Service-Based URLs:

    • Restart the service.
    • Check Jenkins jobs for any related automation processes.
    • Escalate to the appropriate team based on the HTTP status codes and the type of application (e.g., Development team for application issues, DevOps for infrastructure-related issues).
  • Non-Service URLs:

    • For web servers or other types of services, inform the Development or DevOps team based on the issue type (e.g., HTTP error codes or SSL expiration).
    • Ensure quick resolution to prevent further disruptions in service.