BlackBox Exporter

BlackBox Exporter

We can monitor all the services mentioned below like

https://api.pod1.3kosh.in :tomcat service

https://build.3kosh.in :jenkins service

https://code.3kosh.in :sonar qube

https://prodmon.3kosh.in :Grafana service

https://prodmondb.3kosh.in :Promotheus service

https://prodmongw.3kosh.in :pushgate way

In this we can see ssl certificate expiry date and service status wheather it is up or down and http service codes,probe duration,dns lookup resolution panal also.

The Blackbox Exporter is a Prometheus exporter that allows you to probe endpoints over various network protocols (HTTP, HTTPS, TCP, ICMP, etc.) and collect metrics related to the connectivity and response times of those endpoints. In a Blackbox Exporter dashboard in Grafana, you can have several panels that provide insights into the probed endpoints. Here’s an explanation of some common panels you might find in a Blackbox Exporter dashboard:

Blackbox exporter

The Blackbox Exporter is a Prometheus exporter that allows you to monitor the availability and performance of external services or endpoints. It supports HTTP, HTTPS, DNS, TCP, and ICMP probes.

The Blackbox Exporter is a valuable tool for monitoring your infrastructure, as it allows you to:

  • Monitor the availability of external services, such as APIs, websites, and databases.
  • Monitor the performance of external services, such as response times and throughput.
  • Identify potential problems with external services before they cause outages or performance problems for your applications. The Blackbox Exporter is easy to use and configure. You can use it to monitor any type of external service, and you can configure it to probe your services as often as you need.

Metrics

  • Probe_success: It exposes the success rate of probes sent. It is a dimensionless metric that ranges from 0 to 1. A value of 1 indicates that all probes were successful, while a value of 0 indicates that all probes failed.

  • Blackbox_exporter_config_last_reload_successful: It exposes the success status of the last configuration reload for the Blackbox Exporter. It is a boolean metric that takes a value of true if the last configuration reload was successful and false if the last configuration reload failed.

  • Probe_duration_seconds: metric returns how long the probe took to complete. When applications consume third-party services, it is important to detect outages or wrong code responses. You can get this information with the probe_http_status metric.

  • Probe_http_status_code: exposes the HTTP status code of the response received by the Blackbox Exporter when it probes an HTTP endpoint. It is a gauge metric that ranges from 100 to 599.

  • Probe_ssl_earliest_cert_expiry: Certificate expiration can produce miscommunications between your services. The provides you the timestamp in seconds when the certificate chain will no longer be valid.

  • Probe_http_duration_seconds: This metric is a Prometheus metric that exposes the duration of HTTP probes sent. It is a gauge metric that ranges from 0 to infinity.It can indicate problems with the HTTP endpoints that the Blackbox Exporter is probing.

  • Probe_icmp_duration_seconds: It exposes the duration of ICMP probes sent by the Blackbox Exporter.It can indicate problems with the ICMP endpoints that the Blackbox Exporter is probing.

  • Probe_http_redirects: It exposes the number of HTTP redirects that occurred during a probe sent by the Blackbox Exporter.it can indicate problems with the HTTP endpoints that the Blackbox Exporter is probing. For example, a high number of redirects can indicate that the endpoints are not configured correctly or that there is a problem with the network.


Grafana Dashboard for Each API

alt text alt text alt text alt text


Grafana Dashboard Panels

Endpoint Availability Panel:

This panel displays the availability status of the probed endpoints. It indicates whether the endpoints are up or down based on the probe results. It can show a simple “Up/Down” status or provide more detailed information, such as the percentage of successful probes.

Response Time Panel:

This panel shows the response time metrics of the probed endpoints. It provides information about how long it takes for the endpoint to respond to the probe requests. It may include metrics like average response time, minimum response time, maximum response time, and response time percentiles.

Status Codes Panel:

If the Blackbox Exporter is probing HTTP or HTTPS endpoints, this panel displays the distribution of HTTP status codes received from the endpoints. It can show the count or percentage of responses for each status code category (1xx, 2xx, 3xx, 4xx, 5xx, etc.). Monitoring status codes helps identify any errors or issues returned by the endpoints.

4xx Client Error Responses

400 Bad Request: The server could not understand the request due to invalid syntax.

  • This might happen if there’s an issue with the configuration files (e.g., pom.xml for Maven, build.gradle for Gradle) or if the API request to SonarQube or Jenkins has incorrect parameters.

401 Unauthorized:The client must authenticate itself to get the requested response.

  • Commonly seen when there are issues with authentication credentials in Jenkins, SonarQube, or when accessing secured resources on Tomcat.

403 Forbidden: The client does not have access rights to the content.

  • This can occur if the user does not have the necessary permissions to perform an action on Jenkins, SonarQube, or when deploying to Tomcat.

404 Not Found: The server can not find the requested resource.

  • Occurs if a specified resource (such as a Jenkins job, Maven repository, or SonarQube project) does not exist or if the URL is incorrect.

405 Method Not Allowed: The request method is known by the server but has been disabled and cannot be used.

  • May happen if an inappropriate HTTP method is used in API calls to Jenkins or SonarQube.

408 Request Timeout: The server would like to shut down this unused connection.

  • Indicates that a request took too long to process, which can occur during large builds or deployments.

409 Conflict: The request could not be completed due to a conflict with the current state of the target resource.

  • Can happen in Jenkins when trying to create a job that already exists or in SonarQube when there’s a conflict in project configurations.

413 Payload Too Large: The request entity is larger than limits defined by the server.

  • Seen when uploading large files or artifacts that exceed server limits in Jenkins, SonarQube, or Tomcat.

5xx Server Error Responses

500 Internal Server Error: The server has encountered a situation it doesn’t know how to handle.

  • Common in all tools due to unhandled exceptions, misconfigurations, or bugs in the server-side code.

501 Not Implemented: The request method is not supported by the server and cannot be handled.

  • Rare but can occur if a feature is not implemented in the server.

502 Bad Gateway: The server, while acting as a gateway or proxy, received an invalid response from the upstream server.

  • Seen when Jenkins or SonarQube acts as a proxy or when there are issues with reverse proxy setups involving Tomcat.

503 Service Unavailable: The server is not ready to handle the request.

  • Occurs during server maintenance, overload, or if the service (Jenkins, SonarQube, Tomcat) is temporarily down.

504 Gateway Timeout: The server is acting as a gateway and cannot get a response in time.

  • Can happen if Jenkins or SonarQube is taking too long to respond due to heavy load or network issues.

505 HTTP Version Not Supported: The HTTP version used in the request is not supported by the server.

  • This might occur if there’s a mismatch in the HTTP version supported by the client and server.

DNS Resolution Panel:

If the Blackbox Exporter is probing endpoints using DNS, this panel provides insights into DNS resolution. It can show metrics like DNS resolution time, success rate, and any errors encountered during resolution.

TCP Connectivity Panel:

If the Blackbox Exporter is probing TCP endpoints, this panel displays the TCP connectivity status. It indicates whether the TCP connection was successful or if there were any errors or timeouts.

ICMP Response Panel:

If the Blackbox Exporter is probing endpoints using ICMP (ping), this panel shows the ICMP response metrics. It may include metrics like average round-trip time (RTT), packet loss percentage, and any errors encountered during the ICMP probes.

Custom Panels:

Depending on your monitoring requirements, you can create custom panels to display any specific metrics or information collected by the Blackbox Exporter. This allows you to monitor and visualize additional aspects of the probed endpoints, such as SSL certificate expiration, specific headers in HTTP responses, or any other custom probes you have configured.

These are some examples of panels you may find in a Blackbox Exporter dashboard in Grafana. The actual panels and their configurations may vary depending on your specific monitoring needs, the metrics collected by the Blackbox Exporter, and the protocols and probes you have configured. Grafana provides flexibility to customise and create panels based on the metrics you want to monitor from the probed endpoints.


Alert rules

  • alert:BlackboxProbeFailed

    • expr:probe_success == 0
    • summary: Blackbox probe failed
  • alert:BlackboxConfigurationReloadFailure

    • expr:blackbox_exporter_config_last_reload_successful != 1
    • description: Blackbox configuration reload failure VALUE = {{ $value }}
    • summary: Blackbox configuration reload failure
  • alert:BlackboxSlowProbe

    • expr:avg_over_time(probe_duration_seconds[1m]) > 1
    • description: Blackbox probe took more than 1s to complete VALUE = {{ $value }}
    • summary: Blackbox slow probe (instance {{ $labels.instance }})
  • alert:BlackboxProbeHttpFailure

    • expr:probe_http_status_code <= 199 or probe_http_status_code >= 400
    • description: HTTP status code is not 200-399 VALUE = {{ $value }}
    • summary: Blackbox probe HTTP failure
  • alert:BlackboxSslCertificateWillExpireSoon

    • expr:3 <= round((last_over_time(probe_ssl_earliest_cert_expiry[10m]) - time()) / 86400, 0.1) < 20
    • description: SSL certificate expires in less than 20 days.
    • summary: Blackbox SSL certificate will expire soon
  • alert:BlackboxSslCertificateWillExpireSoon

    • expr:0 <= round((last_over_time(probe_ssl_earliest_cert_expiry[10m]) - time()) / 86400, 0.1) < 3
    • description: SSL certificate expires in less than 3 days
    • summary: Blackbox SSL certificate will expire soon.
  • alert:BlackboxSslCertificateExpired

    • expr:round((last_over_time(probe_ssl_earliest_cert_expiry[10m]) - time()) / 86400, 0.1) < 0
    • description: SSL certificate has expired already VALUE
    • summary: Blackbox SSL certificate expired
  • alert:BlackboxProbeSlowHttp

    • expr:avg_over_time(probe_http_duration_seconds[1m]) > 1
    • description: HTTP request took more than 1s VALUE.
    • summary: Blackbox probe slow HTTP
  • alert:BlackboxProbeSlowPing

    • expr:avg_over_time(probe_icmp_duration_seconds[1m]) > 1
    • description: Black Box ping took more than 1s VALUE
    • summary: Blackbox probe slow ping.