Guide to Building Network Monitoring Services for Web3 Apps
Learn how to build a network monitoring service for your blockchain applications, APIs, servers, etc. featuring a practical example with...
Learn how to build a network monitoring service for your blockchain applications, APIs, servers, etc. featuring a practical example with...
If you’ve ever seen hashtags like #WhatsAppIsDown or #GithubIsDown etc, then chances are you’ve experienced a service downtime or heard complaints from people who have. These are times when services are unavailable for various reasons. To manage this, companies often create service monitoring pages that report outages and provide latency and uptime information. This helps users understand the reliability and health of their services.
In this guide, we’ll walk you through creating a network monitoring page for Shardeum, which monitors our core services such as our archive server, JSON RPC server, monitor server, documentation, explorer, and website. By the end, you’ll learn the decisions we made, the approaches we took, and the strategies we employed to create our service monitoring pages.
You can follow the setup instructions here to build your own service monitoring page for blockchain applications, API, node servers, WebSocket and pretty much anything. Additionally, we built this project live on our BIP livestream which you can watch now in Shardeum’s YouTube playlist.
Before diving into the details, let’s look at the high-level architecture. This diagram shows how data flows from our Node server into Prometheus and back to a GUI or (data visualization tool like Grafana).
Prometheus is a time-series database we use to store network request data. To set up Prometheus on your local machine:
prometheus.yml
file as follows:global:
scrape_interval: 20s
scrape_configs:
- job_name: 'shardeum_services'
static_configs:
- targets: ['127.0.0.1:3002']
This configuration instructs Prometheus to ping your server at port 3002 every 20 seconds. To run Prometheus, run the executable file at the root of the prometheus binary using this command:
./prometheus // path to the executable file.
With Prometheus correctly set up and running locally, it’s time to set up a Node server to make network requests to our services and feed the results to Prometheus.
According to our desired architecture, we want to make network requests to our individual services at intervals, and feed the result of those requests to our Prometheus database. Follow these steps to set up a Node server on your local machine for this purpose:
1. Create a new directory for your project and navigate into it
mkdir shardeum-network-status
cd shardeum-network-status
mkdir backend
cd backend
2. Initialize a new Node.js project:
npm init -y
This creates a package.json
file at the root of the directory where you ran the command. Next, install the necessary dependencies:
npm install axios express typescript nodemon @types/express prom-client ts-node
3. Create an entry file app.ts
and add the following code:
import { Request, Response } from 'express';
const express = require('express');
const prometheus = require('prom-client');
const app = express();
const PORT = process.env.PORT || 3002;
app.get('/metrics', async (req: Request, res: Response) => {
const metrics = await prometheus.register.metrics();
res.set('Content-Type', prometheus.register.contentType);
res.send(metrics);
});
app.get('/', async (req: Request, res: Response) => {
res.send("Hello World!");
});
app.listen(PORT, () => {
console.log('Server started on PORT ' + PORT);
});
This sets up a simple Node.js server using Express
and exposes it on port 3002
. It also sets up a Prometheus client to expose the application metrics on the /metrics endpoint. To run the server, update the start
script in the package.json
file:
"scripts": {
"start": "npx tsc && nodemon build/app.js"
}
4. Create a tsconfig.json
file in the backend directory with the following content:
{
"compilerOptions": {
"target": "es2016",
"module": "commonjs",
"rootDir": "./",
"outDir": "build",
"esModuleInterop": true,
"forceConsistentCasingInFileNames": true,
"strict": true,
"skipLibCheck": true
}
}
Lastly, start the server with:
npm start
If we visit port 3002
on the browser, we should get the Hello World
response as we defined it on the home route:
With our Node server setup complete, let’s now dive into the core functionality: monitoring our services. We’ll be creating a script that periodically checks the status of various services and reports their health to Prometheus.
Create a new file named servicecheck.ts
in the services
directory. This file will contain the logic for checking the health of each service and updating Prometheus with the status. Let’s set up the file, first we need to import the necessary dependencies for this file, prom-client
for Prometheus metrics and axios
for making HTTP requests
const prometheus = require('prom-client');
const axios = require('axios');
Next, we define interfaces for our type configuration. Group
represents a collection of endpoints, and Endpoint
defines the structure of an individual service endpoint.
interface Group {
group: string;
servers: Endpoint[];
}
interface Endpoint {
url: string;
name: string;
help: string;
body?: {
jsonrpc: string;
method: string;
params: any[];
id: number;
};
expectedResponse: any;
}
With the preliminary set up done, we can now create the HealthChecker
class to manage our health checks and all its underlying logic
class HealthChecker {
static instance: HealthChecker;
endpoints: (Group | Endpoint)[] = [];
serverHealth: any;
serviceHealthGauge: any;
checkingStatus = new Map();
constructor() {
if (HealthChecker.instance) {
return HealthChecker.instance;
}
const loadedEndpoints = require('../../endpoints.json');
this.endpoints = this.flattenEndpoints(loadedEndpoints.urls);
this.serviceHealthGauge = new prometheus.Gauge({
name: 'Shardeum',
help: 'Current health status of services (1 = online, 0 = offline)',
labelNames: ['name', 'duration', 'timestamp'],
});
HealthChecker.instance = this;
this.startPeriodicChecks();
}
flattenEndpoints(urls: any[]) {
// If an endpoint is part of a group, it is flattened so that each service can be checked individually.
return urls.flatMap(endpoint => 'servers' in endpoint ? endpoint.servers : [endpoint]);
}
// ...
}
The constructor initializes the class. It sets up a singleton pattern to ensure only one instance exists, loads endpoints from a configuration file, initializes the Prometheus gauge, and starts the periodic checks.
The flattenEndpoints
method processes nested groups of endpoints, ensuring that each service can be checked individually with respect to the structure of our the endpoints.json
file:
{
"urls": [
{
"group": "Archive Servers",
"servers": [
{
"url": "http://172.105.153.160:4000/nodelist",
"name": "Archiver 1",
"help": "This is the first Archiver server",
"expectedResponse":{
"nodeList": [{}, {}],
"sign":{
"owner":"7af699dd711074eb96a8xxx"
}
}
}
]
},
{
"url": "https://explorer-sphinx.shardeum.orgdfsgs/",
"name": "Explorer",
"help": "This is the Shardeum Explorer",
"expectedResponse": "The Shardeum Betanet Explorer"
}
]
}
Ideally, this would be where you define the structure of all the services you want to monitor. In your case, it might be a group of multiple servers, APIs, Websockets or good’ol web apps.
With that out of the way, we can proceed to the checkService
method that performs the health check for a given service. It measures the response time, checks the response status, and verifies if the resulting response matches the expected value.
async checkService(service: Endpoint) {
this.checkingStatus.set(service.name, true);
const startTime = Date.now();
try {
const response = service.body ? await axios.post(service.url, service.body) : await axios.get(service.url);
const statusOk = response.status >= 200 && response.status < 300;
let isExpectedResponseIncluded = false;
if (typeof service.expectedResponse === 'string') {
const responseString = response.data.toString().toLowerCase().trim();
const expectedResponseString = service.expectedResponse.toString().toLowerCase().trim();
isExpectedResponseIncluded = responseString.includes(expectedResponseString);
} else {
isExpectedResponseIncluded = this.checkJsonResponse(response.data, service.expectedResponse);
}
const duration = (Date.now() - startTime) / 1000;
this.serviceHealthGauge.set({
name: service.name,
duration,
timestamp: Date.now()
}, statusOk && isExpectedResponseIncluded ? 1 : 0);
console.log(`Service ${service.name} is ${statusOk && isExpectedResponseIncluded ? 'healthy' : 'unhealthy'}`);
} catch (error) {
const duration = (Date.now() - startTime) / 1000;
this.serviceHealthGauge.set({
name: service.name,
duration,
timestamp: Date.now()
}, 0);
console.log(`Service ${service.name} is unhealthy`);
} finally {
this.checkingStatus.set(service.name, false);
}
}
You may notice that we are validating the response from each service by calling a checkJsonResponse method. This is only necessary because of the structure of our json file and may not be needed in your case if your data is structured differently.
checkJsonResponse(response: any, expectedResponse: any) {
for (const key of Object.keys(expectedResponse)) {
if (!response.hasOwnProperty(key)) {
return false;
}
if (typeof expectedResponse[key] === 'object' && !Array.isArray(expectedResponse[key])) {
if (!this.checkJsonResponse(response[key], expectedResponse[key])) {
return false;
}
} else if (Array.isArray(expectedResponse[key])) {
if (!Array.isArray(response[key]) || response.length === 0) {
return false;
}
} else {
if (response[key] !== expectedResponse[key]) {
return false;
}
}
}
return true;
}
Next, we want to ensure that individual service checks do not exceed a specific threshold. This is important because we do not want to overload the network by having any pending service delay the rest of the service checks. The timeout will allow us to terminate any ongoing service check that takes longer than 2 minutes. As such, the next service check cycle will start without any pending checks in progress.
async checkServiceWithTimeout(service: Endpoint, timeout = 120000) {
const timeoutPromise = new Promise((_, reject) =>
setTimeout(() => reject(new Error('Service check timed out')), timeout)
);
try {
await Promise.race([this.checkService(service), timeoutPromise]);
} catch (error) {
console.error(`Error or timeout checking service: ${service.name}`, error);
}
}
Next, we want to kick off the service checks that will iterate through all the endpoints in our endpoints.json
file and initiate health checks.
async runChecks() {
this.endpoints.forEach((service: any) => {
if (!this.checkingStatus.get(service.name)) {
this.checkServiceWithTimeout(service);
}
});
}
And finally, we create a new method startPeriodicChecks
to initiate the periodic health checks, running every 2 minutes.
startPeriodicChecks() {
this.runChecks();
setInterval(() => this.runChecks(), 120000); // Run checks every 2 minutes
}
}
With this, we can then export the HealthChecker
class to be used in our Node server.
To wrap it up, we head back over to the app.ts
file and update it like:
import { Request, Response } from 'express';
const express = require('express');
const prometheus = require('prom-client');
const HealthChecker = require('./services/servicecheck');
const app = express();
const healthChecker = new HealthChecker();
const PORT = process.env.PORT || 3002;
app.get('/metrics', async (req: Request, res: Response) => {
const metrics = await prometheus.register.metrics();
res.set('Content-Type', prometheus.register.contentType);
res.send(metrics);
});
app.listen(PORT, () => {
console.log('Server started on PORT ' + PORT);
});
Here, we import our service check class and initialize a new instance of it such that the class will be executed when the server starts. It performs the service checks, feeds the results back to prometheus which collects the metrics for us.
With these modifications, when we run the node server again, It will kick off all the expected processes and we should get the relevant logs in the console.
You can now use Prometheus to store service check data and query it using visualization tools like Grafana or build a frontend app to display the data to users, as we’ve done here.