Join Shardeum’s $700,000 Bug Bounty Program Today!

Guide to Building Network Monitoring Services for Web3 Apps

Guide to Building Network Monitoring Services for Web3 Apps

Learn how to build a network monitoring service for your blockchain applications, APIs, servers, etc. featuring a practical example with...

Back to top

If you’ve ever seen hashtags like #WhatsAppIsDown or #GithubIsDown etc, then chances are you’ve experienced a service downtime or heard complaints from people who have. These are times when services are unavailable for various reasons. To manage this, companies often create service monitoring pages that report outages and provide latency and uptime information. This helps users understand the reliability and health of their services.

In this guide, we’ll walk you through creating a network monitoring page for Shardeum, which monitors our core services such as our archive server, JSON RPC server, monitor server, documentation, explorer, and website. By the end, you’ll learn the decisions we made, the approaches we took, and the strategies we employed to create our service monitoring pages.

You can follow the setup instructions here to build your own service monitoring page for blockchain applications, API, node servers, WebSocket and pretty much anything. Additionally, we built this project live on our BIP livestream which you can watch now in Shardeum’s YouTube playlist.

High-Level Architecture

Before diving into the details, let’s look at the high-level architecture. This diagram shows how data flows from our Node server into Prometheus and back to a GUI or (data visualization tool like Grafana).

Building a Network Monitoring Service for Shardeum: Tutorial

Set Up Prometheus

Prometheus is a time-series database we use to store network request data. To set up Prometheus on your local machine:

  1. Go to the Prometheus website
  2. Download the appropriate binary for your system.
  3. Unzip the file and open the Prometheus binary in your code editor.
  4. Edit the prometheus.yml file as follows:
global:
  scrape_interval: 20s

scrape_configs:
  - job_name: 'shardeum_services'
    static_configs:
      - targets: ['127.0.0.1:3002']

This configuration instructs Prometheus to ping your server at port 3002 every 20 seconds. To run Prometheus, run the executable file at the root of the prometheus binary using this command:

./prometheus // path to the executable file.

With Prometheus correctly set up and running locally, it’s time to set up a Node server to make network requests to our services and feed the results to Prometheus.

Set Up a Node.js Server

According to our desired architecture, we want to make network requests to our individual services at intervals, and feed the result of those requests to our Prometheus database. Follow these steps to set up a Node server on your local machine for this purpose:

1. Create a new directory for your project and navigate into it

mkdir shardeum-network-status
cd shardeum-network-status
mkdir backend
cd backend

2. Initialize a new Node.js project:

npm init -y

This creates a package.json file at the root of the directory where you ran the command. Next, install the necessary dependencies:

npm install axios express typescript nodemon @types/express prom-client ts-node

3. Create an entry file app.ts and add the following code:

import { Request, Response } from 'express';
const express = require('express');
const prometheus = require('prom-client');
const app = express();

const PORT = process.env.PORT || 3002;

app.get('/metrics', async (req: Request, res: Response) => {
  const metrics = await prometheus.register.metrics();
  res.set('Content-Type', prometheus.register.contentType);
  res.send(metrics);
});

app.get('/', async (req: Request, res: Response) => {
  res.send("Hello World!");
});

app.listen(PORT, () => {
  console.log('Server started on PORT ' + PORT);
});

This sets up a simple Node.js server using Express and exposes it on port 3002. It also sets up a Prometheus client to expose the application metrics on the /metrics endpoint. To run the server, update the start script in the package.json file:

"scripts": {
  "start": "npx tsc && nodemon build/app.js"
}

4. Create a tsconfig.json file in the backend directory with the following content:

{
  "compilerOptions": {
    "target": "es2016",
    "module": "commonjs",
    "rootDir": "./",
    "outDir": "build",
    "esModuleInterop": true,
    "forceConsistentCasingInFileNames": true,
    "strict": true,
    "skipLibCheck": true
  }
}

Lastly, start the server with:

npm start

If we visit port 3002 on the browser, we should get the Hello World response as we defined it on the home route:

Building a Network Monitoring Service for Shardeum: Tutorial

Building the Monitoring Functionality

With our Node server setup complete, let’s now dive into the core functionality: monitoring our services. We’ll be creating a script that periodically checks the status of various services and reports their health to Prometheus.

Create a new file named servicecheck.ts in the services directory. This file will contain the logic for checking the health of each service and updating Prometheus with the status. Let’s set up the file, first we need to import the necessary dependencies for this file, prom-client for Prometheus metrics and axios for making HTTP requests

const prometheus = require('prom-client');
const axios = require('axios');

Next, we define interfaces for our type configuration. Group represents a collection of endpoints, and Endpoint defines the structure of an individual service endpoint.

interface Group {
    group: string;
    servers: Endpoint[];
}

interface Endpoint {
    url: string;
    name: string;
    help: string;
    body?: {
        jsonrpc: string;
        method: string;
        params: any[];
        id: number;
    };
    expectedResponse: any;
}

With the preliminary set up done, we can now create the HealthChecker class to manage our health checks and all its underlying logic

class HealthChecker {
    static instance: HealthChecker;
    endpoints: (Group | Endpoint)[] = [];
    serverHealth: any;
    serviceHealthGauge: any;
    checkingStatus = new Map();
    constructor() {
        if (HealthChecker.instance) {
            return HealthChecker.instance;
        }
        const loadedEndpoints = require('../../endpoints.json');
        this.endpoints = this.flattenEndpoints(loadedEndpoints.urls);
     this.serviceHealthGauge = new prometheus.Gauge({
            name: 'Shardeum',
            help: 'Current health status of services (1 = online, 0 = offline)',
            labelNames: ['name', 'duration', 'timestamp'],
        });
        HealthChecker.instance = this;
        this.startPeriodicChecks();
    }
    flattenEndpoints(urls: any[]) {
        // If an endpoint is part of a group, it is flattened so that each service can be checked individually.
        return urls.flatMap(endpoint => 'servers' in endpoint ? endpoint.servers : [endpoint]);
    }
    // ...
   }

The constructor initializes the class. It sets up a singleton pattern to ensure only one instance exists, loads endpoints from a configuration file, initializes the Prometheus gauge, and starts the periodic checks.

The flattenEndpoints method processes nested groups of endpoints, ensuring that each service can be checked individually with respect to the structure of our the endpoints.json file:

{
    "urls": [
        {
            "group": "Archive Servers",
            "servers": [
                {
                    "url": "http://172.105.153.160:4000/nodelist",
                    "name": "Archiver 1",
                    "help": "This is the first Archiver server",
                    "expectedResponse":{
                         "nodeList": [{}, {}],
              "sign":{
                            "owner":"7af699dd711074eb96a8xxx"
                        }
                    }
                }
                
            ]
        },
        {
            "url": "https://explorer-sphinx.shardeum.orgdfsgs/",
            "name": "Explorer",
            "help": "This is the Shardeum Explorer",
            "expectedResponse": "The Shardeum Betanet Explorer"

        }
    ]
} 

Ideally, this would be where you define the structure of all the services you want to monitor. In your case, it might be a group of multiple servers, APIs, Websockets or good’ol web apps.

With that out of the way, we can proceed to the checkService method that performs the health check for a given service. It measures the response time, checks the response status, and verifies if the resulting response matches the expected value.

async checkService(service: Endpoint) {
        this.checkingStatus.set(service.name, true);
        const startTime = Date.now();

        try {
            const response = service.body ? await axios.post(service.url, service.body) : await axios.get(service.url);
            const statusOk = response.status >= 200 && response.status < 300;
            let isExpectedResponseIncluded = false;

            if (typeof service.expectedResponse === 'string') {
                const responseString = response.data.toString().toLowerCase().trim();
                const expectedResponseString = service.expectedResponse.toString().toLowerCase().trim();
                isExpectedResponseIncluded = responseString.includes(expectedResponseString);
            } else {
                isExpectedResponseIncluded = this.checkJsonResponse(response.data, service.expectedResponse);
            }

            const duration = (Date.now() - startTime) / 1000;
            this.serviceHealthGauge.set({
                name: service.name,
                duration,
                timestamp: Date.now()
            }, statusOk && isExpectedResponseIncluded ? 1 : 0);

            console.log(`Service ${service.name} is ${statusOk && isExpectedResponseIncluded ? 'healthy' : 'unhealthy'}`);
        } catch (error) {
            const duration = (Date.now() - startTime) / 1000;
            this.serviceHealthGauge.set({
                name: service.name,
                duration,
                timestamp: Date.now()
            }, 0);
            console.log(`Service ${service.name} is unhealthy`);
        } finally {
            this.checkingStatus.set(service.name, false);
        }
    }

You may notice that we are validating the response from each service by calling a checkJsonResponse method. This is only necessary because of the structure of our json file and may not be needed in your case if your data is structured differently.

    checkJsonResponse(response: any, expectedResponse: any) {
        for (const key of Object.keys(expectedResponse)) {
            if (!response.hasOwnProperty(key)) {
                return false;
            }

            if (typeof expectedResponse[key] === 'object' && !Array.isArray(expectedResponse[key])) {
                if (!this.checkJsonResponse(response[key], expectedResponse[key])) {
                    return false;
                }
            } else if (Array.isArray(expectedResponse[key])) {
                if (!Array.isArray(response[key]) || response.length === 0) {
                    return false;
                }
            } else {
                if (response[key] !== expectedResponse[key]) {
                    return false;
                }
            }
        }
        return true;
    }

Next, we want to ensure that individual service checks do not exceed a specific threshold. This is important because we do not want to overload the network by having any pending service delay the rest of the service checks. The timeout will allow us to terminate any ongoing service check that takes longer than 2 minutes. As such, the next service check cycle will start without any pending checks in progress.

    async checkServiceWithTimeout(service: Endpoint, timeout = 120000) {
        const timeoutPromise = new Promise((_, reject) =>
            setTimeout(() => reject(new Error('Service check timed out')), timeout)
        );

        try {
            await Promise.race([this.checkService(service), timeoutPromise]);
        } catch (error) {
            console.error(`Error or timeout checking service: ${service.name}`, error);
        }
    }

Next, we want to kick off the service checks that will iterate through all the endpoints in our endpoints.json file and initiate health checks.

    async runChecks() {
        this.endpoints.forEach((service: any) => {
            if (!this.checkingStatus.get(service.name)) {
                this.checkServiceWithTimeout(service);
            }
        });
    }

And finally, we create a new method startPeriodicChecks to initiate the periodic health checks, running every 2 minutes.

    startPeriodicChecks() {
        this.runChecks();
        setInterval(() => this.runChecks(), 120000); // Run checks every 2 minutes
    }
}

With this, we can then export the HealthChecker class to be used in our Node server.

To wrap it up, we head back over to the app.ts file and update it like:

import { Request, Response } from 'express';
const express = require('express');
const prometheus = require('prom-client');
const HealthChecker = require('./services/servicecheck');
const app = express();


const healthChecker = new HealthChecker();
const PORT = process.env.PORT || 3002;

app.get('/metrics', async (req: Request, res: Response) => {

  const metrics = await prometheus.register.metrics();
  res.set('Content-Type', prometheus.register.contentType);
  res.send(metrics);
});

app.listen(PORT, () => {
  console.log('Server started on PORT ' + PORT);
});

Here, we import our service check class and initialize a new instance of it such that the class will be executed when the server starts. It performs the service checks, feeds the results back to prometheus which collects the metrics for us.

With these modifications, when we run the node server again, It will kick off all the expected processes and we should get the relevant logs in the console.

Building a Network Monitoring Service for Shardeum: Tutorial

Quick Recap

  • We set up a Node.js server that uses Prometheus to store metrics for various services.
  • We provided a JSON file containing the services to track and report.
  • Every 2 minutes, a service check cycle initiates, making HTTP requests to all endpoints and logging their health.
  • The results are sent to Prometheus, which collects the metrics.
  • Timeout constraints ensure that pending requests do not delay or stall the system.
  • You can contribute to the project on Github

Next Steps

You can now use Prometheus to store service check data and query it using visualization tools like Grafana or build a frontend app to display the data to users, as we’ve done here.


34
The Shard

Sign up for The Shard community newsletter

Stay updated on major developments about Shardeum.