The Keep Alive application is a simple example of a heartbeat/watchdog for packet processing cores. It demonstrates how to detect ‘failed’ DPDK cores and notify a fault management entity of this failure. Its purpose is to ensure the failure of the core does not result in a fault that is not detectable by a management entity.
The application demonstrates how to protect against ‘silent outages’ on packet processing cores. A Keep Alive Monitor Agent Core (master) monitors the state of packet processing cores (worker cores) by dispatching pings at a regular time interval (default is 5ms) and monitoring the state of the cores. Cores states are: Alive, MIA, Dead or Buried. MIA indicates a missed ping, and Dead indicates two missed pings within the specified time interval. When a core is Dead, a callback function is invoked to restart the packet processing core; A real life application might use this callback function to notify a higher level fault management entity of the core failure in order to take the appropriate corrective action.
Note: Only the worker cores are monitored. A local (on the host) mechanism or agent to supervise the Keep Alive Monitor Agent Core DPDK core is required to detect its failure.
Note: This application is based on the L2 Forwarding Sample Application (in Real and Virtualized Environments). As such, the initialization and run-time paths are very similar to those of the L2 forwarding application.
To compile the sample application see Compiling the Sample Applications.
The application is located in the l2fwd_keep_alive sub-directory.
The application has a number of command line options:
./build/l2fwd-keepalive [EAL options] \
-- -p PORTMASK [-q NQ] [-K PERIOD] [-T PERIOD]
where,
To run the application in linuxapp environment with 4 lcores, 16 ports 8 RX queues per lcore and a ping interval of 10ms, issue the command:
./build/l2fwd-keepalive -l 0-3 -n 4 -- -q 8 -p ffff -K 10
Refer to the DPDK Getting Started Guide for general information on running applications and the Environment Abstraction Layer (EAL) options.
The following sections provide some explanation of the The Keep-Alive/’Liveliness’ conceptual scheme. As mentioned in the overview section, the initialization and run-time paths are very similar to those of the L2 Forwarding Sample Application (in Real and Virtualized Environments).
The Keep-Alive/’Liveliness’ conceptual scheme:
The following sections provide some explanation of the code aspects that are specific to the Keep Alive sample application.
The keepalive functionality is initialized with a struct rte_keepalive and the callback function to invoke in the case of a timeout.
rte_global_keepalive_info = rte_keepalive_create(&dead_core, NULL);
if (rte_global_keepalive_info == NULL)
rte_exit(EXIT_FAILURE, "keepalive_create() failed");
The function that issues the pings keepalive_dispatch_pings() is configured to run every check_period milliseconds.
if (rte_timer_reset(&hb_timer,
(check_period * rte_get_timer_hz()) / 1000,
PERIODICAL,
rte_lcore_id(),
&rte_keepalive_dispatch_pings,
rte_global_keepalive_info
) != 0 )
rte_exit(EXIT_FAILURE, "Keepalive setup failure.\n");
The rest of the initialization and run-time path follows the same paths as the L2 forwarding application. The only addition to the main processing loop is the mark alive functionality and the example random failures.
rte_keepalive_mark_alive(&rte_global_keepalive_info);
cur_tsc = rte_rdtsc();
/* Die randomly within 7 secs for demo purposes.. */
if (cur_tsc - tsc_initial > tsc_lifetime)
break;
The rte_keepalive_mark_alive function simply sets the core state to alive.
static inline void
rte_keepalive_mark_alive(struct rte_keepalive *keepcfg)
{
keepcfg->live_data[rte_lcore_id()].core_state = RTE_KA_STATE_ALIVE;
}