Important aspect of network management is fault management. It's a set of functions that detect and correct malfunctions in a telecommunications networks. When a fault occurs, a network component will send a notification to the network operator using a protocol such as SNMP (Simple Network Management Protocol). An alarm is a persistent indication of a fault that clears only when the triggering condition has been resolved. A current list of problems occurring on the network component is often kept in the form of an active alarm list. A list of cleared faults is also maintained by most telecom network management systems.
A fault management console allows to monitor events from multiple systems and perform actions based on this information so fault management system should be able to correctly identify events and automatically take action launching a program to take corrective action or notification software that allows a human to take proper intervention.
There are two primary ways to perform fault management: active and passive. Passive fault management is done by collecting alarms from devices when something happens in the devices. Fault management system only knows if a device it is monitoring is smart enough to throw an error and report it to the management tool. Important thing is ff the device being monitored fails completely or locks up, it won't throw an alarm and the problem will not be detected. The solution is active fault management addresses this issue by actively monitoring devices via tools such as PING to determine if the device is active and responding. If the device stops responding, active monitoring will throw an alarm showing the device as unavailable and allows for the proactive correction of the problem.