Monitoring & Control | Network Management | FCAPS

FCAPS

The FCAPS concept was developed by the International Telecommunication Union (ITU-T) and applied to data networks by the International Standards Organisationin in their Open Systems Interconnect (OSI) Network Management Model that forms the basis for most network management implementations. The OSI model specifies the 5 functional areas:

Fault Management- To detect, log, notify users of, and (to the extent possible) automatically fix network problems to keep the network running effectively. Because faults can cause downtime or unacceptable network degradation, fault management is perhaps the most widely implemented of the ISO network management elements.
Configuration Management - To monitor network and system configuration information so that the effects on network operation of various versions of hardware and software elements can be tracked and managed.
Accounting Management (or Asset / Inventory Management for networks that don't apply charges to users) - To measure network utilisation and activities of individual or group uses on the network for the purpose of network usage regulation and billing.
Performance Management - To measure and make available various aspects of network performance for network performance monitoring and optimisation. The network performance variables include network throughput, user response times, and line utilisation.
Security Management - To control access to network resources so that the network cannot be sabotaged and sensitive information can only be accessed by those with authorisation.

FCAPS is an extension of the network management conceptual frameworks called Telecommunication Management Network (TMN). Each TMN layer needs to perform some or all FCAPS functions in certain ways. The TMN logical layered architecture (LLA) consists of five management layers:

Network Element Layer (NEL) defines interfaces for the network elements
Element Management Layer (EML) provides management functions for network elements on an individual or group basis. It also supports an abstraction of the functions provided by the network element layer. Examples include determining equipment errors, measuring device temperatures, collecting statistical data for accounting purposes, and logging event notifications and performance statistics.
Network Management Layer (NML) offers a holistic view of the network, between multiple pieces of equipment and independent of device types and vendors. It manages a network as supported by the element management layer. Examples include end-to-end network utilisation reports, root cause analysis, and traffic engineering.
Service Management Layer (SML). The main functions of this layer are service creation, order handling, service implementation, service monitoring, complaint handling, and invoicing. Examples include QoS management, accounting per service (VPN), and SLA monitoring and notification.
Business Management Layer (BML) is responsible for the total enterprise. Business management can be considered a goal-setting approach - What are the objectives, and how can the network (and network management specifically) help achieve them?.

FCAPS can be seen as the predecessor of the newer Fulfillment/Assurance/Billing (FAB) model defined in eTOM.

Fault

Fault Management (FM) is the collection and analysis of alarms and faults in a service. These faults can be either transient or persistent. Transient failures are not alarmed if their occurrence does not exceed a threshold. These events are, however, logged. Some transient problems can be automatically corrected within the service, while others may require different levels of management services to resolve. Faults can be determined from unsolicited alarm messages or by log analysis; the latter may be the only course when, say, existing services/applications do not have internal monitoring and/or alarm generation capabilities.

The FM function analyses and filters the fault messages and coordinates the messages so that the number of actual events reflects the real conditions of the services. The root cause is reported, while suppressing other related fault messages. While all faults are logged, and an FM at some layer may have been able to resolve the fault, the resolving FM will create a trouble ticket recording the fault details and any corrective actions performed.

Fault Management covers:

Fault detection
Fault correction
Fault isolation
Service recovery
Alarm handling
Alarm filtering
Alarm generation
Fault correlation
Diagnostics
Error (fault) logging
Error (fault) handling
Error (fault) statistics

Configuration

Configuration management (CM) capabilities are responsible for the life-cycle of a service agent from its inception to final shut down; CM standardises the activation and deactivation of services in a regulated and controlled manner. CM also includes change management to keep track of modifications to the services. Configuration information is the make-up of the service and agents that realise it. Configuration management provides the location, setup, inventory and maintenance of service agents, their components and their realisation configurations. Information on the service agents is collected regularly, tracking the types of resources and their details. When changes in a service agent configuration occur, the CM can collect and analyse the changes, and ensure that these were authorised and are acceptable; unauthorised changes are reversed and also alarmed as they may be a pointer to, say, a security breach. The configuration of multi services is complex and failure or delay can have a detrimental impact on many customer services. Configuration Manager needs to support transaction-like configuration operations where multiple features in one or more service agents can be configured at the same time in a single action; this can be achieved using Group and Orchestration agent organisations.

CM is responsible for configuration change management. The new proposed configuration version is compared with the current/last working version and all changes are identified and notified to responsible parties. CM can also verify if the proposed version has been used in the past and any problems that it caused; previous version that have caused problems may have been marked as unusable in which case if there is a match the new version would not be allowed. CM maintains all past configuration versions allowing quick restoration to a previously working configuration. All changes have to be authorised and change notices sent to multiple users and service agents.

The auto service discovery tool and methods component of a CM provides continuous discovery and mapping of services, their dependencies, components and configurations with respect to their underlying realisations. The tools provide accurate, real-time visibility into the service configurations.

Configuration Management covers:

Service initialisation
Service provisioning
Auto-discovery
Backup and restore
Resource shut down
Change management
Pre-provisioning
Inventory/asset management
Copy configuration
Remote configuration
Automated software/fix distribution
Job initiation, tracking, and execution

Accounting

AM capabilities deal with specifying the parameters to be monitored that define usage, setting usage limits (this is done through a configuration manager), monitoring and costing usage, ensuring that usage quotas are not exceeded, detect and report fraud (and attempts to defraud), and support accounting audits (logging). AM also collects the accounting and usage information, analyses the information (for example, to detect fraud, resource utilisation) and sends reports to other services (for example, other AMs). AM supports audits and fraud reporting by analysing suspect and/or unusual behaviors. Precise accounting across all layers of the environment is required.

Accounting Management covers:

Track service/resource use
Cost for services
Accounting limit
Usage quotas
Audits
Fraud reporting
Combine costs from multiple resources
Support for different accounting mode

Performance

PM measures and analyses service performance. PM collects resource performance data, evaluates the data and raises alerts when the actual performance is in variance from the performance threshold limits. The PM may be able to take corrective action if the scope of the possible correction is within its scope; the scope of PM of a service agent is the service agent. PM maintains logs and performs trend analysis to predict and anticipate performance problems. PM consists of performance policy definitions, measurement and analysis systems. Services may temporarily run short of capacity, for example, in the face of high demand. To improve the performance of a service, including capacity, PM may cooperate with Configuration Managers (CM) to improve performance of the underlying services.

Performance Management covers:

Utilisation and error rates
Performance data collection
Consistent performance level
Performance data analysis
Problem reporting
Capacity planning
Performance report generation
Maintaining and examining historical logs

Security

SM provides multi-services, defense in-depth level security to control access and utilisation of the services, maintain privacy, confidentiality and information integrity. SM is designed to protect the services and prevent malicious, negligent and abusive behavior by authorised and non-authorised users alike. SM capabilities are distributed at all levels and the service realisations may also implement or incorporate the needed security for the protection of the service. In this defense in-depth approach, selective services may require multiple levels of screening of access and message flows; for example, at every mediator, collector and service itself.

SM maintains access rights (task based), access logs, audit trails, management and governance policy enforcement, raise security alarms and distribute necessary security related information. In our model, some of the service interfaces may be secure and, thus, be under access control. The security capabilities may be configured or changed only by authorised personnel; all changes would be distributed.

Service realisations may require point-to-point and/or end-to-end security mechanisms, depending upon the degree of threat or risk. Traditional, connection-oriented, point-to-point security mechanisms may not meet the end-to-end (e2e) security requirements of services. For example, traditional network level security mechanisms, such as Virtual Private Networks (VPNs) and Secure Multipurpose Internet Mail Exchange (S/MIME), are point-to-point technologies and are not sufficient for providing e2e security for a services environment that uses messages for complex interactions and where messages flow between and across various trust domains, facilitated by intermediaries. Therefore, e2e message-level security is important, as the intermediary may need access to some but not all of the information in the message. In this e2e security encompasses the security of messages between the intermediary and the original requester agent, and the intermediary and the ultimate receiver agent; when more than one intermediary is involved it also covers the security between each of the adjacent intermediaries. Secure message protocols have to be constructed to secure the message payload while providing access to, say, performance level information; for example, the message may specify non-logging at intermediaries or specify end of message life. Messages to the mediators, controllers, manager components of services and management services are screened to ensure security.

While virtualisation can have many worthwhile security benefits, security also becomes more of a management issue. In a services environment, there are innumerably more services to secure and more interconnection points. The security manageability problem is addressed by distributing security management to all levels and all service realisations, where each realisation may implement a particular security capability differently; this significantly complicates the task of anyone with the intent to cause harm. All services in the implementation of a process must meet or exceed the security requirements for the process. It should be noted that sub-processes may require a higher level of security control; the differences in the security levels require a controller or mediator service at the interaction point.

Security Management covers:

Selective resource access
Access logs
Data privacy
User access rights checking
Security audit trail log
Security alarm/event reporting
Taking care of security breaches and attempts
Security-related information distributions