Blog Blog

Entries with tag <em>fault management</em>.

OSSera is implementing OSS project in Asia

OSSera won the phase-I OSS project from one of the new Fiber Network Operators in Asia in 2012. Currently OSSera is implementing its 4 products at the Operator, including Inventory Management, Resource/Service Activation, Fault Management, and Performance Management.  The oeprator is expected to provide FTTH services to its customers in Spring 2014.

IPTV End-to-End Fault, Performance, Service Planning and Monitoring

IPTV has seen steady growth in the broadband market but with growth there are increasing challenges with customer experience management and overall quality of service.  Some problems include:

“We have a large number of Set-Top-Boxes (STB’s) in one area that cannot boot-up.”

“Some of our subscribers are having video quality issues?!?”

“Some subscribers are encountering video quality issues in one particular channel?”

A single STB is unable to boot-up?”

In the above problem scenarios, the root cause can be anywhere along the end-to-end service chain from the head end to the home.  To determine the root cause there are many options but the question for the service provider is who can provide the most flexible solution which helps to transform their overall business operations.

  • High Performance & Event Volumes - Once a severe problem occurs resulting in an “event or alarm storm” the single-threaded and processor bound architectures reach their limit.  These applications sometimes lose alarms and metrics or slow down to process data.
  • No E2E Network View - A NOC requires a topology view and generally alert displays are not the best way to visualize a unified network view.  Plus often network topologies are not designed by operations but planned by engineering.  There is often a disconnect in knowledge transfer.
  • Effective Alarm Management, Reporting and real-time Analysis - Correlation, Suppression, and Reporting analysis tools are required which capture human expert knowledge. 
  • Real-time Monitoring - Performance Management systems sometimes do not handle real-time performance data where Threshold Crossing Alarms (TCA’s) can be triggered.  Sometimes only historical data is captured and reported upon.  Other solutions can only plot real-time metrics but again no TCA’s are triggered according to KPI/KQI formulas and thresholds.
  • Complex KPI/KQI Definitions - Complex mathematical calculations are required so that formulas can be applied to KPI/KQI’s to be monitored effectively.
  • Trend Analysis - Performance trends can often seem haphazard - hourly, daily, weekly, and monthly moving averages are required for effective monitoring of overall trends to be proactive.
  • High Availability and lack of Fault-Tolerance - Management systems are still architected with primary and secondary HA clusters. Failover times are unacceptable.
  • Lack of visibility to Service and Customer Impact - How to identify customers and services potentially impacted due to changes in the network, to analyze the root cause of problems impacting end-to-end services, to re-use service components as a reference for new service designs

 

Solution

A single Fault Management or Performance Management solution is not enough and will not provide a clear view into root cause analysis of IPTV solutions.

For IPTV a total Service Assurance solution is required.

OSSera’s IPTV probe partnership combined with OSSera’s Service Quality Management functions are required to resolve both Set-Top-Box, Middleware, Video Channel Quality, and other possible problems.

OSSera’s Unified Management  and Data Modeling tool brings together all probe, event, and metrics into a single framework for fault, performance, service quality, and modeling of network topology for root cause analysis.

OSSera's OSS Explorer Platform is unique because of its multi-threaded symmetrically distributed architecture which can provide a 99.999% Fault-Tolerant Service Assurance monitoring solution.

There are different Phases that can be leveraged using this component based approach to End-to-End Service Assurance:

  • Scenario 1: Probes feed data into the OSSera Service Assurance solution
  • Scenario 2: Probes with Network Topology feed the Service Assurance solution.  Network topology can be imported from an existing Inventory system.
  • Scenario 3: Probes, Network Topology, and fault alarms can be gathered from network elements, Element Management Systems (EMS), or existing fault management systems via OSSera’s Data Mediation Platform (DMP).
  • Scenario 4: Probes, Network Topology, fault events, and performance metrics can be gathered from network elements, EMS, or existing performance management systems via OSSera’s DMP.

Download our latest solution brochure to see how OSSera can help.

Versatile Data Mediation Platform

OSSera Data Mediation Platform provides a set of drag and drop components to process:

  • incoming alarms, metrics, Call/Signaling Data Records, Inventory Retrieval, and
  • bi-directional command and response interactions (i.e.: Testing, Activating, Re-configurating, Pings, etc...)

The Data Mediation Platform has collectors for standard interfaces as well as tools for users to develop user-specific data collectors. These data collectors can collect and process data in any format.  For example, OSSera's Fault Management framework uses these alarm data collectors to collect and pre-process raw alarm data from NE/EMS/NMS or any other entities.  

Multi-Protocol

OSSera's Data Mediation Platform (DMP) supports SNMP, Socket, File, and Corba with supporting Managed Information Base (MIB) Loading tools and ID/Parse rules for non-SNMP protocols.  Data from various protocols are normalized into a standardized Flow Builder.

Flow Builder

Flow Builder is a drag and drop tool which allow the user to create states.  States model the input, ID, Parse, and Error states of the data flow.  Also users can model output, final, transition, and final states of the flow.  This streamlines the process in managing resources across a wide domain.

Control Flow

Action States are implemented for applications such as Advanced Troubleshooting where commands are sent to various managed resources for testing, pings, and retrieving data.

 

Debugging Mode

From experience the process in managing Data Mediation can be painfully cumbersome.  Debugging features have been added including breakpoints, stepping, value checks, and flow animation to make sure DMP functions have been implemented correctly in building adapters.

Deploy and Execute Flows

Flows are packaged and can be installed with an installer for ease of distribution across a carrier's network.  Once deployed they can be executed to run.

A Unified Data Mediation Platform

Using the OSSera platform as a foundation the DMP automates interactions critical for Advanced Troubleshooting, Fault Management, Performance Management, Service Problem Management, Service Quality Management, Inventory Retrieval/Resource Management, and Customer Experience Management/CDR Service Usage Management.

 

 

OSSera's Data Mediation Platform Brochure

OSSera Tackles Fault-Tolerant Fault Management - don't skip a beat

OSSera's OSS Explorer Platform is unique because of its multi-threaded symmetrically distributed architecture which can provide a 99.999% Fault-Tolerant Fault Management monitoring solution.

  • Unlike other monitoring solutions which have a High Availability add-on component, OSSera's OSS Explorer platform does not have a Primary and Secondary system architecture, therefore there is no Secondary system that must be kept in synch with the Primary.
  • Unlike other monitoring solutions which have a Hot and/or Cold Standby system, OSSera's OSS Explorer platform does not have any transition time between the runtime production system to a Hot and/or Cold Standby system. 

These industry options are common to High Availability Clusters (HAC).  In summary HAC's are groups of computers that support server applications (i.e.: mission-critical fault management software applications) that can be reliably utilized with minimum down-time.   HAC's detect hardware/software faults, and immediately restart the application on another system without requiring administrative intervention, a process known as failover (see HAC definition from wikipedia below).

Fault-Tolerance

Unlike High Availability Clusters, A fault-tolerant system must be architected to just continue to run without skipping a beat.

Unlike High Availability Clusters, A fault-tolerant system does not have a failover system that is restarted.

A fault-tolerant system is designed from the ground up for reliability...

OSSera's Fault Management architecture is built upon OSS Explorer which has been designed from the ground up to be fault-tolerant due to its unique ability to distribute processing across a multi-server/multi-core virtualized environment and shift the load transparently based upon available processors and servers.

Therefore based upon the definitions below, fault-tolerance is even more reliable and available than High Availability because a Fault-Tolerant system does not have to "resubmit", "restart", and/or "failover" to a secondary system.

Imagine being able to handle disaster recovery, event storms, and maintenance upgrades without skipping a beat.  Never lose sight to critical resources and services.

See Related Blog: 99.999% - Myth or Reality?!?

Download the Brochure on Fault Management:

 


Reference: Wikipedia

Fault-tolerance or graceful degradation is the property that enables a system (often computer-based) to continue operating properly in the event of the failure of (or one or more faults within) some of its components. A newer approach is progressive enhancement. If its operating quality decreases at all, the decrease is proportional to the severity of the failure, as compared to a naïvely-designed system in which even a small failure can cause total breakdown. Fault-tolerance is particularly sought-after in high-availability or life-critical systems.

PC Magazine Definition of: fault tolerant 

The ability to continue non-stop when a hardware failure occurs. A fault-tolerant system is designed from the ground up for reliability by building multiples of all critical components, such as CPUs, memories, disks and power supplies into the same computer. In the event one component fails, another takes over without skipping a beat.

Tandem and Stratus were the first two manufacturers that were dedicated to building fault-tolerant computer systems for the transaction processing (OLTP) market.

High Availability
Many systems are designed to recover from a failure by detecting the failed component and switching to another computer system. These systems, although sometimes called fault tolerant, are more widely known as "high availability" systems, requiring that the software resubmits the job when the second system is available.

Redundant Hardware
True fault tolerant systems with redundant hardware are the most costly because the additional components add to the overall system cost. However, fault tolerant systems provide the same processing capacity after a failure as before, whereas high availability systems often provide reduced capacity.

Definition of: fault management 

The monitoring of error indications in a computer system in order to log the occurrences and send alerts to system administrators and field service. Fault management software keeps track of hardware faults such as memory parity errors and software crashes. The proper analysis of the frequency and type of such errors is intended to initiate a repair order before a total breakdown occurs.

See Related Wiki Pages:

High Availability Clusters (HAC)

High-availability clusters (also known as HA clusters or failover clusters) are groups of computers that support server applications that can be reliably utilized with a minimum of down-time. They operate by harnessing redundant computers in groups or clusters that provide continued service when system components fail. Without clustering, if a server running a particular application crashes, the application will be unavailable until the crashed server is fixed. HA clustering remedies this situation by detecting hardware/software faults, and immediately restarting the application on another system without requiring administrative intervention, a process known as failover. As part of this process, clustering software may configure the node before starting the application on it. For example, appropriate filesystems may need to be imported and mounted, network hardware may have to be configured, and some supporting applications may need to be running as well.

Veritas Clustering:

High availability clusters (HAC) improve availability of applications by failing them over or switching them over in a group of systems as opposed to High Performance Clusters which improve performance of applications by allowing them to run on multiple systems simultaneously. 

High Performance Computing (HPC)

Showing 4 results.

 

About Us  -  OSSera, Inc. is a global provider of Operational Support System (OSS) solutions for IT organizations, service planning, service operations, and network operations.  OSSera's multi-threaded symmetrically distributed platform fully leverages modern multi-core server hardware to provide higher flexibility, reliability, and scalability for service and resource management solutions.  OSSera's products support the TM Forum's suite of standards especially in the area of Service Management, Fault Management, Performance Management, Data Mediation, and Configuration Management.   Meet the management team.

Twitter Twitter

Blogs Aggregator Blogs Aggregator

OSSera DART (Data Analytics & Reporting Tool) is a Big Data Analytics Platform which is used to examine large amounts of data of a variety of types to uncover hidden patterns, unknown
OSSera Delivers Resource/ServiceActivation System Resource Activation and Service Activation Management are important components in Resource and Service Management Domain of Operations Support
OSSera won the phase-I OSS project from one of the new Fiber Network Operators in Asia in 2012. Currently OSSera is implementing its 4 products at the Operator, including Inventory Management,
Resource Inventory Management is an important component in Resource Management Domain of Operations Support System (OSS). Resource Inventory applications manage information of all resources used to
Download our latest NMS Brochure The OSSera Network Management System (OSSera NMS) is a software platform that can manage network with large number of Network Elements (NE) and multiple
RSS (Opens New Window)
Showing 1 - 5 of 20 results.
of 4