Agenda & Session Schedule

Automated Crash Analysis


Nace Sapundziev

Nikolai Tankov

If something goes wrong with a Java process, typically you would manually investigate it. Now imagine you have a large landscape with thousands of Java processes. It would take a huge amount of time detecting the problem, collecting the right raw data for the investigation of the root cause and possible solution.
Automated Crash Analysis is a framework for detecting problems, collecting and analyzing data, and for some type of problems – even automatic recovery. The framework consists of a central monitoring system, which aggregates data from all instances. Based on this data, a rule engine detects problems, such as slow response times, unresponsive applications, constantly high CPU or memory consumption, etc. Upon detection, the rule engine triggers collection of the relevant data, archives it and sends it to a central analysis module, which unpacks the data and distributes it to different analyzers. A cumulative report containing the proposed (or even already taken) actions is sent to the system administrator.
The Automated Crash Analysis greatly simplifies the investigation and shortens the time to resolution. The framework is part of SAP’s R&D and will be contributed soon to the open source community.

Mobile2Days