Reverse Engineering

 
 

Current Projects

  • Reverse Engineering Tool Knowledge Base
  • We have created a knowledge base on obtaining, installing, and using various reverse engineering research tools. This collection of information is hosted in a moinmoin (a Wiki variant), that is, an end-user editable web page, so that other tool users can benefit from and add to our experience.
  • MRS.G: An Eclipse plug-in for GXL
  • MRS.G is an environment for working with GXL files. Currently, it consists of an editor with syntax highlighting and a converter framework for combining different GXL converters to obtain end-to-end conversion between two local formats.

Background

A standard definition for reverse engineering was given by Chikofsky and Cross in 1990:

"Reverse engineering is the process of analyzing a subject [software] system to:

  • identify the system’s components and their interrelationships and
  • create representations of the system in another form or at a higher level of abstraction
  • [1].

This definition has proven useful because it is flexible; it describes reverse engineering as a process without explicitly stating its inputs or outputs. In practice, the inputs to the process include source code, logs from configuration management systems, interviews with developers, and documentation. The outputs can be diagrams, new documentation, a searchable database, or re-formatted source code. Research in reverse engineering is concerned with the creation of tools and techniques to facilitate this analysis, particularly for legacy software systems with thousands of lines of code (KLOC) or millions of lines of code (MLOC). There is a focus on tools in this research area because it is difficult to apply techniques to large legacy systems without tool support.

Diagram of Reverse Engineering Processing Pipeline

The figure shows a typical sequence of steps to reverse engineer a software system. The inputs and outputs are shown in boxes with rounded corners. The steps in the process are shown as rectangles with square corners. The first step is to extract the desired facts from the input. In the second step, the facts are further analyzed before being presented to the user in the third and final step. Below each of the boxes are example products or tools. This sequence of steps is similar to compilation with front end analysis, optimisation, and code generation. The distinction here between extraction and analysis is somewhat artificial in that most extraction and presentation tools perform some analysis, but it is useful because most tool suites have separate tools for each stage. Because the individual tools are difficult to use in isolation, most researchers create workbenches or pipelines that consist of tools that perform each step along with software to link them together. The integration software typically consists of controller software that invokes each of the tools in sequence and a database, or factbase, for storing and querying the facts.

References

[1] Elliot J. Chikofsky and James H. Cross II, "Reverse Engineering and Design Recovery: A Taxonomy," IEEE Software, pp. 13-17, 1990.