Reverse Engineering |
Current Projects
BackgroundA standard definition for reverse engineering was given by Chikofsky and Cross in 1990:"Reverse engineering is the process of analyzing a subject [software] system to:
This definition has proven useful because it is flexible; it describes reverse engineering as a process without explicitly stating its inputs or outputs. In practice, the inputs to the process include source code, logs from configuration management systems, interviews with developers, and documentation. The outputs can be diagrams, new documentation, a searchable database, or re-formatted source code. Research in reverse engineering is concerned with the creation of tools and techniques to facilitate this analysis, particularly for legacy software systems with thousands of lines of code (KLOC) or millions of lines of code (MLOC). There is a focus on tools in this research area because it is difficult to apply techniques to large legacy systems without tool support.
The figure shows a typical sequence of steps to reverse engineer a software system. The inputs and outputs are shown in boxes with rounded corners. The steps in the process are shown as rectangles with square corners. The first step is to extract the desired facts from the input. In the second step, the facts are further analyzed before being presented to the user in the third and final step. Below each of the boxes are example products or tools. This sequence of steps is similar to compilation with front end analysis, optimisation, and code generation. The distinction here between extraction and analysis is somewhat artificial in that most extraction and presentation tools perform some analysis, but it is useful because most tool suites have separate tools for each stage. Because the individual tools are difficult to use in isolation, most researchers create workbenches or pipelines that consist of tools that perform each step along with software to link them together. The integration software typically consists of controller software that invokes each of the tools in sequence and a database, or factbase, for storing and querying the facts.
References[1] Elliot J. Chikofsky and James H. Cross II, "Reverse Engineering and Design Recovery: A Taxonomy," IEEE Software, pp. 13-17, 1990. |