CppETS 1.0 and 1.1 have been developed and successfully deployed.
Susan Elliott Sim, Richard C. Holt, Steve Easterbrook. "On Using a Benchmark to Evaluate C++ Extractors." Proceedings of the Tenth International Workshop on Program Comprehension, Paris, France, pp. 114-123, 26-29 June, 2002.
Naturally, the next step is to develop CppETS 2.0.
Here are some ideas for improvement that came from our experience working with CppETS.
- Interpretation Framework for ResultsThe results are hard to interpret, and therefore hard for someone to use to select a fact extractor. A user would need understand his or her requirements very well and to look very closely at how each fact extractor performed on particular test cases. It would be nice if we could have a search form where a user could say what they wanted to use the extractor for and what features were important, and the framework could return a short list.
- Schema Zoo It would be nice to collect the schemas for various fact extractors (and downstream analysis tools) and permit evaluation and selection of a fact extractor analytically. I have a collection of schemas and I think other people do too (e.g. Jean-Marie Favre and his MDE site).
- Data Requirements for Downstream Analysis An examination of these requirements could start with the aforementioned schema zoo. Other factors such as accuracy would also need to be considered. For example, is an extractor that is 97% accurate good enough? Are some errors more serious than others?