Internet-Scale Source Code Searching

 

Participate in Our Web Survey- Now Open

 

Current Work

The availability of massive repositories of open source code on the Internet and the ability to search these repositories has the potential to revolutionize software construction and component reuse. Currently, we do not have a good understanding of how programmers use open source code repositories and to what extent the search capabilities of these repositories meet their needs.

In the proposed research, we seek to study users of open source code repositories to understand what their goals are when searching, what they search for, and how they perform these searches. We plan to conduct the empirical study by means of an online survey about search practices of software developers that will be distributed to developers from academia as well as industry. Results from the empirical study will be used to inform the improvement of open source code search engines.

Throughout the proposed work, we seek to answer the following research questions:

  • Who searches for source code on the Internet?
  • For what purpose do they conduct these searches?
  • How do they conduct these searches?
  • To what extent do existing repositories and tools meet or fail to meet their needs?
  • How do they use the code that they find?
  • Has the availability of large code repositories changed how they program?

The survey is now online and open.


Background

Prior work on source code searching (Sim et. al., 1998), illustrates that useful information about how software developers search source code can be obtained using an exploratory web-based survey.

The results of this study were a set of archetypal searches that characterized how software developers perform software maintenance activities. This work provides the intellectual and methodological background for the proposed emprical study.

We now wish to build on this prior work because source code searching on the Internet, we feel, is fundamentally different from searches that are performed within an integrated development environment. Therefore, it becomes necessary to conduct a new empirical study in order to lend insight into how programmers use existing source code repositories to aid software development processes.

The software engineering community will benefit from the knowledge of how open source software repositories are being used to aid software development. Also, this study informs the improvement of Sourcerer, a search engine for source code. The improvements may enable more relevant and useful source code to be retrieved, thereby lessening software production time and increasing software reuse.


References

[1] Sim, S. E., Clarke, C. L. A., and Holt, R. C., Archetypal Source Code Searching: A Survey of Software Developers and Maintainers, Proc. Sixth International Workshop on Program Comprehension, Ischia, Italy, pp. 180-187, 24-26 June, 1998.