[CPM-SPIRE-L] Postdoc position in Paris on pattern matching of source code

Gregory Kucherov Gregory.Kucherov at univ-mlv.fr
Sun Nov 29 07:10:51 PST 2015


On behalf of Roberto Di Cosmo

================================

We are looking for an applicant for a post-doc position at the INRIA center
in Paris, France, taking office in 2016, possibly between January and
October, for a period of 2 years, on the subject detailed below.

It is essential to express your interest no later than December 7, 2015.

We invite anyone interested to contact ASAP Roberto Di Cosmo (roberto at dicosmo.org).

Subject:
  Efficient pattern matching in large masses of source code.

Summary:

Searching for patterns in the source code is a natural activity
for anyone who develops software, or tries to understand it.

In the 1970s, finite automata theory and regular expressions provided the basis
for building versatile tools like "grep" that are used regularly today.

With the take over of free software, the number of programs available with their
source code has skyrocketed, bringing us a new challenge: performing efficient
pattern matching not only in the few files of a given software project, but in
the entire source code of all the free software on the planet.

On a significant code base, such as the Debian distribution, an approach based
on inverted indexes of trigrams have made the problem treatable up to a
billion of lines of code (see https://swtch.com/~rsc/regexp/regexp4.html and
http://sources.debian.net/).

But the source code available today represents several billions of single files and
hundreds of billions of lines of code!

The purpose of this post-doc proposal is to explore the new approaches needed to
scale up, and experiment them together with a team that already has a collection
of source code of realistic dimensions. 

Good theoretical knowledge of related theoretical fields, and a certain passion
for tool development will be highly appreciated.



More information about the CPM-SPIRE-L mailing list