Today I moved my Generic Dependency Parser (rcdp) utility to google code hosting.
The parser is generic and can work on arbitrary files and dependency rules. For demonstration purposes rcdp recorded inter-project dependencies of Boost 1.40.0 by parsing over 6000 Boost header files.
The Image below is the dependency graph for the Boost.Archive library.
How is Boost parsed?
ruby rcdp/dependencies/boost.rb <path_to_boost>
The script records all header files within the boost include directory. Each file path is assigned a vertex name: if the file path contains a nested directory inside boost include directory, then the first nested directory name is used as a vertex name in the graph. If the file path points directly to the boost include directory the vertex name is derived as follows: if a directory with the same name exists (except for the file extension ‘.hpp’) then the directory name is used. Else, the basename of the file including the extension is used (considered as mini-library).
Next, each recorded file is parsed for matching #include preprocessor statements. On every such statement the script tries to lookup the file inside boost directory and its vertex name. A dependency is then generated between the vertex name of the file parsed and the vertex name from the resolved include statement. If the dependency causes a cycle in the dependency graph, the dependency is not added but error’d to the logger.
After all files are parsed, the dependency graph is reduced by removing edges u -> w, where u -> … -> w exists and written to a ‘.dot’ file that can be converted into various formats using http://www.graphviz.org/.
Additionally, for each project a more readable dependency graph is generated by masking unrelated dependencies.
Limitations of parsing
The parser is not
- a complete preprocessor parser: it does not care about conditional include statements or commented ones. Parsing is based on simple pattern matching to keep to code small nice. You might however add a complete parser if you like to
- recording cyclic dependencies: some include statements cause cyclic dependencies that simply result from the choice of mapping to vertex names. I.e. when file boost/a/detail/detail.hpp includes boost/b/win32/abc.hpp which in turn includes boost/a/other/other.hpp and the mapping resolves vertex names from the first nested directory inside boost, we generate a dependency from a -> b and finally another one from b -> a which will not be added.