Channels ▼
RSS

Tools

A Build System for Complex Projects: Part 2


DependencyAnalyzer.py

Let's examine the DependencyAnalyzer.py module. The entry point is the get_project_depenencies() function. It accepts a project_dir, the static libraries dir, the include search path, and a list of extensions. It returns a list of projects that the current project depends on. The algorithm is pretty simple: Get the dependencies of each file in the project (that has the right extension) and prune duplicates.


def get_project_dependencies(project_dir, libs_dir, search_path,
                             extensions=['.cpp', '.hpp', '.c', '.h']):
  """Get all the projects in the libs dir that the target project depends on

The algorithm gets the dependencies of every file in the current project and keep a list of all the directories the files reside in.

  @project_dir: the target project
  @libs_dir: the name of the static libraries parent dir (e.g. 'nta')
  @search_path: the list of include directories
  @extensions: the list of file extensions that are checked for dependencies
  """
  files = glob.glob(os.path.join(project_dir, '*.*'))
  all_dependencies = []
  for f in files:
    if not os.path.isfile(f):
      continue
    if not os.path.splitext(f)[1] in extensions:
      continue
    file_dependencies = []
    get_file_dependencies(f, libs_dir, file_dependencies, search_path)
    all_dependencies += file_dependencies

  temp = [os.path.dirname(f) for f in all_dependencies]

  dependencies = []
  for p in temp:
    if not p in dependencies:
      dependencies.append(p)

  dependencies.remove(project_dir)
  return dependencies

How do you find the dependencies of a file? Via a recursive scan of its #include statements. There is a little bit more going on, but the whole point of the exercise is to figure out what projects you depend on so you can build them. Various system #includes that have already built libraries are irrelevant. So, the prefix serves as a filter to limit the search to #include statements that include the prefix.


def get_file_dependencies(filename, prefix, file_dependencies, search_path):
  """Get all the projects in the prefix dir that the target file depends on

  All the filenames it #includes are extracted using get_file_includes().
  The dependencies of each dependency are extracted recursively.

  @filename: the target filename
  @prefix: the prefix of interesting dependencies
  @file_dependencies: the list of dependencies (grows as the function trundles along)
  """
  filename = os.path.abspath(filename)
  if filename in file_dependencies:
    return
  else:
    file_dependencies.append(filename)

  text = open(filename).read()
  includes = get_file_includes(text, prefix, search_path)

  includes = [i[0] for i in includes if starts_with_prefix(i[0], prefix, search_path)]
  for i in includes:
    if i not in file_dependencies:
      get_file_dependencies(i, prefix, file_dependencies, search_path)

The get_file_dependencies() function uses the helper function get_file_includes() to get all the relevant #include statements. It uses a regular expression to match every line in the file. The regular expression is compiled using Python's re module and works with both double quotes (") and angled brackets (< >) around the included file. It can also handle leading whitespace and any following whitespace or comments. The regex also uses groups -- the parts in the expression surrounded by braces as in (.*). This allows the extraction of the interesting parts directly without further parsing of each line.

The entire function is a nice example for using regular expressions in Python. The result of a successful match is an object that contains a list of groups that are stored as a 3-tuple in the results and filtered according to the prefix.


# Pick up both #include statements
# Also take care of comments following the #include statement
include_re = re.compile('\s*#include [<"](.*)[>"](.*)')

def get_file_includes(text, prefix, search_path):
  """Get the filenames from #include statements in a text

  The text usually comes from a source file. If prefix
  is not empty it will return only include statements
  whose content (following the first quote or angle bracket)
  begins with the prefix.

  The algorithm is to extract the relative filename using a regex
  and then scan the search path an try to append the relative filename
  and see if it exists.
  """
  includes = []
  lines = text.split('\n')
  for line in lines:
    m = include_re.match(line)
    if m is not None:
      includes.append((m.group(1), line, m.group(2)))

  results = []

  for i in includes:
    if not i[0].startswith(prefix):
      continue
    for d in search_path:
      full_path = os.path.join(d, i[0])
      if os.path.exists(full_path):
        results.append((full_path, i[1], i[2]))
        break

  return results

Conclusion

This article discussed the architecture and implementation of the generic core of the ibs. It explored the implementation and demonstrated several interesting aspects of the architecture and the code: separating generic logic from custom logic using lightweight plug-ins (dynamically loaded helper modules), using templates (object can manage text files with placeholder and substitution dicts) to generate build files, and automatic discovery of dependencies using regular expressions to match #include statements. The next article will delve into the implementation of a specific build system (NetBeans 6) within ibs and demonstrate how the sage development manager Isaac and the dedicated Bob "the Builder" use it to build their enterprise "Hello, World!" system.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video