GithubHelp home page GithubHelp logo

abyss7 / dist-clang Goto Github PK

View Code? Open in Web Editor NEW
33.0 6.0 13.0 446.95 MB

Distributed [cross-]compilation for Clang

License: GNU General Public License v3.0

Shell 0.01% Python 0.03% C++ 97.16% C 2.00% Batchfile 0.01% Pawn 0.01% Assembly 0.80%
c-plus-plus cross clang

dist-clang's Introduction

Build Status

DistClang is the Clang compiler extension with a client-server infrastructure. It features the distributed cross-platform compilation and the intermediate result caching.

The project consists of 2 executables and a couple of configuration files.

clang is a client part and should reside on the machine where the compilation is invoked. It should replace the invocation of the original compiler.

clangd is a server that has 2 different roles: emitter and absorber.

How to build

First of all do clone with an argument --recurse-submodules and then configure the project:

./build/configure

To build You have to use a recent Clang compiler with C++14 support.

For debugging and local usage

ninja -C out/Debug.gn All
cd out/Debug.gn
ln -s clang clang++

The resulting files clang, clang++ and clangd are located in the out/Debug.gn folder.

Linux DEB and RPM packages

ninja -C out/Release.gn rpm_package deb_package

The resulting packages are:

out/Release.gn/dist-clang_<version>_amd64.deb
out/Release.gn/rpmbuild/RPMS/x86_64/dist-clang-<version>-1.x86_64.rpm

Don't use locally the clang and clangd from the out/Release.gn folder since they are hardcoded to use libraries from /usr/lib/dist-clang folder.

Mac OS X package

ninja -C out/Release.gn pkg_package

The resulting package is out/Release.gn/dist-clang-<version>.pkg

How to configure the emitter

TODO!

How to configure the absorber

TODO!

How to run local compilation

The basics is to make use of dist-clang's clang and clang++ as the compilers. Doing

export CC=/usr/bin/dist-clang/clang CXX=/usr/bin/dist-clang/clang++

possibly should work almost always.

To work properly the dist-clang should know about the real compiler's path and compiler's version.

Use local config file

One way to provide information about a real compiler is to put config file somewhere on the path to the folder where the build is performed. File must be named .distclang and should contain something like this:

path: "third_party/llvm-build/Release+Asserts/bin/clang"
version: "clang version 3.7.0 (trunk 231690)"

Use environment variables

Another way - is to set some env. vars:

export DC_CLANG_PATH="/usr/bin/clang"
export DC_CLANG_VERSION="clang version 3.7.0 (trunk 231690)"

Rely on auto-detect

The last resort is to use dist-clang's auto-detect feature: it tries to find the next clang in the path, that differs from the current binary, i.e. /usr/bin/dist-clang/clang. It's a not recommended and error-prone way, since internally paths are compared as a raw strings - without link resolution, etc.

In any way, if the clang path is provided without version, then the version is carved out of the real clang's output.

dist-clang's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

dist-clang's Issues

Implement store for remote-unfriendly files

There are some preprocessed files, that don't compile on remote side because of warnings like -Wtautological-compare, etc. It would be wise to store hashes for such files and do not attempt to compile them remotely.

Direct cache may not get full dependencies

The problem is with some Clang options that restrict headers in the produced dependency file to the in-project headers only, i.e. it skips system headers.

We need to override such behaviour and store full dependencies.

Implement the load balancer

The balancer should be able to dynamically increase/decrease the number of available local jobs for each daemon - according to the third-party average load on the workstation. Also balancer should be able to dynamically changes the number of available jobs for the remote daemons - to properly balance all workstation in use.

Two source files with the same content may "collide" in direct cache

Source root can have to identical files in the different directories, which may compile into different object code: for example, if we do #include "file.c" from two different locations, this can mean two different object files must be generated, because contents of file.c from two different locations may not be the same.

When checking direct cache with clangd, we only look at (source file content, command line, version) triple, where command line does not include source file name. And if two source files compile with the same version and flags (which does not look awkward in a typical project), clangd may cache object code from compiling the first file, and then "hit" the cache searching for matching content from the second file, producing wrong object code for it.

I created simple example and attached it. Reproduction is as follows:

malets@ub:/tmp/clang-test$ make clean
rm -f a/file.o a/file.d b/file.o b/file.d lib.so
malets@ub:/tmp/clang-test$ make
cc -c -MD -MF a/file.d -o a/file.o a/file.c
cc -c -MD -MF b/file.d -o b/file.o b/file.c
ld -shared -o lib.so a/file.o b/file.o
malets@ub:/tmp/clang-test$ make clean
rm -f a/file.o a/file.d b/file.o b/file.d lib.so
malets@ub:/tmp/clang-test$ make CC=/usr/bin/dist-clang/clang
/usr/bin/dist-clang/clang -c -MD -MF a/file.d -o a/file.o a/file.c
/usr/bin/dist-clang/clang -c -MD -MF b/file.d -o b/file.o b/file.c
ld -shared -o lib.so a/file.o b/file.o
b/file.o: In function `f':
a/file.c:(.text+0x0): multiple definition of `f'
a/file.o:a/file.c:(.text+0x0): first defined here
make: *** [lib.so] Error 1

Command '--print-file-name' prints two lines instead of one

This command maybe used to determine the path to some internal files (i.e. to link compiler-rt builtins for ubsan), but when used with dist-clang the client outputs only a filename as a first line. The client should output only a single line provided by real compiler.

Implement a fair file cache clean-up

Right now the cache is cleaned in the following way:

  • Find the least recent 1-st level folder (folder #1).
  • Find the least recent 2-nd level folder (folder #2) inside the folder #1.
  • Delete all the least recent files inside the folder #2 until the cache size fits the limit.

Better to make more precise clean-up.

Store actual object and etc. files via client

Right now there is a problem when the build is interrupted. The clang clients are already gone, while the daemon may still produce object files - it's unexpected behavior.

The daemon should produce temporary files and report them to client. If the client is active it should move them to expected paths. Otherwise, daemon should remove temporary files.

Implement framework for custom tools

Refactor the code in the way that it will be easy to implement any custom tool that has the same environment (configs, etc.) as a client or daemon.

Do the cache entry compression in async way

Right now the compression is a part of the method FileCache::DoStore() - it requires to store the whole object inside memory to perform compression. Also, it consumes time of compilation process. If we delay the compression, then we can efficiently copy the object file to the cache, and compress it lately using the memory mappings.

Actualize expectations about a compiler path in messages from client

Right now in the code the client won't send a message without a compiler path. Also the client does path replacements inside flags according to the specific compiler path - so it's impossible to send a proper message from client to Emitter without predefined compiler path. We should reflect this logic in daemon's code and tests.

clangd does not correctly exit on SIGTERM and SIGINT in most situations

The cause is the following:

  1. Upon receiving a signal, the main thread is joining all the worker threads it has created.
  2. File cache has "Cache Resetter Worker" (https://github.com/abyss7/dist-clang/blob/master/src/cache/file_cache.cc#L108), which uses simple thread sleeps without any synchronisation (https://github.com/abyss7/dist-clang/blob/master/src/cache/file_cache.cc#L104).
  3. By default, if not overriden in config, sleep duration for this resetter thread is 600 seconds (https://github.com/abyss7/dist-clang/blob/master/src/daemon/configuration.proto#L31).
  4. As a result, clangd only exits 10 minutes after receiving termination signal. supervisorctl and other process management tools usually only wait a few seconds before sending SIGKILL, thus clangd always gets killed by this signal and does not exit gracefully.

Implement permissive cache mode

Looks like the direct cache has problems. To debug them it's convenient to compare results of direct cache hit and simple cache hit.

Collect basic statistics and be able to print them

The list of stats are:

  • Количество зря потраченного времени на удалённой стороне
  • Максимальное время выполнения локальной компиляции - позволит оценить новый необходимый таймаут
  • Количество попаданий в прямой кеш
  • Количество попаданий в обычный кеш
  • Количество успешных локальных тасков
  • Количество успешных удалённых тасков

Allow to parse Clang args externally

This feature should be turned on in config.

It's required in case the internal command-line parser becomes incompatible with previous Clang versions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.