.. _compiler-tools:

=======================
Anaconda compiler tools
=======================

Anaconda 5.0 switched from OS-provided compiler tools to our own toolsets. This
allows improved compiler capabilities, including better security and
performance. This page describes how to use these tools and enable these
benefits.

Compiler packages
=================

Before Anaconda 5.0, compilers were installed using system tools such as XCode
or ``yum install gcc``. Now there are conda packages for Linux and macOS
compilers. Unlike the previous GCC 4.8.5 packages that included GCC, g++, and
GFortran all in the same package, these conda packages are split into separate
compilers:

macOS:

* clang_osx-64.
* clangxx_osx-64.
* gfortran_osx-64.

Linux:

* gcc_linux-64.
* gxx_linux-64.
* gfortran_linux-64.

A compiler's "build platform" is the platform where the compiler runs and
builds the code.

A compiler's "host platform" is the platform where the built code will finally
be hosted and run.

Notice that all of these package names end in a platform identifier which
specifies the host platform. All compiler packages are specific to both the
build platform and the host platform.

Using the compiler packages
===========================

The compiler packages can be installed with conda. Because they are designed
with (pseudo) cross-compiling in mind, all of the executables in a compiler
package are "prefixed." Instead of ``gcc``, the executable name of the compiler
you use will be something like ``x86_64-conda_cos6-linux-gnu-gcc``. These full
compiler names are shown in the build logs, recording the host platform and
helping prevent the common mistake of using the wrong compiler.

Many build tools such as ``make`` and ``CMake`` search by default for a
compiler named simply ``gcc``, so we set environment variables to point these
tools to the correct compiler.

We set these variables in conda ``activate.d`` scripts, so any environment in
which you will use the compilers must first be activated so the scripts will
run. Conda-build does this activation for you using activation hooks installed
with the compiler packages in ``CONDA_PREFIX/etc/conda/activate.d``, so no
additional effort is necessary.

You can activate the root environment with the command ``conda activate root``.

.. _mac-SDK:

macOS SDK
=========

The macOS compilers require the macOS 10.9 SDK or above. The SDK license prevents
it from being bundled in the conda package. We know of 2 current sources for the
macOS SDKs:

- https://github.com/devernay/xcodelegacy
- https://github.com/phracker/MacOSX-SDKs

We usually install the 10.10 SDK at ``/opt/MacOSX10.10.sdk`` but you may install
it anywhere. Edit your ``conda_build_config.yaml`` file to point to it, like this::

    CONDA_BUILD_SYSROOT:
      - /opt/MacOSX10.10.sdk        # [osx]

At Anaconda, we have this configuration setting in a centralized
``conda_build_config.yaml`` at the root of our recipe repository. Since we run
build commands from that location, the file and the setting are used for all
recipes. The ``conda_build_config.yaml`` search order is described further at
:ref:`conda-build-variant-config-files`.

Build scripts for macOS should make use of the variables
``MACOSX_DEPLOYMENT_TARGET`` and ``CONDA_BUILD_SYSROOT``, which are set by
conda-build (see :ref:`env-vars`). These variables should be translated into
correct compiler arguments, e.g. for Clang this would be::

    clang .. -isysroot ${CONDA_BUILD_SYSROOT} -mmacosx-version-min=${MACOSX_DEPLOYMENT_TARGET} ..

Most build tools, e.g. CMake and distutils (setuptools), will automatically pick
up ``MACOSX_DEPLOYMENT_TARGET`` but you need to pass ``CONDA_BUILD_SYSROOT``
explicitly. For CMake, this can be done with the option
``-DCMAKE_OSX_SYSROOT=${CONDA_BUILD_SYSROOT}``. When building Python extensions
with distutils, one should always extend ``CFLAGS`` before calling
``setup.py``::

    export CFLAGS="${CFLAGS} -i sysroot ${CONDA_BUILD_SYSROOT}"

When building C++ extensions with Cython, ``CXXFLAGS`` must be similarly modified.


Backward compatibility
======================

Some users want to use the latest Anaconda packages but do not yet want to use
the Anaconda compilers. To enable this, the latest Python package builds have
a default ``_sysconfigdata`` file. This file sets the compilers provided by the
system, such as ``gcc`` and ``g++``, as the default compilers. This way allows legacy
recipes to keep working.

Python packages also include an alternative ``_sysconfigdata`` file that sets
the Anaconda compilers as the default compilers. The Anaconda Python executable
itself is made with these Anaconda compilers.

The compiler packages set the environment variable
``_PYTHON_SYSCONFIGDATA_NAME``, which tells Python which ``_sysconfigdata`` file
to use. This variable is set at activation time using the activation hooks
described above.

The new ``_sysconfigdata`` customization system is only present in recent
versions of the Python package. Conda-build automatically tries to use the
latest Python version available in the currently configured channels, which
normally gets the latest from the default channel. If you're using something
other than conda-build while working with the new compilers, conda does not
automatically update Python, so make sure you have the correct
``_sysconfigdata`` files by updating your Python package manually.

Anaconda compilers and conda-build 3
====================================

The Anaconda 5.0 compilers and conda-build 3 are designed to work together.

Conda-build 3 defines a special jinja2 function, ``compiler()``, to make it
easy to specify compiler packages dynamically on many platforms. The
``compiler`` function takes at least 1 argument, the language of the compiler
to use::

    requirements:
      build:
        - {{ compiler('c') }}

"Cross-capable" recipes can be used to make packages with a host platform
different than the build platform where conda-build runs. To write
cross-capable recipes, you may also need to use the "host" section in the
requirements section. In this example we set "host" to "zlib" to tell
conda-build to use the zlib in the conda environment and not the system
zlib. This makes sure conda-build uses the zlib for the host platform
and not the zlib for the build platform.

::

    requirements:
      build:
        - {{ compiler('c') }}
      host:
        - zlib

Generally, the build section should include compilers and other build tools and
the host section should include everything else, including shared libraries,
Python, and Python libraries.

An aside on CMake and sysroots
==============================

Anaconda's compilers for Linux are built with something called crosstool-ng.
They include not only GCC, but also a "sysroot" with glibc, as well as the rest
of the toolchain (binutils). Ordinarily, the sysroot is something that your
system provides, and it is what establishes the libc compatibility bound for
your compiled code. Any compilation that uses a sysroot other than the system
sysroot is said to be "cross-compiling." When the target OS and the build OS
are the same, it is called a "pseudo-cross-compiler." This is the case for
normal builds with Anaconda's compilers on Linux.

Unfortunately, some software tools do not handle sysroots in intuitive ways.
CMake is especially bad for this. Even though the compiler itself understands
its own sysroot, CMake insists on ignoring that.  We've filed issues at:

* https://gitlab.kitware.com/cmake/cmake/issues/17483


Additionally, this Stack Overflow issue has some more information: https://stackoverflow.com/questions/36195791/cmake-missing-sysroot-when-cross-compiling

In order to teach CMake about the sysroot, you must do additional work. As an
example, please see our recipe for libnetcdf at
https://github.com/AnacondaRecipes/libnetcdf-feedstock/tree/master/recipe

In particular, you'll need to copy the ``cross-linux.cmake`` file there, and reference it in your build.sh file:

::

    CMAKE_PLATFORM_FLAGS+=(-DCMAKE_TOOLCHAIN_FILE="${RECIPE_DIR}/cross-linux.cmake")

    cmake -DCMAKE_INSTALL_PREFIX=${PREFIX} \
      ${CMAKE_PLATFORM_FLAGS[@]} \
      ${SRC_DIR}

Customizing the compilers
=========================

The compiler packages listed above are small packages that only include the
activation scripts and list most of the software they provide as runtime
dependencies.

This design is intended to make it easy for you to customize your own compiler
packages by copying these recipes and changing the flags. You can then edit the
``conda_build_config.yaml`` file to specify your own packages.

We have been careful to select good, general purpose, secure, and fast flags.
We have also used them for all packages in Anaconda Distribution 5.0.0, except
for some minor customizations in a few recipes. When changing these flags,
remember that choosing the wrong flags can reduce security, reduce performance,
and cause incompatibilities.

With that warning in mind, let's look at good ways to customize Clang.

1. Download or fork the code from https://github.com/anacondarecipes/aggregate.
   The Clang package recipe is in the ``clang`` folder. The main material is in the
   llvm-compilers-feedstock folder.

2. Edit ``clang/recipe/meta.yaml``::

       package:
         name: clang_{{ target_platform }}
         version: {{ version }}

   The name here does not matter but the output names below do. Conda-build
   expects any compiler to follow the BASENAME_PLATFORMNAME pattern, so it is
   important to keep the ``{{target_platform}}`` part of the name.

   ``{{ version }}`` is left as an intentionally undefined jinja2 variable. It
   is set later in ``conda_build_config.yaml``.

3. Before any packaging is done, run the build.sh script:
   https://github.com/AnacondaRecipes/aggregate/blob/master/clang/build.sh

   In this recipe, values are changed here. Those values are inserted into the
   activate scripts that are installed later.

   ::

       #!/bin/bash

       CHOST=${macos_machine}

       FINAL_CPPFLAGS="-D_FORTIFY_SOURCE=2 -mmacosx-version-min=${macos_min_version}"
       FINAL_CFLAGS="-march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe"
       FINAL_CXXFLAGS="-march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -stdlib=libc++ -fvisibility-inlines-hidden -std=c++14 -fmessage-length=0"
       # These are the LDFLAGS for when the linker is being called directly, without "-Wl,"
       FINAL_LDFLAGS="-pie -headerpad_max_install_names"
       # These are the LDFLAGS for when the linker is being driven by a compiler, with "-Wl,"
       FINAL_LDFLAGS_CC="-Wl,-pie -Wl,-headerpad_max_install_names"
       FINAL_DEBUG_CFLAGS="-Og -g -Wall -Wextra -fcheck=all -fbacktrace -fimplicit-none -fvar-tracking-assignments"
       FINAL_DEBUG_CXXFLAGS="-Og -g -Wall -Wextra -fcheck=all -fbacktrace -fimplicit-none -fvar-tracking-assignments"
       FINAL_DEBUG_FFLAGS="-Og -g -Wall -Wextra -fcheck=all -fbacktrace -fimplicit-none -fvar-tracking-assignments"

       find "${RECIPE_DIR}" -name "*activate*.sh" -exec cp {} . \;

       find . -name "*activate*.sh" -exec sed -i.bak "s|@CHOST@|${CHOST}|g" "{}" \;
       find . -name "*activate*.sh" -exec sed -i.bak "s|@CPPFLAGS@|${FINAL_CPPFLAGS}|g"             "{}" \;
       find . -name "*activate*.sh" -exec sed -i.bak "s|@CFLAGS@|${FINAL_CFLAGS}|g"                 "{}" \;
       find . -name "*activate*.sh" -exec sed -i.bak "s|@DEBUG_CFLAGS@|${FINAL_DEBUG_CFLAGS}|g"     "{}" \;
       find . -name "*activate*.sh" -exec sed -i.bak "s|@CXXFLAGS@|${FINAL_CXXFLAGS}|g"             "{}" \;
       find . -name "*activate*.sh" -exec sed -i.bak "s|@DEBUG_CXXFLAGS@|${FINAL_DEBUG_CXXFLAGS}|g" "{}" \;
       find . -name "*activate*.sh" -exec sed -i.bak "s|@DEBUG_CXXFLAGS@|${FINAL_DEBUG_CXXFLAGS}|g" "{}" \;
       # find . -name "*activate*.sh" -exec sed -i.bak "s|@FFLAGS@|${FINAL_FFLAGS}|g"                 "{}" \;
       # find . -name "*activate*.sh" -exec sed -i.bak "s|@DEBUG_FFLAGS@|${FINAL_DEBUG_FFLAGS}|g"     "{}" \;
       find . -name "*activate*.sh" -exec sed -i.bak "s|@LDFLAGS@|${FINAL_LDFLAGS}|g"               "{}" \;
       find . -name "*activate*.sh" -exec sed -i.bak "s|@LDFLAGS_CC@|${FINAL_LDFLAGS_CC}|g"         "{}" \;
       find . -name "*activate*.sh.bak" -exec rm "{}" \;

4. With those changes to the activate scripts in place, it's time to move on to
   installing things. Look back at the ``clang`` folder's ``meta.yaml``. Here's
   where we change the package name. Notice what comes before the
   ``{{ target_platform }}``.

   ::

       outputs:
         - name: super_duper_clang_{{ target_platform }}
           script: install-clang.sh
           requirements:
             - clang {{ version }}

   The script reference here is another place you might add customization.
   You'll either change the contents of those install scripts or change the
   scripts that those install scripts are installing.

   Note that we make the package ``clang`` in the main material agree in version
   with our output version. This is implicitly the same as the top-level
   recipe. The ``clang`` package sets no environment variables at all, so it
   may be difficult to use directly.

5. Let's examine the script ``install-clang.sh``::

       #!/bin/bash

       set -e -x

       CHOST=${macos_machine}

       mkdir -p "${PREFIX}"/etc/conda/{de,}activate.d/
       cp "${SRC_DIR}"/activate-clang.sh "${PREFIX}"/etc/conda/activate.d/activate_"${PKG_NAME}".sh
       cp "${SRC_DIR}"/deactivate-clang.sh "${PREFIX}"/etc/conda/deactivate.d/deactivate_"${PKG_NAME}".sh

       pushd "${PREFIX}"/bin
         ln -s clang ${CHOST}-clang
       popd

   Nothing here is too unusual.

   Activate scripts are named according to our package name so they won't
   conflict with other activate scripts.

   The symlink for Clang is a Clang implementation detail that sets the host
   platform.

   We define ``macos_machine`` in aggregate's ``conda_build_config.yaml``:
   https://github.com/AnacondaRecipes/aggregate/blob/master/conda_build_config.yaml#L79

   The activate scripts that are being installed are where we actually set the
   environment variables. Remember that these have been modified by build.sh.

6. With any of your desired changes in place, go ahead and build the recipe.

   You should end up with a super_duper_clang_osx-64 package. Or, if you're not
   on macOS and are modifying a different recipe, you should end up with an
   equivalent package for your platform.

.. _using-your-customized-compiler-package-with-conda-build-3:

Using your customized compiler package with conda-build 3
=========================================================

Remember the Jinja2 function, ``{{ compiler('c') }}``? Here's where that comes
in. Specific keys in ``conda_build_config.yaml`` are named for the language
argument to that jinja2 function. In your ``conda_build_config.yaml``, add
this::

    c_compiler:
      - super_duper_clang

Note that we're not adding the ``target_platform`` part, which is separate. You
can define that key, too::

    c_compiler:
      - super_duper_clang
    target_platform:
      - win-64

With those two keys defined, conda-build will try to use a compiler package
named ``super_duper_clang_win-64``. That package needs to exist for your native
platform. For example, if you're on macOS, your native platform is ``osx-64``.

The package subdirectory for your native platform is the build platform. The
build platform and the ``target_platform`` can be the same, and they are the
same by default, but they can also be different. When they are different,
you're cross-compiling.

If you ever needed a different compiler key for the same language, remember
that the language key is arbitrary. For example, we might want different
compilers for Python and for R within one ecosystem. On Windows, the Python
ecosystem uses the Microsoft Visual C compilers, while the R ecosystem uses the
Mingw compilers.

Let's start in ``conda_build_config.yaml``::

    python_c_compiler:
      - vs2015
    r_c_compiler:
      - m2w64-gcc
    target_platform:
      - win-64

In Python recipes, you'd have::

    requirements:
      build:
        - {{ compiler('python_c') }}

In R recipes, you'd have::

    requirements:
      build:
        - {{ compiler('r_c') }}

This example is a little contrived, because the ``m2w64-gcc_win-64`` package is
not available. You'd need to create a metapackage ``m2w64-gcc_win-64`` to
point at the ``m2w64-gcc`` package, which does exist on the msys2 channel on
`repo.anaconda.com <https://repo.anaconda.com/>`_.

Expressing the relation between compiler and its standard library
=================================================================

For most languages, certainly for "c" and for "cxx", compiling any given
program *may* create a run-time dependence on symbols from the respective
standard library. For example, the standard library for C on linux is generally
``glibc``, and a core component of your operating system. Conda is not able to
change or supersede this library (it would be too risky to try to). A similar
situation exists on MacOS and on Windows.

Compiler packages usually have two ways to deal with this dependence:

* assume the package must be there (like ``glibc`` on linux).
* always add a run-time requirement on the respective stdlib (e.g. ``libcxx``
  on MacOS).

However, even if we assume the package must be there, the information about the
``glibc`` version is still a highly relevant piece of information, which is
also why it is reflected in the ``__glibc``
`virtual package <https://docs.conda.io/projects/conda/en/stable/user-guide/tasks/manage-virtual.html>`_.

For example, newer packages may decide over time to increase the lowest version
of ``glibc`` that they support. We therefore need a way to express this
dependence in a way that conda will be able to understand, so that (in
conjunction with the ``__glibc`` virtual package) the environment resolver will
not consider those packages on machines whose ``glibc`` version is too old.

The way to do this is to use the Jinja2 function ``{{ stdlib('c') }}``, which
matches ``{{ compiler('c') }}`` in as many ways as possible. Let's start again
with the ``conda_build_config.yaml``::

    c_stdlib:
      - sysroot                     # [linux]
      - macosx_deployment_target    # [osx]
    c_stdlib_version:
      - 2.17                        # [linux]
      - 10.13                       # [osx]

In the recipe we would then use::

    requirements:
      build:
        - {{ compiler('c') }}
        - {{ stdlib('c') }}

This would then express that the resulting package requires ``sysroot ==2.17``
(corresponds to ``glibc``) on linux and ``macosx_deployment_target ==10.13`` on
MacOS in the build environment, respectively. How this translates into a
run-time dependence can be defined in the metadata of the respective conda
(meta-)package which represents the standard library (i.e. those defined under
``c_stdlib`` above).

In this example, ``sysroot 2.17`` would generate a run-export on
``__glibc >=2.17`` and ``macosx_deployment_target 10.13`` would similarly
generate ``__osx >=10.13``. This way, we enable packages to define their own
expectations about the standard library in a unified way, and without
implicitly depending on some global assumption about what the lower version
on a given platform must be.

In principle, this facility would make it possible to also express the
dependence on separate stdlib implementations (like ``musl`` instead of
``glibc``), or to remove the need to assume that a C++ compiler always needs to
add a run-export on the C++ stdlib -- it could then be left up to packages
themselves whether they need ``{{ stdlib('cxx') }}`` or not.

Anaconda compilers implicitly add RPATH pointing to the conda environment
=========================================================================

You might want to use the Anaconda compilers outside of ``conda-build``
so that you use the same versions, flags, and configuration, for maximum
compatibility with Anaconda packages (but in a case where you want simple
tarballs, for example). In this case, there is a gotcha.

Even if Anaconda compilers are used from outside of ``conda-build``, the GCC
specs are customized so that, when linking an executable or a shared library,
an RPATH pointing to ``lib/`` inside the current enviroment prefix directory
(``$CONDA_PREFIX/lib``) is added. This is done by changing the
``link_libgcc:`` section inside GCC ``specs`` file, and this change is done
so that ``LD_LIBRARY_PATH`` isn't required for basic libraries.

``conda-build`` knows how to make this automatically relocatable, so that
this ``RPATH`` will be changed to point to the environment where the package
is being installed (at installation time, by ``conda``). But if you only pack
this binary in a tarball, it will continue containing this hardcoded ``RPATH``
to an environment in your machine. In this case, it is recommended to manually
remove the ``RPATH``.