******************************** Channels and generating an index ******************************** Channel layout -------------- .. code-block:: bash . ├── channeldata.json ├── linux-32 | ├── repodata.json │   └── package-0.0.0.tar.bz2 ├── linux-64 | ├── repodata.json │   └── package-0.0.0.tar.bz2 ├── win-64 | ├── repodata.json │   └── package-0.0.0.tar.bz2 ├── win-32 | ├── repodata.json │   └── package-0.0.0.tar.bz2 ├── osx-64 | ├── repodata.json │   └── package-0.0.0.tar.bz2 ... Parts of a channel ------------------ * ``channeldata.json`` contains metadata about the channel, including: - What subdirs the channel contains. - What packages exist in the channel and what subdirs they are in. * Subdirs are associated with platforms. For example, the linux-64 subdir contains packages for linux-64 systems. * ``repodata.json`` contains an index of the packages in a subdir. Each subdir will have its own repodata. * Channels have packages as tarballs under corresponding subdirs. ``channeldata.json`` -------------------- .. code-block:: bash { "channeldata_version": 1, "packages": { "super-fun-package": { "activate.d": false, "binary_prefix": false, "deactivate.d": false, "home": "https://github.com/Home/super-fun-package", "license": "BSD", "post_link": false, "pre_link": false, "pre_unlink": false, "reference_package": "win-64/super-fun-package-0.1.0-py310_0.tar.bz2", "run_exports": {}, "subdirs": [ "win-64" ], "summary": "A fun package! Open me up for rainbows", "text_prefix": false, "version": "0.1.0" }, "subdirs": [ "win-64", ... ] } ``repodata.json`` ----------------- .. code-block:: bash { "packages": { "super-fun-package-0.1.0-py310_0.tar.bz2": { "build": "py37_0", "build_number": 0, "depends": [ "some-depends" ], "license": "BSD", "md5": "a75683f8d9f5b58c19a8ec5d0b7f796e", "name": "super-fun-package", "sha256": "1fe3c3f4250e51886838e8e0287e39029d601b9f493ea05c37a2630a9fe5810f", "size": 3832, "subdir": "win-64", "timestamp": 1530731681870, "version": "0.1.0" }, ... } How an index is generated ------------------------- For each subdir: * Look at all the packages that exist in the subdir. * Generate a list of packages to add/update/remove. * Remove all packages that need to be removed. * For all packages that need to be added/updated: * Extract the package to access metadata, including full package name, file modification time (``mtime``), size, and ``index.json``. * Aggregate package metadata to repodata collection. * Apply repodata hotfixes (patches). * Compute and save the reduced ``current_index.json`` index. Example: Building a channel --------------------------- To build a local channel and put a package in it, follow the directions below: #. Make the channel directory. .. code-block:: bash $ mkdir local-channel $ cd local-channel #. Now, download your favorite package. We'll use SciPy in our example. The next steps depend on your platform: #. Windows .. code-block:: bash $ mkdir win-64 $ curl -L https://anaconda.org/anaconda/scipy/1.9.1/download/win-64/scipy-1.9.1-py310h86744a3_0.tar.bz2 -o win-64\scipy-1.9.1-py310h86744a3_0.tar.bz2 #. Linux #. Most Linux systems come with ``curl`` pre-installed. Let's install it if you don't already have it. #. Check if you have ``curl``: .. code-block:: bash $ which curl #. If ``curl`` is not found, then install it: .. code-block:: bash $ conda install curl #. Create a local copy of the package you want to include in your channel: .. code-block:: bash $ mkdir linux-64 $ curl -L https://anaconda.org/anaconda/scipy/1.9.1/download/linux-64/scipy-1.9.1-py310hd5efca6_0.tar.bz2 -o linux-64\scipy-1.9.1-py310hd5efca6_0.tar.bz2 #. macOS, Intel chip .. code-block:: bash $ mkdir osx-64 $ curl -L https://anaconda.org/anaconda/scipy/1.9.1/download/osx-64/scipy-1.9.1-py310h09290a1_0.tar.bz2 -o osx-64/scipy-1.9.1-py310h09290a1_0.tar.bz2 #. macOS, Apple chip .. code-block:: bash $ mkdir osx-arm64 $ curl -L https://anaconda.org/anaconda/scipy/1.9.1/download/osx-arm64/scipy-1.9.1-py310h20cbe94_0.tar.bz2 -o osx-arm64/scipy-1.9.1-py310h20cbe94_0.tar.bz2 #. Other To find the latest SciPy on a platform not included in the list above, go to the `Anaconda Packages file list for SciPy`_. #. Run a conda index. This will generate both ``channeldata.json`` for the channel and ``repodata.json`` for the linux-64 and osx-64 subdirs, along with some other files: .. code-block:: bash $ conda index . #. Check your work by searching the channel: .. code-block:: bash $ conda search -c file://local-channel scipy SciPy should be available in several channels, including ``local-channel``. More details behind the scenes ------------------------------ Caching package metadata ~~~~~~~~~~~~~~~~~~~~~~~~ Caching utilizes the existing ``repodata.json`` file if it exists. Indexing checks which files to update based on which files are new, removed, or changed since the last ``repodata.json`` was created. When a package is new or changed, its metadata is extracted and cached in the subdir to which the package belongs. The subfolder is the ``.cache`` folder. This folder has one file of interest: ``stat.json``, which contains results from the ``stat`` command for each file. This is used for understanding when a file has changed and needs to be updated. In each of the other subfolders, the extracted metadata file for each package is saved as the original package name, plus a ``.json`` extension. Having these already extracted can save a lot of time in fully re-creating the index, should that be necessary. An aside: one design goal of the ``.conda`` package format was to make indexing as fast as possible. To achieve this, the .conda format separates metadata from the actual package contents. Where the old ``.tar.bz2`` container required extracting the entire package to obtain the metadata, the new package format allows extraction of metadata without touching the package contents. This allows indexing speed to be independent of the package size. Large ``.tar.bz2`` packages can take a very long time to extract and index. It is generally never necessary to manually alter the cache. To force an update/rescan of all cached packages, you can delete the .cache folder, or you can delete just the ``.cache/stat.json`` file. Ideally, you could remove only one package of interest from the cache, but that functionality does not currently exist. Repodata patching ~~~~~~~~~~~~~~~~~ Package repodata is bootstrapped from the index.json file within packages. Unfortunately, that metadata is not always correct. Sometimes a version bound needs to be added retroactively. The process of altering repodata from the values derived from package index.json files is called "hotfixing." Hotfixing is tricky, as it has the potential to break environments that have worked, but it is also sometimes necessary to fix environments that are known not to work. Repodata patches generated from a python script =============================================== On your own server, you're probably fine to run arbitrary python code that you have written to apply your patches. The advantage here is that the patches are generated on the fly every time the index is generated. That means that any new packages that have been added since the patch python file was last committed will be picked up and will have hotfixes applied to them where appropriate. Anaconda applies hotfixes by providing a python file to ``conda index`` that has logic on how to alter metadata. Anaconda's repository of hotfixes is at https://github.com/AnacondaRecipes/repodata-hotfixes Repodata patches applied from a JSON file ========================================= Unfortunately, you can't always run your python code directly - other people who host your patches may not allow you to run code. What you can do instead is package the patches as .json files. These will clobber the entries in the ``repodata.json`` when they are applied. This is the approach that conda-forge has to take, for example. Their patch creation code is here: https://github.com/conda-forge/conda-forge-repodata-patches-feedstock/tree/main/recipe What that code does is to download the current ``repodata.json``, then runs their python logic to generate the patch JSON file. Those patches are placed into a location where Anaconda's mirroring tools will find them and apply them to conda-forge's ``repodata.json`` at mirroring time. The downside here is that this JSON file is only as new as the last time that the repodata-patches feedstock last generated a package. Any new packages that have been added to the index in the meantime will not have any hotfixes applied to them, because the hotfix JSON file does not know about those files. Trimming to "current" repodata ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The number of packages available is always growing. That means conda is always having to do more and more work. To slow down this growth, in conda 4.7, we added the ability to have alternate ``repodata.json`` files that may represent a subset of the normal ``repodata.json``. One in particular is ``current_repodata.json``, which represents: 1. the latest version of each package 2. any earlier versions of dependencies needed to make the latest versions satisfiable ``current_repodata.json`` also keeps only one file type: ``.conda`` where it is available, and ``.tar.bz2`` where only ``.tar.bz2`` is available. For Anaconda's defaults "main" channel, the current_repodata.json file is approximately 1/7 the size of ``repodata.json``. This makes downloading the repodata faster, and it also makes loading the repodata into its python representation faster. For those interested in how this is achieved, please refer to the code at https://github.com/conda/conda-build/blob/90a6de55d8b9e36fc4a8c471b566d356e07436c7/conda_build/index.py#L695-L737 .. _`Anaconda Packages file list for SciPy`: https://anaconda.org/anaconda/scipy/files