Configuration#

3LC has several configuration options, which control data discovery, logging, and network configuration.

All the options are documented in the tlcconfig.options module. The documentation for each option details the configuration file key, environment variable and command-line argument to the 3LC Object Service that can be used to set the option.

3LC tries to have as good defaults as possible, however some times you may want to persist a machine global configuration. This can be achieved by using the default configuration file. The configuration file is located in $HOME/.config/3LC/config.yaml on Linux and in %LOCALAPPDATA%\3LC\3LC\config.yaml on Windows. The location can be overridden by the TLC_CONFIG_FILE environment variable.

To create an initial configuration file, you can run the following command:

$ 3lc config --write #add --quiet to produce less verbose output

It is also easy to inspect the output at the command line by running:

$ 3lc config --to-yaml

The config command will try and read the current configuration, including the environment variables and command line options given to the command. This means that when writing the configuration file with config --write, it will first read the current configuration and write the loaded configuration to file. This makes it also possible to update the configuration file by passing new options and environment variables to the config command and supply --write --force to enforce overwriting the existing configuration file.

# Setting the root index location in the configuration file
$ 3lc --project-root-url "/some/location" config --write --force
# Setting alias with name CIFAR10_LOCATION in the configuration file
$ TLC_ALIAS_CIFAR10_LOCATION="/some/location" 3lc config --write --force

An example configuration file can be seen below.

Example config.yaml
## Configuration file for `tlc`.
##
## This YAML file contains setting for the `tlc` Python Package.
## Created at 2024-11-08 17:47:14
## Documentation starts with ##,
## commented out default values start with #

##
## Service Settings
service:
  ## Port for the server.
  # port: 5015

  ## Host for the server.
  # host: 127.0.0.1

  ## Specify license or license file
  ##
  ##     The option can either be the license key or point to a local file containing the license key.
  # license: 

  ## Specify the amount of memory to use for caching, in bytes.
  ##
  ##     Setting the value to 0 will disable in-memory caching.
  ##
  ##     Default: 1073741824 (1 GB)
  # cache_in_memory_size: 1073741824

  ## Specify the cache item time out, in seconds.
  ##
  ##     Setting the value to 0 will disable cache eviction based on time.
  ##
  ##     Default: 3600 (1 hour)
  # cache_time_out: 3600

##
## Indexing Settings
indexing:
  ## Location for reading and writing 3LC data.
  ##
  ##     This option is mandatory and must point to a location (e.g. directory on disk or object store bucket) with write
  ##     access. The location will be created if it does not exist.
  ##
  ##     If the option value contains an environment variable, it will be expanded.
  # project-root-url: /home/build/.local/share/3LC/projects

  ## Locations to scan for 3LC objects (runs and tables) following a standard 3LC project layout.
  ##
  ##     The option value should be a list of URLs to scan for 3LC objects. Items in the list can be represented either by a
  ##     string value (e.g., "s3://bucket") or as a dictionary with the required field 'url' and optional extra fields
  ##     (e.g., {"url": "s3://bucket", "static": True}).
  ##
  ##     Example values:
  ##      - "C:\Users\user\Documents\3LC\projects"
  ##      - "s3://my-bucket/3LC/projects"
  ##      - {"url": "s3://my-bucket/3LC/read-only-projects", "static": True}
  ##
  ##     The only public optional field is 'static', which, when set to 'True', prevents the indexer from re-scanning
  ##     the location.
  ##
  ##     Each URL in this list is assumed to contain sub-projects. Default (sub-)directories will be created if they do
  ##     not exist.
  # project-scan-urls:

##
## Logging Settings
logging:
  ## Log file for the 3LC logger.
  ##
  ##     The directory will be created if it does not exist.
  ##
  ##     If the option value contains an environment variable, it will be expanded.
  # logfile: /home/build/.local/state/3LC/log/3LC.log

  ## Log level for the 3LC logger.
  ##
  ##     The `tlc` Python package adheres to the standard Python logging levels:
  ##
  ##       - DEBUG:  Detailed information, typically of interest only when diagnosing problems.
  ##       - INFO: Confirmation that things are working as expected.
  ##       - WARNING: An indication that something unexpected happened, or indicative of some problem in the near future
  ##         (e.g. "disk space low"). The software is still working as expected.
  ##       - ERROR: Due to a more serious problem, the software has not been able to perform some function.
  ##       - CRITICAL: A serious error, indicating that the program itself may be unable to continue running.
  # loglevel: WARNING

##
## Tlc Settings
tlc:
  ## Whether to display progress bars or not.
  ##
  ##     The option can be either 0 or 1, where 0 means no progress bars and 1 means progress bars.
  # display-progress: 1

## List of registered aliases
##
##     The option value should be a dictionary with the alias name as the key and the alias value as the value. In addition
##     to the aliases in this list, the system will also consider 'data aliases' stored in special data configuration files
##     and merge them with the aliases defined here.
##     A data configuration file is a partial 3LC configuration that contains only one key: `aliases`. The system will look
##     for such files, upon startup only, in the following locations:
##
##        - `<PROJECT_SCAN_URL>/config.3lc.yaml` # top level
##        - `<PROJECT_SCAN_URL>/<project_xyz>/config.3lc.yaml` # per project
##        - `<EXTRA_TABLE_SCAN_URL>/config.3lc.yaml`
##        - `<EXTRA_RUN_SCAN_URL>/config.3lc.yaml`
##
##     ## Special alias syntax
##     For aliases defined inside *data configuration files* it is possible to define an alias-value that is relative to
##     the configuration file in which the alias is defined. This is done by prefixing the alias-value with `$.`, `$..` and
##     so forth. The system will then expand the alias-value relative to the configuration file in which the alias is
##     defined.
##
##     ## Examples
##
##     Example values (for all configuration files):
##     ```
##       PROJECT_XYZ_REMOTE_ALIAS: s3://my-bucket/3LC/projects/project-xyz
##       PROJECT_XYZ_LOCAL_ALIAS: /home/user/projects/project-xyz
##     ```
##
##     Special syntax examples (only for data configuration files):
##     ```
##       PROJECT_XYZ_DATA_ALIAS: $./data # expands to the data directory relative to the configuration file itself
##       PROJECT_XYZ_PARENT_DATA_ALIAS: $../data # relative to parent dir
##       PROJECT_XYZ_PARENT_PARENT_DATA_ALIAS: $.../data # relative to parent's parent dir
##     ```
# aliases:

Data configuration files#

For convenience there is a special kind of partial configuration that can be stored with the data, for example, inside an example project. The file must be named configuration.3lc.yaml and may only contain one alias-configuration through the aliases option, for details see the module documentation.

The system will detect data configuration files in the following locations:

    <PROJECT_SCAN_URL>/config.3lc.yaml # top level
    <PROJECT_SCAN_URL>/<project_xyz>/config.3lc.yaml # per project
    <EXTRA_TABLE_SCAN_URL>/config.3lc.yaml
    <EXTRA_RUN_SCAN_URL>/config.3lc.yaml

Data configuration files are read once, on system startup, and will not be monitored for changes. This means that adding scan urls programmatically will not discover data configuration files.

Overriding Configuration Options#

3LC follows the following precedence rules for configuration:

api-call > command-line argument > environment variable > configuration file

The rules above should be straightforward when launching the Object Service from the command, but it is more complicated when importing the tlc package in a Jupyter notebook (e.g. import tlc). Importing tlc causes the default configuration file to be read, and it is too late to modify these options after the module has been imported.

For this reason, there exists a tlcconfig-module that allows for creating and modifying an OptionLoader-instance prior to importing the main tlc-module. See the module documentation for an example of how to use this module.