Deployment Example: Remote Training with shared S3 Data¶

This example shows a common deployment scenario where:

  • An object detection model is trained on a remote VM with a GPU

  • Object Service runs locally under a default deployment with each user

  • Data and metrics are stored on S3

  • Bulk data (images) are stored on S3

The following users will interact with the project:

  • The VM uses Tables from S3 and local images for training, and produces Runs which are written to a shared location on S3.

  • One user has a copy of the images locally and uses an alias for fast access

  • Another user does not have a copy of the images, but relies on a project default alias to read the images from S3

Setup Instructions¶

What you need to do on the remote training VM:

  1. Install 3LC:

    pip install 3lc
    
  2. Login:

    3lc login <api_key>
    
  3. Configure S3 access:

    export AWS_ACCESS_KEY_ID=your_access_key
    export AWS_SECRET_ACCESS_KEY=your_secret_key
    export AWS_DEFAULT_REGION=your_region
    
  4. Configure 3LC:

    In config.3lc.yaml, set

    indexing:
      project-root-url: "s3://your-3lc-bucket/projects"
    
    aliases:
       PROJECT_DATA_ROOT: /path/to/vm/local/images
    

    This

    • sets the project root URL, ensuring Tables and Runs are written to the shared location on S3.

    • sets an alias to the images where they are stored on the training VM, ensuring fast reads instead of downloading them from S3.

  5. Create initial tables (only do this once):

    In a Python script, create initial Tables based on your input format. Here we assume the data is in COCO format:

    import tlc
    
    train_table = tlc.Table.from_coco(
       annotations_file="train.json",
       image_folder="/path/to/local/images",
       project_name="My Project",
       dataset_name="My Train Dataset",
       table_name="initial",
    )
    
  6. Set a project default alias:

    To share the project with your teammates, we create a project default alias pointing at the images on S3

    import tlc
    
    tlc.register_project_url_alias("PROJECT_DATA_ROOT", "s3://your-3lc-bucket/images")
    
  7. Run your training:

    python your_training_script.py
    

What you need to do on your local machine:

  1. Install 3LC:

    pip install 3lc
    
  2. Login:

    3lc login <api_key>
    
  3. Configure S3 access:

    export AWS_ACCESS_KEY_ID=your_access_key
    export AWS_SECRET_ACCESS_KEY=your_secret_key
    export AWS_DEFAULT_REGION=your_region
    
  4. Configure 3LC to index the S3 storage:

    In config.3lc.yaml, set:

    indexing:
      project-scan-urls:
        - "s3://your-3lc-bucket/projects"
    
  5. Start Object Service:

    3lc service
    
  6. Open Dashboard:

    Go to https://dashboard.3lc.ai. On opening the project you should see the runs and tables created on the remote VM, and on opening a run or table the images will be fetched from S3 on-the-fly, because there is a project default alias.

What you need to do on your local machine:

  1. Install 3LC:

    pip install 3lc
    
  2. Login:

    3lc login <api_key>
    
  3. Configure S3 access:

    export AWS_ACCESS_KEY_ID=your_access_key
    export AWS_SECRET_ACCESS_KEY=your_secret_key
    export AWS_DEFAULT_REGION=your_region
    
  4. Configure 3LC for mixed storage:

    In config.3lc.yaml, set

    indexing:
      project-scan-urls:
        - "s3://your-3lc-bucket/projects"
    aliases:
       PROJECT_DATA_ROOT: /path/to/user/local/images
    
  5. Start Object Service:

    3lc service
    
  6. Open Dashboard:

    Go to https://dashboard.3lc.ai. This user will now also see the same project with runs and tables from S3, but the alias to local images ensures they are read from local disk. This will, noticeably for large images, load the images faster when new images are requested by the Dashboard.