Skip to content

jlima91/ska-sdp-instrumental-calibration

Repository files navigation

SKA SDP Instrumental Calibration Pipeline

Documentation Status

The Instrumental Calibration Pipeline (INST) is a cli application which is used to perform calibration operations on the SKA visibility data.

This repository contains the functions to generate the initial calibration products during standard SKA batch processing. It includes processing functions to prepare, model and calibrate a visibility dataset, data handling functions for parallel processing, and high level workflow scripts and notebooks.

If you wish to contribute to this repository, please refer Developer Guide

Requirements for running the pipeline

The INST pipeline is primarily dependent on these external astronomy related libraries:

  1. python-casacore
  2. everybeam

Apart from above, the pipeline uses standard SKA processing functions in the

  1. sdp-func-python
  2. sdp-func (optional)

and SKA standard data models in the ska-sdp-datamodels repository.

All above dependencies are installed along with the pipeline using standard installation steps.

For prediction of model visibilities, here are the pre-requisites:

  • The GLEAM extragalactic catalogue or a csv file. This and other catalogues will soon be available via global-sky-model, but at present a hard copy is needed for prediction of model visibilities. The gleamegc catalogue can be downloaded via FTP from VizieR.
  • A measurement set with appropriate metadata to initialise the everybeam beam models. An appropriate measurement set for basic tests can be downloaded using the everybeam package script download_ms.sh, but one will also be made available in this package.
  • The everybeam coeffs directory is also needed to generate beam models. The directory path supplied to predict_from_components is used to set environment variable EVERYBEAM_DATADIR.

Foreword: Dask distribution

This section is inspired from the batch-preprocessing pipeline

This is only applicable with you run INST pipeline with a dask cluster (LocalCluster or SlurmCluster)

The INST pipeline in the load_data stage, converts the MSv2 into a Zarr file, and stores it in the cache_directory path.

During the testing, we have realised that its better to limit the number of parallel tasks that run during the conversion from MSv2 to Zarr, so that each task can get enought memory.

The only reliable solution is to use worker resources.

The instrumental calibration assumes that all workers define a resource called process; each worker may hold 1 or more process resources. Each task of the conversion is defined to use 1 process resource. Thus each worker will only run process number of tasks at any time (parallel/concurrent using its threadpool.)

To define the reources when starting dask worker using the cli command:

dask worker <SCHEDULER_ADDRESS> <OPTIONS> --resources "process=1"

Or in a LocalCluster:

cluster = LocalCluster(resources={'process': 1})

⚠️ Warning: If the process resource is not defined on any worker, the pipeline (or rather, the Dask scheduler) will hang indefinitely.

Installing the pipeline

In python environments

It is always recommended to create a seperate python environment for the pipeline. For that, you can use conda or uv

# To create a virtual environment using uv
# This will be created in the `.venv` directory
uv venv --python 3.10 --seed

#To activate the environment
source .venv/bin/activate

Stable release from SKAO pip index (recommended)

Run the following command to install the latest stable release (0.3.4) of the pipeline from SKAO python artifact repository:

INST_VERSION=0.3.4

# if using uv, use `uv pip install ...`
pip install --extra-index-url "https://artefact.skao.int/repository/pypi-internal/simple" "ska-sdp-instrumental-calibration[python-casacore,ska-sdp-func]==$INST_VERSION"

Latest pipeline from git

Run the following command to install the latest pipeline from the main branch of the git repository

INST_BRANCH=main

# if using uv, use `uv pip install ...`
pip install --extra-index-url "https://artefact.skao.int/repository/pypi-internal/simple" "ska-sdp-instrumental-calibration[python-casacore,ska-sdp-func]@git+https://gitlab.com/ska-telescope/sdp/science-pipeline-workflows/ska-sdp-instrumental-calibration.git@$INST_BRANCH"

As spack package

The INST pipeline is available as a spack package, in the ska-sdp-spack repository. Please follow the README to setup spack on your machine. Then install the e2e pipeline using this command:

INST_VERSION=0.3.4

spack install "py-ska-sdp-instrumental-calibration@$INST_VERSION"

Then load the spack package with this command

spack load "py-ska-sdp-instrumental-calibration"

As OCI container

We also provide a OCI (docker) image which is hosted on the SKA Docker artifact repository. To pull the docker image for the latest stable release, please run:

INST_VERSION=0.3.4
docker pull "artefact.skao.int/ska-sdp-instrumental-calibration:$INST_VERSION"

The entrypoint of above image is set to the executable ska-sdp-instrumental-calibration.

Run image with volume mounts to enable read write to storage.

docker run [-v local:container] <image-name> ...<cli_options>...

Using the CLI

Once you install the pipeline, you should be able to access the pipeline cli with ska-sdp-instrumental-calibration command.

Running ska-sdp-instrumental-calibration --help should show following output:

usage: ska-sdp-instrumental-calibration [-h] {run,install-config,experimental} ...

positional arguments:
  {run,install-config,experimental}
    run                 Run the pipeline
    install-config      Installs the default config at --config-install-path
    experimental        Allows reordering of stages via additional config section

options:
  -h, --help            show this help message and exit

Generating YAML config of the pipeline

The INST pipeline expects a YAML config file as one of the inputs, which defines the stages and their parameters. The information about stages is present in the documentation

Install the default config YAML of the pipeline to a specific directory using the install-config subcommand.

ska-sdp-instrumental-calibration install-config --config-install-path path/to/dir

Parameters of the default configurations can be overridden

ska-sdp-instrumental-calibration install-config --config-install-path path/to/dir \
                    --set parameters.bandpass_calibration.flagging true \
                    --set parameters.load_data.fchunk 64

Running the pipeline

Run the instrumental calibration pipeline using run subcommand.

Example:

ska-sdp-instrumental-calibration run \
--input /path/to/ms \
--config /path/to/config \
--output /path/to/output/dir

Please run ska-sdp-instrumental-calibration run --help to see all supported options of the run subcommand.\

Reordering stages in pipeline

Run the instrumental calibration pipeline using experimental subcommand, to provide alternate stage order than the default order.

Example:

ska-sdp-instrumental-calibration experimental \
--input /path/to/ms \
--config /path/to/config \
--output /path/to/output/dir

The configuration is used to control both the execution order and any additional settings for each stage. The experimental subcommand allows reuse of the same stage multiple times.

Experimental Configuration

global_parameters:
  experimental:
    pipeline:
      - load_data: {}
      - predict_vis:
          beam_type: everybeam
          export_model_vis: true
          flux_limit: 2.0
          fov: 30.0
      - bandpass_calibration: {}
      - delay_calibration: {}
      - generate_channel_rm:
          fchunk: -1
          run_solver_config:
              solver: normal_equations
              refant: 0
              niter: 30
      - delay_calibration: {}
      - export_gain_table: {}
parameters:
  bandpass_calibration:
    flagging: false
    plot_config:
      fixed_axis: false
      plot_table: true
    run_solver_config:
      niter: 10
      refant: 0
      solver: gain_substitution
  delay_calibration:
    oversample: 16
    plot_config:
      fixed_axis: false
      plot_table: false
  export_gain_table:
    export_format: h5parm
    export_metadata: false
    file_name: gaintable
  load_data:
    fchunk: 32
pipeline: {}

The pipeline defined under global_parameters.experimental.pipeline will be used to construct the execution pipeline. It will consist of the following stages in the order: (1) load_data (2) predict_vis (3) bandpass_calibration (4) delay_calibration (5) generate_channel_rm (6) delay_calibration_1 and (7) export_gain_table. There is no stage specific validations done while constructing the execution order, hence the user should pay special attention to stage order, and inclusion of mandatory and required stages. The pipeline would not function if load_data stage is not set as the first stage.

Stage Names: The ska-sdp-instrumental-calibration experimental feature will update the stage names for duplicated stages as follows: the first occurrence of a stage name will remain unchanged, without a suffix. Subsequent duplicates will be renamed using the format <stage-name>_x, where x is the duplicate index starting from 1. For example, the second occurrence of delay_calibration will be renamed to delay_calibration_1. This numbering is automatically incremented for each duplicate, preserving the order as defined in the global_parameters.experimental.pipeline section. This approach ensures that each stage has a unique and identifiable name.

The stage configurations have the following precedence - (1) --set (2) Configuration provided under the global_parameters.experimental.pipeline.<stage> section (3) Configuration provided under parameters.<stage> section and (4) The default configurations used for the stage definitions.

While using the --set cli-option, please be mindful of the suffix appended to the stage name. Example: ska-sdp-instrumental-calibration experimental ... --set parameters.delay_calibration_1.plot_config.plot_table true

Please note that the pipeline section is intentionally left blank and would be ignored for the ska-sdp-instrumental-calibration experimental feature, as the stage execution order is decided from global_parameters.experimental.pipeline section.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 18

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy