The Instrumental Calibration Pipeline (INST) is a cli application which is used to perform calibration operations on the SKA visibility data.
This repository contains the functions to generate the initial calibration products during standard SKA batch processing. It includes processing functions to prepare, model and calibrate a visibility dataset, data handling functions for parallel processing, and high level workflow scripts and notebooks.
If you wish to contribute to this repository, please refer Developer Guide
The INST pipeline is primarily dependent on these external astronomy related libraries:
Apart from above, the pipeline uses standard SKA processing functions in the
- sdp-func-python
- sdp-func (optional)
and SKA standard data models in the ska-sdp-datamodels repository.
All above dependencies are installed along with the pipeline using standard installation steps.
For prediction of model visibilities, here are the pre-requisites:
- The GLEAM extragalactic catalogue or a csv file. This and other catalogues will soon be available via global-sky-model, but at present a hard copy is needed for prediction of model visibilities. The gleamegc catalogue can be downloaded via FTP from VizieR.
- A measurement set with appropriate metadata to initialise the everybeam beam models.
An appropriate measurement set for basic tests can be downloaded using the
everybeam package
script
download_ms.sh
, but one will also be made available in this package. - The everybeam coeffs
directory is also needed to generate beam models. The directory path supplied to
predict_from_components
is used to set environment variableEVERYBEAM_DATADIR
.
This section is inspired from the batch-preprocessing pipeline
This is only applicable with you run INST pipeline with a dask cluster (LocalCluster or SlurmCluster)
The INST pipeline in the load_data
stage, converts the MSv2 into a Zarr file, and stores it in the cache_directory
path.
During the testing, we have realised that its better to limit the number of parallel tasks that run during the conversion from MSv2 to Zarr, so that each task can get enought memory.
The only reliable solution is to use worker resources.
The instrumental calibration assumes that all workers define a resource called process
; each worker may hold 1 or more process
resources.
Each task of the conversion is defined to use 1 process
resource.
Thus each worker will only run process
number of tasks at any time (parallel/concurrent using its threadpool.)
To define the reources when starting dask worker using the cli command:
dask worker <SCHEDULER_ADDRESS> <OPTIONS> --resources "process=1"
Or in a LocalCluster:
cluster = LocalCluster(resources={'process': 1})
⚠️ Warning: If the process resource is not defined on any worker, the pipeline (or rather, the Dask scheduler) will hang indefinitely.
It is always recommended to create a seperate python environment for the pipeline.
For that, you can use conda
or uv
# To create a virtual environment using uv
# This will be created in the `.venv` directory
uv venv --python 3.10 --seed
#To activate the environment
source .venv/bin/activate
Run the following command to install the latest stable release (0.3.4) of the pipeline from SKAO python artifact repository:
INST_VERSION=0.3.4
# if using uv, use `uv pip install ...`
pip install --extra-index-url "https://artefact.skao.int/repository/pypi-internal/simple" "ska-sdp-instrumental-calibration[python-casacore,ska-sdp-func]==$INST_VERSION"
Run the following command to install the latest pipeline from the main
branch of the git repository
INST_BRANCH=main
# if using uv, use `uv pip install ...`
pip install --extra-index-url "https://artefact.skao.int/repository/pypi-internal/simple" "ska-sdp-instrumental-calibration[python-casacore,ska-sdp-func]@git+https://gitlab.com/ska-telescope/sdp/science-pipeline-workflows/ska-sdp-instrumental-calibration.git@$INST_BRANCH"
The INST pipeline is available as a spack package, in the ska-sdp-spack repository. Please follow the README to setup spack on your machine. Then install the e2e pipeline using this command:
INST_VERSION=0.3.4
spack install "py-ska-sdp-instrumental-calibration@$INST_VERSION"
Then load the spack package with this command
spack load "py-ska-sdp-instrumental-calibration"
We also provide a OCI (docker) image which is hosted on the SKA Docker artifact repository. To pull the docker image for the latest stable release, please run:
INST_VERSION=0.3.4
docker pull "artefact.skao.int/ska-sdp-instrumental-calibration:$INST_VERSION"
The entrypoint of above image is set to the executable ska-sdp-instrumental-calibration
.
Run image with volume mounts to enable read write to storage.
docker run [-v local:container] <image-name> ...<cli_options>...
Once you install the pipeline, you should be able to access the pipeline cli with ska-sdp-instrumental-calibration
command.
Running ska-sdp-instrumental-calibration --help
should show following output:
usage: ska-sdp-instrumental-calibration [-h] {run,install-config,experimental} ...
positional arguments:
{run,install-config,experimental}
run Run the pipeline
install-config Installs the default config at --config-install-path
experimental Allows reordering of stages via additional config section
options:
-h, --help show this help message and exit
The INST pipeline expects a YAML config file as one of the inputs, which defines the stages and their parameters. The information about stages is present in the documentation
Install the default config YAML of the pipeline to a specific directory using the install-config
subcommand.
ska-sdp-instrumental-calibration install-config --config-install-path path/to/dir
Parameters of the default configurations can be overridden
ska-sdp-instrumental-calibration install-config --config-install-path path/to/dir \
--set parameters.bandpass_calibration.flagging true \
--set parameters.load_data.fchunk 64
Run the instrumental calibration pipeline using run
subcommand.
Example:
ska-sdp-instrumental-calibration run \
--input /path/to/ms \
--config /path/to/config \
--output /path/to/output/dir
Please run ska-sdp-instrumental-calibration run --help
to see all supported options of the run
subcommand.\
Run the instrumental calibration pipeline using experimental
subcommand, to provide alternate stage order than the default order.
Example:
ska-sdp-instrumental-calibration experimental \
--input /path/to/ms \
--config /path/to/config \
--output /path/to/output/dir
The configuration is used to control both the execution order and any additional settings for each stage. The experimental
subcommand allows reuse of the same stage multiple times.
global_parameters:
experimental:
pipeline:
- load_data: {}
- predict_vis:
beam_type: everybeam
export_model_vis: true
flux_limit: 2.0
fov: 30.0
- bandpass_calibration: {}
- delay_calibration: {}
- generate_channel_rm:
fchunk: -1
run_solver_config:
solver: normal_equations
refant: 0
niter: 30
- delay_calibration: {}
- export_gain_table: {}
parameters:
bandpass_calibration:
flagging: false
plot_config:
fixed_axis: false
plot_table: true
run_solver_config:
niter: 10
refant: 0
solver: gain_substitution
delay_calibration:
oversample: 16
plot_config:
fixed_axis: false
plot_table: false
export_gain_table:
export_format: h5parm
export_metadata: false
file_name: gaintable
load_data:
fchunk: 32
pipeline: {}
The pipeline defined under global_parameters.experimental.pipeline
will be used to construct the execution pipeline. It will consist of the following stages in the order: (1) load_data
(2) predict_vis
(3) bandpass_calibration
(4) delay_calibration
(5) generate_channel_rm
(6) delay_calibration_1
and (7) export_gain_table
. There is no stage specific validations done while constructing the execution order, hence the user should pay special attention to stage order, and inclusion of mandatory and required stages. The pipeline would not function if load_data
stage is not set as the first stage.
Stage Names: The ska-sdp-instrumental-calibration experimental
feature will update the stage names for duplicated stages as follows: the first occurrence of a stage name will remain unchanged, without a suffix. Subsequent duplicates will be renamed using the format <stage-name>_x
, where x
is the duplicate index starting from 1. For example, the second occurrence of delay_calibration
will be renamed to delay_calibration_1
. This numbering is automatically incremented for each duplicate, preserving the order as defined in the global_parameters.experimental.pipeline
section. This approach ensures that each stage has a unique and identifiable name.
The stage configurations have the following precedence - (1) --set
(2) Configuration provided under the global_parameters.experimental.pipeline.<stage>
section (3) Configuration provided under parameters.<stage>
section and (4) The default configurations used for the stage definitions.
While using the --set
cli-option, please be mindful of the suffix appended to the stage name. Example: ska-sdp-instrumental-calibration experimental ... --set parameters.delay_calibration_1.plot_config.plot_table true
Please note that the pipeline
section is intentionally left blank and would be ignored for the ska-sdp-instrumental-calibration experimental
feature, as the stage execution order is decided from global_parameters.experimental.pipeline
section.