Data Products, Management, and Long-Term Archive


Data Products

Interferometer Data

Pre-processing pipeline: For each observing beam a set of uncalibrated correlated visibilities is provided. The data products are stored in Measurement Sets (MS), which can be archived to the LTA. Data, written in the DATA column, have been flagged, demixed (if requested) and averaged in time and frequency according to the user specification.

 

Pre-Factor pipeline: The ASTRON Radio Observatory is currently implementing in production preFactor v3, the direction-independent calibration pipeline, which produces direction-independent calibrated visibilities, wide-band images of the target field, calibration solutions and diagnostic plots. Commissioning of preFactor v3 in the operational system is in an advanced stage of completion. While preFactor cannot yet be widely offered in Cycle 13, the Radio Observatory will select a sample of appropriate projects that will be offered the opportunity to obtain data products processed through this pipeline. Pre-Factor consists of 3 pipelines: calibrator, target and imaging. For each target beam a set of direction-independent calibrated visibilities is provided. The data products are stored in Measurement Sets (MS), which can be archived to the LTA. Also for each observed target beam a full bandwidth image will be produced, which can be archived to the LTA. Furthermore calibration solutions and diagnostic plots of each preFactor pipeline will be made available.

 

Long baseline pipeline: The data products of the long baseline pipeline are the results of the calibrator, target pipeline, phase-shift pipeline (which also adds up the core into a single station), and concat pipeline. All products are stored in Measurement Sets (MS) which are also archived to the LTA. In particular, the product of the concat pipeline is a measurement set in which linear polarization has been converted into circular polarization, the user can save it in fits format and loaded into AIPS for fringe fitting and further processing. Ancillary files, such as INST files, can be transferred to CEP3, if such a request is included in the proposal.

 

Please note the nomenclature changed in the data products names of the preprocessing, calibrator and target pipelines from the moment that the Radio Observatory introduced CEP4 as a production cluster. Data recorded on CEP2 used the following naming convention: L####_SBXXX_uv.dppp.MS where #### gives the pipeline ID and XXX indicates the sub-band number. For data recorded on CEP4 the naming convention has been changed into L####_SBXXX_uv.MS.  

 

Beam-formed Data

Raw beam-formed data: These are in the form of two files, a 'raw' file containing the raw data in a binary format and an hdf5 file containing the meta-data and a data array linking to the raw data.   Thus the most straightforward way to access the data is to use hdf5 tools and open only the 'h5' file.  Data can then be found in the sub-folder, '/SUB_ARRAY_POINTING_xxx/BEAM_xxx/STOKES_x' with meta-data stored as attributes of the root folder and each sub-folder.

Pulsar data products: These are written in a pulsar-standard PRESTO format, generated by the Known Pulsar Pipeline developed by the LOFAR Pulsar Working Group.  Potential users are advised to contact any member of this group for details of this format.

 

Transient Buffer Board Data

Data is stored as Raw Voltages per station in HDF5 format, including some of the metadata. The software package to access the data and do some processing (eg. FFT, RFI mitigation, ...) with python scripts, PyCRTools, is available at CEP.

 

The Long-Term Archive (LTA)

Processed data products and, if requested and granted, raw data products are stored in the LOFAR long-term archive.  The LTA currently involves sites in the Netherlands, Germany, and Poland.  For astronomers, the LOFAR LTA provides the principal interface to LOFAR data retrieval and data mining. In the future, facilities to further process these data are also expected to be available.  

Data can be downloaded by a user associated with the project using the LOFAR-LTAinterface.  Initially,  data can only be accessed by a user associated with a given project but, after one year, the data become public (see LOFAR data policy for further details).  This is the main way in which a user will access their data following an observation.  However, in the event that the data cannot be archived in this way, the data may be transferred to the CEP3 user cluster from where the user has four weeks in which to download the data to their own facilities.   

Full details on how to use the LTA and retrieve LOFAR data can be found on the LTA public wiki page.   Given the potential quantities of data involved, it is strongly recommended to use Grid facilities and download via SRM if these tools are available.

Users can also use processing resources offered by the LTA sites to further process their data - see here

 

LTA usage statistics per LOFAR cycle

Since May 2013 (start of cycle 0), about 30 PB of LOFAR data were downloaded from the LTA. Only ~8% of the data were downloaded by non-proprietary users after the data became publicly available. However, the volume of downloaded non-proprietary data has increased significantly during the past few semesters. This is expected as more data have become publicly available, including data from the LOFAR Two-metre Sky Survey (LoTSS), which will eventually cover the whole northern sky.

Volume of proprietary and non-proprietary data downloaded from the LTA in each LOFAR cycle.

Since 2017, there has been a strong increase in the volume of downloaded interferometric data. This is around the time when data processing was started on the grid at SURFsara for LoTSS as well as LoTSS co-observing projects. The drop in the volume of downloaded beamformed data during the past few semesters reflects the increase in the proportion of interferometric observations.

Volume of beamformed and interferometric data downloaded from the LTA

There is a caveat in that data retrieved from the LTA in a non-standard way, e.g. obtained directly from disk without having to stage the data, are not tracked in these plots. Also, some beamformed expert users based at ASTRON do not download their data from the LTA but copy it from CEP4. The data from some communities may therefore be underestimated/excluded.

Design: Kuenst.    Development: Dripl.    © 2020 ASTRON