Tutorial for trXPS for energy calibration using core level side-bands#

Preparation#

Import necessary libraries#

[1]:
%load_ext autoreload
%autoreload 2

from pathlib import Path
import os

from sed import SedProcessor
from sed.dataset import dataset
import numpy as np

%matplotlib widget
import matplotlib.pyplot as plt

### for automatic peak finding
from scipy.signal import find_peaks

Get data paths#

If it is your beamtime, you can read the raw data and write to the processed directory. For the public data, you can not write to the processed directory.

The paths are such that if you are on Maxwell, it uses those. Otherwise, data is downloaded in the current directory from Zenodo: https://zenodo.org/records/12609441

[2]:
beamtime_dir = "/asap3/flash/gpfs/pg2/2023/data/11019101" # on Maxwell
if os.path.exists(beamtime_dir) and os.access(beamtime_dir, os.R_OK):
    path = beamtime_dir + "/raw/hdf/offline/fl1user3"
    buffer_path = beamtime_dir + "/processed/tutorial/"
else:
    # data_path can be defined and used to store the data in a specific location
    dataset.get("W110") # Put in Path to a storage of at least 10 GByte free space.
    path = dataset.dir
    buffer_path = path + "/processed/"
INFO - Not downloading W110 data as it already exists at "/home/runner/work/sed/sed/docs/tutorial/datasets/W110".
Set 'use_existing' to False if you want to download to a new location.
INFO - Using existing data path for "W110": "/home/runner/work/sed/sed/docs/tutorial/datasets/W110"
INFO - W110 data is already present.

Config setup#

Here, we get the path to the config file and set up the relevant directories. This can also be done directly in the config file.

[3]:
# pick the default configuration file for hextof@FLASH
config_file = Path('../sed/config/flash_example_config.yaml')
assert config_file.exists()
[4]:
# here we setup a dictionary that will be used to override the path configuration
config_override = {
    "core": {
        "beamtime_id": 11019101,
        "paths": {
            "data_raw_dir": path,
            "data_parquet_dir": buffer_path
        },
    },
}

Reference calibration from a bias series#

[5]:
sp_44455 = SedProcessor(runs=[44455], config=config_override, system_config=config_file)
sp_44455.add_jitter()
sp_44455.align_dld_sectors()
Folder config loaded from: [/home/runner/work/sed/sed/docs/tutorial/sed_config.yaml]
System config loaded from: [/home/runner/work/sed/sed/docs/sed/config/flash_example_config.yaml]
Default config loaded from: [/home/runner/work/sed/sed/sed/config/default.yaml]
Reading files: 0 new files of 6 total.
All files converted successfully!
Filling nan values...
loading complete in  0.04 s

find calibration parameters#

We now will fit the tof-energy relation. This is done by finding the maxima of a peak in the tof spectrum, and then fitting the square root relation to obtain the calibration parameters.

[6]:
axes = ['sampleBias','dldTimeSteps']
bins = [4, 250]
ranges = [[77.5,81.5],  [4050,4500]]
res = sp_44455.compute(bins=bins, axes=axes, ranges=ranges)
sp_44455.load_bias_series(binned_data=res)
[7]:
ranges=(4120, 4200)
ref_id=0
sp_44455.find_bias_peaks(ranges=ranges, ref_id=ref_id, apply=True)

We offset the reference energy by the different in bias voltages between this run and run 44498

[8]:
sp_44455.calibrate_energy_axis(
    ref_id=0,
    ref_energy=-34.9,
    method="lmfit",
    energy_scale='kinetic',
    d={'value':1.0,'min': .7, 'max':1.0, 'vary':True},
    t0={'value':5e-7, 'min': 1e-7, 'max': 1e-6, 'vary':True},
    E0={'value': 0., 'min': -200, 'max': 100, 'vary': True},
    verbose=True,
)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 46
    # data points      = 4
    # variables        = 3
    chi-square         = 0.00151332
    reduced chi-square = 0.00151332
    Akaike info crit   = -25.5189696
    Bayesian info crit = -27.3600865
[[Variables]]
    d:   0.80966772 +/- 1.56525760 (193.32%) (init = 1)
    t0:  4.0148e-07 +/- 1.6505e-07 (41.11%) (init = 5e-07)
    E0: -101.048293 +/- 12.5092127 (12.38%) (init = 0)
[[Correlations]] (unreported correlations are < 0.100)
    C(d, t0)  = -0.9999
    C(d, E0)  = -0.9998
    C(t0, E0) = +0.9995
Quality of Calibration:
E/TOF relationship:

Now that we have the calibration parameters, we can generate the energy axis for each spectrum

[9]:
sp_44455.save_energy_calibration("reference_calib.yaml")
Saved energy calibration parameters to "reference_calib.yaml".

Now we can use those parameters and load our trXPS data using the additional config file#

[10]:
run_number = 44498
sp_44498 = SedProcessor(runs=[run_number], config=config_override, folder_config="reference_calib.yaml", system_config=config_file, verbose=True)
sp_44498.add_jitter()
sp_44498.append_energy_axis()
Folder config loaded from: [/home/runner/work/sed/sed/docs/tutorial/reference_calib.yaml]
System config loaded from: [/home/runner/work/sed/sed/docs/sed/config/flash_example_config.yaml]
Default config loaded from: [/home/runner/work/sed/sed/sed/config/default.yaml]
Reading files: 0 new files of 14 total.
All files converted successfully!
Filling nan values...
loading complete in  0.04 s
Adding energy column to dataframe:
Using energy calibration parameters generated on 12/21/2024, 23:40:18
Dask DataFrame Structure:
               trainId pulseId electronId timeStamp  dldPosX  dldPosY dldTimeSteps cryoTemperature dldTimeBinSize extractorCurrent extractorVoltage sampleBias sampleTemperature tofVoltage pulserSignAdc monochromatorPhotonEnergy   gmdBda      bam delayStage dldSectorID   energy
npartitions=14
                uint32   int64      int64   float64  float64  float64      float64         float32        float32          float32          float32    float32           float32    float32       float32                   float32  float32  float32    float32        int8  float64
                   ...     ...        ...       ...      ...      ...          ...             ...            ...              ...              ...        ...               ...        ...           ...                       ...      ...      ...        ...         ...      ...
...                ...     ...        ...       ...      ...      ...          ...             ...            ...              ...              ...        ...               ...        ...           ...                       ...      ...      ...        ...         ...      ...
                   ...     ...        ...       ...      ...      ...          ...             ...            ...              ...              ...        ...               ...        ...           ...                       ...      ...      ...        ...         ...      ...
                   ...     ...        ...       ...      ...      ...          ...             ...            ...              ...              ...        ...               ...        ...           ...                       ...      ...      ...        ...         ...      ...
Dask Name: assign, 17 graph layers

And bin an energy spectrum for reference

[11]:
axes = ['energy']
ranges = [[-37.5,-27.5]]
bins = [200]
res_ref = sp_44498.compute(bins=bins, axes=axes, ranges=ranges)

plt.figure()
res_ref.plot()
[11]:
[<matplotlib.lines.Line2D at 0x7fe52f74ebe0>]

Energy calibration using side-band peaks#

Visualize trXPS data bin in the dldTimeSteps and the corrected delay axis to prepare for energy calibration using SB#

We now prepare for an alternative energy calibration based on the side-bands of the time-dependent dataset. This is e.g. helpful if no bias series has been obtained.

[12]:
run_number = 44498
sp_44498 = SedProcessor(runs=[run_number], config=config_override, system_config=config_file, verbose=True)
sp_44498.add_jitter()
Folder config loaded from: [/home/runner/work/sed/sed/docs/tutorial/sed_config.yaml]
System config loaded from: [/home/runner/work/sed/sed/docs/sed/config/flash_example_config.yaml]
Default config loaded from: [/home/runner/work/sed/sed/sed/config/default.yaml]
Reading files: 0 new files of 14 total.
All files converted successfully!
Filling nan values...
loading complete in  0.04 s

We correct delay stage, t0 position and BAM (see previous tutorial)#

[13]:
sp_44498.add_delay_offset(
    constant=-1448, # this is time zero position determined from side band fit
    flip_delay_axis=True, # invert the direction of the delay axis
    columns=['bam'], # use the bam to offset the values
    weights=[-0.001], # bam is in fs, delay in ps
    preserve_mean=True # preserve the mean of the delay axis to keep t0 position
)
Adding delay offset to dataframe:
Delay offset parameters:
   Column[bam]: Weight=-0.001, Preserve Mean: True,  Reductions: None.
   Constant: -1448
   Flip delay axis: True
Dask DataFrame Structure:
               trainId pulseId electronId timeStamp  dldPosX  dldPosY dldTimeSteps cryoTemperature dldTimeBinSize extractorCurrent extractorVoltage sampleBias sampleTemperature tofVoltage pulserSignAdc monochromatorPhotonEnergy   gmdBda      bam delayStage dldSectorID
npartitions=14
                uint32   int64      int64   float64  float64  float64      float64         float32        float32          float32          float32    float32           float32    float32       float32                   float32  float32  float32    float64        int8
                   ...     ...        ...       ...      ...      ...          ...             ...            ...              ...              ...        ...               ...        ...           ...                       ...      ...      ...        ...         ...
...                ...     ...        ...       ...      ...      ...          ...             ...            ...              ...              ...        ...               ...        ...           ...                       ...      ...      ...        ...         ...
                   ...     ...        ...       ...      ...      ...          ...             ...            ...              ...              ...        ...               ...        ...           ...                       ...      ...      ...        ...         ...
                   ...     ...        ...       ...      ...      ...          ...             ...            ...              ...              ...        ...               ...        ...           ...                       ...      ...      ...        ...         ...
Dask Name: assign, 26 graph layers
[14]:
axes = ['dldTimeSteps', 'delayStage']
ranges = [[3900,4200], [-1.5,1.5]]
bins = [100,60]
res_corr = sp_44498.compute(bins=bins, axes=axes, ranges=ranges, normalize_to_acquisition_time="delayStage")

fig,ax = plt.subplots(1,2,figsize=(8,3), layout='constrained')
fig.suptitle(f"Run {run_number}: W 4f, side bands")
res_corr.plot(robust=True, ax=ax[0], cmap='terrain')
ax[0].set_title('raw')
bg = res_corr.sel(delayStage=slice(-1.3,-1.0)).mean('delayStage')
(res_corr-bg).plot(robust=True, ax=ax[1])
ax[1].set_title('difference')
Calculate normalization histogram for axis 'delayStage'...
[14]:
Text(0.5, 1.0, 'difference')

Automatically extract number and position of peaks in the ROI around t0#

[15]:
# binned data
roi = slice(3980, 4160)
delay = slice(-0.5,0.5)
data = res_corr.sel(dldTimeSteps = roi, delayStage=delay).sum('delayStage')
distance = 7
peaks, _ = find_peaks(data, height=None, distance=distance)

p1SB = data[peaks]['dldTimeSteps'][0]
W4f5 = data[peaks]['dldTimeSteps'][1]
m1SB = data[peaks]['dldTimeSteps'][2]
W4f7 = data[peaks]['dldTimeSteps'][3]
mm1SB = data[peaks]['dldTimeSteps'][4]
plt.figure()
data.plot()
plt.scatter(data[peaks]['dldTimeSteps'], data[peaks], c='r')#, "x")
plt.vlines([p1SB-7,p1SB+7], 0, 150, color='violet', linestyles='dashed', label='$1^{st}$ order SB')
plt.vlines([W4f5-7,W4f5+7], 0, 150, color='b', linestyles='dashed', label='W 4f 7/2')
plt.vlines([m1SB-7,m1SB+7], 0, 150, color='g', linestyles='dashed', label='$-1^{st}$ order SB')
plt.vlines([W4f7-7,W4f7+7], 0, 150, color='r', linestyles='dashed', label='W 4f 5/2')
plt.vlines([mm1SB-7,mm1SB+7], 0, 150, color='orange', linestyles='dashed', label='$2nd -1^{st}$ order SB')
plt.legend()
plt.show()

find calibration parameters#

We now will fit the tof-energy relation. This is done using the maxima of a peak in the ToF spectrum and the known kinetic energy of those peaks (kinetic energy of e.g. W4f peaks (-31.4 and -33.6 eV) and their SB of different orders accounting energy of pump beam of 1030 nm = 1.2 eV. The calibration parameters are obtained by fitting the square root relation.

[16]:
### Kinetic energy of w4f peaks and their SB
ref_energy = -31.4
ref_id = 1
sp_44498.ec.biases = -1*np.array([-30.2,-31.4,-32.6,-33.6,-34.8])
sp_44498.ec.peaks = np.expand_dims(data[peaks]['dldTimeSteps'].data,1)
sp_44498.ec.tof = res_corr.dldTimeSteps.data

sp_44498.calibrate_energy_axis(
    ref_id=ref_id,
    ref_energy=ref_energy,
    method="lmfit",
    d={'value':1.0,'min': .8, 'max':1.0, 'vary':True},
    t0={'value':5e-7, 'min': 1e-7, 'max': 1e-6, 'vary':True},
    E0={'value': -100., 'min': -200, 'max': 15, 'vary': True},
    labels="",
    verbose=True)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 123
    # data points      = 5
    # variables        = 3
    chi-square         = 0.04811488
    reduced chi-square = 0.02405744
    Akaike info crit   = -17.2180090
    Bayesian info crit = -18.3896953
[[Variables]]
    d:   0.80482246 +/- 0.56768800 (70.54%) (init = 1)
    t0:  4.0567e-07 +/- 2.9002e-07 (71.49%) (init = 5e-07)
    E0: -59.1600349 +/- 29.3415291 (49.60%) (init = -100)
[[Correlations]] (unreported correlations are < 0.100)
    C(d, t0)  = -0.9999
    C(d, E0)  = -0.9997
    C(t0, E0) = +0.9992
E/TOF relationship:

Append energy axis into a data frame, bin and visualize data in the calibrated energy and corrected delay axis#

[17]:
sp_44498.append_energy_axis()
Adding energy column to dataframe:
Using energy calibration parameters generated on 12/21/2024, 23:40:28
Dask DataFrame Structure:
               trainId pulseId electronId timeStamp  dldPosX  dldPosY dldTimeSteps cryoTemperature dldTimeBinSize extractorCurrent extractorVoltage sampleBias sampleTemperature tofVoltage pulserSignAdc monochromatorPhotonEnergy   gmdBda      bam delayStage dldSectorID   energy
npartitions=14
                uint32   int64      int64   float64  float64  float64      float64         float32        float32          float32          float32    float32           float32    float32       float32                   float32  float32  float32    float64        int8  float64
                   ...     ...        ...       ...      ...      ...          ...             ...            ...              ...              ...        ...               ...        ...           ...                       ...      ...      ...        ...         ...      ...
...                ...     ...        ...       ...      ...      ...          ...             ...            ...              ...              ...        ...               ...        ...           ...                       ...      ...      ...        ...         ...      ...
                   ...     ...        ...       ...      ...      ...          ...             ...            ...              ...              ...        ...               ...        ...           ...                       ...      ...      ...        ...         ...      ...
                   ...     ...        ...       ...      ...      ...          ...             ...            ...              ...              ...        ...               ...        ...           ...                       ...      ...      ...        ...         ...      ...
Dask Name: assign, 36 graph layers
[18]:
axes = ['energy', 'delayStage']
ranges = [[-37.5,-27.5], [-1.5,1.5]]
bins = [200,60]
res_corr = sp_44498.compute(bins=bins, axes=axes, ranges=ranges, normalize_to_acquisition_time="delayStage")

fig,ax = plt.subplots(1,2,figsize=(8,3), layout='constrained')
fig.suptitle(f"Run {run_number}: W 4f, side bands")
res_corr.plot(robust=True, ax=ax[0], cmap='terrain')
ax[0].set_title('raw')
bg = res_corr.sel(delayStage=slice(-1.3,-1.0)).mean('delayStage')
(res_corr-bg).plot(robust=True, ax=ax[1])
ax[1].set_title('difference')
Calculate normalization histogram for axis 'delayStage'...
[18]:
Text(0.5, 1.0, 'difference')

Compare to reference#

While this calibration methods gives a reasonable approximation to the energy axis, there are some deviations to the bias method, so it should be used with care

[19]:
axes = ['energy']
ranges = [[-37.5,-27.5]]
bins = [200]
res_1D = sp_44498.compute(bins=bins, axes=axes, ranges=ranges)

plt.figure()
(res_ref/res_ref.max()).plot(label="bias series calibration")
(res_1D/res_1D.max()).plot(label="side band calibration")
plt.legend()
[19]:
<matplotlib.legend.Legend at 0x7fe4f40d0df0>
[ ]: