Tutorial for trXPS for energy calibration using core level side-bands#
Preparation#
Import necessary libraries#
[1]:
%load_ext autoreload
%autoreload 2
from pathlib import Path
import os
from sed import SedProcessor
from sed.dataset import dataset
import numpy as np
%matplotlib widget
import matplotlib.pyplot as plt
### for automatic peak finding
from scipy.signal import find_peaks
Get data paths#
If it is your beamtime, you can read the raw data and write to the processed directory. For the public data, you can not write to the processed directory.
The paths are such that if you are on Maxwell, it uses those. Otherwise, data is downloaded in the current directory from Zenodo: https://zenodo.org/records/12609441
[2]:
beamtime_dir = "/asap3/flash/gpfs/pg2/2023/data/11019101" # on Maxwell
if os.path.exists(beamtime_dir) and os.access(beamtime_dir, os.R_OK):
path = beamtime_dir + "/raw/hdf/offline/fl1user3"
buffer_path = beamtime_dir + "/processed/tutorial/"
else:
# data_path can be defined and used to store the data in a specific location
dataset.get("W110") # Put in Path to a storage of at least 10 GByte free space.
path = dataset.dir
buffer_path = path + "/processed/"
INFO - Not downloading W110 data as it already exists at "/home/runner/work/sed/sed/docs/tutorial/datasets/W110".
Set 'use_existing' to False if you want to download to a new location.
INFO - Using existing data path for "W110": "/home/runner/work/sed/sed/docs/tutorial/datasets/W110"
INFO - W110 data is already present.
Config setup#
Here, we get the path to the config file and set up the relevant directories. This can also be done directly in the config file.
[3]:
# pick the default configuration file for hextof@FLASH
config_file = Path('../src/sed/config/flash_example_config.yaml')
assert config_file.exists()
[4]:
# here we setup a dictionary that will be used to override the path configuration
config_override = {
"core": {
"beamtime_id": 11019101,
"paths": {
"raw": path,
"processed": buffer_path
},
},
}
Reference calibration from a bias series#
[5]:
sp_44455 = SedProcessor(runs=[44455], config=config_override, system_config=config_file)
sp_44455.add_jitter()
sp_44455.align_dld_sectors()
INFO - Folder config loaded from: [/home/runner/work/sed/sed/docs/tutorial/sed_config.yaml]
INFO - System config loaded from: [/home/runner/work/sed/sed/docs/src/sed/config/flash_example_config.yaml]
INFO - Default config loaded from: [/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/sed/config/default.yaml]
INFO - Reading files: 0 new files of 6 total.
loading complete in 0.08 s
INFO - add_jitter: Added jitter to columns ['dldPosX', 'dldPosY', 'dldTimeSteps'].
INFO - Aligning 8s sectors of dataframe
INFO - Dask DataFrame Structure:
trainId pulseId electronId dldPosX dldPosY dldTimeSteps pulserSignAdc bam timeStamp monochromatorPhotonEnergy gmdBda delayStage sampleBias tofVoltage extractorVoltage extractorCurrent cryoTemperature sampleTemperature dldTimeBinSize dldSectorID
npartitions=6
uint32 int64 int64 float64 float64 float32 float32 float32 float64 float32 float32 float32 float32 float32 float32 float32 float32 float32 float32 int8
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Dask Name: assign, 16 graph layers
find calibration parameters#
We now will fit the tof-energy relation. This is done by finding the maxima of a peak in the tof spectrum, and then fitting the square root relation to obtain the calibration parameters.
[6]:
axes = ['sampleBias','dldTimeSteps']
bins = [4, 250]
ranges = [[77.5,81.5], [4050,4500]]
res = sp_44455.compute(bins=bins, axes=axes, ranges=ranges)
sp_44455.load_bias_series(binned_data=res)
[7]:
ranges=(4120, 4200)
ref_id=0
sp_44455.find_bias_peaks(ranges=ranges, ref_id=ref_id, apply=True)
INFO - Use feature ranges: [(4120.2, 4199.4), (4156.2, 4231.8), (4195.8, 4282.2), (4237.2, 4325.4)].
INFO - Extracted energy features: [[4.1472e+03 1.0000e+00]
[4.1850e+03 1.0000e+00]
[4.2246e+03 1.0000e+00]
[4.2678e+03 1.0000e+00]].
[8]:
sp_44455.calibrate_energy_axis(
ref_energy=-31.4,
method="lmfit",
energy_scale='kinetic',
d={'value':1.0,'min': .7, 'max':1.0, 'vary':True},
t0={'value':5e-7, 'min': 1e-7, 'max': 1e-6, 'vary':True},
E0={'value': 0., 'min': -200, 'max': 100, 'vary': True},
)
INFO - [[Fit Statistics]]
# fitting method = leastsq
# function evals = 46
# data points = 4
# variables = 3
chi-square = 0.00151332
reduced chi-square = 0.00151332
Akaike info crit = -25.5189696
Bayesian info crit = -27.3600865
[[Variables]]
d: 0.80966772 +/- 1.56525760 (193.32%) (init = 1)
t0: 4.0148e-07 +/- 1.6505e-07 (41.11%) (init = 5e-07)
E0: -101.048293 +/- 12.5092127 (12.38%) (init = 0)
[[Correlations]] (unreported correlations are < 0.100)
C(d, t0) = -0.9999
C(d, E0) = -0.9998
C(t0, E0) = +0.9995
Now that we have the calibration parameters, we can generate the energy axis for each spectrum
[9]:
sp_44455.save_energy_calibration("reference_calib.yaml")
INFO - Saved energy calibration parameters to "reference_calib.yaml".
Now we can use those parameters and load our trXPS data using the additional config file#
To obtain a correct energy axis, we offset the energy axis by the difference of photon energy between this run and the energy calibration runs
[10]:
run_number = 44498
sp_44498 = SedProcessor(runs=[run_number], config=config_override, folder_config="reference_calib.yaml", system_config=config_file, verbose=True)
sp_44498.add_jitter()
sp_44498.append_energy_axis()
sp_44498.add_energy_offset(
constant=1,
columns=['monochromatorPhotonEnergy','tofVoltage'],
weights=[-1,-1],
preserve_mean=[True, True],
)
INFO - Folder config loaded from: [/home/runner/work/sed/sed/docs/tutorial/reference_calib.yaml]
INFO - System config loaded from: [/home/runner/work/sed/sed/docs/src/sed/config/flash_example_config.yaml]
INFO - Default config loaded from: [/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/sed/config/default.yaml]
INFO - Reading files: 0 new files of 14 total.
loading complete in 0.08 s
INFO - add_jitter: Added jitter to columns ['dldPosX', 'dldPosY', 'dldTimeSteps'].
INFO - Adding energy column to dataframe:
INFO - Using energy calibration parameters generated on 02/05/2025, 22:09:16
INFO - Dask DataFrame Structure:
trainId pulseId electronId dldPosX dldPosY dldTimeSteps pulserSignAdc bam timeStamp monochromatorPhotonEnergy gmdBda delayStage sampleBias tofVoltage extractorVoltage extractorCurrent cryoTemperature sampleTemperature dldTimeBinSize dldSectorID energy
npartitions=14
uint32 int64 int64 float64 float64 float64 float32 float32 float64 float32 float32 float32 float32 float32 float32 float32 float32 float32 float32 int8 float64
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Dask Name: assign, 29 graph layers
INFO - Adding energy offset to dataframe:
INFO - Energy offset parameters:
Column[monochromatorPhotonEnergy]: Weight=-1, Preserve Mean: True, Reductions: None.
Column[tofVoltage]: Weight=-1, Preserve Mean: True, Reductions: None.
Constant: 1
INFO - Dask DataFrame Structure:
trainId pulseId electronId dldPosX dldPosY dldTimeSteps pulserSignAdc bam timeStamp monochromatorPhotonEnergy gmdBda delayStage sampleBias tofVoltage extractorVoltage extractorCurrent cryoTemperature sampleTemperature dldTimeBinSize dldSectorID energy
npartitions=14
uint32 int64 int64 float64 float64 float64 float32 float32 float64 float32 float32 float32 float32 float32 float32 float32 float32 float32 float32 int8 float64
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Dask Name: assign, 62 graph layers
And bin an energy spectrum for reference
[11]:
axes = ['energy']
ranges = [[-37.5,-27.5]]
bins = [200]
res_ref = sp_44498.compute(bins=bins, axes=axes, ranges=ranges)
plt.figure()
res_ref.plot()
[11]:
[<matplotlib.lines.Line2D at 0x7fae0409ba00>]
Energy calibration using side-band peaks#
Visualize trXPS data bin in the dldTimeSteps and the corrected delay axis to prepare for energy calibration using SB#
We now prepare for an alternative energy calibration based on the side-bands of the time-dependent dataset. This is e.g. helpful if no bias series has been obtained.
[12]:
run_number = 44498
sp_44498 = SedProcessor(runs=[run_number], config=config_override, system_config=config_file, verbose=True)
sp_44498.add_jitter()
INFO - Folder config loaded from: [/home/runner/work/sed/sed/docs/tutorial/sed_config.yaml]
INFO - System config loaded from: [/home/runner/work/sed/sed/docs/src/sed/config/flash_example_config.yaml]
INFO - Default config loaded from: [/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/sed/config/default.yaml]
INFO - Reading files: 0 new files of 14 total.
loading complete in 0.08 s
INFO - add_jitter: Added jitter to columns ['dldPosX', 'dldPosY', 'dldTimeSteps'].
We correct delay stage, t0 position and BAM (see previous tutorial)#
[13]:
sp_44498.add_delay_offset(
constant=-1448, # this is time zero position determined from side band fit
flip_delay_axis=True, # invert the direction of the delay axis
columns=['bam'], # use the bam to offset the values
weights=[-0.001], # bam is in fs, delay in ps
preserve_mean=True # preserve the mean of the delay axis to keep t0 position
)
INFO - Adding delay offset to dataframe:
INFO - Delay offset parameters:
Column[bam]: Weight=-0.001, Preserve Mean: True, Reductions: None.
Constant: -1448
Flip delay axis: True
INFO - Dask DataFrame Structure:
trainId pulseId electronId dldPosX dldPosY dldTimeSteps pulserSignAdc bam timeStamp monochromatorPhotonEnergy gmdBda delayStage sampleBias tofVoltage extractorVoltage extractorCurrent cryoTemperature sampleTemperature dldTimeBinSize dldSectorID
npartitions=14
uint32 int64 int64 float64 float64 float64 float32 float32 float64 float32 float32 float64 float32 float32 float32 float32 float32 float32 float32 int8
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Dask Name: assign, 34 graph layers
[14]:
axes = ['dldTimeSteps', 'delayStage']
ranges = [[3900,4200], [-1.5,1.5]]
bins = [100,60]
res_corr = sp_44498.compute(bins=bins, axes=axes, ranges=ranges, normalize_to_acquisition_time="delayStage")
fig,ax = plt.subplots(1,2,figsize=(8,3), layout='constrained')
fig.suptitle(f"Run {run_number}: W 4f, side bands")
res_corr.plot(ax=ax[0], cmap='terrain')
ax[0].set_title('raw')
bg = res_corr.sel(delayStage=slice(-1.3,-1.0)).mean('delayStage')
(res_corr-bg).plot(ax=ax[1])
ax[1].set_title('difference')
INFO - Calculate normalization histogram for axis 'delayStage'...
[14]:
Text(0.5, 1.0, 'difference')
Automatically extract number and position of peaks in the ROI around t0#
[15]:
# binned data
roi = slice(3980, 4160)
delay = slice(-0.5,0.5)
data = res_corr.sel(dldTimeSteps = roi, delayStage=delay).sum('delayStage')
distance = 7
peaks, _ = find_peaks(data, height=None, distance=distance)
p1SB = data[peaks]['dldTimeSteps'][0]
W4f5 = data[peaks]['dldTimeSteps'][1]
m1SB = data[peaks]['dldTimeSteps'][2]
W4f7 = data[peaks]['dldTimeSteps'][3]
mm1SB = data[peaks]['dldTimeSteps'][4]
plt.figure()
data.plot()
plt.scatter(data[peaks]['dldTimeSteps'], data[peaks], c='r')#, "x")
plt.vlines([p1SB-7,p1SB+7], 0, 150, color='violet', linestyles='dashed', label='$1^{st}$ order SB')
plt.vlines([W4f5-7,W4f5+7], 0, 150, color='b', linestyles='dashed', label='W 4f 7/2')
plt.vlines([m1SB-7,m1SB+7], 0, 150, color='g', linestyles='dashed', label='$-1^{st}$ order SB')
plt.vlines([W4f7-7,W4f7+7], 0, 150, color='r', linestyles='dashed', label='W 4f 5/2')
plt.vlines([mm1SB-7,mm1SB+7], 0, 150, color='orange', linestyles='dashed', label='$2nd -1^{st}$ order SB')
plt.legend()
plt.show()
find calibration parameters#
We now will fit the tof-energy relation. This is done using the maxima of a peak in the ToF spectrum and the known kinetic energy of those peaks (kinetic energy of e.g. W4f peaks (-31.4 and -33.6 eV) and their SB of different orders accounting energy of pump beam of 1030 nm = 1.2 eV. The calibration parameters are obtained by fitting the square root relation.
[16]:
### Kinetic energy of w4f peaks and their SB
ref_energy = -30.2
sp_44498.ec.biases = -1*np.array([-30.2,-31.4,-32.6,-33.6,-34.8])
sp_44498.ec.peaks = np.expand_dims(data[peaks]['dldTimeSteps'].data,1)
sp_44498.ec.tof = res_corr.dldTimeSteps.data
sp_44498.calibrate_energy_axis(
ref_energy=ref_energy,
method="lmfit",
d={'value':1.0,'min': .8, 'max':1.0, 'vary':True},
t0={'value':5e-7, 'min': 1e-7, 'max': 1e-6, 'vary':True},
E0={'value': -100., 'min': -200, 'max': 15, 'vary': True},
)
INFO - [[Fit Statistics]]
# fitting method = leastsq
# function evals = 123
# data points = 5
# variables = 3
chi-square = 0.04811488
reduced chi-square = 0.02405744
Akaike info crit = -17.2180090
Bayesian info crit = -18.3896953
[[Variables]]
d: 0.80482246 +/- 0.56768800 (70.54%) (init = 1)
t0: 4.0567e-07 +/- 2.9002e-07 (71.49%) (init = 5e-07)
E0: -59.1600349 +/- 29.3415291 (49.60%) (init = -100)
[[Correlations]] (unreported correlations are < 0.100)
C(d, t0) = -0.9999
C(d, E0) = -0.9997
C(t0, E0) = +0.9992
Append energy axis into a data frame, bin and visualize data in the calibrated energy and corrected delay axis#
To get a correct energy axis, we undo the shifts imposed by the calibration function
[17]:
sp_44498.append_energy_axis()
sp_44498.add_energy_offset(
constant=30.2,
columns=['monochromatorPhotonEnergy','tofVoltage','sampleBias'],
weights=[-1,-1,-1],
preserve_mean=[True, True,False],
)
INFO - Adding energy column to dataframe:
INFO - Using energy calibration parameters generated on 02/05/2025, 22:09:27
INFO - Dask DataFrame Structure:
trainId pulseId electronId dldPosX dldPosY dldTimeSteps pulserSignAdc bam timeStamp monochromatorPhotonEnergy gmdBda delayStage sampleBias tofVoltage extractorVoltage extractorCurrent cryoTemperature sampleTemperature dldTimeBinSize dldSectorID energy
npartitions=14
uint32 int64 int64 float64 float64 float64 float32 float32 float64 float32 float32 float64 float32 float32 float32 float32 float32 float32 float32 int8 float64
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Dask Name: assign, 49 graph layers
INFO - Adding energy offset to dataframe:
INFO - Energy offset parameters:
Column[monochromatorPhotonEnergy]: Weight=-1, Preserve Mean: True, Reductions: None.
Column[tofVoltage]: Weight=-1, Preserve Mean: True, Reductions: None.
Column[sampleBias]: Weight=-1, Preserve Mean: False, Reductions: None.
Constant: 30.2
INFO - Dask DataFrame Structure:
trainId pulseId electronId dldPosX dldPosY dldTimeSteps pulserSignAdc bam timeStamp monochromatorPhotonEnergy gmdBda delayStage sampleBias tofVoltage extractorVoltage extractorCurrent cryoTemperature sampleTemperature dldTimeBinSize dldSectorID energy
npartitions=14
uint32 int64 int64 float64 float64 float64 float32 float32 float64 float32 float32 float64 float32 float32 float32 float32 float32 float32 float32 int8 float64
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Dask Name: assign, 87 graph layers
[18]:
axes = ['energy', 'delayStage']
ranges = [[-37.5,-27.5], [-1.5,1.5]]
bins = [200,60]
res_corr = sp_44498.compute(bins=bins, axes=axes, ranges=ranges, normalize_to_acquisition_time="delayStage")
fig,ax = plt.subplots(1,2,figsize=(8,3), layout='constrained')
fig.suptitle(f"Run {run_number}: W 4f, side bands")
res_corr.plot(ax=ax[0], cmap='terrain')
ax[0].set_title('raw')
bg = res_corr.sel(delayStage=slice(-1.3,-1.0)).mean('delayStage')
(res_corr-bg).plot(ax=ax[1])
ax[1].set_title('difference')
INFO - Calculate normalization histogram for axis 'delayStage'...
[18]:
Text(0.5, 1.0, 'difference')
Compare to reference#
While this calibration methods gives a reasonable approximation to the energy axis, there are some deviations to the bias method, so it should be used with care
[19]:
axes = ['energy']
ranges = [[-37.5,-27.5]]
bins = [200]
res_1D = sp_44498.compute(bins=bins, axes=axes, ranges=ranges)
plt.figure()
(res_ref/res_ref.max()).plot(label="bias series calibration")
(res_1D/res_1D.max()).plot(label="side band calibration")
plt.legend()
[19]:
<matplotlib.legend.Legend at 0x7fae041cbcd0>
[ ]: