Tutorial for binning data from the HEXTOF instrument at FLASH#

Preparation#

Import necessary libraries#

[1]:

%load_ext autoreload
%autoreload 2
from typing import List
from pathlib import Path
import os

from sed import SedProcessor
from sed.dataset import dataset
import xarray as xr

%matplotlib widget
import matplotlib.pyplot as plt

Get data paths#

The paths are such that if you are on Maxwell, it uses those. Otherwise data is downloaded in current directory from Zenodo.

Generally, if it is your beamtime, you can both read the raw data and write to processed directory. However, for the public data, you can not write to processed directory.

[2]:

beamtime_dir = "/asap3/flash/gpfs/pg2/2023/data/11019101" # on Maxwell
if os.path.exists(beamtime_dir) and os.access(beamtime_dir, os.R_OK):
    path = beamtime_dir + "/raw/hdf/offline/fl1user3"
    meta_path = beamtime_dir + "/shared"
    buffer_path = "Gd_W110/processed/"
else:
    # data_path can be defined and used to store the data in a specific location
    dataset.get("Gd_W110") # Put in Path to a storage of at least 10 GByte free space.
    path = dataset.dir
    meta_path = path
    buffer_path = path + "/processed/"

INFO - Not downloading Gd_W110 data as it already exists at "/home/runner/work/sed/sed/docs/tutorial/datasets/Gd_W110".
Set 'use_existing' to False if you want to download to a new location.
INFO - Using existing data path for "Gd_W110": "/home/runner/work/sed/sed/docs/tutorial/datasets/Gd_W110"
INFO - Gd_W110 data is already present.

Config setup#

Here we get the path to the config file and setup the relevant directories. This can also be done directly in the config file.

[3]:

# pick the default configuration file for hextof@FLASH
config_file = Path('../src/sed/config/flash_example_config.yaml')
assert config_file.exists()

The path to the processed folder can also be defined as a keyword argument later.

[4]:

# here we setup a dictionary that will be used to override the path configuration
config_override = {
    "core": {
        "paths": {
            "raw": path,
            "processed": buffer_path,
        },
    },
}

cleanup previous config files#

In this notebook, we will show how calibration parameters can be generated. Therefore we want to clean the local directory of previously generated files.

WARNING running the cell below will delete the “sed_config.yaml” file in the local directory. If these contain precious calibration parameters, DO NOT RUN THIS CELL.

[5]:

local_folder_config = Path('./sed_config.yaml')
if local_folder_config.exists():
    os.remove(local_folder_config)
    print(f'deleted local config file {local_folder_config}')
assert not local_folder_config.exists()

deleted local config file sed_config.yaml

Load a chessy sample run#

The common starting point at a FLASH beamtime. Look at the Chessy sample!

run 44762: Chessy - FoV = 450 µm

Generate the Processor instance#

this cell generates an instance of the SedProcessor class. It will be our workhorse for the entire workflow.

Important note#

The following extra arguments are available for FlashLoader. None of which are necessary to give but helpful to know.

force_recreate: Probably the most useful. In case the config is changed, this allows to reduce the raw h5 files to the the intermediate parquet format again. Otherwise, the schema between the saved dataframe and config differs.
debug: Setting this runs the reduction process in serial, so the errors are easier to find.
remove_invalid_files: Sometimes some critical channels defined in the config are missing in some raw files. Setting this will make sure to ignore such files.
filter_timed_by_electron: Defaults to True. When True, the timed dataframe will only contain data points where valid electron events were detected. When False, all timed data points are included regardless of electron detection (see OpenCOMPES/sed#307)
processed_dir: Location to save the reduced parquet files.
scicat_token: Token from your scicat account.
detector: ‘1Q’ and ‘4Q’ detector for example. Useful when there are separate raw files for each detector.

[6]:

sp = SedProcessor(runs=[44762], config=config_override, system_config=config_file, collect_metadata=False)
# You can set collect_metadata=True if the scicat_url and scicat_token are defined

INFO - System config loaded from: [/home/runner/work/sed/sed/docs/src/sed/config/flash_example_config.yaml]
INFO - Default config loaded from: [/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/sed/config/default.yaml]
INFO - Reading files: 0 new files of 1 total.

Add Jitter#

In order to avoid artifacts arising from incommensurate binning sizes with those imposed during data collection, e.g. by the detector, we jitter all the digital columns.

[7]:

sp.add_jitter()

INFO - add_jitter: Added jitter to columns ['dldPosX', 'dldPosY', 'dldTimeSteps'].

inspect the dataframe#

Looking at the dataframe can give quick insight about the columns loaded and the data available.

sp.dataframe shows the structure of the dataframe without computing anything. Interesting here are the columns, and their type.
The sp.dataframe.head() function accesses the first 5 events in the dataframe, giving us a view of what the values of each column look like, without computing the whole thing. sp.dataframe.tail()does the same from the end.
sp.dataframe.compute() will compute the whole dataframe, and can take a while. We should avoid doing this.

[8]:

sp.dataframe

[8]:

Dask DataFrame Structure:

	trainId	pulseId	electronId	dldPosX	dldPosY	dldTimeSteps	bam	timeStamp	monochromatorPhotonEnergy	gmdBda	delayStage	sampleBias	tofVoltage	extractorVoltage	extractorCurrent	cryoTemperature	sampleTemperature	dldTimeBinSize	dldSectorID
npartitions=1
	uint32	int64	int64	float64	float64	float64	float32	float64	float32	float32	float32	float32	float32	float32	float32	float32	float32	float32	int8
	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...

Dask Name: apply_jitter, 14 graph layers

[9]:

sp.dataframe.head()

[9]:

	trainId	pulseId	electronId	dldPosX	dldPosY	dldTimeSteps	bam	timeStamp	monochromatorPhotonEnergy	gmdBda	delayStage	sampleBias	tofVoltage	extractorVoltage	extractorCurrent	cryoTemperature	sampleTemperature	dldTimeBinSize	dldSectorID
0	1646339970	12	0	781.310241	690.310241	3050.310241	8976.09375	1.679395e+09	51.023453	45.512794	1448.312988	-0.003489	29.999065	6029.370117	-0.070368	303.940002	304.940002	0.020576	7
1	1646339970	12	1	780.608687	690.608687	3048.608687	8976.09375	1.679395e+09	51.023453	45.512794	1448.312988	-0.003489	29.999065	6029.370117	-0.070368	303.940002	304.940002	0.020576	2
2	1646339970	22	0	562.146472	231.146472	5729.146472	8990.37500	1.679395e+09	51.023453	45.512794	1448.312988	-0.003489	29.999065	6029.370117	-0.070368	303.940002	304.940002	0.020576	3
3	1646339970	22	1	937.786909	946.786909	5729.786909	8990.37500	1.679395e+09	51.023453	45.512794	1448.312988	-0.003489	29.999065	6029.370117	-0.070368	303.940002	304.940002	0.020576	2
4	1646339970	27	0	535.525536	853.525536	1570.525536	8982.87500	1.679395e+09	51.023453	45.512794	1448.312988	-0.003489	29.999065	6029.370117	-0.070368	303.940002	304.940002	0.020576	0

Visualizing event histograms#

For getting a first impression of the data, and to determine binning ranges, the method sp.view_even_histogram() allows visualizing the events in one dataframe partition as histograms. Default axes and ranges are defined in the config, and show the dldPosX, dldPosY, and dldTimeStep columns:

[10]:

sp.view_event_histogram(dfpid=0)

Binning#

Here we define the parameters for binning the dataframe to an n-dimensional histogram, which we can then plot, analyze or save.

If you never saw this before, the type after : is a “hint” to what type the object to the left will have. We include them here to make sure you know what each variable should be.

a:int = 1 # a is an integer
b:float = 1.0 # b is a float
c:str = 1 # we hint c to be a string, but it is still an integer

This is totally optional, but can help you keep track of what you are doing.

[11]:

# the name of the axes on which we want to bin
axes: List[str] = ['dldPosY', 'dldPosX']
# the number of bins for each axis
bins: List[int] = [480, 480]
# for each axis, the range of values to consider
ranges: List[List[int]] = [[420,900], [420,900]]
# here we compute the histogram
res_chessy: xr.DataArray = sp.compute(bins=bins, axes=axes, ranges=ranges)

Optical Spot Profile#

Here we load runs 44798 and 44799, which show the profile of the optical spot on the same spatial view as in our chessy run above. The two differ in transmission, being \(T=1.0\) and \(T=0.5\) respectively.

[14]:

sp = SedProcessor(runs=[44798], config=config_override, system_config=config_file, collect_metadata=False)
sp.add_jitter()
res_t05: xr.DataArray = sp.compute(bins=bins, axes=axes, ranges=ranges)

sp = SedProcessor(runs=[44799], config=config_override, system_config=config_file, collect_metadata=False)
sp.add_jitter()
res_t10: xr.DataArray = sp.compute(bins=bins, axes=axes, ranges=ranges)

INFO - System config loaded from: [/home/runner/work/sed/sed/docs/src/sed/config/flash_example_config.yaml]
INFO - Default config loaded from: [/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/sed/config/default.yaml]
INFO - Reading files: 0 new files of 1 total.
INFO - add_jitter: Added jitter to columns ['dldPosX', 'dldPosY', 'dldTimeSteps'].

INFO - System config loaded from: [/home/runner/work/sed/sed/docs/src/sed/config/flash_example_config.yaml]
INFO - Default config loaded from: [/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/sed/config/default.yaml]
INFO - Reading files: 0 new files of 2 total.
INFO - add_jitter: Added jitter to columns ['dldPosX', 'dldPosY', 'dldTimeSteps'].

[15]:

fig,ax = plt.subplots(1,3,figsize=(6,2), layout='tight')
res_chessy.plot(ax=ax[0], robust=True, add_colorbar=False)
res_t05.plot(ax=ax[1], robust=True, add_colorbar=False)
res_t10.plot(ax=ax[2], robust=True, add_colorbar=False)

[15]:

<matplotlib.collections.QuadMesh at 0x7f35a48245e0>

TODO: here we can add the evaluation of the spot size.

Energy Calibration#

We now load a bias series, where the sample bias was varied, effectively shifting the energy spectra. This allows us to calibrate the conversion between the digital values of the dld and the energy.

[16]:

sp = SedProcessor(runs=[44797], config=config_override, system_config=config_file, collect_metadata=False)
sp.add_jitter()

INFO - System config loaded from: [/home/runner/work/sed/sed/docs/src/sed/config/flash_example_config.yaml]
INFO - Default config loaded from: [/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/sed/config/default.yaml]
INFO - Reading files: 0 new files of 5 total.
INFO - add_jitter: Added jitter to columns ['dldPosX', 'dldPosY', 'dldTimeSteps'].

We can use the view_event_histogram() function also to e.g. visualize the events per microbunch along the train, or hit multiplicity per microbunch:

[17]:

sp.view_event_histogram(dfpid=0, axes=["pulseId", "electronId"], ranges=[[0, 600], [0,10]], bins=[100, 10])

sector alignment#

as usual first we jitter, but here we also align in time the 8 sectors of the dld. This is done by finding the time of the maximum of the signal in each sector, and then shifting the signal in each sector by the difference between the maximum time and the time of the maximum in each sector.

For better precision, the photon peak can be used to track the energy shift.

[18]:

sp.align_dld_sectors()

INFO - Aligning 8s sectors of dataframe
INFO - Dask DataFrame Structure:
              trainId pulseId electronId  dldPosX  dldPosY dldTimeSteps      bam timeStamp monochromatorPhotonEnergy   gmdBda delayStage sampleBias tofVoltage extractorVoltage extractorCurrent cryoTemperature sampleTemperature dldTimeBinSize dldSectorID
npartitions=5
               uint32   int64      int64  float64  float64      float32  float32   float64                   float32  float32    float32    float32    float32          float32          float32         float32           float32        float32        int8
                  ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...
...               ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...
                  ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...
                  ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...
Dask Name: assign, 16 graph layers

time-of-flight spectrum#

to compare with what we see on the measurement computer, we might want to plot the time-of-flight spectrum. This is done here.

[19]:

sp.append_tof_ns_axis()

INFO - Adding time-of-flight column in nanoseconds to dataframe.
INFO - Dask DataFrame Structure:
              trainId pulseId electronId  dldPosX  dldPosY dldTimeSteps      bam timeStamp monochromatorPhotonEnergy   gmdBda delayStage sampleBias tofVoltage extractorVoltage extractorCurrent cryoTemperature sampleTemperature dldTimeBinSize dldSectorID  dldTime
npartitions=5
               uint32   int64      int64  float64  float64      float32  float32   float64                   float32  float32    float32    float32    float32          float32          float32         float32           float32        float32        int8  float64
                  ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...
...               ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...
                  ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...
                  ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...
Dask Name: assign, 22 graph layers

Now, to determine proper binning ranges, let’s have again a look at the event histograms:

[20]:

sp.view_event_histogram(dfpid=0, axes=["sampleBias", "dldTime"], ranges=[[27, 33], [650,1050]], bins=[50, 100])

[21]:

axes = ['sampleBias','dldTime']
bins = [5, 250]
ranges = [[28,33],  [650,800]]
res = sp.compute(bins=bins, axes=axes, ranges=ranges)

We binned not only in dldTime but also in sampleBias. This allows us to separate the spectra obtained at different bias values.

[22]:

plt.figure()
res.plot.line(x='dldTime'); # the ; here is to suppress an annoying output

find calibration parameters#

We now will fit the tof-energy relation. This is done by finding the maxima of a peak in the tof spectrum, and then fitting the square root relation to obtain the calibration parameters.

[23]:

axes = ['sampleBias', 'dldTimeSteps']
bins = [5, 500]
ranges = [[28,33], [4000, 4800]]
res = sp.compute(bins=bins, axes=axes, ranges=ranges)

[24]:

sp.load_bias_series(binned_data=res)

[25]:

ranges=(4120, 4200)
ref_id=0
sp.find_bias_peaks(ranges=ranges, ref_id=ref_id, apply=True)

INFO - Use feature ranges: [(4120.0, 4200.0), (4156.8, 4238.4), (4195.2, 4286.4), (4236.8, 4329.6), (4280.0, 4374.4)].
INFO - Extracted energy features: [[4.1488e+03 1.0000e+00]
 [4.1872e+03 1.0000e+00]
 [4.2288e+03 1.0000e+00]
 [4.2704e+03 1.0000e+00]
 [4.3152e+03 1.0000e+00]].

[26]:

sp.calibrate_energy_axis(
    ref_energy=-.55,
    method="lmfit",
    energy_scale='kinetic',
    d={'value':1.0,'min': .2, 'max':1.0, 'vary':False},
    t0={'value':5e-7, 'min': 1e-7, 'max': 1e-6, 'vary':True},
    E0={'value': 0., 'min': -100, 'max': 100, 'vary': True},
)

INFO - [[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 22
    # data points      = 5
    # variables        = 2
    chi-square         = 0.00130709
    reduced chi-square = 4.3570e-04
    Akaike info crit   = -37.2469349
    Bayesian info crit = -38.0280591
[[Variables]]
    d:   1 (fixed)
    t0:  3.5725e-07 +/- 7.4509e-10 (0.21%) (init = 5e-07)
    E0: -54.7889861 +/- 0.10961367 (0.20%) (init = 0)
[[Correlations]] (unreported correlations are < 0.100)
    C(t0, E0) = -0.9964

generate the energy axis#

Now that we have the calibration parameters, we can generate the energy axis for each spectrum

[27]:

sp.append_energy_axis()

INFO - Adding energy column to dataframe:
INFO - Using energy calibration parameters generated on 08/29/2025, 09:12:10
INFO - Dask DataFrame Structure:
              trainId pulseId electronId  dldPosX  dldPosY dldTimeSteps      bam timeStamp monochromatorPhotonEnergy   gmdBda delayStage sampleBias tofVoltage extractorVoltage extractorCurrent cryoTemperature sampleTemperature dldTimeBinSize dldSectorID  dldTime   energy
npartitions=5
               uint32   int64      int64  float64  float64      float32  float32   float64                   float32  float32    float32    float32    float32          float32          float32         float32           float32        float32        int8  float64  float64
                  ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...      ...
...               ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...      ...
                  ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...      ...
                  ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...      ...
Dask Name: assign, 37 graph layers

Lets have a look at the dataset, and the columns we added.

[28]:

sp.dataframe[['dldTime','dldTimeSteps','energy','dldSectorID']].head()

[28]:

	dldTime	dldTimeSteps	energy	dldSectorID
0	696.431900	4230.823730	-2.642492	1
1	696.703168	4232.471680	-2.681970	0
2	697.842892	4239.395508	-2.846807	1
3	697.829711	4239.315430	-2.844910	4
4	745.186563	4527.008301	-8.463299	1

Bin in energy#

With the newly added column, we can now bin directly in energy

[29]:

axes: List[str] = ['sampleBias', 'energy']
bins: List[int] = [5, 500]
ranges: List[List[int]] = [[28,33], [-10,10]]
res: xr.DataArray = sp.compute(bins=bins, axes=axes, ranges=ranges)

[30]:

plt.figure() # if you are using interactive plots, you'll need to generate a new figure explicitly every time.
res.mean('sampleBias').plot.line(x='energy',linewidth=3)
res.plot.line(x='energy',linewidth=1,alpha=.5);

correct offsets#

The energy axis is now correct, taking the sample bias of the measurement into account. Additionally, we can compensate the photon energy (monochromatorPhotonEnergy) and the tofVoltage.

[31]:

sp.add_energy_offset(
    columns=['monochromatorPhotonEnergy','tofVoltage'],
    weights=[-1,-1],
    preserve_mean=[True, True],
)

INFO - Adding energy offset to dataframe:
INFO - Energy offset parameters:
   Column[monochromatorPhotonEnergy]: Weight=-1, Preserve Mean: True, Reductions: None.
   Column[tofVoltage]: Weight=-1, Preserve Mean: True, Reductions: None.
INFO - Dask DataFrame Structure:
              trainId pulseId electronId  dldPosX  dldPosY dldTimeSteps      bam timeStamp monochromatorPhotonEnergy   gmdBda delayStage sampleBias tofVoltage extractorVoltage extractorCurrent cryoTemperature sampleTemperature dldTimeBinSize dldSectorID  dldTime   energy
npartitions=5
               uint32   int64      int64  float64  float64      float32  float32   float64                   float32  float32    float32    float32    float32          float32          float32         float32           float32        float32        int8  float64  float64
                  ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...      ...
...               ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...      ...
                  ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...      ...
                  ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...      ...
Dask Name: assign, 67 graph layers

Now we bin again and see the result

[32]:

axes = ['sampleBias', 'energy']
bins = [5, 500]
ranges = [[28,33], [-10,2]]
res = sp.compute(bins=bins, axes=axes, ranges=ranges)

[33]:

plt.figure()
ax = plt.subplot(111)
res.energy.attrs['unit'] = 'eV' # add units to the axes
res.mean('sampleBias').plot.line(x='energy',linewidth=3, ax=ax)
res.plot.line(x='energy',linewidth=1,alpha=.5,label='all',ax=ax);

save the calibration parameters#

The parameters we have found can be saved to a file, so that we can use them later. This means the calibration can be used for different runs.

[34]:

sp.save_energy_calibration()
sp.save_energy_offset()

INFO - Saved energy calibration parameters to "sed_config.yaml".
INFO - Saved energy offset parameters to "sed_config.yaml".

A more general function, which saves parameters for all the calibrations performed. Use either the above or below function. They are equivalent (and overwrite each other)

[35]:

sp.save_workflow_params()

INFO - Saved energy calibration parameters to "sed_config.yaml".
INFO - Saved energy offset parameters to "sed_config.yaml".

Correct delay axis#

To calibrate the pump-probe delay axis, we need to shift the delay stage values to center the pump-probe-time overlap time zero. Also, we want to correct the SASE jitter, using information from the bam column.

Here we load multiple runs at once

[36]:

sp = SedProcessor(
    runs=[44824,44825,44826,44827],
    config=config_override,
    system_config=config_file,
    collect_metadata=False,
)

INFO - Folder config loaded from: [/home/runner/work/sed/sed/docs/tutorial/sed_config.yaml]
INFO - System config loaded from: [/home/runner/work/sed/sed/docs/src/sed/config/flash_example_config.yaml]
INFO - Default config loaded from: [/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/sed/config/default.yaml]
INFO - Reading files: 0 new files of 62 total.

Run the workflow from the config file#

as we have saved some calibration and correction parameters, we can now run the workflow from the config file. This is done by calling each of the correction functions, with no parameters. The functions will then load the parameters from the config file.

[37]:

sp.add_jitter()
sp.align_dld_sectors()
sp.append_energy_axis()
sp.add_energy_offset()

INFO - add_jitter: Added jitter to columns ['dldPosX', 'dldPosY', 'dldTimeSteps'].
INFO - Aligning 8s sectors of dataframe
INFO - Dask DataFrame Structure:
               trainId pulseId electronId  dldPosX  dldPosY dldTimeSteps      bam timeStamp monochromatorPhotonEnergy   gmdBda delayStage sampleBias tofVoltage extractorVoltage extractorCurrent cryoTemperature sampleTemperature dldTimeBinSize dldSectorID
npartitions=62
                uint32   int64      int64  float64  float64      float32  float32   float64                   float32  float32    float32    float32    float32          float32          float32         float32           float32        float32        int8
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...
...                ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...
Dask Name: assign, 16 graph layers
INFO - Adding energy column to dataframe:
INFO - Using energy calibration parameters generated on 08/29/2025, 09:12:10
INFO - Dask DataFrame Structure:
               trainId pulseId electronId  dldPosX  dldPosY dldTimeSteps      bam timeStamp monochromatorPhotonEnergy   gmdBda delayStage sampleBias tofVoltage extractorVoltage extractorCurrent cryoTemperature sampleTemperature dldTimeBinSize dldSectorID   energy
npartitions=62
                uint32   int64      int64  float64  float64      float32  float32   float64                   float32  float32    float32    float32    float32          float32          float32         float32           float32        float32        int8  float64
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...
...                ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...
Dask Name: assign, 31 graph layers
INFO - Adding energy offset to dataframe:
INFO - Using energy offset parameters generated on 08/29/2025, 09:12:13
INFO - Energy offset parameters:
   Column[monochromatorPhotonEnergy]: Weight=-1.0, Preserve Mean: True, Reductions: None.
   Column[tofVoltage]: Weight=-1.0, Preserve Mean: True, Reductions: None.
INFO - Dask DataFrame Structure:
               trainId pulseId electronId  dldPosX  dldPosY dldTimeSteps      bam timeStamp monochromatorPhotonEnergy   gmdBda delayStage sampleBias tofVoltage extractorVoltage extractorCurrent cryoTemperature sampleTemperature dldTimeBinSize dldSectorID   energy
npartitions=62
                uint32   int64      int64  float64  float64      float32  float32   float64                   float32  float32    float32    float32    float32          float32          float32         float32           float32        float32        int8  float64
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...
...                ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...
Dask Name: assign, 61 graph layers

plot the delayStage values#

[38]:

axes = ['energy','delayStage']
bins = [100,150]
delay_start,delay_stop=1462.00,1464.85
ranges = [[-5,2], [delay_start,delay_stop]]
res = sp.compute(bins=bins, axes=axes, ranges=ranges)

[39]:

fig,ax = plt.subplots(1,2,figsize=(6,2.25), layout='constrained')
res.plot(robust=True, ax=ax[0])
bg = res.isel(delayStage=slice(0,10)).mean('delayStage')
(res-bg).plot(robust=True, ax=ax[1])

[39]:

<matplotlib.collections.QuadMesh at 0x7f3594d781c0>

[40]:

sp.add_delay_offset(
    constant=-1463.7, # this is time zero
    flip_delay_axis=True, # invert the direction of the delay axis
    columns=['bam'], # use the bam to offset the values
    weights=[-0.001], # bam is in fs, delay in ps
    preserve_mean=True # preserve the mean of the delay axis
)

INFO - Adding delay offset to dataframe:
INFO - Delay offset parameters:
   Column[bam]: Weight=-0.001, Preserve Mean: True, Reductions: None.
   Constant: -1463.7
   Flip delay axis: True
INFO - Dask DataFrame Structure:
               trainId pulseId electronId  dldPosX  dldPosY dldTimeSteps      bam timeStamp monochromatorPhotonEnergy   gmdBda delayStage sampleBias tofVoltage extractorVoltage extractorCurrent cryoTemperature sampleTemperature dldTimeBinSize dldSectorID   energy
npartitions=62
                uint32   int64      int64  float64  float64      float32  float32   float64                   float32  float32    float64    float32    float32          float32          float32         float32           float32        float32        int8  float64
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...
...                ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...
Dask Name: assign, 81 graph layers

[41]:

sp.dataframe # This has generated too many layers, there is room for improvement!

[41]:

Dask DataFrame Structure:

	trainId	pulseId	electronId	dldPosX	dldPosY	dldTimeSteps	bam	timeStamp	monochromatorPhotonEnergy	gmdBda	delayStage	sampleBias	tofVoltage	extractorVoltage	extractorCurrent	cryoTemperature	sampleTemperature	dldTimeBinSize	dldSectorID	energy
npartitions=62
	uint32	int64	int64	float64	float64	float32	float32	float64	float32	float32	float64	float32	float32	float32	float32	float32	float32	float32	int8	float64
	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...

Dask Name: assign, 81 graph layers

bin in the corrected delay axis#

[42]:

axes = ['energy','delayStage']
bins = [100,150]
delay_start,delay_stop=1462.00,1464.85
ranges = [[-3,2], [-1.1, 1.75]]
res = sp.compute(bins=bins, axes=axes, ranges=ranges)

[43]:

fig,ax = plt.subplots(1,2,figsize=(6,2.25))
res.plot(robust=True, ax=ax[0])
bg = res.sel(delayStage=slice(-1,-0.2)).mean('delayStage')
(res-bg).plot(robust=True, ax=ax[1])
fig.tight_layout()

You may note some intensity variation along the delay axis. This comes mainly from inhomogeneous speed of the delay stage, and thus inequivalent amounts of time spent on every delay point. This can be corrected for by normalizing the data to the acquisition time per delay point:

[44]:

res = sp.compute(bins=bins, axes=axes, ranges=ranges, normalize_to_acquisition_time="delayStage")
fig,ax = plt.subplots(1,2,figsize=(6,2.25), layout='constrained')
res.plot(robust=True, ax=ax[0])
bg = res.sel(delayStage=slice(-1,-.2)).mean('delayStage')
(res-bg).plot(robust=True, ax=ax[1])

INFO - Calculate normalization histogram for axis 'delayStage'...

[44]:

<matplotlib.collections.QuadMesh at 0x7f3594b0bc40>

save parameters#

as before, we can save the parameters we just used in the config for the next run

[45]:

sp.save_delay_offsets()

INFO - Saved delay offset parameters to "sed_config.yaml".

Run workflow entirely from config.#

Once all the calibrations are done, a new run can be loaded by simply calling all the calibration functions.

[46]:

from sed.core.config import load_config
import numpy as np
metadata = load_config(meta_path + "/44824_20230324T060430.json")

# Fix metadata
metadata["scientificMetadata"]["Source"]["photon_energy"]["value"] = float(metadata["scientificMetadata"]["Source"]["photon_energy"]["value"])
metadata["scientificMetadata"]["Source"]["repetition_rate"]["value"] = float(metadata["scientificMetadata"]["Source"]["repetition_rate"]["value"])
metadata["scientificMetadata"]["Source"]["dispersion"]["value"] = float(metadata["scientificMetadata"]["Source"]["dispersion"]["value"])
metadata["scientificMetadata"]["Laser"]["wavelength"]["value"] = float(metadata["scientificMetadata"]["Laser"]["wavelength"]["value"][:-2])
metadata["scientificMetadata"]["Laser"]["pulse duration"]["value"] = float(metadata["scientificMetadata"]["Laser"]["pulse duration"]["value"])
metadata["scientificMetadata"]["Laser"]["pulse_energy"]["value"] = float(metadata["scientificMetadata"]["Laser"]["pulse_energy"]["value"])
metadata["scientificMetadata"]["Laser"]["energy"] = {"value": 1239.84/metadata["scientificMetadata"]["Laser"]["wavelength"]["value"], "unit": "eV"}
metadata["scientificMetadata"]["Laser"]["polarization"] = [1, 1, 0, 0]
metadata["scientificMetadata"]["Manipulator"]["sample_bias"] = float(metadata["scientificMetadata"]["Manipulator"]["sample_bias"])
metadata["scientificMetadata"]["Collection"]["tof_voltage"] = float(metadata["scientificMetadata"]["Collection"]["tof_voltage"])
metadata["scientificMetadata"]["Collection"]["extractor_voltage"] = float(metadata["scientificMetadata"]["Collection"]["extractor_voltage"])
metadata["scientificMetadata"]["Collection"]["field_aperture"] = float(metadata["scientificMetadata"]["Collection"]["field_aperture"])
metadata["scientificMetadata"]["Collection"]["field_aperture_x"] = float(metadata["scientificMetadata"]["Collection"]["field_aperture_x"])
metadata["scientificMetadata"]["Collection"]["field_aperture_y"] = float(metadata["scientificMetadata"]["Collection"]["field_aperture_y"])
metadata["pi"] = {"institute": "JGU Mainz"}
metadata["proposer"] = {"institute": "TU Dortmund"}

[47]:

sp = SedProcessor(
    runs=[44824,44825,44826,44827],
    config=config_override,
    system_config=config_file,
    metadata = metadata,
    collect_metadata=False,
)

INFO - Folder config loaded from: [/home/runner/work/sed/sed/docs/tutorial/sed_config.yaml]
INFO - System config loaded from: [/home/runner/work/sed/sed/docs/src/sed/config/flash_example_config.yaml]
INFO - Default config loaded from: [/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/sed/config/default.yaml]
INFO - Reading files: 0 new files of 62 total.

[48]:

sp.add_jitter()
sp.align_dld_sectors()
sp.append_tof_ns_axis()
sp.append_energy_axis()
sp.add_energy_offset()
sp.add_delay_offset()

INFO - add_jitter: Added jitter to columns ['dldPosX', 'dldPosY', 'dldTimeSteps'].
INFO - Aligning 8s sectors of dataframe
INFO - Dask DataFrame Structure:
               trainId pulseId electronId  dldPosX  dldPosY dldTimeSteps      bam timeStamp monochromatorPhotonEnergy   gmdBda delayStage sampleBias tofVoltage extractorVoltage extractorCurrent cryoTemperature sampleTemperature dldTimeBinSize dldSectorID
npartitions=62
                uint32   int64      int64  float64  float64      float32  float32   float64                   float32  float32    float32    float32    float32          float32          float32         float32           float32        float32        int8
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...
...                ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...
Dask Name: assign, 16 graph layers
INFO - Adding time-of-flight column in nanoseconds to dataframe.
INFO - Dask DataFrame Structure:
               trainId pulseId electronId  dldPosX  dldPosY dldTimeSteps      bam timeStamp monochromatorPhotonEnergy   gmdBda delayStage sampleBias tofVoltage extractorVoltage extractorCurrent cryoTemperature sampleTemperature dldTimeBinSize dldSectorID  dldTime
npartitions=62
                uint32   int64      int64  float64  float64      float32  float32   float64                   float32  float32    float32    float32    float32          float32          float32         float32           float32        float32        int8  float64
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...
...                ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...
Dask Name: assign, 22 graph layers
INFO - Adding energy column to dataframe:
INFO - Using energy calibration parameters generated on 08/29/2025, 09:12:10
INFO - Dask DataFrame Structure:
               trainId pulseId electronId  dldPosX  dldPosY dldTimeSteps      bam timeStamp monochromatorPhotonEnergy   gmdBda delayStage sampleBias tofVoltage extractorVoltage extractorCurrent cryoTemperature sampleTemperature dldTimeBinSize dldSectorID  dldTime   energy
npartitions=62
                uint32   int64      int64  float64  float64      float32  float32   float64                   float32  float32    float32    float32    float32          float32          float32         float32           float32        float32        int8  float64  float64
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...      ...
...                ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...      ...
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...      ...
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...      ...
Dask Name: assign, 37 graph layers
INFO - Adding energy offset to dataframe:
INFO - Using energy offset parameters generated on 08/29/2025, 09:12:13
INFO - Energy offset parameters:
   Column[monochromatorPhotonEnergy]: Weight=-1.0, Preserve Mean: True, Reductions: None.
   Column[tofVoltage]: Weight=-1.0, Preserve Mean: True, Reductions: None.
INFO - Dask DataFrame Structure:
               trainId pulseId electronId  dldPosX  dldPosY dldTimeSteps      bam timeStamp monochromatorPhotonEnergy   gmdBda delayStage sampleBias tofVoltage extractorVoltage extractorCurrent cryoTemperature sampleTemperature dldTimeBinSize dldSectorID  dldTime   energy
npartitions=62
                uint32   int64      int64  float64  float64      float32  float32   float64                   float32  float32    float32    float32    float32          float32          float32         float32           float32        float32        int8  float64  float64
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...      ...
...                ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...      ...
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...      ...
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...      ...
Dask Name: assign, 67 graph layers
INFO - Adding delay offset to dataframe:
INFO - Using delay offset parameters generated on 08/29/2025, 09:14:03
INFO - Delay offset parameters:
   Constant: -1463.7
   Flip delay axis: True
   Column[bam]: Weight=-0.001, Preserve Mean: True, Reductions: None.
INFO - Dask DataFrame Structure:
               trainId pulseId electronId  dldPosX  dldPosY dldTimeSteps      bam timeStamp monochromatorPhotonEnergy   gmdBda delayStage sampleBias tofVoltage extractorVoltage extractorCurrent cryoTemperature sampleTemperature dldTimeBinSize dldSectorID  dldTime   energy
npartitions=62
                uint32   int64      int64  float64  float64      float32  float32   float64                   float32  float32    float64    float32    float32          float32          float32         float32           float32        float32        int8  float64  float64
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...      ...
...                ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...      ...
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...      ...
                   ...     ...        ...      ...      ...          ...      ...       ...                       ...      ...        ...        ...        ...              ...              ...             ...               ...            ...         ...      ...      ...
Dask Name: assign, 87 graph layers

Compute the results#

[49]:

axes = ['energy','delayStage']
bins = [100,150]
delay_start,delay_stop=1462.00,1464.85
ranges = [[-5,2], [-1.1, 1.75]]
res = sp.compute(bins=bins, axes=axes, ranges=ranges, normalize_to_acquisition_time="delayStage")

INFO - Calculate normalization histogram for axis 'delayStage'...

[50]:

fig,ax = plt.subplots(1,2,figsize=(6,2.25), layout='constrained')
res.plot(robust=True, ax=ax[0])
bg = res.sel(delayStage=slice(-1,-.2)).mean('delayStage')
(res-bg).plot(robust=True, ax=ax[1])

[50]:

<matplotlib.collections.QuadMesh at 0x7f35947ecd90>

Save results#

binned data can now be saved as h5 or tiff. igor binaries soon to come if requested!

[51]:

sp.save('runs44824-27.h5')

saving data to runs44824-27.h5
Saved creation_date as string.
Saved creation_date as string.
Saved creation_date as string.
Saving complete!

[52]:

sp.save('runs44824-27.tiff')

Successfully saved runs44824-27.tiff
 Axes order: ['delayStage', 'energy', 'C', 'Y', 'X', 'S']

[53]:

sp.save("runs44824-27.nxs")

Using mpes reader to convert the given files:
• ../src/sed/config/NXmpes_config-HEXTOF.json
WARNING: The unit 'eV/mm' at /ENTRY[entry]/INSTRUMENT[instrument]/beam_probe/incident_energy/@units does not match with the unit category NX_ENERGY of 'incident_energy'.
ERROR: Expected a field at /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYZER[electronanalyzer]/energy_resolution/resolution but found a group.
WARNING: The unit 'V' at /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYZER[electronanalyzer]/ENERGYDISPERSION[energydispersion]/pass_energy/@units does not match with the unit category NX_ENERGY of 'pass_energy'.
ERROR: Expected a field at /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYZER[electronanalyzer]/momentum_resolution/resolution but found a group.
WARNING: Attribute /ENTRY[entry]/DATA[data]/AXISNAME_indices[@energy_indices] written without documentation.
WARNING: Attribute /ENTRY[entry]/DATA[data]/AXISNAME_indices[@delayStage_indices] written without documentation.
The output file generated: runs44824-27.nxs.

[ ]:

Tutorial for binning data from the HEXTOF instrument at FLASH#

Preparation#

Import necessary libraries#

Get data paths#

Config setup#

cleanup previous config files#

Load a chessy sample run#

Generate the Processor instance#

Important note#

Add Jitter#

inspect the dataframe#

Visualizing event histograms#

Binning#

visualize the result#

Optical Spot Profile#

Energy Calibration#

sector alignment#

time-of-flight spectrum#

find calibration parameters#

generate the energy axis#

Bin in energy#

correct offsets#

save the calibration parameters#

Correct delay axis#

Run the workflow from the config file#

plot the delayStage values#

bin in the corrected delay axis#

save parameters#

Run workflow entirely from config.#

Compute the results#

Save results#

This Page