{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tutorial for binning data from the SXP instrument at the European XFEL"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preparation\n",
    "### Import necessary libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "notebookRunGroups": {
     "groupValue": "1"
    }
   },
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "from pathlib import Path\n",
    "import os\n",
    "import xarray as xr\n",
    "import numpy as np\n",
    "\n",
    "from sed import SedProcessor\n",
    "from sed.dataset import dataset\n",
    "\n",
    "%matplotlib widget\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get data paths\n",
    "The paths are such that if you are on Maxwell, it uses those. Otherwise data is downloaded in current directory from Zenodo.\n",
    "\n",
    "Generally, if it is your beamtime, you can both read the raw data and write to processed directory. However, for the public data, you can not write to processed directory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "notebookRunGroups": {
     "groupValue": "1"
    }
   },
   "outputs": [],
   "source": [
    "beamtime_dir = \"/gpfs/exfel/exp/SXP/202302/p004316/\" # on Maxwell\n",
    "if os.path.exists(beamtime_dir) and os.access(beamtime_dir, os.R_OK):\n",
    "    path = beamtime_dir + \"/raw/\"\n",
    "    buffer_path = \"Au_Mica/processed/\"\n",
    "else:\n",
    "    # data_path can be defined and used to store the data in a specific location\n",
    "    dataset.get(\"Au_Mica\") # Put in Path to a storage of at least 10 GByte free space.\n",
    "    path = dataset.dir\n",
    "    buffer_path = path + \"/processed/\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Config setup\n",
    "Here we get the path to the config file and setup the relevant directories. This can also be done directly in the config file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# pick the default configuration file for SXP@XFEL\n",
    "config_file = Path('../sed/config/sxp_example_config.yaml')\n",
    "assert config_file.exists()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# here we setup a dictionary that will be used to override the path configuration\n",
    "config_override = {\n",
    "    \"core\": {\n",
    "        \"paths\": {\n",
    "            \"data_raw_dir\": path,\n",
    "            \"data_parquet_dir\": buffer_path,\n",
    "        },\n",
    "    },\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### cleanup previous config files\n",
    "In this notebook, we will show how calibration parameters can be generated. Therefore we want to clean the local directory of previously generated files.\n",
    "\n",
    "**WARNING** running the cell below will delete the \"sed_config.yaml\" file in the local directory. If these contain precious calibration parameters, **DO NOT RUN THIS CELL**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "local_folder_config = Path('./sed_config.yaml')\n",
    "if local_folder_config.exists():\n",
    "    os.remove(local_folder_config)\n",
    "    print(f'deleted local config file {local_folder_config}')\n",
    "assert not local_folder_config.exists()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load Au/Mica data\n",
    "Now we load a couple of scans from Au 4f core levels. Data will be processed to parquet format first, if not existing yet, and then loaded into the processor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sp = SedProcessor(\n",
    "    runs=[\"0058\", \"0059\", \"0060\", \"0061\"],\n",
    "    config=config_override,\n",
    "    system_config=config_file,\n",
    "    collect_metadata=False,\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Inspect the dataframe\n",
    "We first try to get an overview of the structure of the data. For this, we look at the loaded dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sp.dataframe.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Train IDs in scans \n",
    "Next, let's look at the trainIDs contained in these runs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure()\n",
    "ids=sp.dataframe.trainId.compute().values\n",
    "plt.plot(ids)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Channel Histograms\n",
    "Let's look at the single histograms of the main data channels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sp.view_event_histogram(dfpid=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### PulseIds, ElectronIds\n",
    "To get a better understanding of the structure of the data, lets look at the histograms of microbunches and electrons. We see that we have more hits at later microbunches, and only few multi-hits."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "axes = [\"pulseId\", \"electronId\"]\n",
    "bins = [101, 11]\n",
    "ranges = [(-0.5, 800.5), (-0.5, 10.5)]\n",
    "sp.view_event_histogram(dfpid=1, axes=axes, bins=bins, ranges=ranges)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also inspect the counts per train as function of the trainId and the pulseId, which gives us a good idea about the evolution of the count rate over the run(s)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure()\n",
    "axes = [\"trainId\", \"pulseId\"]\n",
    "bins = [100, 100]\n",
    "ranges = [(ids.min()+1, ids.max()), (0, 800)]\n",
    "res = sp.compute(bins=bins, axes=axes, ranges=ranges)\n",
    "res.plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Spectrum vs. MicrobunchId\n",
    "Let's check the TOF spectrum as function of microbunch ID, to understand if the increasing hit probability has any influence on the spectrum."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "axes = [\"dldTimeSteps\", \"pulseId\"]\n",
    "bins = [200, 800]\n",
    "ranges = [(8000, 28000), (0, 800)]\n",
    "res = sp.compute(bins=bins, axes=axes, ranges=ranges)\n",
    "plt.figure()\n",
    "res.plot(robust=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that the background below the Au 4f core levels slightly changes with microbunch ID. The origin of this is not quite clear yet."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure()\n",
    "(res.loc[{\"pulseId\":slice(0,50)}].sum(axis=1)/res.loc[{\"pulseId\":slice(0,50)}].sum(axis=1).mean()).plot()\n",
    "(res.loc[{\"pulseId\":slice(700,750)}].sum(axis=1)/res.loc[{\"pulseId\":slice(700,750)}].sum(axis=1).mean()).plot()\n",
    "plt.legend((\"mbID=0-50\", \"mbID=700-750\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Energy Calibration\n",
    "We now load a bias series, where the sample bias was varied, effectively shifting the energy spectra. This allows us to calibrate the conversion between the digital values of the dld and the energy."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### time-of-flight spectrum\n",
    "to compare with what we see on the measurement computer, we might want to plot the time-of-flight spectrum. This is done here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sp.append_tof_ns_axis()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, to determine proper binning ranges, let's have again a look at the event histograms:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "axes = [\"dldTime\"]\n",
    "bins = [150]\n",
    "ranges = [(-0.5, 150.5)]\n",
    "sp.view_event_histogram(dfpid=1, axes=axes, bins=bins, ranges=ranges)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load energy calibration files\n",
    "We now load a range of runs sequentially, that were recorded with different sample bias values, and load them afterwards into an xarray"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "runs = [\"0074\", \"0073\", \"0072\", \"0071\", \"0070\", \"0064\", \"0065\", \"0066\", \"0067\", \"0068\", \"0069\"]\n",
    "biases = np.arange(962, 951, -1)\n",
    "data = []\n",
    "for run in runs:\n",
    "    sp.load(runs=[run])\n",
    "    axes = [\"dldTimeSteps\"]\n",
    "    bins = [2000]\n",
    "    ranges = [(1000, 25000)]\n",
    "    res = sp.compute(bins=bins, axes=axes, ranges=ranges)\n",
    "    data.append(res)\n",
    "\n",
    "biasSeries = xr.concat(data, dim=xr.DataArray(biases, dims=\"sampleBias\", name=\"sampleBias\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load bias series\n",
    "Now we load the bias series xarray into the processor for calibration"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sp.load_bias_series(binned_data=biasSeries)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### find calibration parameters\n",
    "We now will fit the tof-energy relation. This is done by finding the maxima of a peak in the tof spectrum, and then fitting the square root relation to obtain the calibration parameters. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ranges=(6380, 6700)\n",
    "ref_id=6\n",
    "sp.find_bias_peaks(ranges=ranges, ref_id=ref_id, apply=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sp.calibrate_energy_axis(\n",
    "    ref_id=5,\n",
    "    ref_energy=-33,\n",
    "    method=\"lmfit\",\n",
    "    energy_scale='kinetic',\n",
    "    d={'value':1.1,'min': .2, 'max':5.0, 'vary':False},\n",
    "    t0={'value':-1E-8, 'min': -1E-6, 'max': 1e-4, 'vary':True},\n",
    "    E0={'value': 0., 'min': -1500, 'max': 1500, 'vary': True},\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Save calibration\n",
    "Now we save the calibration parameters into a local configuration file, that will be loaded in the next step"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sp.save_energy_calibration()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Bin data with energy axis\n",
    "Now that we have the calibration parameters, we can generate the energy axis for our dataset. We need to load it again, and apply the calibration"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sp.load(runs=np.arange(58, 62))\n",
    "sp.add_jitter()\n",
    "sp.filter_column(\"pulseId\", max_value=756)\n",
    "sp.append_energy_axis()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we can bin as function fo energy and delay stage position"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "axes = ['energy', \"delayStage\"]\n",
    "bins = [200, 100]\n",
    "ranges = [[-37,-31], [-135, -115]]\n",
    "res = sp.compute(bins=bins, axes=axes, ranges=ranges, normalize_to_acquisition_time=\"delayStage\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, axs = plt.subplots(1, 1, figsize=(4, 3), constrained_layout=True)\n",
    "res.plot(ax=axs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Correct delay stage offset.\n",
    "We can also offset the zero delay of the delay stage"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sp.add_delay_offset(constant=126.9)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "axes = ['energy', \"delayStage\"]\n",
    "bins = [200, 100]\n",
    "ranges = [[-37,-31], [-8, 8]]\n",
    "res = sp.compute(bins=bins, axes=axes, ranges=ranges, normalize_to_acquisition_time=\"delayStage\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "res_sub = res - res.loc[{\"delayStage\": slice(-8, -1)}].mean(axis=1)\n",
    "fig, axs = plt.subplots(3, 1, figsize=(4, 8), constrained_layout=True)\n",
    "res.plot(ax=axs[0])\n",
    "res_sub.plot(ax=axs[1])\n",
    "res_sub.loc[{\"energy\":slice(-32.5,-32)}].sum(axis=0).plot(ax=axs[2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "python3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.19"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}