gradient_descent_notes/notes.tex

\documentclass{article}

%other packages
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{physics}

\usepackage[
	style=phys, articletitle=false, biblabel=brackets, chaptertitle=false, pageranges=false, url=true
]{biblatex}

\usepackage{graphicx}
\usepackage{todonotes}
\usepackage{siunitx}

\usepackage{hyperref}
\usepackage{cleveref}

\title{Notes on initial gradient descent testing}

% \addbibresource{./bibliography.bib}

\graphicspath{{./figures/}}

\begin{document}

\maketitle

Before beginning an implementation of the ``dimensional reduction'' solution method of the charge qubit locating problem, I started with a naïve gradient descent algorithm.
This gives us something to benchmark against later.
The (unoptimised and messily written) relevant gradient descent code is available at \url{https://gitea.deepak.science/deepak/pathfinder/src/branch/master/pathfinder/gradient_descent.py}.

\section{gradient descent}

We are given a set of real-valued cost functions $C_n(\vec{x})$, such that a solution $\vec{x}_{sol}$ has $C_n(\vec{x}_{sol}) = 0$ for all $n$.
We want to minimise the function $f(\vec{x}) = \sum_{n} C_n(\vec{x})^2$ (which is how we keep the costs non-negative in this algorithm).
Then given some initial point $\vec{x}_0$, we can minimise $f$ by moving to a point $\vec{x}_1 = \vec{x}_0 - \epsilon \grad f(\vec{x}_0)$, where $\epsilon$ is a step-size we'll discuss soon.
It's pretty straightforward to show that $\grad f(\vec{x}) = J^\top \vec{C}$, where $\vec{C}$ is the cost functions as a vector and $J$ is the Jacobian of $\vec{C}$ with respect to $\vec{x}$.

The advantage of writing it this way is mostly that we already have the cost functions $C$, and we can pretty easily code up the Jacobians when those cost functions are simple polynomials.
There are ways of calculating the Jacobian given a black-box set of cost functions, but fortunately we don't need to use them.

The step size $\epsilon$ can be somewhat adaptively set, and it turned out that to get anything useful from this method that it needed to be.
Essentially a default step size $\epsilon_0$ is chosen, and at each step, if $f(\vec{x_n} - \epsilon_0 \grad f) > f(\vec{x_n})$, then I used an $\epsilon_1 = \frac{\epsilon_0}{10}$.
This is iterated up to $a$ times, where $a$ is some parameter free to set in the code ($a$ for adaptive).
This basically lets us pick an order of magnitude that actually lowers the cost.

Essentially then, we can use this process, iterating step by step lowering the cost, and at each step trying different step sizes.
We can set some target cost $f_{target}$, and if $f(\vec{x}_n) < f_{target}$, we end the process and return $\vec{x}_n$.
Otherwise we step when we hit a max number of iterations.

\section{Toy problem}

The first step is to look at the toy problem of intersecting circles.
We can begin by defining two circles, one of radius $5$ centered at the origin, and another of radius $13$ centered at $(8, -8)$.
These two circles should have two intersections, one at $(3, 4)$ and one at $(-4, -3)$.
Our cost functions here are $C_1\left(\left(x, y\right)\right) = 25 - x^2 - y^2$ and $C_2\left(\left(x, y\right)\right) = 13^2 - (x - 8)^2 - (y + 8)^2$.

This was relatively easily for the gradient descent program to solve.
After 1200 iterations, the program returned $(3.00000011, 4.00000002)$, so as might be expected standard gradient descent is very effective at handling this case.

\section{Single noise source example}

Next, we look at the dipole source problem.
To set up a particular example values, I started by picking a dipole moment $(p_x, p_y, p_z) = (1, 3, 5)$ located at $(s_x, s_y, s_z) = (5, 6, 7)$.
Then, for five qubits located at $r_n = (0, 0, n)$, I found $V_n = \frac{\vec{p} \cdot (\vec{r}_n - \vec{s})}{\abs{\vec{r}_n - \vec{s}}^3}$.
For the rest of the problem we take the $r_n$ and $V_n$ as givens, and our goal will be to find $(p_x, p_y, p_z, s_x, s_y, s_z)$, which should be $(1, 3, 5, 5, 6, 7)$.

I then took a cost function based on a known dipole magnitude, so $C_0(\vec{x} = (p_x, p_y, p_z, s_x, s_y, s_z)) = p_x^2 - p_y^2 - p_z^2 - 35$.
Then the next five cost functions are $C_n(\vec{x}) = V_n \left(\abs{\vec{r}_n - \vec{s}}^3\right) - \vec{p} \cdot \left( \vec{r}_n - \vec{s} \right)$.

As might be expected, the naïve gradient descent solver found this six-dimensional problem much harder to solve.
After \num[group-separator={,}]{100000} iterations, the algorithm found $\vec{x} = (1.03770348, 2.94004682, 5.02785529, 4.98257315, 6.00541671, 7.017208)$.
This is the correct answer, but the low precision after a huge number of iterations is unideal.
For higher numbers of charge sources and qubits, this would naturally scale quite poorly.

\section{Gradient descent thoughts}
The main point, that gradient descent works but is an inefficient method, is not a particularly surprising outcome, which is good.
It also suggests that if I want a backup method to test the results of the dimensional reduction method, I can re-run the same problems with this gradient descent algorithm and compare the outcomes.

\end{document}