82 lines
5.0 KiB
TeX
82 lines
5.0 KiB
TeX
\documentclass{article}
|
|
|
|
%other packages
|
|
\usepackage{amsmath}
|
|
\usepackage{amssymb}
|
|
\usepackage{physics}
|
|
|
|
\usepackage[
|
|
style=phys, articletitle=false, biblabel=brackets, chaptertitle=false, pageranges=false, url=true
|
|
]{biblatex}
|
|
|
|
\usepackage{graphicx}
|
|
\usepackage{todonotes}
|
|
\usepackage{siunitx}
|
|
|
|
\usepackage{hyperref}
|
|
\usepackage{cleveref}
|
|
|
|
\title{Notes on initial gradient descent testing}
|
|
|
|
% \addbibresource{./bibliography.bib}
|
|
|
|
\graphicspath{{./figures/}}
|
|
|
|
\begin{document}
|
|
|
|
\maketitle
|
|
|
|
Before beginning an implementation of the ``dimensional reduction'' solution method of the charge qubit locating problem, I started with a naïve gradient descent algorithm.
|
|
This gives us something to benchmark against later.
|
|
The (unoptimised and messily written) relevant gradient descent code is available at \url{https://gitea.deepak.science/deepak/pathfinder/src/branch/master/pathfinder/gradient_descent.py}.
|
|
|
|
\section{gradient descent}
|
|
|
|
We are given a set of real-valued cost functions $C_n(\vec{x})$, such that a solution $\vec{x}_{sol}$ has $C_n(\vec{x}_{sol}) = 0$ for all $n$.
|
|
We want to minimise the function $f(\vec{x}) = \sum_{n} C_n(\vec{x})^2$ (which is how we keep the costs non-negative in this algorithm).
|
|
Then given some initial point $\vec{x}_0$, we can minimise $f$ by moving to a point $\vec{x}_1 = \vec{x}_0 - \epsilon \grad f(\vec{x}_0)$, where $\epsilon$ is a step-size we'll discuss soon.
|
|
It's pretty straightforward to show that $\grad f(\vec{x}) = J^\top \vec{C}$, where $\vec{C}$ is the cost functions as a vector and $J$ is the Jacobian of $\vec{C}$ with respect to $\vec{x}$.
|
|
|
|
The advantage of writing it this way is mostly that we already have the cost functions $C$, and we can pretty easily code up the Jacobians when those cost functions are simple polynomials.
|
|
There are ways of calculating the Jacobian given a black-box set of cost functions, but fortunately we don't need to use them.
|
|
|
|
The step size $\epsilon$ can be somewhat adaptively set, and it turned out that to get anything useful from this method that it needed to be.
|
|
Essentially a default step size $\epsilon_0$ is chosen, and at each step, if $f(\vec{x_n} - \epsilon_0 \grad f) > f(\vec{x_n})$, then I used an $\epsilon_1 = \frac{\epsilon_0}{10}$.
|
|
This is iterated up to $a$ times, where $a$ is some parameter free to set in the code ($a$ for adaptive).
|
|
This basically lets us pick an order of magnitude that actually lowers the cost.
|
|
|
|
Essentially then, we can use this process, iterating step by step lowering the cost, and at each step trying different step sizes.
|
|
We can set some target cost $f_{target}$, and if $f(\vec{x}_n) < f_{target}$, we end the process and return $\vec{x}_n$.
|
|
Otherwise we step when we hit a max number of iterations.
|
|
|
|
\section{Toy problem}
|
|
|
|
The first step is to look at the toy problem of intersecting circles.
|
|
We can begin by defining two circles, one of radius $5$ centered at the origin, and another of radius $13$ centered at $(8, -8)$.
|
|
These two circles should have two intersections, one at $(3, 4)$ and one at $(-4, -3)$.
|
|
Our cost functions here are $C_1\left(\left(x, y\right)\right) = 25 - x^2 - y^2$ and $C_2\left(\left(x, y\right)\right) = 13^2 - (x - 8)^2 - (y + 8)^2$.
|
|
|
|
This was relatively easily for the gradient descent program to solve.
|
|
After 1200 iterations, the program returned $(3.00000011, 4.00000002)$, so as might be expected standard gradient descent is very effective at handling this case.
|
|
|
|
\section{Single noise source example}
|
|
|
|
Next, we look at the dipole source problem.
|
|
To set up a particular example values, I started by picking a dipole moment $(p_x, p_y, p_z) = (1, 3, 5)$ located at $(s_x, s_y, s_z) = (5, 6, 7)$.
|
|
Then, for five qubits located at $r_n = (0, 0, n)$, I found $V_n = \frac{\vec{p} \cdot (\vec{r}_n - \vec{s})}{\abs{\vec{r}_n - \vec{s}}^3}$.
|
|
For the rest of the problem we take the $r_n$ and $V_n$ as givens, and our goal will be to find $(p_x, p_y, p_z, s_x, s_y, s_z)$, which should be $(1, 3, 5, 5, 6, 7)$.
|
|
|
|
I then took a cost function based on a known dipole magnitude, so $C_0(\vec{x} = (p_x, p_y, p_z, s_x, s_y, s_z)) = p_x^2 - p_y^2 - p_z^2 - 35$.
|
|
Then the next five cost functions are $C_n(\vec{x}) = V_n \left(\abs{\vec{r}_n - \vec{s}}^3\right) - \vec{p} \cdot \left( \vec{r}_n - \vec{s} \right)$.
|
|
|
|
As might be expected, the naïve gradient descent solver found this six-dimensional problem much harder to solve.
|
|
After \num[group-separator={,}]{100000} iterations, the algorithm found $\vec{x} = (1.03770348, 2.94004682, 5.02785529, 4.98257315, 6.00541671, 7.017208)$.
|
|
This is the correct answer, but the low precision after a huge number of iterations is unideal.
|
|
For higher numbers of charge sources and qubits, this would naturally scale quite poorly.
|
|
|
|
\section{Gradient descent thoughts}
|
|
The main point, that gradient descent works but is an inefficient method, is not a particularly surprising outcome, which is good.
|
|
It also suggests that if I want a backup method to test the results of the dimensional reduction method, I can re-run the same problems with this gradient descent algorithm and compare the outcomes.
|
|
|
|
\end{document}
|