Compare commits

..

1 Commits

Author SHA1 Message Date
0992f0e746 chore(deps): update dependency mypy to ^0.940
All checks were successful
gitea-physics/deepdog/pipeline/head This commit looks good
gitea-physics/deepdog/pipeline/pr-master This commit looks good
2022-03-12 01:30:52 +00:00
26 changed files with 961 additions and 3103 deletions

View File

@@ -1,3 +1,3 @@
[flake8] [flake8]
ignore = W191, E501, W503, E203 ignore = W191, E501, W503
max-line-length = 120 max-line-length = 120

4
.gitignore vendored
View File

@@ -114,10 +114,6 @@ ENV/
env.bak/ env.bak/
venv.bak/ venv.bak/
# direnv
.envrc
.direnv
# Spyder project settings # Spyder project settings
.spyderproject .spyderproject
.spyproject .spyproject

View File

@@ -2,218 +2,6 @@
All notable changes to this project will be documented in this file. See [standard-version](https://github.com/conventional-changelog/standard-version) for commit guidelines. All notable changes to this project will be documented in this file. See [standard-version](https://github.com/conventional-changelog/standard-version) for commit guidelines.
### [0.7.8](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.7...0.7.8) (2024-02-29)
### Bug Fixes
* uses correct measurements ([5f534a6](https://gitea.deepak.science:2222/physics/deepdog/commit/5f534a60cc7c4838fcacee11a7e58b97d34e154a))
### [0.7.7](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.6...0.7.7) (2024-02-29)
### Bug Fixes
* fixes phase calculation issue with setting input array ([48e41cb](https://gitea.deepak.science:2222/physics/deepdog/commit/48e41cbd2c58d4c4d2747822d618d7d55257643d))
### [0.7.6](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.5...0.7.6) (2024-02-28)
### Features
* adds ability to use phase measurements only for correlations ([bb72e90](https://gitea.deepak.science:2222/physics/deepdog/commit/bb72e903d14704a3783daf2dbc1797b90880aa85))
### Bug Fixes
* fixes typeerror vs indexerror on bare float as cost in subset simulation ([65e1948](https://gitea.deepak.science:2222/physics/deepdog/commit/65e19488359d7f5656660da7da8f32ed474989c3))
### [0.7.5](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.4...0.7.5) (2023-12-09)
### Features
* adds direct monte carlo package ([1741807](https://gitea.deepak.science:2222/physics/deepdog/commit/1741807be43d08fb51bc94518dd3b67585c04c20))
* adds longchain logging if logging last generation ([b4e5f53](https://gitea.deepak.science:2222/physics/deepdog/commit/b4e5f5372682fc64c3734a96c4a899e018f127ce))
* allows disabling timestamp in subset simulation bayes results ([9a4548d](https://gitea.deepak.science:2222/physics/deepdog/commit/9a4548def45a01f1f518135d4237c3dc09dcc342))
### [0.7.4](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.3...0.7.4) (2023-07-27)
### Features
* adds configurable chunk size for the initial mc level 0 SS stage cost calculation to reduce memory usage ([9a7a3ff](https://gitea.deepak.science:2222/physics/deepdog/commit/9a7a3ff2c7ebe81d5e10647ce39844c372ff7b07))
* allows for deepdog bayesrun with ss to not print csv to make snapshot testing possible ([8e6ead4](https://gitea.deepak.science:2222/physics/deepdog/commit/8e6ead416c9eba56f568f648d0df44caaa510cfe))
### Bug Fixes
* fixes bug if case of clamping necessary ([161bcf4](https://gitea.deepak.science:2222/physics/deepdog/commit/161bcf42addf331661c3929073688b9f2c13502c))
* fixes bug with clamped probabilities being underestimated ([e6defc7](https://gitea.deepak.science:2222/physics/deepdog/commit/e6defc794871a48ac331023eb477bd235b78d6d0))
### [0.7.3](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.2...0.7.3) (2023-07-27)
### Features
* adds utility options and avoids memory leak ([598dad1](https://gitea.deepak.science:2222/physics/deepdog/commit/598dad1e6dc8fc0b7a5b4a90c8e17bf744e8d98c))
### [0.7.2](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.1...0.7.2) (2023-07-24)
### Features
* clamps results now ([9bb8fc5](https://gitea.deepak.science:2222/physics/deepdog/commit/9bb8fc50fe1bd1a285a333c5a396bfb6ac3176cf))
### Bug Fixes
* fixes clamping format etc. ([a170a3c](https://gitea.deepak.science:2222/physics/deepdog/commit/a170a3ce01adcec356e5aaab9abcc0ec4accd64b))
### [0.7.1](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.0...0.7.1) (2023-07-24)
### Features
* adds subset simulation stuff ([33cab9a](https://gitea.deepak.science:2222/physics/deepdog/commit/33cab9ab4179cec13ae9e591a8ffc32df4dda989))
## [0.7.0](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.7...0.7.0) (2023-05-01)
### ⚠ BREAKING CHANGES
* removes fastfilter parameter because it should never be needed
### Features
* adds pair capability to real spectrum run hopefully ([a089951](https://gitea.deepak.science:2222/physics/deepdog/commit/a089951bbefcd8a0b2efeb49b7a8090412cbb23d))
* removes fastfilter parameter because it should never be needed ([a015daf](https://gitea.deepak.science:2222/physics/deepdog/commit/a015daf5ff6fa5f6155c8d7e02981b588840a5b0))
### [0.6.7](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.6...0.6.7) (2023-04-14)
### Features
* adds option to cap core count for real spectrum run ([bf15f4a](https://gitea.deepak.science:2222/physics/deepdog/commit/bf15f4a7b7f59504983624e7d512ed7474372032))
* adds option to cap core count for temp aware run ([12903b2](https://gitea.deepak.science:2222/physics/deepdog/commit/12903b2540cefb040174d230bc0d04719a6dc1b7))
### Bug Fixes
* avoids redefinition of core count in loop ([1cf4454](https://gitea.deepak.science:2222/physics/deepdog/commit/1cf44541531541088198bd4599d467df3e1acbcf))
### [0.6.6](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.5...0.6.6) (2023-04-09)
### Bug Fixes
* removes bad logging in multiprocessing function ([8fd1b75](https://gitea.deepak.science:2222/physics/deepdog/commit/8fd1b75e1378301210bfa8f14dd09174bbd21414))
### [0.6.5](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.4...0.6.5) (2023-04-09)
### Features
* adds temp aware guy using new pdme temp-flexible feature for bundling temp models ([de1ec3e](https://gitea.deepak.science:2222/physics/deepdog/commit/de1ec3e70062d418e0d4c89716905cc9313d2e26))
### [0.6.4](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.3...0.6.4) (2022-08-13)
### Features
* Prints model names while running ([7ea1d71](https://gitea.deepak.science:2222/physics/deepdog/commit/7ea1d715f67e81c9fa841c5a62f1cc700ff7363d))
### [0.6.3](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.2...0.6.3) (2022-06-12)
### Features
* adds fast filter variant ([2c5c122](https://gitea.deepak.science:2222/physics/deepdog/commit/2c5c1228209e51d17253f07470e2f1e6dc6872d7))
* adds tester for fast filter real spectrum ([0a1a277](https://gitea.deepak.science:2222/physics/deepdog/commit/0a1a27759b0d4ab01da214b76ab14bf2b1fe00e3))
### [0.6.2](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.1...0.6.2) (2022-05-26)
### Features
* adds better import api for real data run ([d7e0f13](https://gitea.deepak.science:2222/physics/deepdog/commit/d7e0f13ca55197b24cb534c80f321ee76b9c4a40))
### [0.6.1](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.0...0.6.1) (2022-05-22)
### Features
* adds new runner for real spectra ([bd56f24](https://gitea.deepak.science:2222/physics/deepdog/commit/bd56f247748babb2ee1f2a1182d25aa968bff5a5))
## [0.6.0](https://gitea.deepak.science:2222/physics/deepdog/compare/0.5.0...0.6.0) (2022-05-22)
### ⚠ BREAKING CHANGES
* bayes run now handles multidipoles with changes to output file format etc.
* logs multiple dipoles better maybe
* switches over to pdme new stuff, uses models and scraps discretisations entirely
* removes alt_bayes bayes distinction, which was superfluous when only alt worked
### Features
* adds pdme 0.7.0 for multiprocessing ([874d876](https://gitea.deepak.science:2222/physics/deepdog/commit/874d876c9d774433b034d47c4cc0cdac41e6f2c7))
* bayes run now handles multidipoles with changes to output file format etc. ([5d0a7a4](https://gitea.deepak.science:2222/physics/deepdog/commit/5d0a7a4be09c58f8f8f859384f01d7912a98b8b9))
* logs multiple dipoles better maybe ([ae8977b](https://gitea.deepak.science:2222/physics/deepdog/commit/ae8977bb1e4d6cd71e88ea0876da8f4318e030b6))
* removes alt_bayes bayes distinction, which was superfluous when only alt worked ([101569d](https://gitea.deepak.science:2222/physics/deepdog/commit/101569d749e4f3f1842886aa2fd3321b8132278b))
* switches over to pdme new stuff, uses models and scraps discretisations entirely ([6e29f7a](https://gitea.deepak.science:2222/physics/deepdog/commit/6e29f7a702b578c266a42bba23ac973d155ada10))
* Uses multidipole for bayes run, with more verbose output ([df89776](https://gitea.deepak.science:2222/physics/deepdog/commit/df8977655de977fd3c4f7383dd9571e551eb1382))
### Bug Fixes
* another bug fix for csv generation ([b7da3d6](https://gitea.deepak.science:2222/physics/deepdog/commit/b7da3d61cc5c128cba1d2fcb3770b71b7f6fc4b8))
* fixes crash when dipole count is smaller than expected max during file write ([b5e0ecb](https://gitea.deepak.science:2222/physics/deepdog/commit/b5e0ecb52886b32d9055302eacfabb69338026b4))
* fixes format string in csv output for headers ([9afa209](https://gitea.deepak.science:2222/physics/deepdog/commit/9afa209864cdb9255988778e987fe05952848fd4))
* fixes random issue ([eec926a](https://gitea.deepak.science:2222/physics/deepdog/commit/eec926aaac654f78942b4c6b612e4d1cdcbf81dc))
* moves logging successes to after they've actually happened ([0caad05](https://gitea.deepak.science:2222/physics/deepdog/commit/0caad05e3cc6a9adba8bf937c3d2f944e1b096a3))
* now doesn't double randomise frequency ([23b202b](https://gitea.deepak.science:2222/physics/deepdog/commit/23b202beb81cb89f7f20b691e83116fa53764902))
* whoops deleted word multiprocessing ([31070b5](https://gitea.deepak.science:2222/physics/deepdog/commit/31070b5342c265d930b4c51402f42a3ee2415066))
## [0.5.0](https://gitea.deepak.science:2222/physics/deepdog/compare/0.4.0...0.5.0) (2022-04-30)
### ⚠ BREAKING CHANGES
* simulpairs now uses different rng calculator
### Features
* adds simulpairs run ([e9277c3](https://gitea.deepak.science:2222/physics/deepdog/commit/e9277c3da777359feb352c0b19f3bb029248ba2f))
* has better parallelisation ([edf0ba6](https://gitea.deepak.science:2222/physics/deepdog/commit/edf0ba6532c0588fce32341709cdb70e384b83f4))
* simulpairs now uses different rng calculator ([50dbc48](https://gitea.deepak.science:2222/physics/deepdog/commit/50dbc4835e60bace9e9b4ba37415f073a3c9e479))
### Bug Fixes
* better parallelisation hopefully ([42829c0](https://gitea.deepak.science:2222/physics/deepdog/commit/42829c0327e080e18be2fb75e746f6ac0d7c2f6d))
* Makes altbayessimulpairs available in package ([492a5e6](https://gitea.deepak.science:2222/physics/deepdog/commit/492a5e6681c85f95840e28cfd5d4ce4ca1d54eba))
* stronger names ([0954429](https://gitea.deepak.science:2222/physics/deepdog/commit/0954429e2d015a105ff16dfbb9e7a352bf53e5e9))
* Uses correct filename arg for passed in rng ([349341b](https://gitea.deepak.science:2222/physics/deepdog/commit/349341b405375a43b933f1fd7db4ee9fc501def3))
* uses correct filename for pairs guy ([4c06b39](https://gitea.deepak.science:2222/physics/deepdog/commit/4c06b3912c811c93c310b1d9e4c153f2014c4f8b))
## [0.4.0](https://gitea.deepak.science:2222/physics/deepdog/compare/0.3.5...0.4.0) (2022-04-10)
### ⚠ BREAKING CHANGES
* Adds pair calculations, with changing api format
### Features
* Adds dynamic cycle count increases to help reach minimum success count ([ec7b4ca](https://gitea.deepak.science:2222/physics/deepdog/commit/ec7b4cac393c15e94c513215c4f1ba32be2ae87a))
* Adds pair calculations, with changing api format ([6463b13](https://gitea.deepak.science:2222/physics/deepdog/commit/6463b135ef2d212b565864b5ac1b655e014d2194))
### Bug Fixes
* uses bigfix from pdme for negatives ([c1c711f](https://gitea.deepak.science:2222/physics/deepdog/commit/c1c711f47b574d3a9b8a24dbcbdd7f50b9be8ea9))
### [0.3.5](https://gitea.deepak.science:2222/physics/deepdog/compare/0.3.4...0.3.5) (2022-03-07) ### [0.3.5](https://gitea.deepak.science:2222/physics/deepdog/compare/0.3.4...0.3.5) (2022-03-07)

20
Jenkinsfile vendored
View File

@@ -4,7 +4,7 @@ pipeline {
label 'deepdog' // all your pods will be named with this prefix, followed by a unique id label 'deepdog' // all your pods will be named with this prefix, followed by a unique id
idleMinutes 5 // how long the pod will live after no jobs have run on it idleMinutes 5 // how long the pod will live after no jobs have run on it
yamlFile 'jenkins/ci-agent-pod.yaml' // path to the pod definition relative to the root of our project yamlFile 'jenkins/ci-agent-pod.yaml' // path to the pod definition relative to the root of our project
defaultContainer 'poetry' // define a default container if more than a few stages use it, will default to jnlp container defaultContainer 'python' // define a default container if more than a few stages use it, will default to jnlp container
} }
} }
@@ -12,30 +12,36 @@ pipeline {
parallelsAlwaysFailFast() parallelsAlwaysFailFast()
} }
environment {
POETRY_HOME="/opt/poetry"
POETRY_VERSION="1.1.12"
}
stages { stages {
stage('Build') { stage('Build') {
steps { steps {
echo 'Building...' echo 'Building...'
sh 'python --version' sh 'python --version'
sh 'poetry --version' sh 'curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python'
sh 'poetry install' sh '${POETRY_HOME}/bin/poetry --version'
sh '${POETRY_HOME}/bin/poetry install'
} }
} }
stage('Test') { stage('Test') {
parallel{ parallel{
stage('pytest') { stage('pytest') {
steps { steps {
sh 'poetry run pytest' sh '${POETRY_HOME}/bin/poetry run pytest'
} }
} }
stage('lint') { stage('lint') {
steps { steps {
sh 'poetry run flake8 deepdog tests' sh '${POETRY_HOME}/bin/poetry run flake8 deepdog tests'
} }
} }
stage('mypy') { stage('mypy') {
steps { steps {
sh 'poetry run mypy deepdog' sh '${POETRY_HOME}/bin/poetry run mypy deepdog'
} }
} }
} }
@@ -51,7 +57,7 @@ pipeline {
} }
steps { steps {
echo 'Deploying...' echo 'Deploying...'
sh 'poetry publish -u ${PYPI_USR} -p ${PYPI_PSW} --build' sh '${POETRY_HOME}/bin/poetry publish -u ${PYPI_USR} -p ${PYPI_PSW} --build'
} }
} }

View File

@@ -5,7 +5,7 @@
[![Jenkins](https://img.shields.io/jenkins/build?jobUrl=https%3A%2F%2Fjenkins.deepak.science%2Fjob%2Fgitea-physics%2Fjob%2Fdeepdog%2Fjob%2Fmaster&style=flat-square)](https://jenkins.deepak.science/job/gitea-physics/job/deepdog/job/master/) [![Jenkins](https://img.shields.io/jenkins/build?jobUrl=https%3A%2F%2Fjenkins.deepak.science%2Fjob%2Fgitea-physics%2Fjob%2Fdeepdog%2Fjob%2Fmaster&style=flat-square)](https://jenkins.deepak.science/job/gitea-physics/job/deepdog/job/master/)
![Jenkins tests](https://img.shields.io/jenkins/tests?compact_message&jobUrl=https%3A%2F%2Fjenkins.deepak.science%2Fjob%2Fgitea-physics%2Fjob%2Fdeepdog%2Fjob%2Fmaster%2F&style=flat-square) ![Jenkins tests](https://img.shields.io/jenkins/tests?compact_message&jobUrl=https%3A%2F%2Fjenkins.deepak.science%2Fjob%2Fgitea-physics%2Fjob%2Fdeepdog%2Fjob%2Fmaster%2F&style=flat-square)
![Jenkins Coverage](https://img.shields.io/jenkins/coverage/cobertura?jobUrl=https%3A%2F%2Fjenkins.deepak.science%2Fjob%2Fgitea-physics%2Fjob%2Fdeepdog%2Fjob%2Fmaster%2F&style=flat-square) ![Jenkins Coverage](https://img.shields.io/jenkins/coverage/cobertura?jobUrl=https%3A%2F%2Fjenkins.deepak.science%2Fjob%2Fgitea-physics%2Fjob%2Fdeepdog%2Fjob%2Fmaster%2F&style=flat-square)
![Maintenance](https://img.shields.io/maintenance/yes/2023?style=flat-square) ![Maintenance](https://img.shields.io/maintenance/yes/2022?style=flat-square)
The DiPole DiaGnostic tool. The DiPole DiaGnostic tool.

View File

@@ -1,24 +1,15 @@
import logging import logging
from deepdog.meta import __version__ from deepdog.meta import __version__
from deepdog.bayes_run import BayesRun from deepdog.bayes_run import BayesRun
from deepdog.bayes_run_simulpairs import BayesRunSimulPairs from deepdog.alt_bayes_run import AltBayesRun
from deepdog.real_spectrum_run import RealSpectrumRun from deepdog.diagnostic import Diagnostic
from deepdog.temp_aware_real_spectrum_run import TempAwareRealSpectrumRun
from deepdog.bayes_run_with_ss import BayesRunWithSubspaceSimulation
def get_version(): def get_version():
return __version__ return __version__
__all__ = [ __all__ = ["get_version", "BayesRun", "AltBayesRun", "Diagnostic"]
"get_version",
"BayesRun",
"BayesRunSimulPairs",
"RealSpectrumRun",
"TempAwareRealSpectrumRun",
"BayesRunWithSubspaceSimulation",
]
logging.getLogger(__name__).addHandler(logging.NullHandler()) logging.getLogger(__name__).addHandler(logging.NullHandler())

134
deepdog/alt_bayes_run.py Normal file
View File

@@ -0,0 +1,134 @@
import pdme.model
import pdme.measurement.oscillating_dipole
import pdme.util.fast_v_calc
from typing import Sequence, Tuple, List
import datetime
import csv
import multiprocessing
import logging
import numpy
# TODO: remove hardcode
CHUNKSIZE = 50
# TODO: It's garbage to have this here duplicated from pdme.
DotInput = Tuple[numpy.typing.ArrayLike, float]
_logger = logging.getLogger(__name__)
def get_a_result(input) -> int:
discretisation, dot_inputs, lows, highs, monte_carlo_count, max_frequency = input
sample_dipoles = discretisation.get_model().get_n_single_dipoles(monte_carlo_count, max_frequency)
vals = pdme.util.fast_v_calc.fast_vs_for_dipoles(dot_inputs, sample_dipoles)
return numpy.count_nonzero(pdme.util.fast_v_calc.between(vals, lows, highs))
class AltBayesRun():
'''
A single Bayes run for a given set of dots.
Parameters
----------
dot_inputs : Sequence[DotInput]
The dot inputs for this bayes run.
discretisations_with_names : Sequence[Tuple(str, pdme.model.Model)]
The models to evaluate.
actual_model_discretisation : pdme.model.Discretisation
The discretisation for the model which is actually correct.
filename_slug : str
The filename slug to include.
run_count: int
The number of runs to do.
'''
def __init__(self, dot_inputs: Sequence[DotInput], discretisations_with_names: Sequence[Tuple[str, pdme.model.Discretisation]], actual_model: pdme.model.Model, filename_slug: str, run_count: int, low_error: float = 0.9, high_error: float = 1.1, monte_carlo_count: int = 10000, monte_carlo_cycles: int = 10, max_frequency: float = 20, end_threshold: float = None, chunksize: int = CHUNKSIZE) -> None:
self.dot_inputs = dot_inputs
self.dot_inputs_array = pdme.measurement.oscillating_dipole.dot_inputs_to_array(dot_inputs)
self.discretisations = [disc for (_, disc) in discretisations_with_names]
self.model_names = [name for (name, _) in discretisations_with_names]
self.actual_model = actual_model
self.model_count = len(self.discretisations)
self.monte_carlo_count = monte_carlo_count
self.monte_carlo_cycles = monte_carlo_cycles
self.run_count = run_count
self.low_error = low_error
self.high_error = high_error
self.csv_fields = ["dipole_moment", "dipole_location", "dipole_frequency"]
self.compensate_zeros = True
self.chunksize = chunksize
for name in self.model_names:
self.csv_fields.extend([f"{name}_success", f"{name}_count", f"{name}_prob"])
self.probabilities = [1 / self.model_count] * self.model_count
timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
self.filename = f"{timestamp}-{filename_slug}.altbayes.csv"
self.max_frequency = max_frequency
if end_threshold is not None:
if 0 < end_threshold < 1:
self.end_threshold: float = end_threshold
self.use_end_threshold = True
_logger.info(f"Will abort early, at {self.end_threshold}.")
else:
raise ValueError(f"end_threshold should be between 0 and 1, but is actually {end_threshold}")
def go(self) -> None:
with open(self.filename, "a", newline="") as outfile:
writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
writer.writeheader()
for run in range(1, self.run_count + 1):
rng = numpy.random.default_rng()
frequency = rng.uniform(1, self.max_frequency)
# Generate the actual dipoles
actual_dipoles = self.actual_model.get_dipoles(frequency)
dots = actual_dipoles.get_percent_range_dot_measurements(self.dot_inputs, self.low_error, self.high_error)
lows, highs = pdme.measurement.oscillating_dipole.dot_range_measurements_low_high_arrays(dots)
_logger.info(f"Going to work on dipole at {actual_dipoles.dipoles}")
results = []
_logger.debug("Going to iterate over discretisations now")
for disc_count, discretisation in enumerate(self.discretisations):
_logger.debug(f"Doing discretisation #{disc_count}")
with multiprocessing.Pool(multiprocessing.cpu_count() - 1 or 1) as pool:
results.append(sum(
pool.imap_unordered(get_a_result, [(discretisation, self.dot_inputs_array, lows, highs, self.monte_carlo_count, self.max_frequency)] * self.monte_carlo_cycles, self.chunksize)
))
_logger.debug("Done, constructing output now")
row = {
"dipole_moment": actual_dipoles.dipoles[0].p,
"dipole_location": actual_dipoles.dipoles[0].s,
"dipole_frequency": actual_dipoles.dipoles[0].w
}
successes: List[float] = []
counts: List[int] = []
for model_index, (name, result) in enumerate(zip(self.model_names, results)):
row[f"{name}_success"] = result
row[f"{name}_count"] = self.monte_carlo_count * self.monte_carlo_cycles
successes.append(max(result, 0.5))
counts.append(self.monte_carlo_count * self.monte_carlo_cycles)
success_weight = sum([(succ / count) * prob for succ, count, prob in zip(successes, counts, self.probabilities)])
new_probabilities = [(succ / count) * old_prob / success_weight for succ, count, old_prob in zip(successes, counts, self.probabilities)]
self.probabilities = new_probabilities
for name, probability in zip(self.model_names, self.probabilities):
row[f"{name}_prob"] = probability
_logger.info(row)
with open(self.filename, "a", newline="") as outfile:
writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
writer.writerow(row)
if self.use_end_threshold:
max_prob = max(self.probabilities)
if max_prob > self.end_threshold:
_logger.info(f"Aborting early, because {max_prob} is greater than {self.end_threshold}")
break

View File

@@ -1,19 +1,17 @@
import pdme.inputs
import pdme.model import pdme.model
import pdme.measurement.input_types
import pdme.measurement.oscillating_dipole
import pdme.util.fast_v_calc
import pdme.util.fast_nonlocal_spectrum
from typing import Sequence, Tuple, List from typing import Sequence, Tuple, List
import datetime import datetime
import itertools
import csv import csv
import multiprocessing
import logging import logging
import numpy import numpy
import scipy.optimize
import multiprocessing
# TODO: remove hardcode # TODO: remove hardcode
CHUNKSIZE = 50 COST_THRESHOLD = 1e-10
# TODO: It's garbage to have this here duplicated from pdme. # TODO: It's garbage to have this here duplicated from pdme.
DotInput = Tuple[numpy.typing.ArrayLike, float] DotInput = Tuple[numpy.typing.ArrayLike, float]
@@ -22,126 +20,43 @@ DotInput = Tuple[numpy.typing.ArrayLike, float]
_logger = logging.getLogger(__name__) _logger = logging.getLogger(__name__)
def get_a_result(input) -> int: def get_a_result(discretisation, dots, index) -> Tuple[Tuple[int, ...], scipy.optimize.OptimizeResult]:
model, dot_inputs, lows, highs, monte_carlo_count, max_frequency, seed = input return (index, discretisation.solve_for_index(dots, index))
rng = numpy.random.default_rng(seed)
sample_dipoles = model.get_monte_carlo_dipole_inputs(
monte_carlo_count, max_frequency, rng_to_use=rng
)
vals = pdme.util.fast_v_calc.fast_vs_for_dipoleses(dot_inputs, sample_dipoles)
return numpy.count_nonzero(pdme.util.fast_v_calc.between(vals, lows, highs))
def get_a_result_using_pairs(input) -> int: class BayesRun():
( '''
model,
dot_inputs,
pair_inputs,
local_lows,
local_highs,
nonlocal_lows,
nonlocal_highs,
monte_carlo_count,
max_frequency,
) = input
sample_dipoles = model.get_n_single_dipoles(monte_carlo_count, max_frequency)
local_vals = pdme.util.fast_v_calc.fast_vs_for_dipoles(dot_inputs, sample_dipoles)
local_matches = pdme.util.fast_v_calc.between(local_vals, local_lows, local_highs)
nonlocal_vals = pdme.util.fast_nonlocal_spectrum.fast_s_nonlocal(
pair_inputs, sample_dipoles
)
nonlocal_matches = pdme.util.fast_v_calc.between(
nonlocal_vals, nonlocal_lows, nonlocal_highs
)
combined_matches = numpy.logical_and(local_matches, nonlocal_matches)
return numpy.count_nonzero(combined_matches)
class BayesRun:
"""
A single Bayes run for a given set of dots. A single Bayes run for a given set of dots.
Parameters Parameters
---------- ----------
dot_inputs : Sequence[DotInput] dot_inputs : Sequence[DotInput]
The dot inputs for this bayes run. The dot inputs for this bayes run.
discretisations_with_names : Sequence[Tuple(str, pdme.model.Model)]
models_with_names : Sequence[Tuple(str, pdme.model.DipoleModel)]
The models to evaluate. The models to evaluate.
actual_model_discretisation : pdme.model.Discretisation
actual_model : pdme.model.DipoleModel The discretisation for the model which is actually correct.
The model which is actually correct.
filename_slug : str filename_slug : str
The filename slug to include. The filename slug to include.
run_count: int run_count: int
The number of runs to do. The number of runs to do.
""" '''
def __init__(self, dot_inputs: Sequence[DotInput], discretisations_with_names: Sequence[Tuple[str, pdme.model.Discretisation]], actual_model: pdme.model.Model, filename_slug: str, run_count: int, max_frequency: float = None, end_threshold: float = None) -> None:
def __init__( self.dot_inputs = dot_inputs
self, self.discretisations = [disc for (_, disc) in discretisations_with_names]
dot_positions: Sequence[numpy.typing.ArrayLike], self.model_names = [name for (name, _) in discretisations_with_names]
frequency_range: Sequence[float],
models_with_names: Sequence[Tuple[str, pdme.model.DipoleModel]],
actual_model: pdme.model.DipoleModel,
filename_slug: str,
run_count: int = 100,
low_error: float = 0.9,
high_error: float = 1.1,
monte_carlo_count: int = 10000,
monte_carlo_cycles: int = 10,
target_success: int = 100,
max_monte_carlo_cycles_steps: int = 10,
max_frequency: float = 20,
end_threshold: float = None,
chunksize: int = CHUNKSIZE,
) -> None:
self.dot_inputs = pdme.inputs.inputs_with_frequency_range(
dot_positions, frequency_range
)
self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
self.dot_inputs
)
self.models = [model for (_, model) in models_with_names]
self.model_names = [name for (name, _) in models_with_names]
self.actual_model = actual_model self.actual_model = actual_model
self.model_count = len(self.discretisations)
self.n: int
try:
self.n = self.actual_model.n # type: ignore
except AttributeError:
self.n = 1
self.model_count = len(self.models)
self.monte_carlo_count = monte_carlo_count
self.monte_carlo_cycles = monte_carlo_cycles
self.target_success = target_success
self.max_monte_carlo_cycles_steps = max_monte_carlo_cycles_steps
self.run_count = run_count self.run_count = run_count
self.low_error = low_error self.csv_fields = ["dipole_moment", "dipole_location", "dipole_frequency"]
self.high_error = high_error
self.csv_fields = []
for i in range(self.n):
self.csv_fields.extend(
[
f"dipole_moment_{i+1}",
f"dipole_location_{i+1}",
f"dipole_frequency_{i+1}",
]
)
self.compensate_zeros = True self.compensate_zeros = True
self.chunksize = chunksize
for name in self.model_names: for name in self.model_names:
self.csv_fields.extend([f"{name}_success", f"{name}_count", f"{name}_prob"]) self.csv_fields.extend([f"{name}_success", f"{name}_count", f"{name}_prob"])
self.probabilities = [1 / self.model_count] * self.model_count self.probabilities = [1 / self.model_count] * self.model_count
timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S") timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
self.filename = f"{timestamp}-{filename_slug}.bayesrun.csv" self.filename = f"{timestamp}-{filename_slug}.csv"
self.max_frequency = max_frequency self.max_frequency = max_frequency
if end_threshold is not None: if end_threshold is not None:
@@ -150,9 +65,7 @@ class BayesRun:
self.use_end_threshold = True self.use_end_threshold = True
_logger.info(f"Will abort early, at {self.end_threshold}.") _logger.info(f"Will abort early, at {self.end_threshold}.")
else: else:
raise ValueError( raise ValueError(f"end_threshold should be between 0 and 1, but is actually {end_threshold}")
f"end_threshold should be between 0 and 1, but is actually {end_threshold}"
)
def go(self) -> None: def go(self) -> None:
with open(self.filename, "a", newline="") as outfile: with open(self.filename, "a", newline="") as outfile:
@@ -160,122 +73,56 @@ class BayesRun:
writer.writeheader() writer.writeheader()
for run in range(1, self.run_count + 1): for run in range(1, self.run_count + 1):
frequency: float = run
if self.max_frequency is not None and self.max_frequency > 1:
rng = numpy.random.default_rng()
frequency = rng.uniform(1, self.max_frequency)
dipoles = self.actual_model.get_dipoles(frequency)
# Generate the actual dipoles dots = dipoles.get_dot_measurements(self.dot_inputs)
actual_dipoles = self.actual_model.get_dipoles(self.max_frequency) _logger.info(f"Going to work on dipole at {dipoles.dipoles}")
dots = actual_dipoles.get_percent_range_dot_measurements(
self.dot_inputs, self.low_error, self.high_error
)
(
lows,
highs,
) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
dots
)
_logger.info(f"Going to work on dipole at {actual_dipoles.dipoles}")
# define a new seed sequence for each run
seed_sequence = numpy.random.SeedSequence(run)
results = [] results = []
_logger.debug("Going to iterate over models now") _logger.debug("Going to iterate over discretisations now")
for model_count, model in enumerate(self.models): for disc_count, discretisation in enumerate(self.discretisations):
_logger.debug(f"Doing model #{model_count}") _logger.debug(f"Doing discretisation #{disc_count}")
core_count = multiprocessing.cpu_count() - 1 or 1 with multiprocessing.Pool(multiprocessing.cpu_count() - 1 or 1) as pool:
with multiprocessing.Pool(core_count) as pool: results.append(pool.starmap(get_a_result, zip(itertools.repeat(discretisation), itertools.repeat(dots), discretisation.all_indices())))
cycle_count = 0
cycle_success = 0
cycles = 0
while (cycles < self.max_monte_carlo_cycles_steps) and (
cycle_success <= self.target_success
):
_logger.debug(f"Starting cycle {cycles}")
cycles += 1
current_success = 0
cycle_count += self.monte_carlo_count * self.monte_carlo_cycles
# generate a seed from the sequence for each core.
# note this needs to be inside the loop for monte carlo cycle steps!
# that way we get more stuff.
seeds = seed_sequence.spawn(self.monte_carlo_cycles)
current_success = sum(
pool.imap_unordered(
get_a_result,
[
(
model,
self.dot_inputs_array,
lows,
highs,
self.monte_carlo_count,
self.max_frequency,
seed,
)
for seed in seeds
],
self.chunksize,
)
)
cycle_success += current_success
_logger.debug(f"current running successes: {cycle_success}")
results.append((cycle_count, cycle_success))
_logger.debug("Done, constructing output now") _logger.debug("Done, constructing output now")
row = { row = {
"dipole_moment_1": actual_dipoles.dipoles[0].p, "dipole_moment": dipoles.dipoles[0].p,
"dipole_location_1": actual_dipoles.dipoles[0].s, "dipole_location": dipoles.dipoles[0].s,
"dipole_frequency_1": actual_dipoles.dipoles[0].w, "dipole_frequency": dipoles.dipoles[0].w
} }
for i in range(1, self.n):
try:
current_dipoles = actual_dipoles.dipoles[i]
row[f"dipole_moment_{i+1}"] = current_dipoles.p
row[f"dipole_location_{i+1}"] = current_dipoles.s
row[f"dipole_frequency_{i+1}"] = current_dipoles.w
except IndexError:
_logger.info(f"Not writing anymore, saw end after {i}")
break
successes: List[float] = [] successes: List[float] = []
counts: List[int] = [] counts: List[int] = []
for model_index, (name, (count, result)) in enumerate( for model_index, (name, result) in enumerate(zip(self.model_names, results)):
zip(self.model_names, results) count = 0
): success = 0
for idx, val in result:
count += 1
if val.success and val.cost <= COST_THRESHOLD:
success += 1
row[f"{name}_success"] = result row[f"{name}_success"] = success
row[f"{name}_count"] = count row[f"{name}_count"] = count
successes.append(max(result, 0.5)) successes.append(max(success, 0.5))
counts.append(count) counts.append(count)
success_weight = sum( success_weight = sum([(succ / count) * prob for succ, count, prob in zip(successes, counts, self.probabilities)])
[ new_probabilities = [(succ / count) * old_prob / success_weight for succ, count, old_prob in zip(successes, counts, self.probabilities)]
(succ / count) * prob
for succ, count, prob in zip(successes, counts, self.probabilities)
]
)
new_probabilities = [
(succ / count) * old_prob / success_weight
for succ, count, old_prob in zip(successes, counts, self.probabilities)
]
self.probabilities = new_probabilities self.probabilities = new_probabilities
for name, probability in zip(self.model_names, self.probabilities): for name, probability in zip(self.model_names, self.probabilities):
row[f"{name}_prob"] = probability row[f"{name}_prob"] = probability
_logger.info(row) _logger.info(row)
with open(self.filename, "a", newline="") as outfile: with open(self.filename, "a", newline="") as outfile:
writer = csv.DictWriter( writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
outfile, fieldnames=self.csv_fields, dialect="unix"
)
writer.writerow(row) writer.writerow(row)
if self.use_end_threshold: if self.use_end_threshold:
max_prob = max(self.probabilities) max_prob = max(self.probabilities)
if max_prob > self.end_threshold: if max_prob > self.end_threshold:
_logger.info( _logger.info(f"Aborting early, because {max_prob} is greater than {self.end_threshold}")
f"Aborting early, because {max_prob} is greater than {self.end_threshold}"
)
break break

View File

@@ -1,382 +0,0 @@
import pdme.inputs
import pdme.model
import pdme.measurement.input_types
import pdme.measurement.oscillating_dipole
import pdme.util.fast_v_calc
import pdme.util.fast_nonlocal_spectrum
from typing import Sequence, Tuple, List
import datetime
import csv
import multiprocessing
import logging
import numpy
import numpy.random
# TODO: remove hardcode
CHUNKSIZE = 50
# TODO: It's garbage to have this here duplicated from pdme.
DotInput = Tuple[numpy.typing.ArrayLike, float]
_logger = logging.getLogger(__name__)
def get_a_simul_result_using_pairs(input) -> numpy.ndarray:
(
model,
dot_inputs,
pair_inputs,
local_lows,
local_highs,
nonlocal_lows,
nonlocal_highs,
monte_carlo_count,
monte_carlo_cycles,
max_frequency,
seed,
) = input
rng = numpy.random.default_rng(seed)
local_total = 0
combined_total = 0
sample_dipoles = model.get_monte_carlo_dipole_inputs(
monte_carlo_count, max_frequency, rng_to_use=rng
)
local_vals = pdme.util.fast_v_calc.fast_vs_for_dipoleses(dot_inputs, sample_dipoles)
local_matches = pdme.util.fast_v_calc.between(local_vals, local_lows, local_highs)
nonlocal_vals = pdme.util.fast_nonlocal_spectrum.fast_s_nonlocal_dipoleses(
pair_inputs, sample_dipoles
)
nonlocal_matches = pdme.util.fast_v_calc.between(
nonlocal_vals, nonlocal_lows, nonlocal_highs
)
combined_matches = numpy.logical_and(local_matches, nonlocal_matches)
local_total += numpy.count_nonzero(local_matches)
combined_total += numpy.count_nonzero(combined_matches)
return numpy.array([local_total, combined_total])
class BayesRunSimulPairs:
"""
A dual pairs-nonpairs Bayes run for a given set of dots.
Parameters
----------
dot_inputs : Sequence[DotInput]
The dot inputs for this bayes run.
models_with_names : Sequence[Tuple(str, pdme.model.DipoleModel)]
The models to evaluate.
actual_model : pdme.model.DipoleModel
The modoel for the model which is actually correct.
filename_slug : str
The filename slug to include.
run_count: int
The number of runs to do.
"""
def __init__(
self,
dot_positions: Sequence[numpy.typing.ArrayLike],
frequency_range: Sequence[float],
models_with_names: Sequence[Tuple[str, pdme.model.DipoleModel]],
actual_model: pdme.model.DipoleModel,
filename_slug: str,
run_count: int = 100,
low_error: float = 0.9,
high_error: float = 1.1,
pairs_high_error=None,
pairs_low_error=None,
monte_carlo_count: int = 10000,
monte_carlo_cycles: int = 10,
target_success: int = 100,
max_monte_carlo_cycles_steps: int = 10,
max_frequency: float = 20,
end_threshold: float = None,
chunksize: int = CHUNKSIZE,
) -> None:
self.dot_inputs = pdme.inputs.inputs_with_frequency_range(
dot_positions, frequency_range
)
self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
self.dot_inputs
)
self.dot_pair_inputs = pdme.inputs.input_pairs_with_frequency_range(
dot_positions, frequency_range
)
self.dot_pair_inputs_array = (
pdme.measurement.input_types.dot_pair_inputs_to_array(self.dot_pair_inputs)
)
self.models = [mod for (_, mod) in models_with_names]
self.model_names = [name for (name, _) in models_with_names]
self.actual_model = actual_model
self.n: int
try:
self.n = self.actual_model.n # type: ignore
except AttributeError:
self.n = 1
self.model_count = len(self.models)
self.monte_carlo_count = monte_carlo_count
self.monte_carlo_cycles = monte_carlo_cycles
self.target_success = target_success
self.max_monte_carlo_cycles_steps = max_monte_carlo_cycles_steps
self.run_count = run_count
self.low_error = low_error
self.high_error = high_error
if pairs_low_error is None:
self.pairs_low_error = self.low_error
else:
self.pairs_low_error = pairs_low_error
if pairs_high_error is None:
self.pairs_high_error = self.high_error
else:
self.pairs_high_error = pairs_high_error
self.csv_fields = []
for i in range(self.n):
self.csv_fields.extend(
[
f"dipole_moment_{i+1}",
f"dipole_location_{i+1}",
f"dipole_frequency_{i+1}",
]
)
self.compensate_zeros = True
self.chunksize = chunksize
for name in self.model_names:
self.csv_fields.extend([f"{name}_success", f"{name}_count", f"{name}_prob"])
self.probabilities_no_pairs = [1 / self.model_count] * self.model_count
self.probabilities_pairs = [1 / self.model_count] * self.model_count
timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
self.filename_pairs = f"{timestamp}-{filename_slug}.simulpairs.yespairs.csv"
self.filename_no_pairs = f"{timestamp}-{filename_slug}.simulpairs.noopairs.csv"
self.max_frequency = max_frequency
if end_threshold is not None:
if 0 < end_threshold < 1:
self.end_threshold: float = end_threshold
self.use_end_threshold = True
_logger.info(f"Will abort early, at {self.end_threshold}.")
else:
raise ValueError(
f"end_threshold should be between 0 and 1, but is actually {end_threshold}"
)
def go(self) -> None:
with open(self.filename_pairs, "a", newline="") as outfile:
writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
writer.writeheader()
with open(self.filename_no_pairs, "a", newline="") as outfile:
writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
writer.writeheader()
for run in range(1, self.run_count + 1):
# Generate the actual dipoles
actual_dipoles = self.actual_model.get_dipoles(self.max_frequency)
dots = actual_dipoles.get_percent_range_dot_measurements(
self.dot_inputs, self.low_error, self.high_error
)
(
lows,
highs,
) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
dots
)
pair_lows, pair_highs = (None, None)
pair_measurements = actual_dipoles.get_percent_range_dot_pair_measurements(
self.dot_pair_inputs, self.pairs_low_error, self.pairs_high_error
)
(
pair_lows,
pair_highs,
) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
pair_measurements
)
_logger.info(f"Going to work on dipole at {actual_dipoles.dipoles}")
# define a new seed sequence for each run
seed_sequence = numpy.random.SeedSequence(run)
results_pairs = []
results_no_pairs = []
_logger.debug("Going to iterate over models now")
for model_count, model in enumerate(self.models):
_logger.debug(f"Doing model #{model_count}")
core_count = multiprocessing.cpu_count() - 1 or 1
with multiprocessing.Pool(core_count) as pool:
cycle_count = 0
cycle_success_pairs = 0
cycle_success_no_pairs = 0
cycles = 0
while (cycles < self.max_monte_carlo_cycles_steps) and (
min(cycle_success_pairs, cycle_success_no_pairs)
<= self.target_success
):
_logger.debug(f"Starting cycle {cycles}")
cycles += 1
current_success_pairs = 0
current_success_no_pairs = 0
cycle_count += self.monte_carlo_count * self.monte_carlo_cycles
# generate a seed from the sequence for each core.
# note this needs to be inside the loop for monte carlo cycle steps!
# that way we get more stuff.
seeds = seed_sequence.spawn(self.monte_carlo_cycles)
_logger.debug(f"Creating {self.monte_carlo_cycles} seeds")
current_success_both = numpy.array(
sum(
pool.imap_unordered(
get_a_simul_result_using_pairs,
[
(
model,
self.dot_inputs_array,
self.dot_pair_inputs_array,
lows,
highs,
pair_lows,
pair_highs,
self.monte_carlo_count,
self.monte_carlo_cycles,
self.max_frequency,
seed,
)
for seed in seeds
],
self.chunksize,
)
)
)
current_success_no_pairs = current_success_both[0]
current_success_pairs = current_success_both[1]
cycle_success_no_pairs += current_success_no_pairs
cycle_success_pairs += current_success_pairs
_logger.debug(
f"(pair, no_pair) successes are {(cycle_success_pairs, cycle_success_no_pairs)}"
)
results_pairs.append((cycle_count, cycle_success_pairs))
results_no_pairs.append((cycle_count, cycle_success_no_pairs))
_logger.debug("Done, constructing output now")
row_pairs = {
"dipole_moment_1": actual_dipoles.dipoles[0].p,
"dipole_location_1": actual_dipoles.dipoles[0].s,
"dipole_frequency_1": actual_dipoles.dipoles[0].w,
}
row_no_pairs = {
"dipole_moment_1": actual_dipoles.dipoles[0].p,
"dipole_location_1": actual_dipoles.dipoles[0].s,
"dipole_frequency_1": actual_dipoles.dipoles[0].w,
}
for i in range(1, self.n):
try:
current_dipoles = actual_dipoles.dipoles[i]
row_pairs[f"dipole_moment_{i+1}"] = current_dipoles.p
row_pairs[f"dipole_location_{i+1}"] = current_dipoles.s
row_pairs[f"dipole_frequency_{i+1}"] = current_dipoles.w
row_no_pairs[f"dipole_moment_{i+1}"] = current_dipoles.p
row_no_pairs[f"dipole_location_{i+1}"] = current_dipoles.s
row_no_pairs[f"dipole_frequency_{i+1}"] = current_dipoles.w
except IndexError:
_logger.info(f"Not writing anymore, saw end after {i}")
break
successes_pairs: List[float] = []
successes_no_pairs: List[float] = []
counts: List[int] = []
for model_index, (
name,
(count_pair, result_pair),
(count_no_pair, result_no_pair),
) in enumerate(zip(self.model_names, results_pairs, results_no_pairs)):
row_pairs[f"{name}_success"] = result_pair
row_pairs[f"{name}_count"] = count_pair
successes_pairs.append(max(result_pair, 0.5))
row_no_pairs[f"{name}_success"] = result_no_pair
row_no_pairs[f"{name}_count"] = count_no_pair
successes_no_pairs.append(max(result_no_pair, 0.5))
counts.append(count_pair)
success_weight_pair = sum(
[
(succ / count) * prob
for succ, count, prob in zip(
successes_pairs, counts, self.probabilities_pairs
)
]
)
success_weight_no_pair = sum(
[
(succ / count) * prob
for succ, count, prob in zip(
successes_no_pairs, counts, self.probabilities_no_pairs
)
]
)
new_probabilities_pair = [
(succ / count) * old_prob / success_weight_pair
for succ, count, old_prob in zip(
successes_pairs, counts, self.probabilities_pairs
)
]
new_probabilities_no_pair = [
(succ / count) * old_prob / success_weight_no_pair
for succ, count, old_prob in zip(
successes_no_pairs, counts, self.probabilities_no_pairs
)
]
self.probabilities_pairs = new_probabilities_pair
self.probabilities_no_pairs = new_probabilities_no_pair
for name, probability_pair, probability_no_pair in zip(
self.model_names, self.probabilities_pairs, self.probabilities_no_pairs
):
row_pairs[f"{name}_prob"] = probability_pair
row_no_pairs[f"{name}_prob"] = probability_no_pair
_logger.debug(row_pairs)
_logger.debug(row_no_pairs)
with open(self.filename_pairs, "a", newline="") as outfile:
writer = csv.DictWriter(
outfile, fieldnames=self.csv_fields, dialect="unix"
)
writer.writerow(row_pairs)
with open(self.filename_no_pairs, "a", newline="") as outfile:
writer = csv.DictWriter(
outfile, fieldnames=self.csv_fields, dialect="unix"
)
writer.writerow(row_no_pairs)
if self.use_end_threshold:
max_prob = min(
max(self.probabilities_pairs), max(self.probabilities_no_pairs)
)
if max_prob > self.end_threshold:
_logger.info(
f"Aborting early, because {max_prob} is greater than {self.end_threshold}"
)
break

View File

@@ -1,261 +0,0 @@
import deepdog.subset_simulation
import pdme.inputs
import pdme.model
import pdme.measurement.input_types
import pdme.measurement.oscillating_dipole
import pdme.util.fast_v_calc
import pdme.util.fast_nonlocal_spectrum
from typing import Sequence, Tuple, List, Optional
import datetime
import csv
import logging
import numpy
import numpy.typing
# TODO: remove hardcode
CHUNKSIZE = 50
# TODO: It's garbage to have this here duplicated from pdme.
DotInput = Tuple[numpy.typing.ArrayLike, float]
CLAMPING_FACTOR = 10
_logger = logging.getLogger(__name__)
class BayesRunWithSubspaceSimulation:
"""
A single Bayes run for a given set of dots.
Parameters
----------
dot_inputs : Sequence[DotInput]
The dot inputs for this bayes run.
models_with_names : Sequence[Tuple(str, pdme.model.DipoleModel)]
The models to evaluate.
actual_model : pdme.model.DipoleModel
The model which is actually correct.
filename_slug : str
The filename slug to include.
run_count: int
The number of runs to do.
"""
def __init__(
self,
dot_positions: Sequence[numpy.typing.ArrayLike],
frequency_range: Sequence[float],
models_with_names: Sequence[Tuple[str, pdme.model.DipoleModel]],
actual_model: pdme.model.DipoleModel,
filename_slug: str,
max_frequency: float = 20,
end_threshold: float = None,
run_count=100,
chunksize: int = CHUNKSIZE,
ss_n_c: int = 500,
ss_n_s: int = 100,
ss_m_max: int = 15,
ss_target_cost: Optional[float] = None,
ss_level_0_seed: int = 200,
ss_mcmc_seed: int = 20,
ss_use_adaptive_steps=True,
ss_default_phi_step=0.01,
ss_default_theta_step=0.01,
ss_default_r_step=0.01,
ss_default_w_log_step=0.01,
ss_default_upper_w_log_step=4,
ss_dump_last_generation=False,
ss_initial_costs_chunk_size=100,
write_output_to_bayesruncsv=True,
use_timestamp_for_output=True,
) -> None:
self.dot_inputs = pdme.inputs.inputs_with_frequency_range(
dot_positions, frequency_range
)
self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
self.dot_inputs
)
self.models_with_names = models_with_names
self.models = [model for (_, model) in models_with_names]
self.model_names = [name for (name, _) in models_with_names]
self.actual_model = actual_model
self.n: int
try:
self.n = self.actual_model.n # type: ignore
except AttributeError:
self.n = 1
self.model_count = len(self.models)
self.csv_fields = []
for i in range(self.n):
self.csv_fields.extend(
[
f"dipole_moment_{i+1}",
f"dipole_location_{i+1}",
f"dipole_frequency_{i+1}",
]
)
self.compensate_zeros = True
self.chunksize = chunksize
for name in self.model_names:
self.csv_fields.extend([f"{name}_likelihood", f"{name}_prob"])
self.probabilities = [1 / self.model_count] * self.model_count
if use_timestamp_for_output:
timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
self.filename = f"{timestamp}-{filename_slug}.bayesrunwithss.csv"
else:
self.filename = f"{filename_slug}.bayesrunwithss.csv"
self.max_frequency = max_frequency
if end_threshold is not None:
if 0 < end_threshold < 1:
self.end_threshold: float = end_threshold
self.use_end_threshold = True
_logger.info(f"Will abort early, at {self.end_threshold}.")
else:
raise ValueError(
f"end_threshold should be between 0 and 1, but is actually {end_threshold}"
)
self.ss_n_c = ss_n_c
self.ss_n_s = ss_n_s
self.ss_m_max = ss_m_max
self.ss_target_cost = ss_target_cost
self.ss_level_0_seed = ss_level_0_seed
self.ss_mcmc_seed = ss_mcmc_seed
self.ss_use_adaptive_steps = ss_use_adaptive_steps
self.ss_default_phi_step = ss_default_phi_step
self.ss_default_theta_step = ss_default_theta_step
self.ss_default_r_step = ss_default_r_step
self.ss_default_w_log_step = ss_default_w_log_step
self.ss_default_upper_w_log_step = ss_default_upper_w_log_step
self.ss_dump_last_generation = ss_dump_last_generation
self.ss_initial_costs_chunk_size = ss_initial_costs_chunk_size
self.run_count = run_count
self.write_output_to_csv = write_output_to_bayesruncsv
def go(self) -> Sequence:
if self.write_output_to_csv:
with open(self.filename, "a", newline="") as outfile:
writer = csv.DictWriter(
outfile, fieldnames=self.csv_fields, dialect="unix"
)
writer.writeheader()
return_result = []
for run in range(1, self.run_count + 1):
# Generate the actual dipoles
actual_dipoles = self.actual_model.get_dipoles(self.max_frequency)
measurements = actual_dipoles.get_dot_measurements(self.dot_inputs)
_logger.info(f"Going to work on dipole at {actual_dipoles.dipoles}")
# define a new seed sequence for each run
results = []
_logger.debug("Going to iterate over models now")
for model_count, model in enumerate(self.models_with_names):
_logger.debug(f"Doing model #{model_count}, {model[0]}")
subset_run = deepdog.subset_simulation.SubsetSimulation(
model,
self.dot_inputs,
measurements,
self.ss_n_c,
self.ss_n_s,
self.ss_m_max,
self.ss_target_cost,
self.ss_level_0_seed,
self.ss_mcmc_seed,
self.ss_use_adaptive_steps,
self.ss_default_phi_step,
self.ss_default_theta_step,
self.ss_default_r_step,
self.ss_default_w_log_step,
self.ss_default_upper_w_log_step,
initial_cost_chunk_size=self.ss_initial_costs_chunk_size,
keep_probs_list=False,
dump_last_generation_to_file=self.ss_dump_last_generation,
)
results.append(subset_run.execute())
_logger.debug("Done, constructing output now")
row = {
"dipole_moment_1": actual_dipoles.dipoles[0].p,
"dipole_location_1": actual_dipoles.dipoles[0].s,
"dipole_frequency_1": actual_dipoles.dipoles[0].w,
}
for i in range(1, self.n):
try:
current_dipoles = actual_dipoles.dipoles[i]
row[f"dipole_moment_{i+1}"] = current_dipoles.p
row[f"dipole_location_{i+1}"] = current_dipoles.s
row[f"dipole_frequency_{i+1}"] = current_dipoles.w
except IndexError:
_logger.info(f"Not writing anymore, saw end after {i}")
break
likelihoods: List[float] = []
for (name, result) in zip(self.model_names, results):
if result.over_target_likelihood is None:
if result.lowest_likelihood is None:
_logger.error(f"result {result} looks bad")
clamped_likelihood = 10**-15
else:
clamped_likelihood = result.lowest_likelihood / CLAMPING_FACTOR
_logger.warning(
f"got a none result, clamping to {clamped_likelihood}"
)
else:
clamped_likelihood = result.over_target_likelihood
likelihoods.append(clamped_likelihood)
row[f"{name}_likelihood"] = clamped_likelihood
success_weight = sum(
[
likelihood * prob
for likelihood, prob in zip(likelihoods, self.probabilities)
]
)
new_probabilities = [
likelihood * old_prob / success_weight
for likelihood, old_prob in zip(likelihoods, self.probabilities)
]
self.probabilities = new_probabilities
for name, probability in zip(self.model_names, self.probabilities):
row[f"{name}_prob"] = probability
_logger.info(row)
return_result.append(row)
if self.write_output_to_csv:
with open(self.filename, "a", newline="") as outfile:
writer = csv.DictWriter(
outfile, fieldnames=self.csv_fields, dialect="unix"
)
writer.writerow(row)
if self.use_end_threshold:
max_prob = max(self.probabilities)
if max_prob > self.end_threshold:
_logger.info(
f"Aborting early, because {max_prob} is greater than {self.end_threshold}"
)
break
return return_result

99
deepdog/diagnostic.py Normal file
View File

@@ -0,0 +1,99 @@
from pdme.measurement import OscillatingDipole, OscillatingDipoleArrangement
import pdme
from deepdog.bayes_run import DotInput
import datetime
import numpy
from dataclasses import dataclass
import logging
from typing import Sequence, Tuple
import csv
import itertools
import multiprocessing
_logger = logging.getLogger(__name__)
def get_a_result(discretisation, dots, index):
return (index, discretisation.solve_for_index(dots, index))
@dataclass
class SingleDipoleDiagnostic():
model: str
index: Tuple
bounds: Tuple
actual_dipole: OscillatingDipole
result_dipole: OscillatingDipole
success: bool
def __post_init__(self) -> None:
self.p_actual_x = self.actual_dipole.p[0]
self.p_actual_y = self.actual_dipole.p[1]
self.p_actual_z = self.actual_dipole.p[2]
self.s_actual_x = self.actual_dipole.s[0]
self.s_actual_y = self.actual_dipole.s[1]
self.s_actual_z = self.actual_dipole.s[2]
self.p_result_x = self.result_dipole.p[0]
self.p_result_y = self.result_dipole.p[1]
self.p_result_z = self.result_dipole.p[2]
self.s_result_x = self.result_dipole.s[0]
self.s_result_y = self.result_dipole.s[1]
self.s_result_z = self.result_dipole.s[2]
self.w_actual = self.actual_dipole.w
self.w_result = self.result_dipole.w
class Diagnostic():
'''
Represents a diagnostic for a single dipole moment given a set of discretisations.
Parameters
----------
dot_inputs : Sequence[DotInput]
The dot inputs for this diagnostic.
discretisations_with_names : Sequence[Tuple(str, pdme.model.Model)]
The models to evaluate.
actual_model_discretisation : pdme.model.Discretisation
The discretisation for the model which is actually correct.
filename_slug : str
The filename slug to include.
run_count: int
The number of runs to do.
'''
def __init__(self, actual_dipole_moment: numpy.ndarray, actual_dipole_position: numpy.ndarray, actual_dipole_frequency: float, dot_inputs: Sequence[DotInput], discretisations_with_names: Sequence[Tuple[str, pdme.model.Discretisation]], filename_slug: str) -> None:
self.dipoles = OscillatingDipoleArrangement([OscillatingDipole(actual_dipole_moment, actual_dipole_position, actual_dipole_frequency)])
self.dots = self.dipoles.get_dot_measurements(dot_inputs)
self.discretisations_with_names = discretisations_with_names
self.model_count = len(self.discretisations_with_names)
self.csv_fields = ["model", "index", "bounds", "p_actual_x", "p_actual_y", "p_actual_z", "s_actual_x", "s_actual_y", "s_actual_z", "w_actual", "success", "p_result_x", "p_result_y", "p_result_z", "s_result_x", "s_result_y", "s_result_z", "w_result"]
timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
self.filename = f"{timestamp}-{filename_slug}.diag.csv"
def go(self):
with open(self.filename, "a", newline="") as outfile:
# csv fields
writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect='unix')
writer.writeheader()
for (name, discretisation) in self.discretisations_with_names:
_logger.info(f"Working on discretisation {name}")
results = []
with multiprocessing.Pool(multiprocessing.cpu_count() - 1 or 1) as pool:
results = pool.starmap(get_a_result, zip(itertools.repeat(discretisation), itertools.repeat(self.dots), discretisation.all_indices()))
with open(self.filename, "a", newline='') as outfile:
writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect='unix', extrasaction="ignore")
for idx, result in results:
bounds = discretisation.bounds(idx)
actual_success = result.success and result.cost <= 1e-10
diag_row = SingleDipoleDiagnostic(name, idx, bounds, self.dipoles.dipoles[0], discretisation.model.solution_as_dipoles(result.normalised_x)[0], actual_success)
row = vars(diag_row)
_logger.debug(f"Writing result {row}")
writer.writerow(row)

View File

@@ -1,6 +0,0 @@
from deepdog.direct_monte_carlo.direct_mc import (
DirectMonteCarloRun,
DirectMonteCarloConfig,
)
__all__ = ["DirectMonteCarloRun", "DirectMonteCarloConfig"]

View File

@@ -1,157 +0,0 @@
import pdme.model
import pdme.measurement
import pdme.measurement.input_types
import pdme.subspace_simulation
from typing import Tuple, Sequence
from dataclasses import dataclass
import logging
import numpy
import numpy.random
import pdme.util.fast_v_calc
_logger = logging.getLogger(__name__)
@dataclass
class DirectMonteCarloResult:
successes: int
monte_carlo_count: int
likelihood: float
@dataclass
class DirectMonteCarloConfig:
monte_carlo_count_per_cycle: int = 10000
monte_carlo_cycles: int = 10
target_success: int = 100
max_monte_carlo_cycles_steps: int = 10
monte_carlo_seed: int = 1234
write_successes_to_file: bool = False
tag: str = ""
class DirectMonteCarloRun:
"""
A single model Direct Monte Carlo run, currently implemented only using single threading.
An encapsulation of the steps needed for a Bayes run.
Parameters
----------
model_name_pair : Sequence[Tuple(str, pdme.model.DipoleModel)]
The model to evaluate, with name.
measurements: Sequence[pdme.measurement.DotRangeMeasurement]
The measurements as dot ranges to use as the bounds for the Monte Carlo calculation.
monte_carlo_count_per_cycle: int
The number of Monte Carlo iterations to use in a single cycle calculation.
monte_carlo_cycles: int
The number of cycles to use in each step.
Increasing monte_carlo_count_per_cycle increases memory usage (and runtime), while this increases runtime, allowing
control over memory use.
target_success: int
The number of successes to target before exiting early.
Should likely be ~100 but can go higher to.
max_monte_carlo_cycles_steps: int
The number of steps to use. Each step consists of monte_carlo_cycles cycles, each of which has monte_carlo_count_per_cycle iterations.
monte_carlo_seed: int
The seed to use for the RNG.
"""
def __init__(
self,
model_name_pair: Tuple[str, pdme.model.DipoleModel],
measurements: Sequence[pdme.measurement.DotRangeMeasurement],
config: DirectMonteCarloConfig,
):
self.model_name, self.model = model_name_pair
self.measurements = measurements
self.dot_inputs = [(measure.r, measure.f) for measure in self.measurements]
self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
self.dot_inputs
)
self.config = config
(
self.lows,
self.highs,
) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
self.measurements
)
def _single_run(self, seed) -> numpy.ndarray:
rng = numpy.random.default_rng(seed)
sample_dipoles = self.model.get_monte_carlo_dipole_inputs(
self.config.monte_carlo_count_per_cycle, -1, rng
)
current_sample = sample_dipoles
for di, low, high in zip(self.dot_inputs_array, self.lows, self.highs):
if len(current_sample) < 1:
break
vals = pdme.util.fast_v_calc.fast_vs_for_dipoleses(
numpy.array([di]), current_sample
)
current_sample = current_sample[
numpy.all((vals > low) & (vals < high), axis=1)
]
return current_sample
def execute(self) -> DirectMonteCarloResult:
step_count = 0
total_success = 0
total_count = 0
count_per_step = (
self.config.monte_carlo_count_per_cycle * self.config.monte_carlo_cycles
)
seed_sequence = numpy.random.SeedSequence(self.config.monte_carlo_seed)
while (step_count < self.config.max_monte_carlo_cycles_steps) and (
total_success < self.config.target_success
):
_logger.debug(f"Executing step {step_count}")
for cycle_i, seed in enumerate(
seed_sequence.spawn(self.config.monte_carlo_cycles)
):
cycle_success_configs = self._single_run(seed)
cycle_success_count = len(cycle_success_configs)
if cycle_success_count > 0:
_logger.debug(
f"For cycle {cycle_i} received {cycle_success_count} successes"
)
_logger.debug(cycle_success_configs)
if self.config.write_successes_to_file:
sorted_by_freq = numpy.array(
[
pdme.subspace_simulation.sort_array_of_dipoles_by_frequency(
dipole_config
)
for dipole_config in cycle_success_configs
]
)
dipole_count = numpy.array(cycle_success_configs).shape[1]
for n in range(dipole_count):
numpy.savetxt(
f"{self.config.tag}_{step_count}_{cycle_i}_dipole_{n}.csv",
sorted_by_freq[:, n],
delimiter=",",
)
total_success += cycle_success_count
_logger.debug(f"At end of step {step_count} have {total_success} successes")
step_count += 1
total_count += count_per_step
return DirectMonteCarloResult(
successes=total_success,
monte_carlo_count=total_count,
likelihood=total_success / total_count,
)

View File

@@ -1,3 +1,3 @@
from importlib.metadata import version from importlib.metadata import version
__version__ = version("deepdog") __version__ = version('deepdog')

View File

@@ -1,395 +0,0 @@
import pdme.inputs
import pdme.model
import pdme.measurement
import pdme.measurement.input_types
import pdme.measurement.oscillating_dipole
import pdme.util.fast_v_calc
import pdme.util.fast_nonlocal_spectrum
from typing import Sequence, Tuple, List, Dict, Union, Optional
import datetime
import csv
import multiprocessing
import logging
import numpy
# TODO: remove hardcode
CHUNKSIZE = 50
_logger = logging.getLogger(__name__)
def get_a_result_fast_filter_pairs(input) -> int:
(
model,
dot_inputs,
lows,
highs,
pair_inputs,
pair_lows,
pair_highs,
monte_carlo_count,
seed,
) = input
rng = numpy.random.default_rng(seed)
# TODO: A long term refactor is to pull the frequency stuff out from here. The None stands for max_frequency, which is unneeded in the actually useful models.
sample_dipoles = model.get_monte_carlo_dipole_inputs(
monte_carlo_count, None, rng_to_use=rng
)
current_sample = sample_dipoles
for di, low, high in zip(dot_inputs, lows, highs):
if len(current_sample) < 1:
break
vals = pdme.util.fast_v_calc.fast_vs_for_dipoleses(
numpy.array([di]), current_sample
)
current_sample = current_sample[numpy.all((vals > low) & (vals < high), axis=1)]
for pi, plow, phigh in zip(pair_inputs, pair_lows, pair_highs):
if len(current_sample) < 1:
break
vals = pdme.util.fast_nonlocal_spectrum.fast_s_nonlocal_dipoleses(
numpy.array([pi]), current_sample
)
current_sample = current_sample[
numpy.all(
((vals > plow) & (vals < phigh)) | ((vals < plow) & (vals > phigh)),
axis=1,
)
]
return len(current_sample)
def get_a_result_fast_filter_pair_phase_only(input) -> int:
(
model,
pair_inputs,
pair_phase_lows,
pair_phase_highs,
monte_carlo_count,
seed,
) = input
rng = numpy.random.default_rng(seed)
# TODO: A long term refactor is to pull the frequency stuff out from here. The None stands for max_frequency, which is unneeded in the actually useful models.
sample_dipoles = model.get_monte_carlo_dipole_inputs(
monte_carlo_count, None, rng_to_use=rng
)
current_sample = sample_dipoles
for pi, plow, phigh in zip(pair_inputs, pair_phase_lows, pair_phase_highs):
if len(current_sample) < 1:
break
vals = pdme.util.fast_nonlocal_spectrum.signarg(
pdme.util.fast_nonlocal_spectrum.fast_s_nonlocal_dipoleses(
numpy.array([pi]), current_sample
)
)
current_sample = current_sample[
numpy.all(
((vals > plow) & (vals < phigh)) | ((vals < plow) & (vals > phigh)),
axis=1,
)
]
return len(current_sample)
def get_a_result_fast_filter(input) -> int:
model, dot_inputs, lows, highs, monte_carlo_count, seed = input
rng = numpy.random.default_rng(seed)
# TODO: A long term refactor is to pull the frequency stuff out from here. The None stands for max_frequency, which is unneeded in the actually useful models.
sample_dipoles = model.get_monte_carlo_dipole_inputs(
monte_carlo_count, None, rng_to_use=rng
)
current_sample = sample_dipoles
for di, low, high in zip(dot_inputs, lows, highs):
if len(current_sample) < 1:
break
vals = pdme.util.fast_v_calc.fast_vs_for_dipoleses(
numpy.array([di]), current_sample
)
current_sample = current_sample[numpy.all((vals > low) & (vals < high), axis=1)]
return len(current_sample)
class RealSpectrumRun:
"""
A bayes run given some real data.
Parameters
----------
measurements : Sequence[pdme.measurement.DotRangeMeasurement]
The dot inputs for this bayes run.
models_with_names : Sequence[Tuple(str, pdme.model.DipoleModel)]
The models to evaluate.
actual_model : pdme.model.DipoleModel
The model which is actually correct.
filename_slug : str
The filename slug to include.
run_count: int
The number of runs to do.
If pair_measurements is not None, uses pair measurement method (and single measurements too).
If pair_phase_measurements is not None, ignores measurements and uses phase measurements _only_
This is lazy design on my part.
"""
def __init__(
self,
measurements: Sequence[pdme.measurement.DotRangeMeasurement],
models_with_names: Sequence[Tuple[str, pdme.model.DipoleModel]],
filename_slug: str,
monte_carlo_count: int = 10000,
monte_carlo_cycles: int = 10,
target_success: int = 100,
max_monte_carlo_cycles_steps: int = 10,
chunksize: int = CHUNKSIZE,
initial_seed: int = 12345,
cap_core_count: int = 0,
pair_measurements: Optional[
Sequence[pdme.measurement.DotPairRangeMeasurement]
] = None,
pair_phase_measurements: Optional[
Sequence[pdme.measurement.DotPairRangeMeasurement]
] = None,
) -> None:
self.measurements = measurements
self.dot_inputs = [(measure.r, measure.f) for measure in self.measurements]
self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
self.dot_inputs
)
if pair_measurements is not None:
self.pair_measurements = pair_measurements
self.use_pair_measurements = True
self.use_pair_phase_measurements = False
self.dot_pair_inputs = [
(measure.r1, measure.r2, measure.f)
for measure in self.pair_measurements
]
self.dot_pair_inputs_array = (
pdme.measurement.input_types.dot_pair_inputs_to_array(
self.dot_pair_inputs
)
)
elif pair_phase_measurements is not None:
self.use_pair_measurements = False
self.use_pair_phase_measurements = True
self.pair_phase_measurements = pair_phase_measurements
self.dot_pair_inputs = [
(measure.r1, measure.r2, measure.f)
for measure in self.pair_phase_measurements
]
self.dot_pair_inputs_array = (
pdme.measurement.input_types.dot_pair_inputs_to_array(
self.dot_pair_inputs
)
)
else:
self.use_pair_measurements = False
self.use_pair_phase_measurements = False
self.models = [model for (_, model) in models_with_names]
self.model_names = [name for (name, _) in models_with_names]
self.model_count = len(self.models)
self.monte_carlo_count = monte_carlo_count
self.monte_carlo_cycles = monte_carlo_cycles
self.target_success = target_success
self.max_monte_carlo_cycles_steps = max_monte_carlo_cycles_steps
self.csv_fields = []
self.compensate_zeros = True
self.chunksize = chunksize
for name in self.model_names:
self.csv_fields.extend([f"{name}_success", f"{name}_count", f"{name}_prob"])
# for now initialise priors as uniform.
self.probabilities = [1 / self.model_count] * self.model_count
timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
ff_string = "fast_filter"
self.filename = f"{timestamp}-{filename_slug}.realdata.{ff_string}.bayesrun.csv"
self.initial_seed = initial_seed
self.cap_core_count = cap_core_count
def go(self) -> None:
with open(self.filename, "a", newline="") as outfile:
writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
writer.writeheader()
(
lows,
highs,
) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
self.measurements
)
pair_lows = None
pair_highs = None
if self.use_pair_measurements:
(
pair_lows,
pair_highs,
) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
self.pair_measurements
)
pair_phase_lows = None
pair_phase_highs = None
if self.use_pair_phase_measurements:
(
pair_phase_lows,
pair_phase_highs,
) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
self.pair_phase_measurements
)
# define a new seed sequence for each run
seed_sequence = numpy.random.SeedSequence(self.initial_seed)
results = []
_logger.debug("Going to iterate over models now")
core_count = multiprocessing.cpu_count() - 1 or 1
if (self.cap_core_count >= 1) and (self.cap_core_count < core_count):
core_count = self.cap_core_count
_logger.info(f"Using {core_count} cores")
for model_count, (model, model_name) in enumerate(
zip(self.models, self.model_names)
):
_logger.debug(f"Doing model #{model_count}: {model_name}")
with multiprocessing.Pool(core_count) as pool:
cycle_count = 0
cycle_success = 0
cycles = 0
while (cycles < self.max_monte_carlo_cycles_steps) and (
cycle_success <= self.target_success
):
_logger.debug(f"Starting cycle {cycles}")
cycles += 1
current_success = 0
cycle_count += self.monte_carlo_count * self.monte_carlo_cycles
# generate a seed from the sequence for each core.
# note this needs to be inside the loop for monte carlo cycle steps!
# that way we get more stuff.
seeds = seed_sequence.spawn(self.monte_carlo_cycles)
if self.use_pair_measurements:
current_success = sum(
pool.imap_unordered(
get_a_result_fast_filter_pairs,
[
(
model,
self.dot_inputs_array,
lows,
highs,
self.dot_pair_inputs_array,
pair_lows,
pair_highs,
self.monte_carlo_count,
seed,
)
for seed in seeds
],
self.chunksize,
)
)
elif self.use_pair_phase_measurements:
current_success = sum(
pool.imap_unordered(
get_a_result_fast_filter_pairs,
[
(
model,
self.dot_pair_inputs_array,
pair_phase_lows,
pair_phase_highs,
self.monte_carlo_count,
seed,
)
for seed in seeds
],
self.chunksize,
)
)
else:
current_success = sum(
pool.imap_unordered(
get_a_result_fast_filter,
[
(
model,
self.dot_inputs_array,
lows,
highs,
self.monte_carlo_count,
seed,
)
for seed in seeds
],
self.chunksize,
)
)
cycle_success += current_success
_logger.debug(f"current running successes: {cycle_success}")
results.append((cycle_count, cycle_success))
_logger.debug("Done, constructing output now")
row: Dict[str, Union[int, float, str]] = {}
successes: List[float] = []
counts: List[int] = []
for model_index, (name, (count, result)) in enumerate(
zip(self.model_names, results)
):
row[f"{name}_success"] = result
row[f"{name}_count"] = count
successes.append(max(result, 0.5))
counts.append(count)
success_weight = sum(
[
(succ / count) * prob
for succ, count, prob in zip(successes, counts, self.probabilities)
]
)
new_probabilities = [
(succ / count) * old_prob / success_weight
for succ, count, old_prob in zip(successes, counts, self.probabilities)
]
self.probabilities = new_probabilities
for name, probability in zip(self.model_names, self.probabilities):
row[f"{name}_prob"] = probability
_logger.info(row)
with open(self.filename, "a", newline="") as outfile:
writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
writer.writerow(row)

View File

@@ -1,3 +0,0 @@
from deepdog.subset_simulation.subset_simulation_impl import SubsetSimulation
__all__ = ["SubsetSimulation"]

View File

@@ -1,388 +0,0 @@
import logging
import numpy
import pdme.measurement
import pdme.measurement.input_types
import pdme.subspace_simulation
from typing import Sequence, Tuple, Optional
from dataclasses import dataclass
_logger = logging.getLogger(__name__)
@dataclass
class SubsetSimulationResult:
probs_list: Sequence[Tuple]
over_target_cost: Optional[float]
over_target_likelihood: Optional[float]
under_target_cost: Optional[float]
under_target_likelihood: Optional[float]
lowest_likelihood: Optional[float]
class SubsetSimulation:
def __init__(
self,
model_name_pair,
dot_inputs,
actual_measurements: Sequence[pdme.measurement.DotMeasurement],
n_c: int,
n_s: int,
m_max: int,
target_cost: Optional[float] = None,
level_0_seed: int = 200,
mcmc_seed: int = 20,
use_adaptive_steps=True,
default_phi_step=0.01,
default_theta_step=0.01,
default_r_step=0.01,
default_w_log_step=0.01,
default_upper_w_log_step=4,
keep_probs_list=True,
dump_last_generation_to_file=False,
initial_cost_chunk_size=100,
):
name, model = model_name_pair
self.model_name = name
self.model = model
_logger.info(f"got model {self.model_name}")
self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
dot_inputs
)
# _logger.debug(f"actual measurements: {actual_measurements}")
self.actual_measurement_array = numpy.array([m.v for m in actual_measurements])
def cost_function_to_use(dipoles_to_test):
return pdme.subspace_simulation.proportional_costs_vs_actual_measurement(
self.dot_inputs_array, self.actual_measurement_array, dipoles_to_test
)
self.cost_function_to_use = cost_function_to_use
self.n_c = n_c
self.n_s = n_s
self.m_max = m_max
self.level_0_seed = level_0_seed
self.mcmc_seed = mcmc_seed
self.use_adaptive_steps = use_adaptive_steps
self.default_phi_step = default_phi_step
self.default_theta_step = default_theta_step
self.default_r_step = default_r_step
self.default_w_log_step = default_w_log_step
self.default_upper_w_log_step = default_upper_w_log_step
_logger.info("using params:")
_logger.info(f"\tn_c: {self.n_c}")
_logger.info(f"\tn_s: {self.n_s}")
_logger.info(f"\tm: {self.m_max}")
_logger.info("let's do level 0...")
self.target_cost = target_cost
_logger.info(f"will stop at target cost {target_cost}")
self.keep_probs_list = keep_probs_list
self.dump_last_generations = dump_last_generation_to_file
self.initial_cost_chunk_size = initial_cost_chunk_size
def execute(self) -> SubsetSimulationResult:
probs_list = []
sample_dipoles = self.model.get_monte_carlo_dipole_inputs(
self.n_c * self.n_s,
-1,
rng_to_use=numpy.random.default_rng(self.level_0_seed),
)
# _logger.debug(sample_dipoles)
# _logger.debug(sample_dipoles.shape)
raw_costs = []
_logger.debug(
f"Using iterated cost function thing with chunk size {self.initial_cost_chunk_size}"
)
for x in range(0, len(sample_dipoles), self.initial_cost_chunk_size):
_logger.debug(f"doing chunk {x}")
raw_costs.extend(
self.cost_function_to_use(
sample_dipoles[x : x + self.initial_cost_chunk_size]
)
)
costs = numpy.array(raw_costs)
_logger.debug(f"costs: {costs}")
sorted_indexes = costs.argsort()[::-1]
_logger.debug(costs[sorted_indexes])
_logger.debug(sample_dipoles[sorted_indexes])
sorted_costs = costs[sorted_indexes]
sorted_dipoles = sample_dipoles[sorted_indexes]
threshold_cost = sorted_costs[-self.n_c]
all_dipoles = numpy.array(
[
pdme.subspace_simulation.sort_array_of_dipoles_by_frequency(samp)
for samp in sorted_dipoles
]
)
all_chains = list(zip(sorted_costs, all_dipoles))
mcmc_rng = numpy.random.default_rng(self.mcmc_seed)
for i in range(self.m_max):
next_seeds = all_chains[-self.n_c :]
if self.dump_last_generations:
_logger.info("writing out csv file")
next_dipoles_seed_dipoles = numpy.array([n[1] for n in next_seeds])
for n in range(self.model.n):
_logger.info(f"{next_dipoles_seed_dipoles[:, n].shape}")
numpy.savetxt(
f"generation_{self.n_c}_{self.n_s}_{i}_dipole_{n}.csv",
next_dipoles_seed_dipoles[:, n],
delimiter=",",
)
next_seeds_as_array = numpy.array([s for _, s in next_seeds])
stdevs = self.get_stdevs_from_arrays(next_seeds_as_array)
_logger.info(f"got stdevs: {stdevs.stdevs}")
all_long_chains = []
for seed_index, (c, s) in enumerate(
next_seeds[:: len(next_seeds) // 20]
):
# chain = mcmc(s, threshold_cost, n_s, model, dot_inputs_array, actual_measurement_array, mcmc_rng, curr_cost=c, stdevs=stdevs)
# until new version gotta do
_logger.debug(f"\t{seed_index}: doing long chain on the next seed")
long_chain = self.model.get_mcmc_chain(
s,
self.cost_function_to_use,
1000,
threshold_cost,
stdevs,
initial_cost=c,
rng_arg=mcmc_rng,
)
for _, chained in long_chain:
all_long_chains.append(chained)
all_long_chains_array = numpy.array(all_long_chains)
for n in range(self.model.n):
_logger.info(f"{all_long_chains_array[:, n].shape}")
numpy.savetxt(
f"long_chain_generation_{self.n_c}_{self.n_s}_{i}_dipole_{n}.csv",
all_long_chains_array[:, n],
delimiter=",",
)
if self.keep_probs_list:
for cost_index, cost_chain in enumerate(all_chains[: -self.n_c]):
probs_list.append(
(
((self.n_c * self.n_s - cost_index) / (self.n_c * self.n_s))
/ (self.n_s ** (i)),
cost_chain[0],
i + 1,
)
)
next_seeds_as_array = numpy.array([s for _, s in next_seeds])
stdevs = self.get_stdevs_from_arrays(next_seeds_as_array)
_logger.info(f"got stdevs: {stdevs.stdevs}")
_logger.debug("Starting the MCMC")
all_chains = []
for seed_index, (c, s) in enumerate(next_seeds):
# chain = mcmc(s, threshold_cost, n_s, model, dot_inputs_array, actual_measurement_array, mcmc_rng, curr_cost=c, stdevs=stdevs)
# until new version gotta do
_logger.debug(
f"\t{seed_index}: getting another chain from the next seed"
)
chain = self.model.get_mcmc_chain(
s,
self.cost_function_to_use,
self.n_s,
threshold_cost,
stdevs,
initial_cost=c,
rng_arg=mcmc_rng,
)
for cost, chained in chain:
try:
filtered_cost = cost[0]
except (IndexError, TypeError):
filtered_cost = cost
all_chains.append((filtered_cost, chained))
_logger.debug("finished mcmc")
# _logger.debug(all_chains)
all_chains.sort(key=lambda c: c[0], reverse=True)
_logger.debug("finished sorting all_chains")
threshold_cost = all_chains[-self.n_c][0]
_logger.info(
f"current threshold cost: {threshold_cost}, at P = (1 / {self.n_s})^{i + 1}"
)
if (self.target_cost is not None) and (threshold_cost < self.target_cost):
_logger.info(
f"got a threshold cost {threshold_cost}, less than {self.target_cost}. will leave early"
)
cost_list = [c[0] for c in all_chains]
over_index = reverse_bisect_right(cost_list, self.target_cost)
shorter_probs_list = []
for cost_index, cost_chain in enumerate(all_chains):
if self.keep_probs_list:
probs_list.append(
(
(
(self.n_c * self.n_s - cost_index)
/ (self.n_c * self.n_s)
)
/ (self.n_s ** (i)),
cost_chain[0],
i + 1,
)
)
shorter_probs_list.append(
(
cost_chain[0],
((self.n_c * self.n_s - cost_index) / (self.n_c * self.n_s))
/ (self.n_s ** (i)),
)
)
# _logger.info(shorter_probs_list)
result = SubsetSimulationResult(
probs_list=probs_list,
over_target_cost=shorter_probs_list[over_index - 1][0],
over_target_likelihood=shorter_probs_list[over_index - 1][1],
under_target_cost=shorter_probs_list[over_index][0],
under_target_likelihood=shorter_probs_list[over_index][1],
lowest_likelihood=shorter_probs_list[-1][1],
)
return result
# _logger.debug([c[0] for c in all_chains[-n_c:]])
_logger.info(f"doing level {i + 1}")
if self.keep_probs_list:
for cost_index, cost_chain in enumerate(all_chains):
probs_list.append(
(
((self.n_c * self.n_s - cost_index) / (self.n_c * self.n_s))
/ (self.n_s ** (self.m_max)),
cost_chain[0],
self.m_max + 1,
)
)
threshold_cost = all_chains[-self.n_c][0]
_logger.info(
f"final threshold cost: {threshold_cost}, at P = (1 / {self.n_s})^{self.m_max + 1}"
)
for a in all_chains[-10:]:
_logger.info(a)
# for prob, prob_cost in probs_list:
# _logger.info(f"\t{prob}: {prob_cost}")
probs_list.sort(key=lambda c: c[0], reverse=True)
min_likelihood = ((1) / (self.n_c * self.n_s)) / (self.n_s ** (self.m_max))
result = SubsetSimulationResult(
probs_list=probs_list,
over_target_cost=None,
over_target_likelihood=None,
under_target_cost=None,
under_target_likelihood=None,
lowest_likelihood=min_likelihood,
)
return result
def get_stdevs_from_arrays(
self, array
) -> pdme.subspace_simulation.MCMCStandardDeviation:
# stdevs = get_stdevs_from_arrays(next_seeds_as_array, model)
if self.use_adaptive_steps:
stdev_array = []
count = array.shape[1]
for dipole_index in range(count):
selected = array[:, dipole_index]
pxs = selected[:, 0]
pys = selected[:, 1]
pzs = selected[:, 2]
thetas = numpy.arccos(pzs / self.model.pfixed)
phis = numpy.arctan2(pys, pxs)
rstdevs = numpy.maximum(
numpy.std(selected, axis=0)[3:6],
self.default_r_step / (self.n_s * 10),
)
frequency_stdevs = numpy.minimum(
numpy.maximum(
numpy.std(numpy.log(selected[:, -1])),
self.default_w_log_step / (self.n_s * 10),
),
self.default_upper_w_log_step,
)
stdev_array.append(
pdme.subspace_simulation.DipoleStandardDeviation(
p_theta_step=max(
numpy.std(thetas), self.default_theta_step / (self.n_s * 10)
),
p_phi_step=max(
numpy.std(phis), self.default_phi_step / (self.n_s * 10)
),
rx_step=rstdevs[0],
ry_step=rstdevs[1],
rz_step=rstdevs[2],
w_log_step=frequency_stdevs,
)
)
else:
default_stdev = pdme.subspace_simulation.DipoleStandardDeviation(
self.default_phi_step,
self.default_theta_step,
self.default_r_step,
self.default_r_step,
self.default_r_step,
self.default_w_log_step,
)
stdev_array = [default_stdev]
stdevs = pdme.subspace_simulation.MCMCStandardDeviation(stdev_array)
return stdevs
def reverse_bisect_right(a, x, lo=0, hi=None):
"""Return the index where to insert item x in list a, assuming a is sorted in descending order.
The return value i is such that all e in a[:i] have e >= x, and all e in
a[i:] have e < x. So if x already appears in the list, a.insert(x) will
insert just after the rightmost x already there.
Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
Essentially, the function returns number of elements in a which are >= than x.
>>> a = [8, 6, 5, 4, 2]
>>> reverse_bisect_right(a, 5)
3
>>> a[:reverse_bisect_right(a, 5)]
[8, 6, 5]
"""
if lo < 0:
raise ValueError("lo must be non-negative")
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo + hi) // 2
if x > a[mid]:
hi = mid
else:
lo = mid + 1
return lo

View File

@@ -1,231 +0,0 @@
import pdme.inputs
import pdme.model
import pdme.measurement
import pdme.measurement.input_types
import pdme.measurement.oscillating_dipole
import pdme.util.fast_v_calc
import pdme.util.fast_nonlocal_spectrum
from typing import Sequence, Tuple, List, Dict, Union, Mapping
import datetime
import csv
import multiprocessing
import logging
import numpy
# TODO: remove hardcode
CHUNKSIZE = 50
_logger = logging.getLogger(__name__)
def get_a_result_fast_filter(input) -> int:
# (
# model,
# self.dot_inputs_array_dict,
# low_high_dict,
# self.monte_carlo_count,
# seed,
# )
model, dot_inputs_dict, low_high_dict, monte_carlo_count, seed = input
rng = numpy.random.default_rng(seed)
# TODO: A long term refactor is to pull the frequency stuff out from here. The None stands for max_frequency, which is unneeded in the actually useful models.
sample_dipoles = model.get_monte_carlo_dipole_inputs(
monte_carlo_count, None, rng_to_use=rng
)
current_sample = sample_dipoles
for temp in dot_inputs_dict.keys():
dot_inputs = dot_inputs_dict[temp]
lows, highs = low_high_dict[temp]
for di, low, high in zip(dot_inputs, lows, highs):
if len(current_sample) < 1:
break
vals = pdme.util.fast_v_calc.fast_vs_for_asymmetric_dipoleses(
numpy.array([di]), current_sample, temp
)
current_sample = current_sample[
numpy.all((vals > low) & (vals < high), axis=1)
]
return len(current_sample)
class TempAwareRealSpectrumRun:
"""
A bayes run given some real data, with potentially variable temperature.
Parameters
----------
measurements_dict : Dict[float, Sequence[pdme.measurement.DotRangeMeasurement]]
The dot inputs for this bayes run, in a dictionary indexed by temperatures
models_with_names : models_with_names: Sequence[Tuple[str, pdme.model.DipoleModel]],
The models to evaluate.
actual_model : pdme.model.DipoleModel
The model which is actually correct.
filename_slug : str
The filename slug to include.
run_count: int
The number of runs to do.
"""
def __init__(
self,
measurements_dict: Mapping[
float, Sequence[pdme.measurement.DotRangeMeasurement]
],
models_with_names: Sequence[Tuple[str, pdme.model.DipoleModel]],
filename_slug: str,
monte_carlo_count: int = 10000,
monte_carlo_cycles: int = 10,
target_success: int = 100,
max_monte_carlo_cycles_steps: int = 10,
chunksize: int = CHUNKSIZE,
initial_seed: int = 12345,
cap_core_count: int = 0,
) -> None:
self.measurements_dict = measurements_dict
self.dot_inputs_dict = {
k: [(measure.r, measure.f) for measure in measurements]
for k, measurements in measurements_dict.items()
}
self.dot_inputs_array_dict = {
k: pdme.measurement.input_types.dot_inputs_to_array(dot_inputs)
for k, dot_inputs in self.dot_inputs_dict.items()
}
self.models = [model for (_, model) in models_with_names]
self.model_names = [name for (name, _) in models_with_names]
self.model_count = len(self.models)
self.monte_carlo_count = monte_carlo_count
self.monte_carlo_cycles = monte_carlo_cycles
self.target_success = target_success
self.max_monte_carlo_cycles_steps = max_monte_carlo_cycles_steps
self.csv_fields = []
self.compensate_zeros = True
self.chunksize = chunksize
for name in self.model_names:
self.csv_fields.extend([f"{name}_success", f"{name}_count", f"{name}_prob"])
# for now initialise priors as uniform.
self.probabilities = [1 / self.model_count] * self.model_count
timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
ff_string = "fast_filter"
self.filename = f"{timestamp}-{filename_slug}.realdata.{ff_string}.bayesrun.csv"
self.initial_seed = initial_seed
self.cap_core_count = cap_core_count
def go(self) -> None:
with open(self.filename, "a", newline="") as outfile:
writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
writer.writeheader()
low_high_dict = {}
for temp, measurements in self.measurements_dict.items():
(
lows,
highs,
) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
measurements
)
low_high_dict[temp] = (lows, highs)
# define a new seed sequence for each run
seed_sequence = numpy.random.SeedSequence(self.initial_seed)
results = []
_logger.debug("Going to iterate over models now")
core_count = multiprocessing.cpu_count() - 1 or 1
if (self.cap_core_count >= 1) and (self.cap_core_count < core_count):
core_count = self.cap_core_count
_logger.info(f"Using {core_count} cores")
for model_count, (model, model_name) in enumerate(
zip(self.models, self.model_names)
):
_logger.debug(f"Doing model #{model_count}: {model_name}")
with multiprocessing.Pool(core_count) as pool:
cycle_count = 0
cycle_success = 0
cycles = 0
while (cycles < self.max_monte_carlo_cycles_steps) and (
cycle_success <= self.target_success
):
_logger.debug(f"Starting cycle {cycles}")
cycles += 1
current_success = 0
cycle_count += self.monte_carlo_count * self.monte_carlo_cycles
# generate a seed from the sequence for each core.
# note this needs to be inside the loop for monte carlo cycle steps!
# that way we get more stuff.
seeds = seed_sequence.spawn(self.monte_carlo_cycles)
result_func = get_a_result_fast_filter
current_success = sum(
pool.imap_unordered(
result_func,
[
(
model,
self.dot_inputs_array_dict,
low_high_dict,
self.monte_carlo_count,
seed,
)
for seed in seeds
],
self.chunksize,
)
)
cycle_success += current_success
_logger.debug(f"current running successes: {cycle_success}")
results.append((cycle_count, cycle_success))
_logger.debug("Done, constructing output now")
row: Dict[str, Union[int, float, str]] = {}
successes: List[float] = []
counts: List[int] = []
for model_index, (name, (count, result)) in enumerate(
zip(self.model_names, results)
):
row[f"{name}_success"] = result
row[f"{name}_count"] = count
successes.append(max(result, 0.5))
counts.append(count)
success_weight = sum(
[
(succ / count) * prob
for succ, count, prob in zip(successes, counts, self.probabilities)
]
)
new_probabilities = [
(succ / count) * old_prob / success_weight
for succ, count, old_prob in zip(successes, counts, self.probabilities)
]
self.probabilities = new_probabilities
for name, probability in zip(self.model_names, self.probabilities):
row[f"{name}_prob"] = probability
_logger.info(row)
with open(self.filename, "a", newline="") as outfile:
writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
writer.writerow(row)

5
do.sh
View File

@@ -16,11 +16,6 @@ test() {
poetry run pytest poetry run pytest
} }
fmt() {
poetry run black .
find . -not \( -path "./.*" -type d -prune \) -type f -name "*.py" -exec sed -i -e 's/ /\t/g' {} \;
}
release() { release() {
./scripts/release.sh ./scripts/release.sh
} }

95
flake.lock generated
View File

@@ -1,95 +0,0 @@
{
"nodes": {
"flake-utils": {
"locked": {
"lastModified": 1648297722,
"narHash": "sha256-W+qlPsiZd8F3XkzXOzAoR+mpFqzm3ekQkJNa+PIh1BQ=",
"owner": "numtide",
"repo": "flake-utils",
"rev": "0f8662f1319ad6abf89b3380dd2722369fc51ade",
"type": "github"
},
"original": {
"owner": "numtide",
"repo": "flake-utils",
"rev": "0f8662f1319ad6abf89b3380dd2722369fc51ade",
"type": "github"
}
},
"flake-utils_2": {
"locked": {
"lastModified": 1653893745,
"narHash": "sha256-0jntwV3Z8//YwuOjzhV2sgJJPt+HY6KhU7VZUL0fKZQ=",
"owner": "numtide",
"repo": "flake-utils",
"rev": "1ed9fb1935d260de5fe1c2f7ee0ebaae17ed2fa1",
"type": "github"
},
"original": {
"owner": "numtide",
"repo": "flake-utils",
"type": "github"
}
},
"nixpkgs": {
"locked": {
"lastModified": 1655087213,
"narHash": "sha256-4R5oQ+OwGAAcXWYrxC4gFMTUSstGxaN8kN7e8hkum/8=",
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "37b6b161e536fddca54424cf80662bce735bdd1e",
"type": "github"
},
"original": {
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "37b6b161e536fddca54424cf80662bce735bdd1e",
"type": "github"
}
},
"nixpkgs_2": {
"locked": {
"lastModified": 1655046959,
"narHash": "sha256-gxqHZKq1ReLDe6ZMJSbmSZlLY95DsVq5o6jQihhzvmw=",
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "07bf3d25ce1da3bee6703657e6a787a4c6cdcea9",
"type": "github"
},
"original": {
"owner": "NixOS",
"repo": "nixpkgs",
"type": "github"
}
},
"poetry2nix": {
"inputs": {
"flake-utils": "flake-utils_2",
"nixpkgs": "nixpkgs_2"
},
"locked": {
"lastModified": 1654921554,
"narHash": "sha256-hkfMdQAHSwLWlg0sBVvgrQdIiBP45U1/ktmFpY4g2Mo=",
"owner": "nix-community",
"repo": "poetry2nix",
"rev": "7b71679fa7df00e1678fc3f1d1d4f5f372341b63",
"type": "github"
},
"original": {
"owner": "nix-community",
"repo": "poetry2nix",
"rev": "7b71679fa7df00e1678fc3f1d1d4f5f372341b63",
"type": "github"
}
},
"root": {
"inputs": {
"flake-utils": "flake-utils",
"nixpkgs": "nixpkgs",
"poetry2nix": "poetry2nix"
}
}
},
"root": "root",
"version": 7
}

View File

@@ -1,63 +0,0 @@
{
description = "Application packaged using poetry2nix";
inputs.flake-utils.url = "github:numtide/flake-utils?rev=0f8662f1319ad6abf89b3380dd2722369fc51ade";
inputs.nixpkgs.url = "github:NixOS/nixpkgs?rev=37b6b161e536fddca54424cf80662bce735bdd1e";
inputs.poetry2nix.url = "github:nix-community/poetry2nix?rev=7b71679fa7df00e1678fc3f1d1d4f5f372341b63";
outputs = { self, nixpkgs, flake-utils, poetry2nix }:
{
# Nixpkgs overlay providing the application
overlay = nixpkgs.lib.composeManyExtensions [
poetry2nix.overlay
(final: prev: {
# The application
deepdog = prev.poetry2nix.mkPoetryApplication {
overrides = final.poetry2nix.overrides.withDefaults (self: super: {
# …
# workaround https://github.com/nix-community/poetry2nix/issues/568
pdme = super.pdme.overridePythonAttrs (old: {
buildInputs = old.buildInputs or [ ] ++ [ final.python39.pkgs.poetry-core ];
});
});
projectDir = ./.;
};
deepdogEnv = prev.poetry2nix.mkPoetryEnv {
overrides = final.poetry2nix.overrides.withDefaults (self: super: {
# …
# workaround https://github.com/nix-community/poetry2nix/issues/568
pdme = super.pdme.overridePythonAttrs (old: {
buildInputs = old.buildInputs or [ ] ++ [ final.python39.pkgs.poetry-core ];
});
});
projectDir = ./.;
};
})
];
} // (flake-utils.lib.eachDefaultSystem (system:
let
pkgs = import nixpkgs {
inherit system;
overlays = [ self.overlay ];
};
in
{
apps = {
deepdog = pkgs.deepdog;
};
defaultApp = pkgs.deepdog;
devShell = pkgs.mkShell {
buildInputs = [
pkgs.poetry
pkgs.deepdogEnv
pkgs.deepdog
];
shellHook = ''
export DO_NIX_CUSTOM=1
'';
packages = [ pkgs.nodejs-16_x ];
};
}));
}

View File

@@ -1,11 +1,9 @@
apiVersion: v1 apiVersion: v1
kind: Pod kind: Pod
spec: spec:
imagePullSecrets:
- name: regcreds
containers: # list of containers that you want present for your build, you can define a default container in the Jenkinsfile containers: # list of containers that you want present for your build, you can define a default container in the Jenkinsfile
- name: poetry - name: python
image: ghcr.io/dmallubhotla/poetry-image:1 image: python:3.8
command: ["tail", "-f", "/dev/null"] # this or any command that is bascially a noop is required, this is so that you don't overwrite the entrypoint of the base container command: ["tail", "-f", "/dev/null"] # this or any command that is bascially a noop is required, this is so that you don't overwrite the entrypoint of the base container
imagePullPolicy: Always # use cache or pull image for agent imagePullPolicy: Always # use cache or pull image for agent
resources: # limits the resources your build contaienr resources: # limits the resources your build contaienr

974
poetry.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,23 +1,19 @@
[tool.poetry] [tool.poetry]
name = "deepdog" name = "deepdog"
version = "0.7.8" version = "0.3.5"
description = "" description = ""
authors = ["Deepak Mallubhotla <dmallubhotla+github@gmail.com>"] authors = ["Deepak Mallubhotla <dmallubhotla+github@gmail.com>"]
[tool.poetry.dependencies] [tool.poetry.dependencies]
python = ">=3.8.1,<3.10" python = "^3.8,<3.10"
pdme = "^0.9.3" pdme = "^0.5.4"
numpy = "1.22.3"
scipy = "1.10"
[tool.poetry.dev-dependencies] [tool.poetry.dev-dependencies]
pytest = ">=6" pytest = ">=6"
flake8 = "^4.0.1" flake8 = "^4.0.1"
pytest-cov = "^4.1.0" pytest-cov = "^3.0.0"
mypy = "^0.971" mypy = "^0.940"
python-semantic-release = "^7.24.0" python-semantic-release = "^7.24.0"
black = "^22.3.0"
syrupy = "^4.0.8"
[build-system] [build-system]
requires = ["poetry-core>=1.0.0"] requires = ["poetry-core>=1.0.0"]

View File

@@ -1,177 +0,0 @@
# serializer version: 1
# name: test_basic_analysis
list([
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 0.1,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.3333333333333333,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.1,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.3333333333333333,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.1,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.3333333333333333,
'dipole_frequency_1': 0.006029931414230269,
'dipole_frequency_2': 85436.78758379082,
'dipole_location_1': array([-4.76615152, -6.33160296, 5.29522808]),
'dipole_location_2': array([-4.72700391, -2.06478573, 6.52467702]),
'dipole_moment_1': array([ 860.14181416, -450.27082062, -239.60852996]),
'dipole_moment_2': array([ 908.18325588, -208.52681777, -362.93214244]),
}),
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 0.45,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.3103448275862069,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.9,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.6206896551724138,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.1,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.06896551724137932,
'dipole_frequency_1': 102275.63477261562,
'dipole_frequency_2': 1755280.9783485082,
'dipole_location_1': array([ 4.71515397, -9.70362197, 5.43016546]),
'dipole_location_2': array([3.42476038, 3.88562934, 5.15034328]),
'dipole_moment_1': array([-502.60742674, -790.60222587, 349.7626267 ]),
'dipole_moment_2': array([-192.42708465, -434.81009148, -879.7226844 ]),
}),
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 0.7,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.6631578947368421,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.1,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.18947368421052635,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.7,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.1473684210526316,
'dipole_frequency_1': 2896.799464036654,
'dipole_frequency_2': 9.980565189326681e-05,
'dipole_location_1': array([-4.97465789, 12.54716531, 6.06324588]),
'dipole_location_2': array([ 9.84518459, -11.1183876 , 7.35028226]),
'dipole_moment_1': array([997.67961917, 19.6376112 , 65.19004305]),
'dipole_moment_2': array([305.63093655, 440.57669389, 844.08643362]),
}),
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 0.1,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.663157894736842,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.1,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.18947368421052635,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.1,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.1473684210526316,
'dipole_frequency_1': 1.4522667818288244,
'dipole_frequency_2': 2704.9795645301197,
'dipole_location_1': array([ 7.38183022, 16.6745801 , 7.10428414]),
'dipole_location_2': array([-8.15636906, -9.56609132, 6.34141559]),
'dipole_moment_1': array([-145.9924693 , 738.74936496, 657.97839986]),
'dipole_moment_2': array([-960.16113239, 104.96824669, -258.98314046]),
}),
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 0.9,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.9465776293823038,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.1,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.030050083472454105,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.1,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.02337228714524208,
'dipole_frequency_1': 3827.2315421318913,
'dipole_frequency_2': 1.9301094166184413e-05,
'dipole_location_1': array([ 5.02067673, -0.9783039 , 6.1431897 ]),
'dipole_location_2': array([ 4.66628999, 10.80907459, 7.21771744]),
'dipole_moment_1': array([ 871.30659253, -299.17389491, -388.99846068]),
'dipole_moment_2': array([-189.87268624, 677.28285845, 710.79975568]),
}),
])
# ---
# name: test_bayesss_with_tighter_cost
list([
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.33333333333333337,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.33333333333333337,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.33333333333333337,
'dipole_frequency_1': 0.006029931414230269,
'dipole_frequency_2': 85436.78758379082,
'dipole_location_1': array([-4.76615152, -6.33160296, 5.29522808]),
'dipole_location_2': array([-4.72700391, -2.06478573, 6.52467702]),
'dipole_moment_1': array([ 860.14181416, -450.27082062, -239.60852996]),
'dipole_moment_2': array([ 908.18325588, -208.52681777, -362.93214244]),
}),
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 0.0109375,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.1044776119402985,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.03125,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.2985074626865672,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.0625,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.5970149253731344,
'dipole_frequency_1': 102275.63477261562,
'dipole_frequency_2': 1755280.9783485082,
'dipole_location_1': array([ 4.71515397, -9.70362197, 5.43016546]),
'dipole_location_2': array([3.42476038, 3.88562934, 5.15034328]),
'dipole_moment_1': array([-502.60742674, -790.60222587, 349.7626267 ]),
'dipole_moment_2': array([-192.42708465, -434.81009148, -879.7226844 ]),
}),
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 7.291135021404688e-05,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.021875,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.4666326413699001,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.0125,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.5332944472798858,
'dipole_frequency_1': 2896.799464036654,
'dipole_frequency_2': 9.980565189326681e-05,
'dipole_location_1': array([-4.97465789, 12.54716531, 6.06324588]),
'dipole_location_2': array([ 9.84518459, -11.1183876 , 7.35028226]),
'dipole_moment_1': array([997.67961917, 19.6376112 , 65.19004305]),
'dipole_moment_2': array([305.63093655, 440.57669389, 844.08643362]),
}),
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 7.291135021404688e-05,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.4666326413699001,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.5332944472798858,
'dipole_frequency_1': 1.4522667818288244,
'dipole_frequency_2': 2704.9795645301197,
'dipole_location_1': array([ 7.38183022, 16.6745801 , 7.10428414]),
'dipole_location_2': array([-8.15636906, -9.56609132, 6.34141559]),
'dipole_moment_1': array([-145.9924693 , 738.74936496, 657.97839986]),
'dipole_moment_2': array([-960.16113239, 104.96824669, -258.98314046]),
}),
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 0.175,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.00012008361740869356,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.05625,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.24702915581216964,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.15,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.7528507605704217,
'dipole_frequency_1': 3827.2315421318913,
'dipole_frequency_2': 1.9301094166184413e-05,
'dipole_location_1': array([ 5.02067673, -0.9783039 , 6.1431897 ]),
'dipole_location_2': array([ 4.66628999, 10.80907459, 7.21771744]),
'dipole_moment_1': array([ 871.30659253, -299.17389491, -388.99846068]),
'dipole_moment_2': array([-189.87268624, 677.28285845, 710.79975568]),
}),
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 4.9116305003549454e-08,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.0109375,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.11316396672817797,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.028125,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.886835984155517,
'dipole_frequency_1': 1.1715179359592061e-05,
'dipole_frequency_2': 0.0019103783276337497,
'dipole_location_1': array([-0.95736547, 1.09273812, 7.47158641]),
'dipole_location_2': array([ -3.18510322, -15.64493131, 5.81623624]),
'dipole_moment_1': array([-184.64961369, 956.56786553, 225.57136075]),
'dipole_moment_2': array([ -34.63395137, 801.17771816, -597.42342885]),
}),
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 1.977090156727901e-10,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.00045552157211010855,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.002734375,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.9995444782301809,
'dipole_frequency_1': 999786.9069039805,
'dipole_frequency_2': 186034.67996840767,
'dipole_location_1': array([-5.59679125, 6.3411602 , 5.33602522]),
'dipole_location_2': array([-0.03412955, -6.83522954, 5.58551513]),
'dipole_moment_1': array([826.38270589, 491.81526944, 274.24325726]),
'dipole_moment_2': array([ 202.74745884, -656.07483714, -726.95204519]),
}),
])
# ---

View File

@@ -1,158 +0,0 @@
import deepdog
import logging
import logging.config
import numpy.random
from pdme.model import (
LogSpacedRandomCountMultipleDipoleFixedMagnitudeModel,
LogSpacedRandomCountMultipleDipoleFixedMagnitudeXYModel,
LogSpacedRandomCountMultipleDipoleFixedMagnitudeFixedOrientationModel,
)
_logger = logging.getLogger(__name__)
def fixed_z_model_func(
xmin,
xmax,
ymin,
ymax,
zmin,
zmax,
wexp_min,
wexp_max,
pfixed,
n_max,
prob_occupancy,
):
return LogSpacedRandomCountMultipleDipoleFixedMagnitudeFixedOrientationModel(
xmin,
xmax,
ymin,
ymax,
zmin,
zmax,
wexp_min,
wexp_max,
pfixed,
0,
0,
n_max,
prob_occupancy,
)
def get_model(orientation):
model_funcs = {
"fixedz": fixed_z_model_func,
"free": LogSpacedRandomCountMultipleDipoleFixedMagnitudeModel,
"fixedxy": LogSpacedRandomCountMultipleDipoleFixedMagnitudeXYModel,
}
model = model_funcs[orientation](
-10,
10,
-17.5,
17.5,
5,
7.5,
-5,
6.5,
10**3,
2,
0.99999999,
)
model.n = 2
model.rng = numpy.random.default_rng(1234)
return (
f"connors_geom-5height-orientation_{orientation}-pfixexp_{3}-dipole_count_{2}",
model,
)
def test_basic_analysis(snapshot):
dot_positions = [[0, 0, 0], [0, 1, 0]]
freqs = [1, 10, 100]
models = []
orientations = ["free", "fixedxy", "fixedz"]
for orientation in orientations:
models.append(get_model(orientation))
_logger.info(f"have {len(models)} models to look at")
if len(models) == 1:
_logger.info(f"only one model, name: {models[0][0]}")
square_run = deepdog.BayesRunWithSubspaceSimulation(
dot_positions,
freqs,
models,
models[0][1],
filename_slug="test",
end_threshold=0.9,
ss_n_c=5,
ss_n_s=2,
ss_m_max=10,
ss_target_cost=150,
ss_level_0_seed=200,
ss_mcmc_seed=20,
ss_use_adaptive_steps=True,
ss_default_phi_step=0.01,
ss_default_theta_step=0.01,
ss_default_r_step=0.01,
ss_default_w_log_step=0.01,
ss_default_upper_w_log_step=4,
ss_dump_last_generation=False,
write_output_to_bayesruncsv=False,
ss_initial_costs_chunk_size=1000,
)
result = square_run.go()
assert result == snapshot
def test_bayesss_with_tighter_cost(snapshot):
dot_positions = [[0, 0, 0], [0, 1, 0]]
freqs = [1, 10, 100]
models = []
orientations = ["free", "fixedxy", "fixedz"]
for orientation in orientations:
models.append(get_model(orientation))
_logger.info(f"have {len(models)} models to look at")
if len(models) == 1:
_logger.info(f"only one model, name: {models[0][0]}")
square_run = deepdog.BayesRunWithSubspaceSimulation(
dot_positions,
freqs,
models,
models[0][1],
filename_slug="test",
end_threshold=0.9,
ss_n_c=5,
ss_n_s=2,
ss_m_max=10,
ss_target_cost=1.5,
ss_level_0_seed=200,
ss_mcmc_seed=20,
ss_use_adaptive_steps=True,
ss_default_phi_step=0.01,
ss_default_theta_step=0.01,
ss_default_r_step=0.01,
ss_default_w_log_step=0.01,
ss_default_upper_w_log_step=4,
ss_dump_last_generation=False,
write_output_to_bayesruncsv=False,
ss_initial_costs_chunk_size=1,
)
result = square_run.go()
assert result == snapshot