Compare commits

..

No commits in common. "master" and "0.3.5" have entirely different histories.

55 changed files with 1104 additions and 5085 deletions

View File

@ -1,3 +1,3 @@
[flake8]
ignore = W191, E501, W503, E203
ignore = W191, E501, W503
max-line-length = 120

8
.gitignore vendored
View File

@ -114,10 +114,6 @@ ENV/
env.bak/
venv.bak/
# direnv
.envrc
.direnv
# Spyder project settings
.spyderproject
.spyproject
@ -143,7 +139,3 @@ dmypy.json
cython_debug/
*.csv
local_scripts/
.vscode

View File

@ -2,344 +2,6 @@
All notable changes to this project will be documented in this file. See [standard-version](https://github.com/conventional-changelog/standard-version) for commit guidelines.
## [1.7.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.6.0...1.7.0) (2025-02-27)
### Features
* adds configurable skip if file exists ([24c6e31](https://gitea.deepak.science:2222/physics/deepdog/commit/24c6e311c1d3067eb98cc60e6ca38d76373bf08e))
## [1.6.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.5.0...1.6.0) (2025-02-27)
### Features
* Adds ability to parse bayesruns without timestamps ([46f6b6c](https://gitea.deepak.science:2222/physics/deepdog/commit/46f6b6cdf15c67aedf0c871d201b8db320bccbdf))
* allows negative log magnitude strings in models ([c8435b4](https://gitea.deepak.science:2222/physics/deepdog/commit/c8435b4b2a6e4b89030f53b5734eb743e2003fb7))
## [1.5.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.4.0...1.5.0) (2024-12-30)
### Features
* add configurable max number of dipoles to write ([a1b59cd](https://gitea.deepak.science:2222/physics/deepdog/commit/a1b59cd18b30359328a09210d9393f211aab30c2))
* add configurable max number of dipoles to write ([53f8993](https://gitea.deepak.science:2222/physics/deepdog/commit/53f8993f2b155228fff5cbee84f10c62eb149a1f))
## [1.4.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.3.0...1.4.0) (2024-09-04)
### Features
* add subset sim probs command for bayes for subset simulation results ([c881da2](https://gitea.deepak.science:2222/physics/deepdog/commit/c881da28370a1e51d062e1a7edaa62af6eb98d0a))
* allows some betetr matching for single_dipole runs ([5425ce1](https://gitea.deepak.science:2222/physics/deepdog/commit/5425ce1362919af4cc4dbd5813df3be8d877b198))
* indexifier now has len ([d962ecb](https://gitea.deepak.science:2222/physics/deepdog/commit/d962ecb11e929de1d9aa458b5d8e82270eff0039))
### Bug Fixes
* update log file arg names in cli scripts ([6a5c593](https://gitea.deepak.science:2222/physics/deepdog/commit/6a5c5931d4fc849d0d6a0f2b971523a0f039d559))
## [1.3.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.2.1...1.3.0) (2024-05-20)
### Features
* add multi run to wrap multi model and repeat runs ([92b49fc](https://gitea.deepak.science:2222/physics/deepdog/commit/92b49fce7c86f14484deb1c4aaaa810a6f69c08a))
* adds a filter that works with cost functions ([8845b28](https://gitea.deepak.science:2222/physics/deepdog/commit/8845b2875f2c91c91dd3988fabda26400c59b2d7))
* improve initial cost calculation to allow multiprocessing, adds ability to specify a number of levels to do with direct mc instead of subset simulation ([09fad2e](https://gitea.deepak.science:2222/physics/deepdog/commit/09fad2e1024d9237a6a4f7931f51cb4c84b83bf8))
### Bug Fixes
* Adds ugly hack for stdevs for this uniform range to multiply by root3, proper fix would be in pdme ([b1c01b2](https://gitea.deepak.science:2222/physics/deepdog/commit/b1c01b25c8f2c3947be23f5b2c656c37437dab17))
* fix seeding to avoid recreating seed combinations across multi runs ([24ac65b](https://gitea.deepak.science:2222/physics/deepdog/commit/24ac65bf9c74c454fec826ca9de640fe095f5a17))
### [1.2.1](https://gitea.deepak.science:2222/physics/deepdog/compare/1.2.0...1.2.1) (2024-05-12)
## [1.2.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.1.0...1.2.0) (2024-05-09)
### Features
* adds additional matching regexes ([dc1d2d4](https://gitea.deepak.science:2222/physics/deepdog/commit/dc1d2d45a3e631c5efccce80f8a24fa87c6089e0))
* adds magnitude enabled parsing option ([f0e2fa3](https://gitea.deepak.science:2222/physics/deepdog/commit/f0e2fa3da9f5a5136908d691137a904fda4e3a9a))
## [1.1.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.0.1...1.1.0) (2024-05-03)
### Features
* allows disabling timestamps in directmc bayesrun files ([fb018ab](https://gitea.deepak.science:2222/physics/deepdog/commit/fb018abeae2adf4438a030140a6c905f11bb6bc1))
* removes legacy bayes run, technically breaking but just don't use them ([5361dad](https://gitea.deepak.science:2222/physics/deepdog/commit/5361dada8be4950b5157862f6a92254b543889c3))
### [1.0.1](https://gitea.deepak.science:2222/physics/deepdog/compare/1.0.0...1.0.1) (2024-05-02)
### Bug Fixes
* fixes issue of zero division error with no successes for anything ([e25db1e](https://gitea.deepak.science:2222/physics/deepdog/commit/e25db1e0f677e8d9a657fa1631305cc8f05ff9ff))
## [1.0.0](https://gitea.deepak.science:2222/physics/deepdog/compare/0.8.1...1.0.0) (2024-05-01)
### ⚠ BREAKING CHANGES
* allows new seed spec instead of cli arg, removes old cli arg
### Features
* adds additional file slug parsing ([2105754](https://gitea.deepak.science:2222/physics/deepdog/commit/2105754911c89bde9dcbea9866462225604a3524))
* Adds more powerful direct mc runs to sub for old real spectrum run ([f2b1a1d](https://gitea.deepak.science:2222/physics/deepdog/commit/f2b1a1dd3b3436e37d84f7843b9b2a202be4b51c))
* allows new seed spec instead of cli arg, removes old cli arg ([7108dd0](https://gitea.deepak.science:2222/physics/deepdog/commit/7108dd0111c7dfd6ec204df1d0058530cd3dcab9))
### Bug Fixes
* no longer throws error for overlapping keys, the warning should hopefully be enough? ([f3ba4cb](https://gitea.deepak.science:2222/physics/deepdog/commit/f3ba4cbfd36a9f08cdc4d8774a7f745f8c98bac3))
### [0.8.1](https://gitea.deepak.science:2222/physics/deepdog/compare/0.8.0...0.8.1) (2024-04-28)
### [0.8.1](https://gitea.deepak.science:2222/physics/deepdog/compare/0.8.0...0.8.1) (2024-04-28)
## [0.8.0](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.10...0.8.0) (2024-04-28)
### ⚠ BREAKING CHANGES
* fixes the spin qubit frequency phase shift calculation which had an index problem
### Bug Fixes
* fixes the spin qubit frequency phase shift calculation which had an index problem ([f9646e3](https://gitea.deepak.science:2222/physics/deepdog/commit/f9646e33868e1a0da8ab663230c0c692ac25bb74))
### [0.7.10](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.9...0.7.10) (2024-04-28)
### Features
* adds cli probs ([4b2e573](https://gitea.deepak.science:2222/physics/deepdog/commit/4b2e57371546731137b011461849bb849d4d4e0f))
* better management of cli wrapper ([b0ad4be](https://gitea.deepak.science:2222/physics/deepdog/commit/b0ad4bead0d4762eb7f848f6e557f6d9b61200b9))
### [0.7.9](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.8...0.7.9) (2024-04-21)
### Features
* adds ability to write custom dmc filters ([ea080ca](https://gitea.deepak.science:2222/physics/deepdog/commit/ea080ca1c7068042ce1e0a222d317f785a6b05f4))
* adds tarucha phase calculation, using spin qubit precession rate noise ([3ae0783](https://gitea.deepak.science:2222/physics/deepdog/commit/3ae0783d00cbe6a76439c1d671f2cff621d8d0a8))
### [0.7.8](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.7...0.7.8) (2024-02-29)
### Bug Fixes
* uses correct measurements ([5f534a6](https://gitea.deepak.science:2222/physics/deepdog/commit/5f534a60cc7c4838fcacee11a7e58b97d34e154a))
### [0.7.7](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.6...0.7.7) (2024-02-29)
### Bug Fixes
* fixes phase calculation issue with setting input array ([48e41cb](https://gitea.deepak.science:2222/physics/deepdog/commit/48e41cbd2c58d4c4d2747822d618d7d55257643d))
### [0.7.6](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.5...0.7.6) (2024-02-28)
### Features
* adds ability to use phase measurements only for correlations ([bb72e90](https://gitea.deepak.science:2222/physics/deepdog/commit/bb72e903d14704a3783daf2dbc1797b90880aa85))
### Bug Fixes
* fixes typeerror vs indexerror on bare float as cost in subset simulation ([65e1948](https://gitea.deepak.science:2222/physics/deepdog/commit/65e19488359d7f5656660da7da8f32ed474989c3))
### [0.7.5](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.4...0.7.5) (2023-12-09)
### Features
* adds direct monte carlo package ([1741807](https://gitea.deepak.science:2222/physics/deepdog/commit/1741807be43d08fb51bc94518dd3b67585c04c20))
* adds longchain logging if logging last generation ([b4e5f53](https://gitea.deepak.science:2222/physics/deepdog/commit/b4e5f5372682fc64c3734a96c4a899e018f127ce))
* allows disabling timestamp in subset simulation bayes results ([9a4548d](https://gitea.deepak.science:2222/physics/deepdog/commit/9a4548def45a01f1f518135d4237c3dc09dcc342))
### [0.7.4](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.3...0.7.4) (2023-07-27)
### Features
* adds configurable chunk size for the initial mc level 0 SS stage cost calculation to reduce memory usage ([9a7a3ff](https://gitea.deepak.science:2222/physics/deepdog/commit/9a7a3ff2c7ebe81d5e10647ce39844c372ff7b07))
* allows for deepdog bayesrun with ss to not print csv to make snapshot testing possible ([8e6ead4](https://gitea.deepak.science:2222/physics/deepdog/commit/8e6ead416c9eba56f568f648d0df44caaa510cfe))
### Bug Fixes
* fixes bug if case of clamping necessary ([161bcf4](https://gitea.deepak.science:2222/physics/deepdog/commit/161bcf42addf331661c3929073688b9f2c13502c))
* fixes bug with clamped probabilities being underestimated ([e6defc7](https://gitea.deepak.science:2222/physics/deepdog/commit/e6defc794871a48ac331023eb477bd235b78d6d0))
### [0.7.3](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.2...0.7.3) (2023-07-27)
### Features
* adds utility options and avoids memory leak ([598dad1](https://gitea.deepak.science:2222/physics/deepdog/commit/598dad1e6dc8fc0b7a5b4a90c8e17bf744e8d98c))
### [0.7.2](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.1...0.7.2) (2023-07-24)
### Features
* clamps results now ([9bb8fc5](https://gitea.deepak.science:2222/physics/deepdog/commit/9bb8fc50fe1bd1a285a333c5a396bfb6ac3176cf))
### Bug Fixes
* fixes clamping format etc. ([a170a3c](https://gitea.deepak.science:2222/physics/deepdog/commit/a170a3ce01adcec356e5aaab9abcc0ec4accd64b))
### [0.7.1](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.0...0.7.1) (2023-07-24)
### Features
* adds subset simulation stuff ([33cab9a](https://gitea.deepak.science:2222/physics/deepdog/commit/33cab9ab4179cec13ae9e591a8ffc32df4dda989))
## [0.7.0](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.7...0.7.0) (2023-05-01)
### ⚠ BREAKING CHANGES
* removes fastfilter parameter because it should never be needed
### Features
* adds pair capability to real spectrum run hopefully ([a089951](https://gitea.deepak.science:2222/physics/deepdog/commit/a089951bbefcd8a0b2efeb49b7a8090412cbb23d))
* removes fastfilter parameter because it should never be needed ([a015daf](https://gitea.deepak.science:2222/physics/deepdog/commit/a015daf5ff6fa5f6155c8d7e02981b588840a5b0))
### [0.6.7](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.6...0.6.7) (2023-04-14)
### Features
* adds option to cap core count for real spectrum run ([bf15f4a](https://gitea.deepak.science:2222/physics/deepdog/commit/bf15f4a7b7f59504983624e7d512ed7474372032))
* adds option to cap core count for temp aware run ([12903b2](https://gitea.deepak.science:2222/physics/deepdog/commit/12903b2540cefb040174d230bc0d04719a6dc1b7))
### Bug Fixes
* avoids redefinition of core count in loop ([1cf4454](https://gitea.deepak.science:2222/physics/deepdog/commit/1cf44541531541088198bd4599d467df3e1acbcf))
### [0.6.6](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.5...0.6.6) (2023-04-09)
### Bug Fixes
* removes bad logging in multiprocessing function ([8fd1b75](https://gitea.deepak.science:2222/physics/deepdog/commit/8fd1b75e1378301210bfa8f14dd09174bbd21414))
### [0.6.5](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.4...0.6.5) (2023-04-09)
### Features
* adds temp aware guy using new pdme temp-flexible feature for bundling temp models ([de1ec3e](https://gitea.deepak.science:2222/physics/deepdog/commit/de1ec3e70062d418e0d4c89716905cc9313d2e26))
### [0.6.4](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.3...0.6.4) (2022-08-13)
### Features
* Prints model names while running ([7ea1d71](https://gitea.deepak.science:2222/physics/deepdog/commit/7ea1d715f67e81c9fa841c5a62f1cc700ff7363d))
### [0.6.3](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.2...0.6.3) (2022-06-12)
### Features
* adds fast filter variant ([2c5c122](https://gitea.deepak.science:2222/physics/deepdog/commit/2c5c1228209e51d17253f07470e2f1e6dc6872d7))
* adds tester for fast filter real spectrum ([0a1a277](https://gitea.deepak.science:2222/physics/deepdog/commit/0a1a27759b0d4ab01da214b76ab14bf2b1fe00e3))
### [0.6.2](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.1...0.6.2) (2022-05-26)
### Features
* adds better import api for real data run ([d7e0f13](https://gitea.deepak.science:2222/physics/deepdog/commit/d7e0f13ca55197b24cb534c80f321ee76b9c4a40))
### [0.6.1](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.0...0.6.1) (2022-05-22)
### Features
* adds new runner for real spectra ([bd56f24](https://gitea.deepak.science:2222/physics/deepdog/commit/bd56f247748babb2ee1f2a1182d25aa968bff5a5))
## [0.6.0](https://gitea.deepak.science:2222/physics/deepdog/compare/0.5.0...0.6.0) (2022-05-22)
### ⚠ BREAKING CHANGES
* bayes run now handles multidipoles with changes to output file format etc.
* logs multiple dipoles better maybe
* switches over to pdme new stuff, uses models and scraps discretisations entirely
* removes alt_bayes bayes distinction, which was superfluous when only alt worked
### Features
* adds pdme 0.7.0 for multiprocessing ([874d876](https://gitea.deepak.science:2222/physics/deepdog/commit/874d876c9d774433b034d47c4cc0cdac41e6f2c7))
* bayes run now handles multidipoles with changes to output file format etc. ([5d0a7a4](https://gitea.deepak.science:2222/physics/deepdog/commit/5d0a7a4be09c58f8f8f859384f01d7912a98b8b9))
* logs multiple dipoles better maybe ([ae8977b](https://gitea.deepak.science:2222/physics/deepdog/commit/ae8977bb1e4d6cd71e88ea0876da8f4318e030b6))
* removes alt_bayes bayes distinction, which was superfluous when only alt worked ([101569d](https://gitea.deepak.science:2222/physics/deepdog/commit/101569d749e4f3f1842886aa2fd3321b8132278b))
* switches over to pdme new stuff, uses models and scraps discretisations entirely ([6e29f7a](https://gitea.deepak.science:2222/physics/deepdog/commit/6e29f7a702b578c266a42bba23ac973d155ada10))
* Uses multidipole for bayes run, with more verbose output ([df89776](https://gitea.deepak.science:2222/physics/deepdog/commit/df8977655de977fd3c4f7383dd9571e551eb1382))
### Bug Fixes
* another bug fix for csv generation ([b7da3d6](https://gitea.deepak.science:2222/physics/deepdog/commit/b7da3d61cc5c128cba1d2fcb3770b71b7f6fc4b8))
* fixes crash when dipole count is smaller than expected max during file write ([b5e0ecb](https://gitea.deepak.science:2222/physics/deepdog/commit/b5e0ecb52886b32d9055302eacfabb69338026b4))
* fixes format string in csv output for headers ([9afa209](https://gitea.deepak.science:2222/physics/deepdog/commit/9afa209864cdb9255988778e987fe05952848fd4))
* fixes random issue ([eec926a](https://gitea.deepak.science:2222/physics/deepdog/commit/eec926aaac654f78942b4c6b612e4d1cdcbf81dc))
* moves logging successes to after they've actually happened ([0caad05](https://gitea.deepak.science:2222/physics/deepdog/commit/0caad05e3cc6a9adba8bf937c3d2f944e1b096a3))
* now doesn't double randomise frequency ([23b202b](https://gitea.deepak.science:2222/physics/deepdog/commit/23b202beb81cb89f7f20b691e83116fa53764902))
* whoops deleted word multiprocessing ([31070b5](https://gitea.deepak.science:2222/physics/deepdog/commit/31070b5342c265d930b4c51402f42a3ee2415066))
## [0.5.0](https://gitea.deepak.science:2222/physics/deepdog/compare/0.4.0...0.5.0) (2022-04-30)
### ⚠ BREAKING CHANGES
* simulpairs now uses different rng calculator
### Features
* adds simulpairs run ([e9277c3](https://gitea.deepak.science:2222/physics/deepdog/commit/e9277c3da777359feb352c0b19f3bb029248ba2f))
* has better parallelisation ([edf0ba6](https://gitea.deepak.science:2222/physics/deepdog/commit/edf0ba6532c0588fce32341709cdb70e384b83f4))
* simulpairs now uses different rng calculator ([50dbc48](https://gitea.deepak.science:2222/physics/deepdog/commit/50dbc4835e60bace9e9b4ba37415f073a3c9e479))
### Bug Fixes
* better parallelisation hopefully ([42829c0](https://gitea.deepak.science:2222/physics/deepdog/commit/42829c0327e080e18be2fb75e746f6ac0d7c2f6d))
* Makes altbayessimulpairs available in package ([492a5e6](https://gitea.deepak.science:2222/physics/deepdog/commit/492a5e6681c85f95840e28cfd5d4ce4ca1d54eba))
* stronger names ([0954429](https://gitea.deepak.science:2222/physics/deepdog/commit/0954429e2d015a105ff16dfbb9e7a352bf53e5e9))
* Uses correct filename arg for passed in rng ([349341b](https://gitea.deepak.science:2222/physics/deepdog/commit/349341b405375a43b933f1fd7db4ee9fc501def3))
* uses correct filename for pairs guy ([4c06b39](https://gitea.deepak.science:2222/physics/deepdog/commit/4c06b3912c811c93c310b1d9e4c153f2014c4f8b))
## [0.4.0](https://gitea.deepak.science:2222/physics/deepdog/compare/0.3.5...0.4.0) (2022-04-10)
### ⚠ BREAKING CHANGES
* Adds pair calculations, with changing api format
### Features
* Adds dynamic cycle count increases to help reach minimum success count ([ec7b4ca](https://gitea.deepak.science:2222/physics/deepdog/commit/ec7b4cac393c15e94c513215c4f1ba32be2ae87a))
* Adds pair calculations, with changing api format ([6463b13](https://gitea.deepak.science:2222/physics/deepdog/commit/6463b135ef2d212b565864b5ac1b655e014d2194))
### Bug Fixes
* uses bigfix from pdme for negatives ([c1c711f](https://gitea.deepak.science:2222/physics/deepdog/commit/c1c711f47b574d3a9b8a24dbcbdd7f50b9be8ea9))
### [0.3.5](https://gitea.deepak.science:2222/physics/deepdog/compare/0.3.4...0.3.5) (2022-03-07)

20
Jenkinsfile vendored
View File

@ -4,7 +4,7 @@ pipeline {
label 'deepdog' // all your pods will be named with this prefix, followed by a unique id
idleMinutes 5 // how long the pod will live after no jobs have run on it
yamlFile 'jenkins/ci-agent-pod.yaml' // path to the pod definition relative to the root of our project
defaultContainer 'poetry' // define a default container if more than a few stages use it, will default to jnlp container
defaultContainer 'python' // define a default container if more than a few stages use it, will default to jnlp container
}
}
@ -12,30 +12,36 @@ pipeline {
parallelsAlwaysFailFast()
}
environment {
POETRY_HOME="/opt/poetry"
POETRY_VERSION="1.1.12"
}
stages {
stage('Build') {
steps {
echo 'Building...'
sh 'python --version'
sh 'poetry --version'
sh 'poetry install'
sh 'curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python'
sh '${POETRY_HOME}/bin/poetry --version'
sh '${POETRY_HOME}/bin/poetry install'
}
}
stage('Test') {
parallel{
stage('pytest') {
steps {
sh 'poetry run pytest'
sh '${POETRY_HOME}/bin/poetry run pytest'
}
}
stage('lint') {
steps {
sh 'poetry run flake8 deepdog tests'
sh '${POETRY_HOME}/bin/poetry run flake8 deepdog tests'
}
}
stage('mypy') {
steps {
sh 'poetry run mypy deepdog'
sh '${POETRY_HOME}/bin/poetry run mypy deepdog'
}
}
}
@ -51,7 +57,7 @@ pipeline {
}
steps {
echo 'Deploying...'
sh 'poetry publish -u ${PYPI_USR} -p ${PYPI_PSW} --build'
sh '${POETRY_HOME}/bin/poetry publish -u ${PYPI_USR} -p ${PYPI_PSW} --build'
}
}

View File

@ -5,7 +5,7 @@
[![Jenkins](https://img.shields.io/jenkins/build?jobUrl=https%3A%2F%2Fjenkins.deepak.science%2Fjob%2Fgitea-physics%2Fjob%2Fdeepdog%2Fjob%2Fmaster&style=flat-square)](https://jenkins.deepak.science/job/gitea-physics/job/deepdog/job/master/)
![Jenkins tests](https://img.shields.io/jenkins/tests?compact_message&jobUrl=https%3A%2F%2Fjenkins.deepak.science%2Fjob%2Fgitea-physics%2Fjob%2Fdeepdog%2Fjob%2Fmaster%2F&style=flat-square)
![Jenkins Coverage](https://img.shields.io/jenkins/coverage/cobertura?jobUrl=https%3A%2F%2Fjenkins.deepak.science%2Fjob%2Fgitea-physics%2Fjob%2Fdeepdog%2Fjob%2Fmaster%2F&style=flat-square)
![Maintenance](https://img.shields.io/maintenance/yes/2024?style=flat-square)
![Maintenance](https://img.shields.io/maintenance/yes/2022?style=flat-square)
The DiPole DiaGnostic tool.
@ -13,13 +13,6 @@ The DiPole DiaGnostic tool.
`poetry install` to start locally
Commit using [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/), and when commits are on master, release with `just release`.
Commit using [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/), and when commits are on master, release with `doo release`.
In general `just --list` has some of the useful stuff for figuring out what development tools there are.
Poetry as an installer is good, even better is using Nix (maybe with direnv to automatically pick up the `devShell` from `flake.nix`).
In either case `just` should handle actually calling things in a way that's agnostic to poetry as a runner or through nix.
### local scripts
`local_scripts` folder allows for scripts to be run using this code, but that probably isn't the most auditable for actual usage.
The API is still only something I'm using so there's no guarantees yet that it will be stable; overall semantic versioning should help with API breaks.

View File

@ -1,18 +1,15 @@
import logging
from deepdog.meta import __version__
from deepdog.real_spectrum_run import RealSpectrumRun
from deepdog.temp_aware_real_spectrum_run import TempAwareRealSpectrumRun
from deepdog.bayes_run import BayesRun
from deepdog.alt_bayes_run import AltBayesRun
from deepdog.diagnostic import Diagnostic
def get_version():
return __version__
__all__ = [
"get_version",
"RealSpectrumRun",
"TempAwareRealSpectrumRun",
]
__all__ = ["get_version", "BayesRun", "AltBayesRun", "Diagnostic"]
logging.getLogger(__name__).addHandler(logging.NullHandler())

134
deepdog/alt_bayes_run.py Normal file
View File

@ -0,0 +1,134 @@
import pdme.model
import pdme.measurement.oscillating_dipole
import pdme.util.fast_v_calc
from typing import Sequence, Tuple, List
import datetime
import csv
import multiprocessing
import logging
import numpy
# TODO: remove hardcode
CHUNKSIZE = 50
# TODO: It's garbage to have this here duplicated from pdme.
DotInput = Tuple[numpy.typing.ArrayLike, float]
_logger = logging.getLogger(__name__)
def get_a_result(input) -> int:
discretisation, dot_inputs, lows, highs, monte_carlo_count, max_frequency = input
sample_dipoles = discretisation.get_model().get_n_single_dipoles(monte_carlo_count, max_frequency)
vals = pdme.util.fast_v_calc.fast_vs_for_dipoles(dot_inputs, sample_dipoles)
return numpy.count_nonzero(pdme.util.fast_v_calc.between(vals, lows, highs))
class AltBayesRun():
'''
A single Bayes run for a given set of dots.
Parameters
----------
dot_inputs : Sequence[DotInput]
The dot inputs for this bayes run.
discretisations_with_names : Sequence[Tuple(str, pdme.model.Model)]
The models to evaluate.
actual_model_discretisation : pdme.model.Discretisation
The discretisation for the model which is actually correct.
filename_slug : str
The filename slug to include.
run_count: int
The number of runs to do.
'''
def __init__(self, dot_inputs: Sequence[DotInput], discretisations_with_names: Sequence[Tuple[str, pdme.model.Discretisation]], actual_model: pdme.model.Model, filename_slug: str, run_count: int, low_error: float = 0.9, high_error: float = 1.1, monte_carlo_count: int = 10000, monte_carlo_cycles: int = 10, max_frequency: float = 20, end_threshold: float = None, chunksize: int = CHUNKSIZE) -> None:
self.dot_inputs = dot_inputs
self.dot_inputs_array = pdme.measurement.oscillating_dipole.dot_inputs_to_array(dot_inputs)
self.discretisations = [disc for (_, disc) in discretisations_with_names]
self.model_names = [name for (name, _) in discretisations_with_names]
self.actual_model = actual_model
self.model_count = len(self.discretisations)
self.monte_carlo_count = monte_carlo_count
self.monte_carlo_cycles = monte_carlo_cycles
self.run_count = run_count
self.low_error = low_error
self.high_error = high_error
self.csv_fields = ["dipole_moment", "dipole_location", "dipole_frequency"]
self.compensate_zeros = True
self.chunksize = chunksize
for name in self.model_names:
self.csv_fields.extend([f"{name}_success", f"{name}_count", f"{name}_prob"])
self.probabilities = [1 / self.model_count] * self.model_count
timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
self.filename = f"{timestamp}-{filename_slug}.altbayes.csv"
self.max_frequency = max_frequency
if end_threshold is not None:
if 0 < end_threshold < 1:
self.end_threshold: float = end_threshold
self.use_end_threshold = True
_logger.info(f"Will abort early, at {self.end_threshold}.")
else:
raise ValueError(f"end_threshold should be between 0 and 1, but is actually {end_threshold}")
def go(self) -> None:
with open(self.filename, "a", newline="") as outfile:
writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
writer.writeheader()
for run in range(1, self.run_count + 1):
rng = numpy.random.default_rng()
frequency = rng.uniform(1, self.max_frequency)
# Generate the actual dipoles
actual_dipoles = self.actual_model.get_dipoles(frequency)
dots = actual_dipoles.get_percent_range_dot_measurements(self.dot_inputs, self.low_error, self.high_error)
lows, highs = pdme.measurement.oscillating_dipole.dot_range_measurements_low_high_arrays(dots)
_logger.info(f"Going to work on dipole at {actual_dipoles.dipoles}")
results = []
_logger.debug("Going to iterate over discretisations now")
for disc_count, discretisation in enumerate(self.discretisations):
_logger.debug(f"Doing discretisation #{disc_count}")
with multiprocessing.Pool(multiprocessing.cpu_count() - 1 or 1) as pool:
results.append(sum(
pool.imap_unordered(get_a_result, [(discretisation, self.dot_inputs_array, lows, highs, self.monte_carlo_count, self.max_frequency)] * self.monte_carlo_cycles, self.chunksize)
))
_logger.debug("Done, constructing output now")
row = {
"dipole_moment": actual_dipoles.dipoles[0].p,
"dipole_location": actual_dipoles.dipoles[0].s,
"dipole_frequency": actual_dipoles.dipoles[0].w
}
successes: List[float] = []
counts: List[int] = []
for model_index, (name, result) in enumerate(zip(self.model_names, results)):
row[f"{name}_success"] = result
row[f"{name}_count"] = self.monte_carlo_count * self.monte_carlo_cycles
successes.append(max(result, 0.5))
counts.append(self.monte_carlo_count * self.monte_carlo_cycles)
success_weight = sum([(succ / count) * prob for succ, count, prob in zip(successes, counts, self.probabilities)])
new_probabilities = [(succ / count) * old_prob / success_weight for succ, count, old_prob in zip(successes, counts, self.probabilities)]
self.probabilities = new_probabilities
for name, probability in zip(self.model_names, self.probabilities):
row[f"{name}_prob"] = probability
_logger.info(row)
with open(self.filename, "a", newline="") as outfile:
writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
writer.writerow(row)
if self.use_end_threshold:
max_prob = max(self.probabilities)
if max_prob > self.end_threshold:
_logger.info(f"Aborting early, because {max_prob} is greater than {self.end_threshold}")
break

128
deepdog/bayes_run.py Normal file
View File

@ -0,0 +1,128 @@
import pdme.model
from typing import Sequence, Tuple, List
import datetime
import itertools
import csv
import logging
import numpy
import scipy.optimize
import multiprocessing
# TODO: remove hardcode
COST_THRESHOLD = 1e-10
# TODO: It's garbage to have this here duplicated from pdme.
DotInput = Tuple[numpy.typing.ArrayLike, float]
_logger = logging.getLogger(__name__)
def get_a_result(discretisation, dots, index) -> Tuple[Tuple[int, ...], scipy.optimize.OptimizeResult]:
return (index, discretisation.solve_for_index(dots, index))
class BayesRun():
'''
A single Bayes run for a given set of dots.
Parameters
----------
dot_inputs : Sequence[DotInput]
The dot inputs for this bayes run.
discretisations_with_names : Sequence[Tuple(str, pdme.model.Model)]
The models to evaluate.
actual_model_discretisation : pdme.model.Discretisation
The discretisation for the model which is actually correct.
filename_slug : str
The filename slug to include.
run_count: int
The number of runs to do.
'''
def __init__(self, dot_inputs: Sequence[DotInput], discretisations_with_names: Sequence[Tuple[str, pdme.model.Discretisation]], actual_model: pdme.model.Model, filename_slug: str, run_count: int, max_frequency: float = None, end_threshold: float = None) -> None:
self.dot_inputs = dot_inputs
self.discretisations = [disc for (_, disc) in discretisations_with_names]
self.model_names = [name for (name, _) in discretisations_with_names]
self.actual_model = actual_model
self.model_count = len(self.discretisations)
self.run_count = run_count
self.csv_fields = ["dipole_moment", "dipole_location", "dipole_frequency"]
self.compensate_zeros = True
for name in self.model_names:
self.csv_fields.extend([f"{name}_success", f"{name}_count", f"{name}_prob"])
self.probabilities = [1 / self.model_count] * self.model_count
timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
self.filename = f"{timestamp}-{filename_slug}.csv"
self.max_frequency = max_frequency
if end_threshold is not None:
if 0 < end_threshold < 1:
self.end_threshold: float = end_threshold
self.use_end_threshold = True
_logger.info(f"Will abort early, at {self.end_threshold}.")
else:
raise ValueError(f"end_threshold should be between 0 and 1, but is actually {end_threshold}")
def go(self) -> None:
with open(self.filename, "a", newline="") as outfile:
writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
writer.writeheader()
for run in range(1, self.run_count + 1):
frequency: float = run
if self.max_frequency is not None and self.max_frequency > 1:
rng = numpy.random.default_rng()
frequency = rng.uniform(1, self.max_frequency)
dipoles = self.actual_model.get_dipoles(frequency)
dots = dipoles.get_dot_measurements(self.dot_inputs)
_logger.info(f"Going to work on dipole at {dipoles.dipoles}")
results = []
_logger.debug("Going to iterate over discretisations now")
for disc_count, discretisation in enumerate(self.discretisations):
_logger.debug(f"Doing discretisation #{disc_count}")
with multiprocessing.Pool(multiprocessing.cpu_count() - 1 or 1) as pool:
results.append(pool.starmap(get_a_result, zip(itertools.repeat(discretisation), itertools.repeat(dots), discretisation.all_indices())))
_logger.debug("Done, constructing output now")
row = {
"dipole_moment": dipoles.dipoles[0].p,
"dipole_location": dipoles.dipoles[0].s,
"dipole_frequency": dipoles.dipoles[0].w
}
successes: List[float] = []
counts: List[int] = []
for model_index, (name, result) in enumerate(zip(self.model_names, results)):
count = 0
success = 0
for idx, val in result:
count += 1
if val.success and val.cost <= COST_THRESHOLD:
success += 1
row[f"{name}_success"] = success
row[f"{name}_count"] = count
successes.append(max(success, 0.5))
counts.append(count)
success_weight = sum([(succ / count) * prob for succ, count, prob in zip(successes, counts, self.probabilities)])
new_probabilities = [(succ / count) * old_prob / success_weight for succ, count, old_prob in zip(successes, counts, self.probabilities)]
self.probabilities = new_probabilities
for name, probability in zip(self.model_names, self.probabilities):
row[f"{name}_prob"] = probability
_logger.info(row)
with open(self.filename, "a", newline="") as outfile:
writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
writer.writerow(row)
if self.use_end_threshold:
max_prob = max(self.probabilities)
if max_prob > self.end_threshold:
_logger.info(f"Aborting early, because {max_prob} is greater than {self.end_threshold}")
break

View File

@ -1,5 +0,0 @@
from deepdog.cli.probs.main import wrapped_main
__all__ = [
"wrapped_main",
]

View File

@ -1,51 +0,0 @@
import argparse
import os
def parse_args() -> argparse.Namespace:
def dir_path(path):
if os.path.isdir(path):
return path
else:
raise argparse.ArgumentTypeError(f"readable_dir:{path} is not a valid path")
parser = argparse.ArgumentParser(
"probs", description="Calculating probability from finished bayesrun"
)
parser.add_argument(
"--log-file",
type=str,
help="A filename for logging to, if not provided will only log to stderr",
default=None,
)
parser.add_argument(
"--bayesrun-directory",
"-d",
type=dir_path,
help="The directory to search for bayesrun files, defaulting to cwd if not passed",
default=".",
)
parser.add_argument(
"--indexify-json",
help="A json file with the indexify config for parsing job indexes. Will skip if not present",
default="",
)
parser.add_argument(
"--coalesced-keys",
type=str,
help="A comma separated list of strings over which to coalesce data. By default coalesce over all fields within model names, ignore file level names",
default="",
)
parser.add_argument(
"--uncoalesced-outfile",
type=str,
help="output filename for uncoalesced data. If not provided, will not be written",
default=None,
)
parser.add_argument(
"--coalesced-outfile",
type=str,
help="output filename for coalesced data. If not provided, will not be written",
default=None,
)
return parser.parse_args()

View File

@ -1,178 +0,0 @@
import typing
from deepdog.results import BayesrunOutput
import logging
import csv
import tqdm
_logger = logging.getLogger(__name__)
def build_model_dict(
bayes_outputs: typing.Sequence[BayesrunOutput],
) -> typing.Dict[
typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
]:
"""
Maybe someday do something smarter with the coalescing and stuff but don't want to so i won't
"""
# assume that everything is well formatted and the keys are the same across entire list and initialise list of keys.
# model dict will contain a model_key: {calculation_dict} where each calculation_dict represents a single calculation for that model,
# the uncoalesced version, keyed by the specific file keys
model_dict: typing.Dict[
typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
] = {}
_logger.info("building model dict")
for out in tqdm.tqdm(bayes_outputs, desc="reading outputs", leave=False):
for model_result in out.results:
model_key = tuple(v for v in model_result.parsed_model_keys.values())
if model_key not in model_dict:
model_dict[model_key] = {}
calculation_dict = model_dict[model_key]
calculation_key = tuple(v for v in out.data.values())
if calculation_key not in calculation_dict:
calculation_dict[calculation_key] = {
"_model_key_dict": model_result.parsed_model_keys,
"_calculation_key_dict": out.data,
"success": model_result.success,
"count": model_result.count,
}
else:
raise ValueError(
f"Got {calculation_key} twice for model_key {model_key}"
)
return model_dict
def write_uncoalesced_dict(
uncoalesced_output_filename: typing.Optional[str],
uncoalesced_model_dict: typing.Dict[
typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
],
):
if uncoalesced_output_filename is None or uncoalesced_output_filename == "":
_logger.warning("Not provided a uncoalesced filename, not going to try")
return
first_value = next(iter(next(iter(uncoalesced_model_dict.values())).values()))
model_field_names = set(first_value["_model_key_dict"].keys())
calculation_field_names = set(first_value["_calculation_key_dict"].keys())
if not (set(model_field_names).isdisjoint(calculation_field_names)):
_logger.info(f"Detected model field names {model_field_names}")
_logger.info(f"Detected calculation field names {calculation_field_names}")
_logger.warning(
f"model field names {model_field_names} and calculation {calculation_field_names} have an overlap, which is possibly a problem"
)
collected_fieldnames = list(model_field_names)
collected_fieldnames.extend(calculation_field_names)
collected_fieldnames.extend(["success", "count"])
_logger.info(f"Full uncoalesced fieldnames are {collected_fieldnames}")
with open(uncoalesced_output_filename, "w", newline="") as uncoalesced_output_file:
writer = csv.DictWriter(
uncoalesced_output_file, fieldnames=collected_fieldnames
)
writer.writeheader()
for model_dict in uncoalesced_model_dict.values():
for calculation in model_dict.values():
row = calculation["_model_key_dict"].copy()
row.update(calculation["_calculation_key_dict"].copy())
row.update(
{
"success": calculation["success"],
"count": calculation["count"],
}
)
writer.writerow(row)
def coalesced_dict(
uncoalesced_model_dict: typing.Dict[
typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
],
minimum_count: float = 0.1,
):
"""
pass in uncoalesced dict
the minimum_count field is what we use to make sure our probs are never zero
"""
coalesced_dict = {}
# we are already iterating so for no reason because performance really doesn't matter let's count the keys ourselves
num_keys = 0
# first pass coalesce
for model_key, model_dict in uncoalesced_model_dict.items():
num_keys += 1
for calculation in model_dict.values():
if model_key not in coalesced_dict:
coalesced_dict[model_key] = {
"_model_key_dict": calculation["_model_key_dict"].copy(),
"calculations_coalesced": 0,
"count": 0,
"success": 0,
}
sub_dict = coalesced_dict[model_key]
sub_dict["calculations_coalesced"] += 1
sub_dict["count"] += calculation["count"]
sub_dict["success"] += calculation["success"]
# second pass do probability calculation
prior = 1 / num_keys
_logger.info(f"Got {num_keys} model keys, so our prior will be {prior}")
total_weight = 0
for coalesced_model_dict in coalesced_dict.values():
model_weight = (
max(minimum_count, coalesced_model_dict["success"])
/ coalesced_model_dict["count"]
) * prior
total_weight += model_weight
total_prob = 0
for coalesced_model_dict in coalesced_dict.values():
model_weight = (
max(minimum_count, coalesced_model_dict["success"])
/ coalesced_model_dict["count"]
)
prob = model_weight * prior / total_weight
coalesced_model_dict["prob"] = prob
total_prob += prob
_logger.debug(
f"Got a total probability of {total_prob}, which should be close to 1 up to float/rounding error"
)
return coalesced_dict
def write_coalesced_dict(
coalesced_output_filename: typing.Optional[str],
coalesced_model_dict: typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]],
):
if coalesced_output_filename is None or coalesced_output_filename == "":
_logger.warning("Not provided a uncoalesced filename, not going to try")
return
first_value = next(iter(coalesced_model_dict.values()))
model_field_names = set(first_value["_model_key_dict"].keys())
_logger.info(f"Detected model field names {model_field_names}")
collected_fieldnames = list(model_field_names)
collected_fieldnames.extend(["calculations_coalesced", "success", "count", "prob"])
with open(coalesced_output_filename, "w", newline="") as coalesced_output_file:
writer = csv.DictWriter(coalesced_output_file, fieldnames=collected_fieldnames)
writer.writeheader()
for model_dict in coalesced_model_dict.values():
row = model_dict["_model_key_dict"].copy()
row.update(
{
"calculations_coalesced": model_dict["calculations_coalesced"],
"success": model_dict["success"],
"count": model_dict["count"],
"prob": model_dict["prob"],
}
)
writer.writerow(row)

View File

@ -1,100 +0,0 @@
import logging
import argparse
import json
import deepdog.cli.probs.args
import deepdog.cli.probs.dicts
import deepdog.results
import deepdog.indexify
import pathlib
import tqdm
import tqdm.contrib.logging
_logger = logging.getLogger(__name__)
def set_up_logging(log_file: str):
log_pattern = "%(asctime)s | %(levelname)-7s | %(name)s:%(lineno)d | %(message)s"
if log_file is None:
handlers = [
logging.StreamHandler(),
]
else:
handlers = [logging.StreamHandler(), logging.FileHandler(log_file)]
logging.basicConfig(
level=logging.DEBUG,
format=log_pattern,
# it's okay to ignore this mypy error because who cares about logger handler types
handlers=handlers, # type: ignore
)
logging.captureWarnings(True)
def main(args: argparse.Namespace):
"""
Main function with passed in arguments and no additional logging setup in case we want to extract out later
"""
with tqdm.contrib.logging.logging_redirect_tqdm():
_logger.info(f"args: {args}")
try:
if args.coalesced_keys:
raise NotImplementedError(
"Currently not supporting coalesced keys, but maybe in future"
)
except AttributeError:
# we don't care if this is missing because we don't actually want it to be there
pass
indexifier = None
if args.indexify_json:
with open(args.indexify_json, "r") as indexify_json_file:
indexify_spec = json.load(indexify_json_file)
indexify_data = indexify_spec["indexes"]
if "seed_spec" in indexify_spec:
seed_spec = indexify_spec["seed_spec"]
indexify_data[seed_spec["field_name"]] = list(
range(seed_spec["num_seeds"])
)
# _logger.debug(f"Indexifier data looks like {indexify_data}")
indexifier = deepdog.indexify.Indexifier(indexify_data)
bayes_dir = pathlib.Path(args.bayesrun_directory)
out_files = [f for f in bayes_dir.iterdir() if f.name.endswith("bayesrun.csv")]
_logger.info(
f"Reading {len(out_files)} bayesrun.csv files in directory {args.bayesrun_directory}"
)
# _logger.info(out_files)
parsed_output_files = [
deepdog.results.read_output_file(f, indexifier)
for f in tqdm.tqdm(out_files, desc="reading files", leave=False)
]
# Refactor here to allow for arbitrary likelihood file sources
_logger.info("building uncoalesced dict")
uncoalesced_dict = deepdog.cli.probs.dicts.build_model_dict(parsed_output_files)
if "uncoalesced_outfile" in args and args.uncoalesced_outfile:
deepdog.cli.probs.dicts.write_uncoalesced_dict(
args.uncoalesced_outfile, uncoalesced_dict
)
else:
_logger.info("Skipping writing uncoalesced")
_logger.info("building coalesced dict")
coalesced = deepdog.cli.probs.dicts.coalesced_dict(uncoalesced_dict)
if "coalesced_outfile" in args and args.coalesced_outfile:
deepdog.cli.probs.dicts.write_coalesced_dict(
args.coalesced_outfile, coalesced
)
else:
_logger.info("Skipping writing coalesced")
def wrapped_main():
args = deepdog.cli.probs.args.parse_args()
set_up_logging(args.log_file)
main(args)

View File

@ -1,5 +0,0 @@
from deepdog.cli.subset_sim_probs.main import wrapped_main
__all__ = [
"wrapped_main",
]

View File

@ -1,52 +0,0 @@
import argparse
import os
def parse_args() -> argparse.Namespace:
def dir_path(path):
if os.path.isdir(path):
return path
else:
raise argparse.ArgumentTypeError(f"readable_dir:{path} is not a valid path")
parser = argparse.ArgumentParser(
"subset_sim_probs",
description="Calculating probability from finished subset sim run",
)
parser.add_argument(
"--log-file",
type=str,
help="A filename for logging to, if not provided will only log to stderr",
default=None,
)
parser.add_argument(
"--results-directory",
"-d",
type=dir_path,
help="The directory to search for bayesrun files, defaulting to cwd if not passed",
default=".",
)
parser.add_argument(
"--indexify-json",
help="A json file with the indexify config for parsing job indexes. Will skip if not present",
default="",
)
parser.add_argument(
"--outfile",
"-o",
type=str,
help="output filename for coalesced data. If not provided, will not be written",
default=None,
)
confirm_outfile_overwrite_group = parser.add_mutually_exclusive_group()
confirm_outfile_overwrite_group.add_argument(
"--never-overwrite-outfile",
action="store_true",
help="If a duplicate outfile is detected, skip confirmation and automatically exit early",
)
confirm_outfile_overwrite_group.add_argument(
"--force-overwrite-outfile",
action="store_true",
help="Skips checking for duplicate outfiles and overwrites",
)
return parser.parse_args()

View File

@ -1,136 +0,0 @@
import typing
from deepdog.results import GeneralOutput
import logging
import csv
import tqdm
_logger = logging.getLogger(__name__)
def build_model_dict(
general_outputs: typing.Sequence[GeneralOutput],
) -> typing.Dict[
typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
]:
"""
Maybe someday do something smarter with the coalescing and stuff but don't want to so i won't
"""
# assume that everything is well formatted and the keys are the same across entire list and initialise list of keys.
# model dict will contain a model_key: {calculation_dict} where each calculation_dict represents a single calculation for that model,
# the uncoalesced version, keyed by the specific file keys
model_dict: typing.Dict[
typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
] = {}
_logger.info("building model dict")
for out in tqdm.tqdm(general_outputs, desc="reading outputs", leave=False):
for model_result in out.results:
model_key = tuple(v for v in model_result.parsed_model_keys.values())
if model_key not in model_dict:
model_dict[model_key] = {}
calculation_dict = model_dict[model_key]
calculation_key = tuple(v for v in out.data.values())
if calculation_key not in calculation_dict:
calculation_dict[calculation_key] = {
"_model_key_dict": model_result.parsed_model_keys,
"_calculation_key_dict": out.data,
"num_finished_runs": int(
model_result.result_dict["num_finished_runs"]
),
"num_runs": int(model_result.result_dict["num_runs"]),
"estimated_likelihood": float(
model_result.result_dict["estimated_likelihood"]
),
}
else:
raise ValueError(
f"Got {calculation_key} twice for model_key {model_key}"
)
return model_dict
def coalesced_dict(
uncoalesced_model_dict: typing.Dict[
typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
],
):
"""
pass in uncoalesced dict
the minimum_count field is what we use to make sure our probs are never zero
"""
coalesced_dict = {}
# we are already iterating so for no reason because performance really doesn't matter let's count the keys ourselves
num_keys = 0
# first pass coalesce
for model_key, model_dict in uncoalesced_model_dict.items():
num_keys += 1
for calculation in model_dict.values():
if model_key not in coalesced_dict:
coalesced_dict[model_key] = {
"_model_key_dict": calculation["_model_key_dict"].copy(),
"calculations_coalesced": 1,
"num_finished_runs": calculation["num_finished_runs"],
"num_runs": calculation["num_runs"],
"estimated_likelihood": calculation["estimated_likelihood"],
}
else:
_logger.error(f"We shouldn't be here! Double key for {model_key=}")
raise ValueError()
# second pass do probability calculation
prior = 1 / num_keys
_logger.info(f"Got {num_keys} model keys, so our prior will be {prior}")
total_weight = 0
for coalesced_model_dict in coalesced_dict.values():
model_weight = coalesced_model_dict["estimated_likelihood"] * prior
total_weight += model_weight
total_prob = 0
for coalesced_model_dict in coalesced_dict.values():
likelihood = coalesced_model_dict["estimated_likelihood"]
prob = likelihood * prior / total_weight
coalesced_model_dict["prob"] = prob
total_prob += prob
_logger.debug(
f"Got a total probability of {total_prob}, which should be close to 1 up to float/rounding error"
)
return coalesced_dict
def write_coalesced_dict(
coalesced_output_filename: typing.Optional[str],
coalesced_model_dict: typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]],
):
if coalesced_output_filename is None or coalesced_output_filename == "":
_logger.warning("Not provided a uncoalesced filename, not going to try")
return
first_value = next(iter(coalesced_model_dict.values()))
model_field_names = set(first_value["_model_key_dict"].keys())
_logger.info(f"Detected model field names {model_field_names}")
collected_fieldnames = list(model_field_names)
collected_fieldnames.extend(
["calculations_coalesced", "num_finished_runs", "num_runs", "prob"]
)
with open(coalesced_output_filename, "w", newline="") as coalesced_output_file:
writer = csv.DictWriter(coalesced_output_file, fieldnames=collected_fieldnames)
writer.writeheader()
for model_dict in coalesced_model_dict.values():
row = model_dict["_model_key_dict"].copy()
row.update(
{
"calculations_coalesced": model_dict["calculations_coalesced"],
"num_finished_runs": model_dict["num_finished_runs"],
"num_runs": model_dict["num_runs"],
"prob": model_dict["prob"],
}
)
writer.writerow(row)

View File

@ -1,113 +0,0 @@
import logging
import argparse
import json
import deepdog.cli.subset_sim_probs.args
import deepdog.cli.subset_sim_probs.dicts
import deepdog.cli.util
import deepdog.results
import deepdog.indexify
import pathlib
import tqdm
import os
import tqdm.contrib.logging
_logger = logging.getLogger(__name__)
def set_up_logging(log_file: str):
log_pattern = "%(asctime)s | %(levelname)-7s | %(name)s:%(lineno)d | %(message)s"
if log_file is None:
handlers = [
logging.StreamHandler(),
]
else:
handlers = [logging.StreamHandler(), logging.FileHandler(log_file)]
logging.basicConfig(
level=logging.DEBUG,
format=log_pattern,
# it's okay to ignore this mypy error because who cares about logger handler types
handlers=handlers, # type: ignore
)
logging.captureWarnings(True)
def main(args: argparse.Namespace):
"""
Main function with passed in arguments and no additional logging setup in case we want to extract out later
"""
with tqdm.contrib.logging.logging_redirect_tqdm():
_logger.info(f"args: {args}")
if "outfile" in args and args.outfile:
if os.path.exists(args.outfile):
if args.never_overwrite_outfile:
_logger.warning(
f"Filename {args.outfile} already exists, and never want overwrite, so aborting."
)
return
elif args.force_overwrite_outfile:
_logger.warning(f"Forcing overwrite of {args.outfile}")
else:
# need to confirm
confirm_overwrite = deepdog.cli.util.confirm_prompt(
f"Filename {args.outfile} exists, overwrite?"
)
if not confirm_overwrite:
_logger.warning(
f"Filename {args.outfile} already exists and do not want overwrite, aborting."
)
return
else:
_logger.warning(f"Overwriting file {args.outfile}")
indexifier = None
if args.indexify_json:
with open(args.indexify_json, "r") as indexify_json_file:
indexify_spec = json.load(indexify_json_file)
indexify_data = indexify_spec["indexes"]
if "seed_spec" in indexify_spec:
seed_spec = indexify_spec["seed_spec"]
indexify_data[seed_spec["field_name"]] = list(
range(seed_spec["num_seeds"])
)
# _logger.debug(f"Indexifier data looks like {indexify_data}")
indexifier = deepdog.indexify.Indexifier(indexify_data)
results_dir = pathlib.Path(args.results_directory)
out_files = [
f for f in results_dir.iterdir() if f.name.endswith("subsetsim.csv")
]
_logger.info(
f"Reading {len(out_files)} subsetsim.csv files in directory {args.results_directory}"
)
# _logger.info(out_files)
parsed_output_files = [
deepdog.results.read_subset_sim_file(f, indexifier)
for f in tqdm.tqdm(out_files, desc="reading files", leave=False)
]
# Refactor here to allow for arbitrary likelihood file sources
_logger.info("building uncoalesced dict")
uncoalesced_dict = deepdog.cli.subset_sim_probs.dicts.build_model_dict(
parsed_output_files
)
_logger.info("building coalesced dict")
coalesced = deepdog.cli.subset_sim_probs.dicts.coalesced_dict(uncoalesced_dict)
if "outfile" in args and args.outfile:
deepdog.cli.subset_sim_probs.dicts.write_coalesced_dict(
args.outfile, coalesced
)
else:
_logger.info("Skipping writing coalesced")
def wrapped_main():
args = deepdog.cli.subset_sim_probs.args.parse_args()
set_up_logging(args.log_file)
main(args)

View File

@ -1,3 +0,0 @@
from deepdog.cli.util.confirm import confirm_prompt
__all__ = ["confirm_prompt"]

View File

@ -1,23 +0,0 @@
_RESPONSE_MAP = {
"yes": True,
"ye": True,
"y": True,
"no": False,
"n": False,
"nope": False,
"true": True,
"false": False,
}
def confirm_prompt(question: str) -> bool:
"""Prompt with the question and returns yes or no based on response."""
prompt = question + " [y/n]: "
while True:
choice = input(prompt).lower()
if choice in _RESPONSE_MAP:
return _RESPONSE_MAP[choice]
else:
print('Respond with "yes" or "no"')

99
deepdog/diagnostic.py Normal file
View File

@ -0,0 +1,99 @@
from pdme.measurement import OscillatingDipole, OscillatingDipoleArrangement
import pdme
from deepdog.bayes_run import DotInput
import datetime
import numpy
from dataclasses import dataclass
import logging
from typing import Sequence, Tuple
import csv
import itertools
import multiprocessing
_logger = logging.getLogger(__name__)
def get_a_result(discretisation, dots, index):
return (index, discretisation.solve_for_index(dots, index))
@dataclass
class SingleDipoleDiagnostic():
model: str
index: Tuple
bounds: Tuple
actual_dipole: OscillatingDipole
result_dipole: OscillatingDipole
success: bool
def __post_init__(self) -> None:
self.p_actual_x = self.actual_dipole.p[0]
self.p_actual_y = self.actual_dipole.p[1]
self.p_actual_z = self.actual_dipole.p[2]
self.s_actual_x = self.actual_dipole.s[0]
self.s_actual_y = self.actual_dipole.s[1]
self.s_actual_z = self.actual_dipole.s[2]
self.p_result_x = self.result_dipole.p[0]
self.p_result_y = self.result_dipole.p[1]
self.p_result_z = self.result_dipole.p[2]
self.s_result_x = self.result_dipole.s[0]
self.s_result_y = self.result_dipole.s[1]
self.s_result_z = self.result_dipole.s[2]
self.w_actual = self.actual_dipole.w
self.w_result = self.result_dipole.w
class Diagnostic():
'''
Represents a diagnostic for a single dipole moment given a set of discretisations.
Parameters
----------
dot_inputs : Sequence[DotInput]
The dot inputs for this diagnostic.
discretisations_with_names : Sequence[Tuple(str, pdme.model.Model)]
The models to evaluate.
actual_model_discretisation : pdme.model.Discretisation
The discretisation for the model which is actually correct.
filename_slug : str
The filename slug to include.
run_count: int
The number of runs to do.
'''
def __init__(self, actual_dipole_moment: numpy.ndarray, actual_dipole_position: numpy.ndarray, actual_dipole_frequency: float, dot_inputs: Sequence[DotInput], discretisations_with_names: Sequence[Tuple[str, pdme.model.Discretisation]], filename_slug: str) -> None:
self.dipoles = OscillatingDipoleArrangement([OscillatingDipole(actual_dipole_moment, actual_dipole_position, actual_dipole_frequency)])
self.dots = self.dipoles.get_dot_measurements(dot_inputs)
self.discretisations_with_names = discretisations_with_names
self.model_count = len(self.discretisations_with_names)
self.csv_fields = ["model", "index", "bounds", "p_actual_x", "p_actual_y", "p_actual_z", "s_actual_x", "s_actual_y", "s_actual_z", "w_actual", "success", "p_result_x", "p_result_y", "p_result_z", "s_result_x", "s_result_y", "s_result_z", "w_result"]
timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
self.filename = f"{timestamp}-{filename_slug}.diag.csv"
def go(self):
with open(self.filename, "a", newline="") as outfile:
# csv fields
writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect='unix')
writer.writeheader()
for (name, discretisation) in self.discretisations_with_names:
_logger.info(f"Working on discretisation {name}")
results = []
with multiprocessing.Pool(multiprocessing.cpu_count() - 1 or 1) as pool:
results = pool.starmap(get_a_result, zip(itertools.repeat(discretisation), itertools.repeat(self.dots), discretisation.all_indices()))
with open(self.filename, "a", newline='') as outfile:
writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect='unix', extrasaction="ignore")
for idx, result in results:
bounds = discretisation.bounds(idx)
actual_success = result.success and result.cost <= 1e-10
diag_row = SingleDipoleDiagnostic(name, idx, bounds, self.dipoles.dipoles[0], discretisation.model.solution_as_dipoles(result.normalised_x)[0], actual_success)
row = vars(diag_row)
_logger.debug(f"Writing result {row}")
writer.writerow(row)

View File

@ -1,6 +0,0 @@
from deepdog.direct_monte_carlo.direct_mc import (
DirectMonteCarloRun,
DirectMonteCarloConfig,
)
__all__ = ["DirectMonteCarloRun", "DirectMonteCarloConfig"]

View File

@ -1,14 +0,0 @@
from typing import Sequence
from deepdog.direct_monte_carlo.direct_mc import DirectMonteCarloFilter
import numpy
class ComposedDMCFilter(DirectMonteCarloFilter):
def __init__(self, filters: Sequence[DirectMonteCarloFilter]):
self.filters = filters
def filter_samples(self, samples: numpy.ndarray) -> numpy.ndarray:
current_sample = samples
for filter in self.filters:
current_sample = filter.filter_samples(current_sample)
return current_sample

View File

@ -1,24 +0,0 @@
from deepdog.direct_monte_carlo.direct_mc import DirectMonteCarloFilter
from typing import Callable
import numpy
class CostFunctionTargetFilter(DirectMonteCarloFilter):
def __init__(
self,
cost_function: Callable[[numpy.ndarray], numpy.ndarray],
target_cost: float,
):
"""
Filters dipoles by cost, only leaving dipoles with cost below target_cost
"""
self.cost_function = cost_function
self.target_cost = target_cost
def filter_samples(self, samples: numpy.ndarray) -> numpy.ndarray:
current_sample = samples
costs = self.cost_function(current_sample)
current_sample = current_sample[costs < self.target_cost]
return current_sample

View File

@ -1,435 +0,0 @@
import re
import pathlib
import csv
import pdme.model
import pdme.measurement
import pdme.measurement.input_types
import pdme.subspace_simulation
import datetime
from typing import Tuple, Dict, NewType, Any, Sequence
from dataclasses import dataclass
import logging
import numpy
import numpy.random
import pdme.util.fast_v_calc
import multiprocessing
_logger = logging.getLogger(__name__)
ANTI_ZERO_SUCCESS_THRES = 0.1
@dataclass
class DirectMonteCarloResult:
successes: int
monte_carlo_count: int
likelihood: float
model_name: str
@dataclass
class DirectMonteCarloConfig:
monte_carlo_count_per_cycle: int = 10000
monte_carlo_cycles: int = 10
target_success: int = 100
max_monte_carlo_cycles_steps: int = 10
monte_carlo_seed: int = 1234
write_successes_to_file: bool = False
tag: str = ""
cap_core_count: int = 0 # 0 means cap at num cores - 1
chunk_size: int = 50
# chunk size of some kind
write_bayesrun_file: bool = True
bayesrun_file_timestamp: bool = True
skip_if_exists: bool = False
def get_filename(self) -> str:
"""
Generate a filename for the output of this run.
"""
# set starting execution timestamp
timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
if self.bayesrun_file_timestamp:
timestamp_str = f"{timestamp}-"
else:
timestamp_str = ""
filename = f"{timestamp_str}{self.tag}.realdata.fast_filter.bayesrun.csv"
_logger.debug(f"Got filename {filename}")
return filename
def get_filename_regex(self) -> str:
"""
Generate a regex for the output of this run.
"""
# having both timestamp and the hyphen separately optional is a bit of a hack
# too loose, but will never matter
pattern = rf"(?P<timestamp>\d{{8}}-\d{{6}})?-?{self.tag}\.realdata\.fast_filter\.bayesrun\.csv"
return pattern
# Aliasing dict as a generic data container
DirectMonteCarloData = NewType("DirectMonteCarloData", Dict[str, Any])
class DirectMonteCarloFilter:
"""
Abstract class for filtering out samples matching some criteria. Initialise with data as needed,
then filter out samples as needed.
"""
def filter_samples(self, samples: numpy.ndarray) -> numpy.ndarray:
raise NotImplementedError
class DirectMonteCarloRun:
"""
A single model Direct Monte Carlo run, currently implemented only using single threading.
An encapsulation of the steps needed for a Bayes run.
Parameters
----------
model_name_pairs : Sequence[Tuple(str, pdme.model.DipoleModel)]
The models to evaluate, with names
measurements: Sequence[pdme.measurement.DotRangeMeasurement]
The measurements as dot ranges to use as the bounds for the Monte Carlo calculation.
monte_carlo_count_per_cycle: int
The number of Monte Carlo iterations to use in a single cycle calculation.
monte_carlo_cycles: int
The number of cycles to use in each step.
Increasing monte_carlo_count_per_cycle increases memory usage (and runtime), while this increases runtime, allowing
control over memory use.
target_success: int
The number of successes to target before exiting early.
Should likely be ~100 but can go higher to.
max_monte_carlo_cycles_steps: int
The number of steps to use. Each step consists of monte_carlo_cycles cycles, each of which has monte_carlo_count_per_cycle iterations.
monte_carlo_seed: int
The seed to use for the RNG.
"""
def __init__(
self,
model_name_pairs: Sequence[Tuple[str, pdme.model.DipoleModel]],
filter: DirectMonteCarloFilter,
config: DirectMonteCarloConfig,
):
self.model_name_pairs = model_name_pairs
# self.measurements = measurements
# self.dot_inputs = [(measure.r, measure.f) for measure in self.measurements]
# self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
# self.dot_inputs
# )
self.config = config
self.filter = filter
# (
# self.lows,
# self.highs,
# ) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
# self.measurements
# )
def _single_run(
self, model_name_pair: Tuple[str, pdme.model.DipoleModel], seed
) -> numpy.ndarray:
rng = numpy.random.default_rng(seed)
_, model = model_name_pair
# don't log here it's madness
# _logger.info(f"Executing for model {model_name}")
sample_dipoles = model.get_monte_carlo_dipole_inputs(
self.config.monte_carlo_count_per_cycle, -1, rng
)
current_sample = sample_dipoles
return self.filter.filter_samples(current_sample)
# for di, low, high in zip(self.dot_inputs_array, self.lows, self.highs):
# if len(current_sample) < 1:
# break
# vals = pdme.util.fast_v_calc.fast_vs_for_dipoleses(
# numpy.array([di]), current_sample
# )
# current_sample = current_sample[
# numpy.all((vals > low) & (vals < high), axis=1)
# ]
# return current_sample
def _wrapped_single_run(self, args: Tuple):
"""
single run wrapped up for multiprocessing call.
takes in a tuple of arguments corresponding to
(model_name_pair, seed, return_configs)
return_configs is a boolean, if true then will return tuple of (count, [matching configs])
if false, return (count, [])
"""
# here's where we do our work
model_name_pair, seed, return_configs = args
cycle_success_configs = self._single_run(model_name_pair, seed)
cycle_success_count = len(cycle_success_configs)
if return_configs:
return (cycle_success_count, cycle_success_configs)
else:
return (cycle_success_count, [])
def execute_no_multiprocessing(self) -> Sequence[DirectMonteCarloResult]:
count_per_step = (
self.config.monte_carlo_count_per_cycle * self.config.monte_carlo_cycles
)
seed_sequence = numpy.random.SeedSequence(self.config.monte_carlo_seed)
# core count etc. logic here
results = []
for model_name_pair in self.model_name_pairs:
step_count = 0
total_success = 0
total_count = 0
_logger.info(f"Working on model {model_name_pair[0]}")
# This is probably where multiprocessing logic should go
while (step_count < self.config.max_monte_carlo_cycles_steps) and (
total_success < self.config.target_success
):
_logger.debug(f"Executing step {step_count}")
for cycle_i, seed in enumerate(
seed_sequence.spawn(self.config.monte_carlo_cycles)
):
# here's where we do our work
cycle_success_configs = self._single_run(model_name_pair, seed)
cycle_success_count = len(cycle_success_configs)
if cycle_success_count > 0:
_logger.debug(
f"For cycle {cycle_i} received {cycle_success_count} successes"
)
# _logger.debug(cycle_success_configs)
if self.config.write_successes_to_file:
sorted_by_freq = numpy.array(
[
pdme.subspace_simulation.sort_array_of_dipoles_by_frequency(
dipole_config
)
for dipole_config in cycle_success_configs
]
)
dipole_count = numpy.array(cycle_success_configs).shape[1]
for n in range(dipole_count):
number_dipoles_to_write = self.config.target_success * 5
_logger.info(f"Limiting to {number_dipoles_to_write=}")
numpy.savetxt(
f"{self.config.tag}_{step_count}_{cycle_i}_dipole_{n}.csv",
sorted_by_freq[:number_dipoles_to_write, n],
delimiter=",",
)
total_success += cycle_success_count
_logger.debug(
f"At end of step {step_count} have {total_success} successes"
)
step_count += 1
total_count += count_per_step
results.append(
DirectMonteCarloResult(
successes=total_success,
monte_carlo_count=total_count,
likelihood=total_success / total_count,
model_name=model_name_pair[0],
)
)
return results
def execute(self) -> Sequence[DirectMonteCarloResult]:
filename = self.config.get_filename()
if self.config.skip_if_exists:
_logger.info(f"Checking if {filename} exists")
cwd = pathlib.Path.cwd()
if (cwd / filename).exists():
_logger.info(f"File {filename} exists, skipping")
return []
if self.config.bayesrun_file_timestamp:
_logger.info(
"Also need to check file endings because of possible past or current timestamps, check only occurs if writing timestamp is set"
)
pattern = self.config.get_filename_regex()
for file in cwd.iterdir():
match = re.match(pattern, file.name)
if match is not None:
_logger.info(f"Matched {file.name} to {pattern}")
_logger.info(f"File {filename} exists, skipping")
return []
_logger.info(
f"Finished checking against pattern {pattern}, hopefully didn't take too long!"
)
count_per_step = (
self.config.monte_carlo_count_per_cycle * self.config.monte_carlo_cycles
)
seed_sequence = numpy.random.SeedSequence(self.config.monte_carlo_seed)
# core count etc. logic here
core_count = multiprocessing.cpu_count() - 1 or 1
if (self.config.cap_core_count >= 1) and (
self.config.cap_core_count < core_count
):
core_count = self.config.cap_core_count
_logger.info(f"Using {core_count} cores")
results = []
with multiprocessing.Pool(core_count) as pool:
for model_name_pair in self.model_name_pairs:
_logger.info(f"Working on model {model_name_pair[0]}")
# This is probably where multiprocessing logic should go
step_count = 0
total_success = 0
total_count = 0
while (step_count < self.config.max_monte_carlo_cycles_steps) and (
total_success < self.config.target_success
):
step_count += 1
_logger.debug(f"Executing step {step_count}")
seeds = seed_sequence.spawn(self.config.monte_carlo_cycles)
raw_pool_results = list(
pool.imap_unordered(
self._wrapped_single_run,
[
(
model_name_pair,
seed,
self.config.write_successes_to_file,
)
for seed in seeds
],
self.config.chunk_size,
)
)
pool_results = sum(result[0] for result in raw_pool_results)
_logger.debug(f"Pool results: {pool_results}")
if self.config.write_successes_to_file:
_logger.info("Writing dipole results")
cycle_success_configs = numpy.concatenate(
[result[1] for result in raw_pool_results]
)
dipole_count = numpy.array(cycle_success_configs).shape[1]
max_number_dipoles_to_write = self.config.target_success * 5
_logger.debug(
f"Limiting to {max_number_dipoles_to_write=}, have {len(cycle_success_configs)}"
)
if len(cycle_success_configs):
sorted_by_freq = numpy.array(
[
pdme.subspace_simulation.sort_array_of_dipoles_by_frequency(
dipole_config
)
for dipole_config in cycle_success_configs[
:max_number_dipoles_to_write
]
]
)
for n in range(dipole_count):
dipole_filename = (
f"{self.config.tag}_{step_count}_dipole_{n}.csv"
)
_logger.debug(
f"Writing {min(len(cycle_success_configs), max_number_dipoles_to_write)} to {dipole_filename}"
)
numpy.savetxt(
dipole_filename,
sorted_by_freq[:, n],
delimiter=",",
)
else:
_logger.debug(
"Instructed to write results, but none obtained"
)
total_success += pool_results
total_count += count_per_step
_logger.debug(
f"At end of step {step_count} have {total_success} successes"
)
results.append(
DirectMonteCarloResult(
successes=total_success,
monte_carlo_count=total_count,
likelihood=total_success / total_count,
model_name=model_name_pair[0],
)
)
if self.config.write_bayesrun_file:
_logger.info(f"Going to write to file [{filename}]")
# row: Dict[str, Union[int, float, str]] = {}
row = {}
num_models = len(self.model_name_pairs)
success_weight = sum(
[
(
max(ANTI_ZERO_SUCCESS_THRES, res.successes)
/ res.monte_carlo_count
)
/ num_models
for res in results
]
)
for res in results:
row.update(
{
f"{res.model_name}_success": res.successes,
f"{res.model_name}_count": res.monte_carlo_count,
f"{res.model_name}_prob": (
max(ANTI_ZERO_SUCCESS_THRES, res.successes)
/ res.monte_carlo_count
)
/ (num_models * success_weight),
}
)
_logger.info(f"Writing row {row}")
fieldnames = list(row.keys())
with open(filename, "w", newline="") as outfile:
writer = csv.DictWriter(outfile, fieldnames=fieldnames, dialect="unix")
writer.writeheader()
writer.writerow(row)
return results

View File

@ -1,115 +0,0 @@
from numpy import ndarray
from deepdog.direct_monte_carlo.direct_mc import DirectMonteCarloFilter
from typing import Sequence
import pdme.measurement
import pdme.measurement.input_types
import pdme.util.fast_nonlocal_spectrum
import pdme.util.fast_v_calc
import numpy
class SingleDotPotentialFilter(DirectMonteCarloFilter):
def __init__(self, measurements: Sequence[pdme.measurement.DotRangeMeasurement]):
self.measurements = measurements
self.dot_inputs = [(measure.r, measure.f) for measure in self.measurements]
self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
self.dot_inputs
)
(
self.lows,
self.highs,
) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
self.measurements
)
def filter_samples(self, samples: ndarray) -> ndarray:
current_sample = samples
for di, low, high in zip(self.dot_inputs_array, self.lows, self.highs):
if len(current_sample) < 1:
break
vals = pdme.util.fast_v_calc.fast_vs_for_dipoleses(
numpy.array([di]), current_sample
)
current_sample = current_sample[
numpy.all((vals > low) & (vals < high), axis=1)
]
return current_sample
class SingleDotSpinQubitFrequencyFilter(DirectMonteCarloFilter):
def __init__(self, measurements: Sequence[pdme.measurement.DotRangeMeasurement]):
self.measurements = measurements
self.dot_inputs = [(measure.r, measure.f) for measure in self.measurements]
self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
self.dot_inputs
)
(
self.lows,
self.highs,
) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
self.measurements
)
def filter_samples(self, samples: ndarray) -> ndarray:
current_sample = samples
for di, low, high in zip(self.dot_inputs_array, self.lows, self.highs):
if len(current_sample) < 1:
break
vals = pdme.util.fast_v_calc.fast_efieldxs_for_dipoleses(
numpy.array([di]), current_sample
)
# _logger.info(vals)
current_sample = current_sample[
numpy.all((vals > low) & (vals < high), axis=1)
]
# _logger.info(f"leaving with {len(current_sample)}")
return current_sample
class DoubleDotSpinQubitFrequencyFilter(DirectMonteCarloFilter):
def __init__(
self,
pair_phase_measurements: Sequence[pdme.measurement.DotPairRangeMeasurement],
):
self.pair_phase_measurements = pair_phase_measurements
self.dot_pair_inputs = [
(measure.r1, measure.r2, measure.f)
for measure in self.pair_phase_measurements
]
self.dot_pair_inputs_array = (
pdme.measurement.input_types.dot_pair_inputs_to_array(self.dot_pair_inputs)
)
(
self.pair_phase_lows,
self.pair_phase_highs,
) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
self.pair_phase_measurements
)
def filter_samples(self, samples: ndarray) -> ndarray:
current_sample = samples
for pi, plow, phigh in zip(
self.dot_pair_inputs_array, self.pair_phase_lows, self.pair_phase_highs
):
if len(current_sample) < 1:
break
vals = pdme.util.fast_nonlocal_spectrum.signarg(
pdme.util.fast_nonlocal_spectrum.fast_s_spin_qubit_tarucha_nonlocal_dipoleses(
numpy.array([pi]), current_sample
)
)
current_sample = current_sample[
numpy.all(
((vals > plow) & (vals < phigh)) | ((vals < plow) & (vals > phigh)),
axis=1,
)
]
return current_sample

View File

@ -1,62 +0,0 @@
"""
Probably should just include a way to handle the indexify function I reuse so much.
All about breaking an integer into a tuple of values from lists, which is useful because of how we do CHTC runs.
"""
import itertools
import typing
import logging
import math
_logger = logging.getLogger(__name__)
# from https://stackoverflow.com/questions/5228158/cartesian-product-of-a-dictionary-of-lists
def _dict_product(dicts):
"""
>>> list(dict_product(dict(number=[1,2], character='ab')))
[{'character': 'a', 'number': 1},
{'character': 'a', 'number': 2},
{'character': 'b', 'number': 1},
{'character': 'b', 'number': 2}]
"""
return list(dict(zip(dicts.keys(), x)) for x in itertools.product(*dicts.values()))
class Indexifier:
"""
The order of keys is very important, but collections.OrderedDict is no longer needed in python 3.7.
I think it's okay to rely on that.
"""
def __init__(self, list_dict: typing.Dict[str, typing.Sequence]):
self.dict = list_dict
self.product_dict = _dict_product(self.dict)
def indexify(self, n: int) -> typing.Dict[str, typing.Any]:
return self.product_dict[n]
def __len__(self) -> int:
weights = [len(v) for v in self.dict.values()]
return math.prod(weights)
def _indexify_indices(self, n: int) -> typing.Sequence[int]:
"""
legacy indexify from old scripts, copypast.
could be used like
>>> ret = {}
>>> for k, i in zip(self.dict.keys(), self._indexify_indices):
>>> ret[k] = self.dict[k][i]
>>> return ret
"""
weights = [len(v) for v in self.dict.values()]
N = math.prod(weights)
curr_n = n
curr_N = N
out = []
for w in weights[:-1]:
# print(f"current: {curr_N}, {curr_n}, {curr_n // w}")
curr_N = curr_N // w # should be int division anyway
out.append(curr_n // curr_N)
curr_n = curr_n % curr_N
return out

View File

@ -1,3 +1,3 @@
from importlib.metadata import version
__version__ = version("deepdog")
__version__ = version('deepdog')

View File

@ -1,442 +0,0 @@
import pdme.inputs
import pdme.model
import pdme.measurement
import pdme.measurement.input_types
import pdme.measurement.oscillating_dipole
import pdme.util.fast_v_calc
import pdme.util.fast_nonlocal_spectrum
from typing import Sequence, Tuple, List, Dict, Union, Optional
import datetime
import csv
import multiprocessing
import logging
import numpy
# TODO: remove hardcode
CHUNKSIZE = 50
_logger = logging.getLogger(__name__)
def get_a_result_fast_filter_pairs(input) -> int:
(
model,
dot_inputs,
lows,
highs,
pair_inputs,
pair_lows,
pair_highs,
monte_carlo_count,
seed,
) = input
rng = numpy.random.default_rng(seed)
# TODO: A long term refactor is to pull the frequency stuff out from here. The None stands for max_frequency, which is unneeded in the actually useful models.
sample_dipoles = model.get_monte_carlo_dipole_inputs(
monte_carlo_count, None, rng_to_use=rng
)
current_sample = sample_dipoles
for di, low, high in zip(dot_inputs, lows, highs):
if len(current_sample) < 1:
break
vals = pdme.util.fast_v_calc.fast_vs_for_dipoleses(
numpy.array([di]), current_sample
)
current_sample = current_sample[numpy.all((vals > low) & (vals < high), axis=1)]
for pi, plow, phigh in zip(pair_inputs, pair_lows, pair_highs):
if len(current_sample) < 1:
break
vals = pdme.util.fast_nonlocal_spectrum.fast_s_nonlocal_dipoleses(
numpy.array([pi]), current_sample
)
current_sample = current_sample[
numpy.all(
((vals > plow) & (vals < phigh)) | ((vals < plow) & (vals > phigh)),
axis=1,
)
]
return len(current_sample)
def get_a_result_fast_filter_potential_pair_phase_only(input) -> int:
(
model,
pair_inputs,
pair_phase_lows,
pair_phase_highs,
monte_carlo_count,
seed,
) = input
rng = numpy.random.default_rng(seed)
# TODO: A long term refactor is to pull the frequency stuff out from here. The None stands for max_frequency, which is unneeded in the actually useful models.
sample_dipoles = model.get_monte_carlo_dipole_inputs(
monte_carlo_count, None, rng_to_use=rng
)
current_sample = sample_dipoles
for pi, plow, phigh in zip(pair_inputs, pair_phase_lows, pair_phase_highs):
if len(current_sample) < 1:
break
vals = pdme.util.fast_nonlocal_spectrum.signarg(
pdme.util.fast_nonlocal_spectrum.fast_s_nonlocal_dipoleses(
numpy.array([pi]), current_sample
)
)
current_sample = current_sample[
numpy.all(
((vals > plow) & (vals < phigh)) | ((vals < plow) & (vals > phigh)),
axis=1,
)
]
return len(current_sample)
def get_a_result_fast_filter_tarucha_spin_qubit_pair_phase_only(input) -> int:
(
model,
pair_inputs,
pair_phase_lows,
pair_phase_highs,
monte_carlo_count,
seed,
) = input
rng = numpy.random.default_rng(seed)
# TODO: A long term refactor is to pull the frequency stuff out from here. The None stands for max_frequency, which is unneeded in the actually useful models.
sample_dipoles = model.get_monte_carlo_dipole_inputs(
monte_carlo_count, None, rng_to_use=rng
)
current_sample = sample_dipoles
for pi, plow, phigh in zip(pair_inputs, pair_phase_lows, pair_phase_highs):
if len(current_sample) < 1:
break
###
# This should be abstracted out, but we're going to dump it here for time pressure's sake
###
# vals = pdme.util.fast_nonlocal_spectrum.signarg(
# pdme.util.fast_nonlocal_spectrum.fast_s_nonlocal_dipoleses(
# numpy.array([pi]), current_sample
# )
#
vals = pdme.util.fast_nonlocal_spectrum.signarg(
pdme.util.fast_nonlocal_spectrum.fast_s_spin_qubit_tarucha_nonlocal_dipoleses(
numpy.array([pi]), current_sample
)
)
current_sample = current_sample[
numpy.all(
((vals > plow) & (vals < phigh)) | ((vals < plow) & (vals > phigh)),
axis=1,
)
]
return len(current_sample)
def get_a_result_fast_filter(input) -> int:
model, dot_inputs, lows, highs, monte_carlo_count, seed = input
rng = numpy.random.default_rng(seed)
# TODO: A long term refactor is to pull the frequency stuff out from here. The None stands for max_frequency, which is unneeded in the actually useful models.
sample_dipoles = model.get_monte_carlo_dipole_inputs(
monte_carlo_count, None, rng_to_use=rng
)
current_sample = sample_dipoles
for di, low, high in zip(dot_inputs, lows, highs):
if len(current_sample) < 1:
break
vals = pdme.util.fast_v_calc.fast_vs_for_dipoleses(
numpy.array([di]), current_sample
)
current_sample = current_sample[numpy.all((vals > low) & (vals < high), axis=1)]
return len(current_sample)
class RealSpectrumRun:
"""
A bayes run given some real data.
Parameters
----------
measurements : Sequence[pdme.measurement.DotRangeMeasurement]
The dot inputs for this bayes run.
models_with_names : Sequence[Tuple(str, pdme.model.DipoleModel)]
The models to evaluate.
actual_model : pdme.model.DipoleModel
The model which is actually correct.
filename_slug : str
The filename slug to include.
run_count: int
The number of runs to do.
If pair_measurements is not None, uses pair measurement method (and single measurements too).
If pair_phase_measurements is not None, ignores measurements and uses phase measurements _only_
This is lazy design on my part.
"""
def __init__(
self,
measurements: Sequence[pdme.measurement.DotRangeMeasurement],
models_with_names: Sequence[Tuple[str, pdme.model.DipoleModel]],
filename_slug: str,
monte_carlo_count: int = 10000,
monte_carlo_cycles: int = 10,
target_success: int = 100,
max_monte_carlo_cycles_steps: int = 10,
chunksize: int = CHUNKSIZE,
initial_seed: int = 12345,
cap_core_count: int = 0,
pair_measurements: Optional[
Sequence[pdme.measurement.DotPairRangeMeasurement]
] = None,
pair_phase_measurements: Optional[
Sequence[pdme.measurement.DotPairRangeMeasurement]
] = None,
) -> None:
self.measurements = measurements
self.dot_inputs = [(measure.r, measure.f) for measure in self.measurements]
self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
self.dot_inputs
)
if pair_measurements is not None:
self.pair_measurements = pair_measurements
self.use_pair_measurements = True
self.use_pair_phase_measurements = False
self.dot_pair_inputs = [
(measure.r1, measure.r2, measure.f)
for measure in self.pair_measurements
]
self.dot_pair_inputs_array = (
pdme.measurement.input_types.dot_pair_inputs_to_array(
self.dot_pair_inputs
)
)
elif pair_phase_measurements is not None:
self.use_pair_measurements = False
self.use_pair_phase_measurements = True
self.pair_phase_measurements = pair_phase_measurements
self.dot_pair_inputs = [
(measure.r1, measure.r2, measure.f)
for measure in self.pair_phase_measurements
]
self.dot_pair_inputs_array = (
pdme.measurement.input_types.dot_pair_inputs_to_array(
self.dot_pair_inputs
)
)
else:
self.use_pair_measurements = False
self.use_pair_phase_measurements = False
self.models = [model for (_, model) in models_with_names]
self.model_names = [name for (name, _) in models_with_names]
self.model_count = len(self.models)
self.monte_carlo_count = monte_carlo_count
self.monte_carlo_cycles = monte_carlo_cycles
self.target_success = target_success
self.max_monte_carlo_cycles_steps = max_monte_carlo_cycles_steps
self.csv_fields = []
self.compensate_zeros = True
self.chunksize = chunksize
for name in self.model_names:
self.csv_fields.extend([f"{name}_success", f"{name}_count", f"{name}_prob"])
# for now initialise priors as uniform.
self.probabilities = [1 / self.model_count] * self.model_count
timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
ff_string = "fast_filter"
self.filename = f"{timestamp}-{filename_slug}.realdata.{ff_string}.bayesrun.csv"
self.initial_seed = initial_seed
self.cap_core_count = cap_core_count
def go(self) -> None:
with open(self.filename, "a", newline="") as outfile:
writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
writer.writeheader()
(
lows,
highs,
) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
self.measurements
)
pair_lows = None
pair_highs = None
if self.use_pair_measurements:
(
pair_lows,
pair_highs,
) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
self.pair_measurements
)
pair_phase_lows = None
pair_phase_highs = None
if self.use_pair_phase_measurements:
(
pair_phase_lows,
pair_phase_highs,
) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
self.pair_phase_measurements
)
# define a new seed sequence for each run
seed_sequence = numpy.random.SeedSequence(self.initial_seed)
results = []
_logger.debug("Going to iterate over models now")
core_count = multiprocessing.cpu_count() - 1 or 1
if (self.cap_core_count >= 1) and (self.cap_core_count < core_count):
core_count = self.cap_core_count
_logger.info(f"Using {core_count} cores")
for model_count, (model, model_name) in enumerate(
zip(self.models, self.model_names)
):
_logger.debug(f"Doing model #{model_count}: {model_name}")
with multiprocessing.Pool(core_count) as pool:
cycle_count = 0
cycle_success = 0
cycles = 0
while (cycles < self.max_monte_carlo_cycles_steps) and (
cycle_success <= self.target_success
):
_logger.debug(f"Starting cycle {cycles}")
cycles += 1
current_success = 0
cycle_count += self.monte_carlo_count * self.monte_carlo_cycles
# generate a seed from the sequence for each core.
# note this needs to be inside the loop for monte carlo cycle steps!
# that way we get more stuff.
seeds = seed_sequence.spawn(self.monte_carlo_cycles)
if self.use_pair_measurements:
_logger.debug("using pair measurements")
current_success = sum(
pool.imap_unordered(
get_a_result_fast_filter_pairs,
[
(
model,
self.dot_inputs_array,
lows,
highs,
self.dot_pair_inputs_array,
pair_lows,
pair_highs,
self.monte_carlo_count,
seed,
)
for seed in seeds
],
self.chunksize,
)
)
elif self.use_pair_phase_measurements:
_logger.debug("using pair phase measurements")
_logger.debug("specifically using tarucha")
current_success = sum(
pool.imap_unordered(
get_a_result_fast_filter_tarucha_spin_qubit_pair_phase_only,
[
(
model,
self.dot_pair_inputs_array,
pair_phase_lows,
pair_phase_highs,
self.monte_carlo_count,
seed,
)
for seed in seeds
],
self.chunksize,
)
)
else:
current_success = sum(
pool.imap_unordered(
get_a_result_fast_filter,
[
(
model,
self.dot_inputs_array,
lows,
highs,
self.monte_carlo_count,
seed,
)
for seed in seeds
],
self.chunksize,
)
)
cycle_success += current_success
_logger.debug(f"current running successes: {cycle_success}")
results.append((cycle_count, cycle_success))
_logger.debug("Done, constructing output now")
row: Dict[str, Union[int, float, str]] = {}
successes: List[float] = []
counts: List[int] = []
for model_index, (name, (count, result)) in enumerate(
zip(self.model_names, results)
):
row[f"{name}_success"] = result
row[f"{name}_count"] = count
successes.append(max(result, 0.5))
counts.append(count)
success_weight = sum(
[
(succ / count) * prob
for succ, count, prob in zip(successes, counts, self.probabilities)
]
)
new_probabilities = [
(succ / count) * old_prob / success_weight
for succ, count, old_prob in zip(successes, counts, self.probabilities)
]
self.probabilities = new_probabilities
for name, probability in zip(self.model_names, self.probabilities):
row[f"{name}_prob"] = probability
_logger.info(row)
with open(self.filename, "a", newline="") as outfile:
writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
writer.writerow(row)

View File

@ -1,166 +0,0 @@
import dataclasses
import re
import typing
import logging
import deepdog.indexify
import pathlib
import csv
from deepdog.results.read_csv import (
parse_bayesrun_row,
BayesrunModelResult,
parse_general_row,
GeneralModelResult,
)
from deepdog.results.filename import parse_file_slug
_logger = logging.getLogger(__name__)
FILENAME_REGEX = re.compile(
r"(?P<timestamp>\d{8}-\d{6})-(?P<filename_slug>.*)\.realdata\.fast_filter\.bayesrun\.csv"
)
# probably a better way but who cares
NO_TIMESTAMP_FILENAME_REGEX = re.compile(
r"(?P<filename_slug>.*)\.realdata\.fast_filter\.bayesrun\.csv"
)
SUBSET_SIM_FILENAME_REGEX = re.compile(
r"(?P<filename_slug>.*)-(?:no_adaptive_steps_)?(?P<num_ss_runs>\d+)-nc_(?P<n_c>\d+)-ns_(?P<n_s>\d+)-mmax_(?P<mmax>\d+)\.multi\.subsetsim\.csv"
)
@dataclasses.dataclass
class BayesrunOutputFilename:
timestamp: typing.Optional[str]
filename_slug: str
path: pathlib.Path
@dataclasses.dataclass
class BayesrunOutput:
filename: BayesrunOutputFilename
data: typing.Dict["str", typing.Any]
results: typing.Sequence[BayesrunModelResult]
@dataclasses.dataclass
class GeneralOutput:
filename: BayesrunOutputFilename
data: typing.Dict["str", typing.Any]
results: typing.Sequence[GeneralModelResult]
def _parse_string_output_filename(
filename: str,
) -> typing.Tuple[typing.Optional[str], str]:
if match := FILENAME_REGEX.match(filename):
groups = match.groupdict()
return (groups["timestamp"], groups["filename_slug"])
elif match := NO_TIMESTAMP_FILENAME_REGEX.match(filename):
groups = match.groupdict()
return (None, groups["filename_slug"])
else:
raise ValueError(f"Could not parse {filename} as a bayesrun output filename")
def _parse_output_filename(file: pathlib.Path) -> BayesrunOutputFilename:
filename = file.name
timestamp, slug = _parse_string_output_filename(filename)
return BayesrunOutputFilename(timestamp=timestamp, filename_slug=slug, path=file)
def _parse_ss_output_filename(file: pathlib.Path) -> BayesrunOutputFilename:
filename = file.name
match = SUBSET_SIM_FILENAME_REGEX.match(filename)
if not match:
raise ValueError(f"{filename} was not a valid subset sim output")
groups = match.groupdict()
return BayesrunOutputFilename(
filename_slug=groups["filename_slug"], path=file, timestamp=None
)
def read_subset_sim_file(
file: pathlib.Path, indexifier: typing.Optional[deepdog.indexify.Indexifier]
) -> GeneralOutput:
parsed_filename = tag = _parse_ss_output_filename(file)
out = GeneralOutput(filename=parsed_filename, data={}, results=[])
out.data.update(dataclasses.asdict(tag))
parsed_tag = parse_file_slug(parsed_filename.filename_slug)
if parsed_tag is None:
_logger.warning(
f"Could not parse {tag} against any matching regexes. Going to skip tag parsing"
)
else:
out.data.update(parsed_tag)
if indexifier is not None:
try:
job_index = parsed_tag["job_index"]
indexified = indexifier.indexify(int(job_index))
out.data.update(indexified)
except KeyError:
# This isn't really that important of an error, apart from the warning
_logger.warning(
f"Parsed tag to {parsed_tag}, and attempted to indexify but no job_index key was found. skipping and moving on"
)
with file.open() as input_file:
reader = csv.DictReader(input_file)
rows = [r for r in reader]
if len(rows) == 1:
row = rows[0]
else:
raise ValueError(f"Confused about having multiple rows in {file.name}")
results = parse_general_row(
row, ("num_finished_runs", "num_runs", None, "estimated_likelihood")
)
out.results = results
return out
def read_output_file(
file: pathlib.Path, indexifier: typing.Optional[deepdog.indexify.Indexifier]
) -> BayesrunOutput:
parsed_filename = tag = _parse_output_filename(file)
out = BayesrunOutput(filename=parsed_filename, data={}, results=[])
out.data.update(dataclasses.asdict(tag))
parsed_tag = parse_file_slug(parsed_filename.filename_slug)
if parsed_tag is None:
_logger.warning(
f"Could not parse {tag} against any matching regexes. Going to skip tag parsing"
)
else:
out.data.update(parsed_tag)
if indexifier is not None:
try:
job_index = parsed_tag["job_index"]
indexified = indexifier.indexify(int(job_index))
out.data.update(indexified)
except KeyError:
# This isn't really that important of an error, apart from the warning
_logger.warning(
f"Parsed tag to {parsed_tag}, and attempted to indexify but no job_index key was found. skipping and moving on"
)
with file.open() as input_file:
reader = csv.DictReader(input_file)
rows = [r for r in reader]
if len(rows) == 1:
row = rows[0]
else:
raise ValueError(f"Confused about having multiple rows in {file.name}")
results = parse_bayesrun_row(row)
out.results = results
return out
__all__ = ["read_output_file", "BayesrunOutput"]

View File

@ -1,22 +0,0 @@
import re
import typing
FILE_SLUG_REGEXES = [
re.compile(pattern)
for pattern in [
r"(?P<tag>\w+)-(?P<job_index>\d+)",
r"mock_tarucha-(?P<job_index>\d+)",
r"(?:(?P<mock>mock)_)?tarucha(?:_(?P<tarucha_run_id>\d+))?-(?P<job_index>\d+)",
r"(?P<tag>\w+)-(?P<included_dots>[\w,]+)-(?P<target_cost>\d*\.?\d+)-(?P<job_index>\d+)",
]
]
def parse_file_slug(slug: str) -> typing.Optional[typing.Dict[str, str]]:
for pattern in FILE_SLUG_REGEXES:
match = pattern.match(slug)
if match:
return match.groupdict()
else:
return None

View File

@ -1,141 +0,0 @@
import typing
import re
import dataclasses
MODEL_REGEXES = [
re.compile(pattern)
for pattern in [
r"geom_(?P<xmin>-?\d+)_(?P<xmax>-?\d+)_(?P<ymin>-?\d+)_(?P<ymax>-?\d+)_(?P<zmin>-?\d+)_(?P<zmax>-?\d+)-orientation_(?P<orientation>free|fixedxy|fixedz)-dipole_count_(?P<avg_filled>\d+)_(?P<field_name>\w*)",
r"geom_(?P<xmin>-?\d+)_(?P<xmax>-?\d+)_(?P<ymin>-?\d+)_(?P<ymax>-?\d+)_(?P<zmin>-?\d+)_(?P<zmax>-?\d+)-magnitude_(?P<log_magnitude>\d*\.?\d+)-orientation_(?P<orientation>free|fixedxy|fixedz)-dipole_count_(?P<avg_filled>\d+)_(?P<field_name>\w*)",
r"geom_(?P<xmin>-?\d*\.?\d+)_(?P<xmax>-?\d*\.?\d+)_(?P<ymin>-?\d*\.?\d+)_(?P<ymax>-?\d*\.?\d+)_(?P<zmin>-?\d*\.?\d+)_(?P<zmax>-?\d*\.?\d+)-magnitude_(?P<log_magnitude>\d*\.?\d+)-orientation_(?P<orientation>free|fixedxy|fixedz)-dipole_count_(?P<avg_filled>\d+)_(?P<field_name>\w*)",
r"geom_(?P<xmin>-?\d+)_(?P<xmax>-?\d+)_(?P<ymin>-?\d+)_(?P<ymax>-?\d+)_(?P<zmin>-?\d+)_(?P<zmax>-?\d+)-magnitude_(?P<log_magnitude>-?\d*\.?\d+)-orientation_(?P<orientation>free|fixedxy|fixedz)-dipole_count_(?P<avg_filled>\d+)_(?P<field_name>\w*)",
r"geom_(?P<xmin>-?\d*\.?\d+)_(?P<xmax>-?\d*\.?\d+)_(?P<ymin>-?\d*\.?\d+)_(?P<ymax>-?\d*\.?\d+)_(?P<zmin>-?\d*\.?\d+)_(?P<zmax>-?\d*\.?\d+)-magnitude_(?P<log_magnitude>-?\d*\.?\d+)-orientation_(?P<orientation>free|fixedxy|fixedz)-dipole_count_(?P<avg_filled>\d+)_(?P<field_name>\w*)",
]
]
@dataclasses.dataclass
class BayesrunModelResult:
parsed_model_keys: typing.Dict[str, str]
success: int
count: int
@dataclasses.dataclass
class GeneralModelResult:
parsed_model_keys: typing.Dict[str, str]
result_dict: typing.Dict[str, str]
class BayesrunColumnParsed:
"""
class for parsing a bayesrun while pulling certain special fields out
"""
def __init__(self, groupdict: typing.Dict[str, str]):
self.column_field = groupdict["field_name"]
self.model_field_dict = {
k: v for k, v in groupdict.items() if k != "field_name"
}
self._groupdict_str = repr(groupdict)
def __str__(self):
return f"BayesrunColumnParsed[{self.column_field}: {self.model_field_dict}]"
def __repr__(self):
return f"BayesrunColumnParsed({self._groupdict_str})"
def __eq__(self, other):
if isinstance(other, BayesrunColumnParsed):
return (self.column_field == other.column_field) and (
self.model_field_dict == other.model_field_dict
)
return NotImplemented
def _parse_bayesrun_column(
column: str,
) -> typing.Optional[BayesrunColumnParsed]:
"""
Tries one by one all of a predefined list of regexes that I might have used in the past.
Returns the groupdict for the first match, or None if no match found.
"""
for pattern in MODEL_REGEXES:
match = pattern.match(column)
if match:
return BayesrunColumnParsed(match.groupdict())
else:
return None
def _batch_iterable_into_chunks(iterable, n=1):
"""
utility for batching bayesrun files where columns appear in threes
"""
for ndx in range(0, len(iterable), n):
yield iterable[ndx : min(ndx + n, len(iterable))]
def parse_general_row(
row: typing.Dict[str, str],
expected_fields: typing.Sequence[typing.Optional[str]],
) -> typing.Sequence[GeneralModelResult]:
results = []
batched_keys = _batch_iterable_into_chunks(list(row.keys()), len(expected_fields))
for model_keys in batched_keys:
parsed = [_parse_bayesrun_column(column) for column in model_keys]
values = [row[column] for column in model_keys]
result_dict = {}
parsed_keys = None
for expected_field, parsed_field, value in zip(expected_fields, parsed, values):
if expected_field is None:
continue
if parsed_field is None:
raise ValueError(
f"No viable row found for {expected_field=} in {model_keys=}"
)
if parsed_field.column_field != expected_field:
raise ValueError(
f"The column {parsed_field.column_field} does not match expected {expected_field}"
)
result_dict[expected_field] = value
if parsed_keys is None:
parsed_keys = parsed_field.model_field_dict
if parsed_keys is None:
raise ValueError(f"Somehow parsed keys is none here, for {row=}")
results.append(
GeneralModelResult(parsed_model_keys=parsed_keys, result_dict=result_dict)
)
return results
def parse_bayesrun_row(
row: typing.Dict[str, str],
) -> typing.Sequence[BayesrunModelResult]:
results = []
batched_keys = _batch_iterable_into_chunks(list(row.keys()), 3)
for model_keys in batched_keys:
parsed = [_parse_bayesrun_column(column) for column in model_keys]
values = [row[column] for column in model_keys]
if parsed[0] is None:
raise ValueError(f"no viable success row found for keys {model_keys}")
if parsed[1] is None:
raise ValueError(f"no viable count row found for keys {model_keys}")
if parsed[0].column_field != "success":
raise ValueError(f"The column {model_keys[0]} is not a success field")
if parsed[1].column_field != "count":
raise ValueError(f"The column {model_keys[1]} is not a count field")
parsed_keys = parsed[0].model_field_dict
success = int(values[0])
count = int(values[1])
results.append(
BayesrunModelResult(
parsed_model_keys=parsed_keys,
success=success,
count=count,
)
)
return results

View File

@ -1,3 +0,0 @@
from deepdog.subset_simulation.subset_simulation_impl import SubsetSimulation
__all__ = ["SubsetSimulation"]

View File

@ -1,623 +0,0 @@
import logging
import multiprocessing
import numpy
import pdme.measurement
import pdme.measurement.input_types
import pdme.model
import pdme.subspace_simulation
from typing import Sequence, Tuple, Optional, Callable, Union, List
from dataclasses import dataclass
_logger = logging.getLogger(__name__)
@dataclass
class SubsetSimulationResult:
probs_list: Sequence[Tuple]
over_target_cost: Optional[float]
over_target_likelihood: Optional[float]
under_target_cost: Optional[float]
under_target_likelihood: Optional[float]
lowest_likelihood: Optional[float]
messages: Sequence[str]
@dataclass
class MultiSubsetSimulationResult:
child_results: Sequence[SubsetSimulationResult]
model_name: str
estimated_likelihood: float
arithmetic_mean_estimated_likelihood: float
num_children: int
num_finished_children: int
clean_estimate: bool
class SubsetSimulation:
def __init__(
self,
model_name_pair,
# actual_measurements: Sequence[pdme.measurement.DotMeasurement],
cost_function: Callable[[numpy.ndarray], numpy.ndarray],
n_c: int,
n_s: int,
m_max: int,
target_cost: Optional[float] = None,
level_0_seed: Union[int, Sequence[int]] = 200,
mcmc_seed: Union[int, Sequence[int]] = 20,
use_adaptive_steps=True,
default_phi_step=0.01,
default_theta_step=0.01,
default_r_step=0.01,
default_w_log_step=0.01,
default_upper_w_log_step=4,
num_initial_dmc_gens=1,
keep_probs_list=True,
dump_last_generation_to_file=False,
initial_cost_chunk_size=100,
initial_cost_multiprocess=True,
cap_core_count: int = 0, # 0 means cap at num cores - 1
):
name, model = model_name_pair
self.model_name = name
self.model = model
_logger.info(f"got model {self.model_name}")
# dot_inputs = [(meas.r, meas.f) for meas in actual_measurements]
# self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
# dot_inputs
# )
# _logger.debug(f"actual measurements: {actual_measurements}")
# self.actual_measurement_array = numpy.array([m.v for m in actual_measurements])
# def cost_function_to_use(dipoles_to_test):
# return pdme.subspace_simulation.proportional_costs_vs_actual_measurement(
# self.dot_inputs_array, self.actual_measurement_array, dipoles_to_test
# )
self.cost_function_to_use = cost_function
self.n_c = n_c
self.n_s = n_s
self.m_max = m_max
self.level_0_seed = level_0_seed
self.mcmc_seed = mcmc_seed
self.use_adaptive_steps = use_adaptive_steps
self.default_phi_step = (
default_phi_step * 1.73
) # this is a hack to fix a missing sqrt 3 in the proposal function code.
self.default_theta_step = default_theta_step
self.default_r_step = (
default_r_step * 1.73
) # this is a hack to fix a missing sqrt 3 in the proposal function code.
self.default_w_log_step = (
default_w_log_step * 1.73
) # this is a hack to fix a missing sqrt 3 in the proposal function code.
self.default_upper_w_log_step = default_upper_w_log_step
_logger.info("using params:")
_logger.info(f"\tn_c: {self.n_c}")
_logger.info(f"\tn_s: {self.n_s}")
_logger.info(f"\tm: {self.m_max}")
_logger.info(f"\t{num_initial_dmc_gens=}")
_logger.info(f"\t{mcmc_seed=}")
_logger.info(f"\t{level_0_seed=}")
_logger.info("let's do level 0...")
self.target_cost = target_cost
_logger.info(f"will stop at target cost {target_cost}")
self.keep_probs_list = keep_probs_list
self.dump_last_generations = dump_last_generation_to_file
self.initial_cost_chunk_size = initial_cost_chunk_size
self.initial_cost_multiprocess = initial_cost_multiprocess
self.cap_core_count = cap_core_count
self.num_dmc_gens = num_initial_dmc_gens
def _single_chain_gen(self, args: Tuple):
threshold_cost, stdevs, rng_seed, (c, s) = args
rng = numpy.random.default_rng(rng_seed)
return self.model.get_repeat_counting_mcmc_chain(
s,
self.cost_function_to_use,
self.n_s,
threshold_cost,
stdevs,
initial_cost=c,
rng_arg=rng,
)
def execute(self) -> SubsetSimulationResult:
probs_list = []
output_messages = []
# If we have n_s = 10 and n_c = 100, then our big N = 1000 and p = 1/10
# The DMC stage would normally generate 1000, then pick the best 100 and start counting prob = p/10.
# Let's say we want our DMC stage to go down to level 2.
# Then we need to filter out p^2, so our initial has to be N_0 = N / p = n_c * n_s^2
initial_dmc_n = self.n_c * (self.n_s**self.num_dmc_gens)
initial_level = (
self.num_dmc_gens - 1
) # This is perfunctory but let's label it here really explicitly
_logger.info(f"Generating {initial_dmc_n} for DMC stage")
sample_dipoles = self.model.get_monte_carlo_dipole_inputs(
initial_dmc_n,
-1,
rng_to_use=numpy.random.default_rng(self.level_0_seed),
)
# _logger.debug(sample_dipoles)
# _logger.debug(sample_dipoles.shape)
_logger.debug("Finished dipole generation")
_logger.debug(
f"Using iterated multiprocessing cost function thing with chunk size {self.initial_cost_chunk_size}"
)
# core count etc. logic here
core_count = multiprocessing.cpu_count() - 1 or 1
if (self.cap_core_count >= 1) and (self.cap_core_count < core_count):
core_count = self.cap_core_count
_logger.info(f"Using {core_count} cores")
with multiprocessing.Pool(core_count) as pool:
# Do the initial DMC calculation in a multiprocessing
chunks = numpy.array_split(
sample_dipoles,
range(
self.initial_cost_chunk_size,
len(sample_dipoles),
self.initial_cost_chunk_size,
),
)
if self.initial_cost_multiprocess:
_logger.debug("Multiprocessing initial costs")
raw_costs = pool.map(self.cost_function_to_use, chunks)
else:
_logger.debug("Single process initial costs")
raw_costs = []
for chunk_idx, chunk in enumerate(chunks):
_logger.debug(f"doing chunk #{chunk_idx}")
raw_costs.append(self.cost_function_to_use(chunk))
costs = numpy.concatenate(raw_costs)
_logger.debug("finished initial dmc cost calculation")
# _logger.debug(f"costs: {costs}")
sorted_indexes = costs.argsort()[::-1]
# _logger.debug(costs[sorted_indexes])
# _logger.debug(sample_dipoles[sorted_indexes])
sorted_costs = costs[sorted_indexes]
sorted_dipoles = sample_dipoles[sorted_indexes]
all_dipoles = numpy.array(
[
pdme.subspace_simulation.sort_array_of_dipoles_by_frequency(samp)
for samp in sorted_dipoles
]
)
all_chains = list(zip(sorted_costs, all_dipoles))
for dmc_level in range(initial_level):
# if initial level is 1, we want to print out what the level 0 threshold would have been?
_logger.debug(f"Get the pseudo statistics for level {dmc_level}")
_logger.debug(f"Whole chain has length {len(all_chains)}")
pseudo_threshold_index = -(
self.n_c * (self.n_s ** (self.num_dmc_gens - dmc_level - 1))
)
_logger.debug(
f"Have a pseudo_threshold_index of {pseudo_threshold_index}, or {len(all_chains) + pseudo_threshold_index}"
)
pseudo_threshold_cost = all_chains[-pseudo_threshold_index][0]
_logger.info(
f"Pseudo-level {dmc_level} threshold cost {pseudo_threshold_cost}, at P = (1 / {self.n_s})^{dmc_level + 1}"
)
all_chains = all_chains[pseudo_threshold_index:]
long_mcmc_rng = numpy.random.default_rng(self.mcmc_seed)
mcmc_rng_seed_sequence = numpy.random.SeedSequence(self.mcmc_seed)
threshold_cost = all_chains[-self.n_c][0]
_logger.info(
f"Finishing DMC threshold cost {threshold_cost} at level {initial_level}, at P = (1 / {self.n_s})^{initial_level + 1}"
)
_logger.debug(f"Executing the MCMC with chains of length {len(all_chains)}")
# Now we move on to the MCMC part of the algorithm
# This is important, we want to allow some extra initial levels so we need to account for that here!
for i in range(self.num_dmc_gens, self.m_max):
_logger.info(f"Starting level {i}")
next_seeds = all_chains[-self.n_c :]
if self.dump_last_generations:
_logger.info("writing out csv file")
next_dipoles_seed_dipoles = numpy.array([n[1] for n in next_seeds])
for n in range(self.model.n):
_logger.info(f"{next_dipoles_seed_dipoles[:, n].shape}")
numpy.savetxt(
f"generation_{self.n_c}_{self.n_s}_{i}_dipole_{n}.csv",
next_dipoles_seed_dipoles[:, n],
delimiter=",",
)
next_seeds_as_array = numpy.array([s for _, s in next_seeds])
stdevs = self.get_stdevs_from_arrays(next_seeds_as_array)
_logger.info(f"got stdevs: {stdevs.stdevs}")
all_long_chains = []
for seed_index, (c, s) in enumerate(
next_seeds[:: len(next_seeds) // 20]
):
# chain = mcmc(s, threshold_cost, n_s, model, dot_inputs_array, actual_measurement_array, mcmc_rng, curr_cost=c, stdevs=stdevs)
# until new version gotta do
_logger.debug(
f"\t{seed_index}: doing long chain on the next seed"
)
long_chain = self.model.get_mcmc_chain(
s,
self.cost_function_to_use,
1000,
threshold_cost,
stdevs,
initial_cost=c,
rng_arg=long_mcmc_rng,
)
for _, chained in long_chain:
all_long_chains.append(chained)
all_long_chains_array = numpy.array(all_long_chains)
for n in range(self.model.n):
_logger.info(f"{all_long_chains_array[:, n].shape}")
numpy.savetxt(
f"long_chain_generation_{self.n_c}_{self.n_s}_{i}_dipole_{n}.csv",
all_long_chains_array[:, n],
delimiter=",",
)
if self.keep_probs_list:
for cost_index, cost_chain in enumerate(all_chains[: -self.n_c]):
probs_list.append(
(
(
(self.n_c * self.n_s - cost_index)
/ (self.n_c * self.n_s)
)
/ (self.n_s ** (i)),
cost_chain[0],
i + 1,
)
)
next_seeds_as_array = numpy.array([s for _, s in next_seeds])
stdevs = self.get_stdevs_from_arrays(next_seeds_as_array)
_logger.debug(f"got stdevs, begin: {stdevs.stdevs[:10]}")
_logger.debug("Starting the MCMC")
all_chains = []
seeds = mcmc_rng_seed_sequence.spawn(len(next_seeds))
pool_results = pool.imap_unordered(
self._single_chain_gen,
[
(threshold_cost, stdevs, rng_seed, test_seed)
for rng_seed, test_seed in zip(seeds, next_seeds)
],
chunksize=50,
)
# count for ergodicity analysis
samples_generated = 0
samples_rejected = 0
for rejected_count, chain in pool_results:
for cost, chained in chain:
try:
filtered_cost = cost[0]
except (IndexError, TypeError):
filtered_cost = cost
all_chains.append((filtered_cost, chained))
samples_generated += self.n_s
samples_rejected += rejected_count
_logger.debug("finished mcmc")
_logger.debug(f"{samples_rejected=} out of {samples_generated=}")
if samples_rejected * 2 > samples_generated:
reject_ratio = samples_rejected / samples_generated
rejectionmessage = f"On level {i}, rejected {samples_rejected} out of {samples_generated}, {reject_ratio=} is too high and may indicate ergodicity problems"
output_messages.append(rejectionmessage)
_logger.warning(rejectionmessage)
# _logger.debug(all_chains)
all_chains.sort(key=lambda c: c[0], reverse=True)
_logger.debug("finished sorting all_chains")
threshold_cost = all_chains[-self.n_c][0]
_logger.info(
f"current threshold cost: {threshold_cost}, at P = (1 / {self.n_s})^{i + 1}"
)
if (self.target_cost is not None) and (
threshold_cost < self.target_cost
):
_logger.info(
f"got a threshold cost {threshold_cost}, less than {self.target_cost}. will leave early"
)
cost_list = [c[0] for c in all_chains]
over_index = reverse_bisect_right(cost_list, self.target_cost)
winner = all_chains[over_index][1]
_logger.info(f"Winner obtained: {winner}")
shorter_probs_list = []
for cost_index, cost_chain in enumerate(all_chains):
if self.keep_probs_list:
probs_list.append(
(
(
(self.n_c * self.n_s - cost_index)
/ (self.n_c * self.n_s)
)
/ (self.n_s ** (i)),
cost_chain[0],
i + 1,
)
)
shorter_probs_list.append(
(
cost_chain[0],
(
(self.n_c * self.n_s - cost_index)
/ (self.n_c * self.n_s)
)
/ (self.n_s ** (i)),
)
)
# _logger.info(shorter_probs_list)
result = SubsetSimulationResult(
probs_list=probs_list,
over_target_cost=shorter_probs_list[over_index - 1][0],
over_target_likelihood=shorter_probs_list[over_index - 1][1],
under_target_cost=shorter_probs_list[over_index][0],
under_target_likelihood=shorter_probs_list[over_index][1],
lowest_likelihood=shorter_probs_list[-1][1],
messages=output_messages,
)
return result
# _logger.debug([c[0] for c in all_chains[-n_c:]])
_logger.info(f"doing level {i + 1}")
if self.keep_probs_list:
for cost_index, cost_chain in enumerate(all_chains):
probs_list.append(
(
((self.n_c * self.n_s - cost_index) / (self.n_c * self.n_s))
/ (self.n_s ** (self.m_max)),
cost_chain[0],
self.m_max + 1,
)
)
threshold_cost = all_chains[-self.n_c][0]
_logger.info(
f"final threshold cost: {threshold_cost}, at P = (1 / {self.n_s})^{self.m_max + 1}"
)
# for a in all_chains[-10:]:
# _logger.info(a)
# for prob, prob_cost in probs_list:
# _logger.info(f"\t{prob}: {prob_cost}")
probs_list.sort(key=lambda c: c[0], reverse=True)
min_likelihood = ((1) / (self.n_c * self.n_s)) / (self.n_s ** (self.m_max))
result = SubsetSimulationResult(
probs_list=probs_list,
over_target_cost=None,
over_target_likelihood=None,
under_target_cost=None,
under_target_likelihood=None,
lowest_likelihood=min_likelihood,
messages=output_messages,
)
return result
def get_stdevs_from_arrays(
self, array
) -> pdme.subspace_simulation.MCMCStandardDeviation:
# stdevs = get_stdevs_from_arrays(next_seeds_as_array, model)
if self.use_adaptive_steps:
stdev_array = []
count = array.shape[1]
for dipole_index in range(count):
selected = array[:, dipole_index]
pxs = selected[:, 0]
pys = selected[:, 1]
pzs = selected[:, 2]
thetas = numpy.arccos(pzs / self.model.pfixed)
phis = numpy.arctan2(pys, pxs)
rstdevs = numpy.maximum(
numpy.std(selected, axis=0)[3:6],
self.default_r_step / (self.n_s * 10),
)
frequency_stdevs = numpy.minimum(
numpy.maximum(
numpy.std(numpy.log(selected[:, -1])),
self.default_w_log_step / (self.n_s * 10),
),
self.default_upper_w_log_step,
)
stdev_array.append(
pdme.subspace_simulation.DipoleStandardDeviation(
p_theta_step=max(
numpy.std(thetas), self.default_theta_step / (self.n_s * 10)
),
p_phi_step=max(
numpy.std(phis), self.default_phi_step / (self.n_s * 10)
),
rx_step=rstdevs[0],
ry_step=rstdevs[1],
rz_step=rstdevs[2],
w_log_step=frequency_stdevs,
)
)
else:
default_stdev = pdme.subspace_simulation.DipoleStandardDeviation(
self.default_phi_step,
self.default_theta_step,
self.default_r_step,
self.default_r_step,
self.default_r_step,
self.default_w_log_step,
)
stdev_array = [default_stdev]
stdevs = pdme.subspace_simulation.MCMCStandardDeviation(stdev_array)
return stdevs
class MultiSubsetSimulations:
def __init__(
self,
model_name_pairs: Sequence[Tuple[str, pdme.model.DipoleModel]],
# actual_measurements: Sequence[pdme.measurement.DotMeasurement],
cost_function: Callable[[numpy.ndarray], numpy.ndarray],
num_runs: int,
n_c: int,
n_s: int,
m_max: int,
target_cost: float,
num_initial_dmc_gens: int = 1,
level_0_seed_seed: int = 200,
mcmc_seed_seed: int = 20,
use_adaptive_steps=True,
default_phi_step=0.01,
default_theta_step=0.01,
default_r_step=0.01,
default_w_log_step=0.01,
default_upper_w_log_step=4,
initial_cost_chunk_size=100,
cap_core_count: int = 0, # 0 means cap at num cores - 1
):
self.model_name_pairs = model_name_pairs
self.cost_function = cost_function
self.num_runs = num_runs
self.n_c = n_c
self.n_s = n_s
self.m_max = m_max
self.target_cost = target_cost # This is not optional here!
self.num_dmc_gens = num_initial_dmc_gens
self.level_0_seed_seed = level_0_seed_seed
self.mcmc_seed_seed = mcmc_seed_seed
self.use_adaptive_steps = use_adaptive_steps
self.default_phi_step = default_phi_step
self.default_theta_step = default_theta_step
self.default_r_step = default_r_step
self.default_w_log_step = default_w_log_step
self.default_upper_w_log_step = default_upper_w_log_step
self.initial_cost_chunk_size = initial_cost_chunk_size
self.cap_core_count = cap_core_count
def execute(self) -> Sequence[MultiSubsetSimulationResult]:
output: List[MultiSubsetSimulationResult] = []
for model_index, model_name_pair in enumerate(self.model_name_pairs):
ss_results = [
SubsetSimulation(
model_name_pair,
self.cost_function,
self.n_c,
self.n_s,
self.m_max,
self.target_cost,
num_initial_dmc_gens=self.num_dmc_gens,
level_0_seed=[model_index, run_index, self.level_0_seed_seed],
mcmc_seed=[model_index, run_index, self.mcmc_seed_seed],
use_adaptive_steps=self.use_adaptive_steps,
default_phi_step=self.default_phi_step,
default_theta_step=self.default_theta_step,
default_r_step=self.default_r_step,
default_w_log_step=self.default_w_log_step,
default_upper_w_log_step=self.default_upper_w_log_step,
keep_probs_list=False,
dump_last_generation_to_file=False,
initial_cost_chunk_size=self.initial_cost_chunk_size,
cap_core_count=self.cap_core_count,
).execute()
for run_index in range(self.num_runs)
]
output.append(coalesce_ss_results(model_name_pair[0], ss_results))
return output
def coalesce_ss_results(
model_name: str, results: Sequence[SubsetSimulationResult]
) -> MultiSubsetSimulationResult:
num_finished = sum(1 for res in results if res.under_target_likelihood is not None)
estimated_likelihoods = numpy.array(
[
res.under_target_likelihood
if res.under_target_likelihood is not None
else res.lowest_likelihood
for res in results
]
)
_logger.info(estimated_likelihoods)
geometric_mean_estimated_likelihoods = numpy.exp(
numpy.log(estimated_likelihoods).mean()
)
_logger.info(geometric_mean_estimated_likelihoods)
arithmetic_mean_estimated_likelihoods = estimated_likelihoods.mean()
result = MultiSubsetSimulationResult(
child_results=results,
model_name=model_name,
estimated_likelihood=geometric_mean_estimated_likelihoods,
arithmetic_mean_estimated_likelihood=arithmetic_mean_estimated_likelihoods,
num_children=len(results),
num_finished_children=num_finished,
clean_estimate=num_finished == len(results),
)
return result
def reverse_bisect_right(a, x, lo=0, hi=None):
"""Return the index where to insert item x in list a, assuming a is sorted in descending order.
The return value i is such that all e in a[:i] have e >= x, and all e in
a[i:] have e < x. So if x already appears in the list, a.insert(x) will
insert just after the rightmost x already there.
Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
Essentially, the function returns number of elements in a which are >= than x.
>>> a = [8, 6, 5, 4, 2]
>>> reverse_bisect_right(a, 5)
3
>>> a[:reverse_bisect_right(a, 5)]
[8, 6, 5]
"""
if lo < 0:
raise ValueError("lo must be non-negative")
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo + hi) // 2
if x > a[mid]:
hi = mid
else:
lo = mid + 1
return lo

View File

@ -1,231 +0,0 @@
import pdme.inputs
import pdme.model
import pdme.measurement
import pdme.measurement.input_types
import pdme.measurement.oscillating_dipole
import pdme.util.fast_v_calc
import pdme.util.fast_nonlocal_spectrum
from typing import Sequence, Tuple, List, Dict, Union, Mapping
import datetime
import csv
import multiprocessing
import logging
import numpy
# TODO: remove hardcode
CHUNKSIZE = 50
_logger = logging.getLogger(__name__)
def get_a_result_fast_filter(input) -> int:
# (
# model,
# self.dot_inputs_array_dict,
# low_high_dict,
# self.monte_carlo_count,
# seed,
# )
model, dot_inputs_dict, low_high_dict, monte_carlo_count, seed = input
rng = numpy.random.default_rng(seed)
# TODO: A long term refactor is to pull the frequency stuff out from here. The None stands for max_frequency, which is unneeded in the actually useful models.
sample_dipoles = model.get_monte_carlo_dipole_inputs(
monte_carlo_count, None, rng_to_use=rng
)
current_sample = sample_dipoles
for temp in dot_inputs_dict.keys():
dot_inputs = dot_inputs_dict[temp]
lows, highs = low_high_dict[temp]
for di, low, high in zip(dot_inputs, lows, highs):
if len(current_sample) < 1:
break
vals = pdme.util.fast_v_calc.fast_vs_for_asymmetric_dipoleses(
numpy.array([di]), current_sample, temp
)
current_sample = current_sample[
numpy.all((vals > low) & (vals < high), axis=1)
]
return len(current_sample)
class TempAwareRealSpectrumRun:
"""
A bayes run given some real data, with potentially variable temperature.
Parameters
----------
measurements_dict : Dict[float, Sequence[pdme.measurement.DotRangeMeasurement]]
The dot inputs for this bayes run, in a dictionary indexed by temperatures
models_with_names : models_with_names: Sequence[Tuple[str, pdme.model.DipoleModel]],
The models to evaluate.
actual_model : pdme.model.DipoleModel
The model which is actually correct.
filename_slug : str
The filename slug to include.
run_count: int
The number of runs to do.
"""
def __init__(
self,
measurements_dict: Mapping[
float, Sequence[pdme.measurement.DotRangeMeasurement]
],
models_with_names: Sequence[Tuple[str, pdme.model.DipoleModel]],
filename_slug: str,
monte_carlo_count: int = 10000,
monte_carlo_cycles: int = 10,
target_success: int = 100,
max_monte_carlo_cycles_steps: int = 10,
chunksize: int = CHUNKSIZE,
initial_seed: int = 12345,
cap_core_count: int = 0,
) -> None:
self.measurements_dict = measurements_dict
self.dot_inputs_dict = {
k: [(measure.r, measure.f) for measure in measurements]
for k, measurements in measurements_dict.items()
}
self.dot_inputs_array_dict = {
k: pdme.measurement.input_types.dot_inputs_to_array(dot_inputs)
for k, dot_inputs in self.dot_inputs_dict.items()
}
self.models = [model for (_, model) in models_with_names]
self.model_names = [name for (name, _) in models_with_names]
self.model_count = len(self.models)
self.monte_carlo_count = monte_carlo_count
self.monte_carlo_cycles = monte_carlo_cycles
self.target_success = target_success
self.max_monte_carlo_cycles_steps = max_monte_carlo_cycles_steps
self.csv_fields = []
self.compensate_zeros = True
self.chunksize = chunksize
for name in self.model_names:
self.csv_fields.extend([f"{name}_success", f"{name}_count", f"{name}_prob"])
# for now initialise priors as uniform.
self.probabilities = [1 / self.model_count] * self.model_count
timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
ff_string = "fast_filter"
self.filename = f"{timestamp}-{filename_slug}.realdata.{ff_string}.bayesrun.csv"
self.initial_seed = initial_seed
self.cap_core_count = cap_core_count
def go(self) -> None:
with open(self.filename, "a", newline="") as outfile:
writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
writer.writeheader()
low_high_dict = {}
for temp, measurements in self.measurements_dict.items():
(
lows,
highs,
) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
measurements
)
low_high_dict[temp] = (lows, highs)
# define a new seed sequence for each run
seed_sequence = numpy.random.SeedSequence(self.initial_seed)
results = []
_logger.debug("Going to iterate over models now")
core_count = multiprocessing.cpu_count() - 1 or 1
if (self.cap_core_count >= 1) and (self.cap_core_count < core_count):
core_count = self.cap_core_count
_logger.info(f"Using {core_count} cores")
for model_count, (model, model_name) in enumerate(
zip(self.models, self.model_names)
):
_logger.debug(f"Doing model #{model_count}: {model_name}")
with multiprocessing.Pool(core_count) as pool:
cycle_count = 0
cycle_success = 0
cycles = 0
while (cycles < self.max_monte_carlo_cycles_steps) and (
cycle_success <= self.target_success
):
_logger.debug(f"Starting cycle {cycles}")
cycles += 1
current_success = 0
cycle_count += self.monte_carlo_count * self.monte_carlo_cycles
# generate a seed from the sequence for each core.
# note this needs to be inside the loop for monte carlo cycle steps!
# that way we get more stuff.
seeds = seed_sequence.spawn(self.monte_carlo_cycles)
result_func = get_a_result_fast_filter
current_success = sum(
pool.imap_unordered(
result_func,
[
(
model,
self.dot_inputs_array_dict,
low_high_dict,
self.monte_carlo_count,
seed,
)
for seed in seeds
],
self.chunksize,
)
)
cycle_success += current_success
_logger.debug(f"current running successes: {cycle_success}")
results.append((cycle_count, cycle_success))
_logger.debug("Done, constructing output now")
row: Dict[str, Union[int, float, str]] = {}
successes: List[float] = []
counts: List[int] = []
for model_index, (name, (count, result)) in enumerate(
zip(self.model_names, results)
):
row[f"{name}_success"] = result
row[f"{name}_count"] = count
successes.append(max(result, 0.5))
counts.append(count)
success_weight = sum(
[
(succ / count) * prob
for succ, count, prob in zip(successes, counts, self.probabilities)
]
)
new_probabilities = [
(succ / count) * old_prob / success_weight
for succ, count, old_prob in zip(successes, counts, self.probabilities)
]
self.probabilities = new_probabilities
for name, probability in zip(self.model_names, self.probabilities):
row[f"{name}_prob"] = probability
_logger.info(row)
with open(self.filename, "a", newline="") as outfile:
writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
writer.writerow(row)

33
do.sh Normal file
View File

@ -0,0 +1,33 @@
#!/usr/bin/env bash
# Do - The Simplest Build Tool on Earth.
# Documentation and examples see https://github.com/8gears/do
set -Eeuo pipefail # -e "Automatic exit from bash shell script on error" -u "Treat unset variables and parameters as errors"
build() {
echo "I am ${FUNCNAME[0]}ing"
poetry build
}
test() {
echo "I am ${FUNCNAME[0]}ing"
poetry run flake8 deepdog tests
poetry run mypy deepdog
poetry run pytest
}
release() {
./scripts/release.sh
}
htmlcov() {
poetry run pytest --cov-report=html
}
all() {
build && test
}
"$@" # <- execute the task
[ "$#" -gt 0 ] || printf "Usage:\n\t./do.sh %s\n" "($(compgen -A function | grep '^[^_]' | paste -sd '|' -))"

174
flake.lock generated
View File

@ -1,174 +0,0 @@
{
"nodes": {
"flake-utils": {
"inputs": {
"systems": "systems"
},
"locked": {
"lastModified": 1710146030,
"narHash": "sha256-SZ5L6eA7HJ/nmkzGG7/ISclqe6oZdOZTNoesiInkXPQ=",
"owner": "numtide",
"repo": "flake-utils",
"rev": "b1d9ab70662946ef0850d488da1c9019f3a9752a",
"type": "github"
},
"original": {
"owner": "numtide",
"repo": "flake-utils",
"type": "github"
}
},
"flake-utils_2": {
"inputs": {
"systems": "systems_2"
},
"locked": {
"lastModified": 1705309234,
"narHash": "sha256-uNRRNRKmJyCRC/8y1RqBkqWBLM034y4qN7EprSdmgyA=",
"owner": "numtide",
"repo": "flake-utils",
"rev": "1ef2e671c3b0c19053962c07dbda38332dcebf26",
"type": "github"
},
"original": {
"owner": "numtide",
"repo": "flake-utils",
"type": "github"
}
},
"nix-github-actions": {
"inputs": {
"nixpkgs": [
"poetry2nixSrc",
"nixpkgs"
]
},
"locked": {
"lastModified": 1703863825,
"narHash": "sha256-rXwqjtwiGKJheXB43ybM8NwWB8rO2dSRrEqes0S7F5Y=",
"owner": "nix-community",
"repo": "nix-github-actions",
"rev": "5163432afc817cf8bd1f031418d1869e4c9d5547",
"type": "github"
},
"original": {
"owner": "nix-community",
"repo": "nix-github-actions",
"type": "github"
}
},
"nixpkgs": {
"locked": {
"lastModified": 1710703777,
"narHash": "sha256-M4CNAgjrtvrxIWIAc98RTYcVFoAgwUhrYekeiMScj18=",
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "fc7885fbcea4b782142e06ce2d4d08cf92862004",
"type": "github"
},
"original": {
"owner": "NixOS",
"repo": "nixpkgs",
"type": "github"
}
},
"poetry2nixSrc": {
"inputs": {
"flake-utils": "flake-utils_2",
"nix-github-actions": "nix-github-actions",
"nixpkgs": [
"nixpkgs"
],
"systems": "systems_3",
"treefmt-nix": "treefmt-nix"
},
"locked": {
"lastModified": 1708589824,
"narHash": "sha256-2GOiFTkvs5MtVF65sC78KNVxQSmsxtk0WmV1wJ9V2ck=",
"owner": "nix-community",
"repo": "poetry2nix",
"rev": "3c92540611f42d3fb2d0d084a6c694cd6544b609",
"type": "github"
},
"original": {
"owner": "nix-community",
"repo": "poetry2nix",
"type": "github"
}
},
"root": {
"inputs": {
"flake-utils": "flake-utils",
"nixpkgs": "nixpkgs",
"poetry2nixSrc": "poetry2nixSrc"
}
},
"systems": {
"locked": {
"lastModified": 1681028828,
"narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
"owner": "nix-systems",
"repo": "default",
"rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
"type": "github"
},
"original": {
"owner": "nix-systems",
"repo": "default",
"type": "github"
}
},
"systems_2": {
"locked": {
"lastModified": 1681028828,
"narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
"owner": "nix-systems",
"repo": "default",
"rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
"type": "github"
},
"original": {
"owner": "nix-systems",
"repo": "default",
"type": "github"
}
},
"systems_3": {
"locked": {
"lastModified": 1681028828,
"narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
"owner": "nix-systems",
"repo": "default",
"rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
"type": "github"
},
"original": {
"id": "systems",
"type": "indirect"
}
},
"treefmt-nix": {
"inputs": {
"nixpkgs": [
"poetry2nixSrc",
"nixpkgs"
]
},
"locked": {
"lastModified": 1708335038,
"narHash": "sha256-ETLZNFBVCabo7lJrpjD6cAbnE11eDOjaQnznmg/6hAE=",
"owner": "numtide",
"repo": "treefmt-nix",
"rev": "e504621290a1fd896631ddbc5e9c16f4366c9f65",
"type": "github"
},
"original": {
"owner": "numtide",
"repo": "treefmt-nix",
"type": "github"
}
}
},
"root": "root",
"version": 7
}

View File

@ -1,47 +0,0 @@
{
description = "Application packaged using poetry2nix";
inputs.flake-utils.url = "github:numtide/flake-utils";
inputs.nixpkgs.url = "github:NixOS/nixpkgs";
inputs.poetry2nixSrc = {
url = "github:nix-community/poetry2nix";
inputs.nixpkgs.follows = "nixpkgs";
};
outputs = { self, nixpkgs, flake-utils, poetry2nixSrc }:
flake-utils.lib.eachDefaultSystem (system:
let
pkgs = nixpkgs.legacyPackages.${system};
poetry2nix = poetry2nixSrc.lib.mkPoetry2Nix { inherit pkgs; };
in {
packages = {
deepdogApp = poetry2nix.mkPoetryApplication {
projectDir = self;
python = pkgs.python39;
preferWheels = true;
};
deepdogEnv = poetry2nix.mkPoetryEnv {
projectDir = self;
python = pkgs.python39;
preferWheels = true;
overrides = poetry2nix.overrides.withDefaults (self: super: {
});
};
default = self.packages.${system}.deepdogEnv;
};
devShells.default = pkgs.mkShell {
inputsFrom = [ self.packages.${system}.deepdogEnv ];
buildInputs = [
pkgs.poetry
self.packages.${system}.deepdogEnv
self.packages.${system}.deepdogApp
pkgs.just
pkgs.nodejs
];
shellHook = ''
export DO_NIX_CUSTOM=1
'';
};
}
);
}

View File

@ -1,11 +1,9 @@
apiVersion: v1
kind: Pod
spec:
imagePullSecrets:
- name: regcreds
containers: # list of containers that you want present for your build, you can define a default container in the Jenkinsfile
- name: poetry
image: ghcr.io/dmallubhotla/poetry-image:1
- name: python
image: python:3.8
command: ["tail", "-f", "/dev/null"] # this or any command that is bascially a noop is required, this is so that you don't overwrite the entrypoint of the base container
imagePullPolicy: Always # use cache or pull image for agent
resources: # limits the resources your build contaienr

View File

@ -1,60 +0,0 @@
# execute default build
default: build
# builds the python module using poetry
build:
echo "building..."
poetry build
# print a message displaying whether nix is being used
checknix:
#!/usr/bin/env bash
set -euxo pipefail
if [[ "${DO_NIX_CUSTOM:=0}" -eq 1 ]]; then
echo "In an interactive nix env."
else
echo "Using poetry as runner, no nix detected."
fi
# run all tests
test: fmt
#!/usr/bin/env bash
set -euxo pipefail
if [[ "${DO_NIX_CUSTOM:=0}" -eq 1 ]]; then
echo "testing, using nix..."
flake8 deepdog tests
mypy deepdog
pytest
else
echo "testing..."
poetry run flake8 deepdog tests
poetry run mypy deepdog
poetry run pytest
fi
# format code
fmt:
#!/usr/bin/env bash
set -euxo pipefail
if [[ "${DO_NIX_CUSTOM:=0}" -eq 1 ]]; then
black .
else
poetry run black .
fi
find deepdog -type f -name "*.py" -exec sed -i -e 's/ /\t/g' {} \;
find tests -type f -name "*.py" -exec sed -i -e 's/ /\t/g' {} \;
# release the app, checking that our working tree is clean and ready for release, optionally takes target version
release version="":
#!/usr/bin/env bash
set -euxo pipefail
if [[ -n "{{version}}" ]]; then
./scripts/release.sh {{version}}
else
./scripts/release.sh
fi
htmlcov:
poetry run pytest --cov-report=html

1528
poetry.lock generated

File diff suppressed because it is too large Load Diff

View File

@ -1,28 +1,19 @@
[tool.poetry]
name = "deepdog"
version = "1.7.0"
version = "0.3.5"
description = ""
authors = ["Deepak Mallubhotla <dmallubhotla+github@gmail.com>"]
[tool.poetry.dependencies]
python = ">=3.8.1,<3.10"
pdme = "^1.5.0"
numpy = "1.22.3"
scipy = "1.10"
tqdm = "^4.66.2"
python = "^3.8,<3.10"
pdme = "^0.5.4"
[tool.poetry.dev-dependencies]
pytest = ">=6"
flake8 = "^4.0.1"
pytest-cov = "^4.1.0"
mypy = "^0.971"
pytest-cov = "^3.0.0"
mypy = "^0.931"
python-semantic-release = "^7.24.0"
black = "^22.3.0"
syrupy = "^4.0.8"
[tool.poetry.scripts]
probs = "deepdog.cli.probs:wrapped_main"
subset_sim_probs = "deepdog.cli.subset_sim_probs:wrapped_main"
[build-system]
requires = ["poetry-core>=1.0.0"]
@ -43,13 +34,6 @@ module = [
]
ignore_missing_imports = true
[[tool.mypy.overrides]]
module = [
"tqdm",
"tqdm.*"
]
ignore_missing_imports = true
[tool.semantic_release]
version_toml = "pyproject.toml:tool.poetry.version"
tag_format = "{version}"

View File

@ -25,22 +25,15 @@ if [ -z "$(git status --porcelain)" ]; then
exit 0
fi
std_version_args=()
if [[ -n "${1:-}" ]]; then
std_version_args+=( "--release-as" "$1" )
echo "Parameter $1 was supplied, so we should use release-as"
else
echo "No release-as parameter specifed."
fi
# Working directory clean
echo "Doing a dry run..."
npx standard-version --dry-run "${std_version_args[@]}"
npx standard-version --dry-run
read -p "Does that look good? [y/N] " -n 1 -r
echo # (optional) move to a new line
if [[ $REPLY =~ ^[Yy]$ ]]
then
# do dangerous stuff
npx standard-version "${std_version_args[@]}"
npx standard-version
git push --follow-tags origin master
else
echo "okay, never mind then..."

View File

@ -1,4 +1,4 @@
const pattern = /(\[tool\.poetry\]\nname = "deepdog"\nversion = ")(?<vers>\d+\.\d+\.\d+)(")/mg;
const pattern = /(\[tool\.poetry\]\nname = "deepdog"\nversion = ")(?<vers>\d+\.\d+\.\d)(")/mg;
module.exports.readVersion = function (contents) {
const result = pattern.exec(contents);

View File

@ -1,177 +0,0 @@
# serializer version: 1
# name: test_basic_analysis
list([
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 0.1,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.3333333333333333,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.1,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.3333333333333333,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.1,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.3333333333333333,
'dipole_frequency_1': 0.006029931414230269,
'dipole_frequency_2': 85436.78758379082,
'dipole_location_1': array([-4.76615152, -6.33160296, 5.29522808]),
'dipole_location_2': array([-4.72700391, -2.06478573, 6.52467702]),
'dipole_moment_1': array([ 860.14181416, -450.27082062, -239.60852996]),
'dipole_moment_2': array([ 908.18325588, -208.52681777, -362.93214244]),
}),
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 0.45,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.3103448275862069,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.9,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.6206896551724138,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.1,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.06896551724137932,
'dipole_frequency_1': 102275.63477261562,
'dipole_frequency_2': 1755280.9783485082,
'dipole_location_1': array([ 4.71515397, -9.70362197, 5.43016546]),
'dipole_location_2': array([3.42476038, 3.88562934, 5.15034328]),
'dipole_moment_1': array([-502.60742674, -790.60222587, 349.7626267 ]),
'dipole_moment_2': array([-192.42708465, -434.81009148, -879.7226844 ]),
}),
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 0.7,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.6631578947368421,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.1,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.18947368421052635,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.7,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.1473684210526316,
'dipole_frequency_1': 2896.799464036654,
'dipole_frequency_2': 9.980565189326681e-05,
'dipole_location_1': array([-4.97465789, 12.54716531, 6.06324588]),
'dipole_location_2': array([ 9.84518459, -11.1183876 , 7.35028226]),
'dipole_moment_1': array([997.67961917, 19.6376112 , 65.19004305]),
'dipole_moment_2': array([305.63093655, 440.57669389, 844.08643362]),
}),
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 0.1,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.663157894736842,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.1,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.18947368421052635,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.1,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.1473684210526316,
'dipole_frequency_1': 1.4522667818288244,
'dipole_frequency_2': 2704.9795645301197,
'dipole_location_1': array([ 7.38183022, 16.6745801 , 7.10428414]),
'dipole_location_2': array([-8.15636906, -9.56609132, 6.34141559]),
'dipole_moment_1': array([-145.9924693 , 738.74936496, 657.97839986]),
'dipole_moment_2': array([-960.16113239, 104.96824669, -258.98314046]),
}),
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 0.9,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.9465776293823038,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.1,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.030050083472454105,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.1,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.02337228714524208,
'dipole_frequency_1': 3827.2315421318913,
'dipole_frequency_2': 1.9301094166184413e-05,
'dipole_location_1': array([ 5.02067673, -0.9783039 , 6.1431897 ]),
'dipole_location_2': array([ 4.66628999, 10.80907459, 7.21771744]),
'dipole_moment_1': array([ 871.30659253, -299.17389491, -388.99846068]),
'dipole_moment_2': array([-189.87268624, 677.28285845, 710.79975568]),
}),
])
# ---
# name: test_bayesss_with_tighter_cost
list([
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.33333333333333337,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.33333333333333337,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.33333333333333337,
'dipole_frequency_1': 0.006029931414230269,
'dipole_frequency_2': 85436.78758379082,
'dipole_location_1': array([-4.76615152, -6.33160296, 5.29522808]),
'dipole_location_2': array([-4.72700391, -2.06478573, 6.52467702]),
'dipole_moment_1': array([ 860.14181416, -450.27082062, -239.60852996]),
'dipole_moment_2': array([ 908.18325588, -208.52681777, -362.93214244]),
}),
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 0.0109375,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.1044776119402985,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.03125,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.2985074626865672,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.0625,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.5970149253731344,
'dipole_frequency_1': 102275.63477261562,
'dipole_frequency_2': 1755280.9783485082,
'dipole_location_1': array([ 4.71515397, -9.70362197, 5.43016546]),
'dipole_location_2': array([3.42476038, 3.88562934, 5.15034328]),
'dipole_moment_1': array([-502.60742674, -790.60222587, 349.7626267 ]),
'dipole_moment_2': array([-192.42708465, -434.81009148, -879.7226844 ]),
}),
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 7.291135021404688e-05,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.021875,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.4666326413699001,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.0125,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.5332944472798858,
'dipole_frequency_1': 2896.799464036654,
'dipole_frequency_2': 9.980565189326681e-05,
'dipole_location_1': array([-4.97465789, 12.54716531, 6.06324588]),
'dipole_location_2': array([ 9.84518459, -11.1183876 , 7.35028226]),
'dipole_moment_1': array([997.67961917, 19.6376112 , 65.19004305]),
'dipole_moment_2': array([305.63093655, 440.57669389, 844.08643362]),
}),
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 7.291135021404688e-05,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.4666326413699001,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.5332944472798858,
'dipole_frequency_1': 1.4522667818288244,
'dipole_frequency_2': 2704.9795645301197,
'dipole_location_1': array([ 7.38183022, 16.6745801 , 7.10428414]),
'dipole_location_2': array([-8.15636906, -9.56609132, 6.34141559]),
'dipole_moment_1': array([-145.9924693 , 738.74936496, 657.97839986]),
'dipole_moment_2': array([-960.16113239, 104.96824669, -258.98314046]),
}),
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 0.175,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.00012008361740869356,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.05625,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.24702915581216964,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.15,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.7528507605704217,
'dipole_frequency_1': 3827.2315421318913,
'dipole_frequency_2': 1.9301094166184413e-05,
'dipole_location_1': array([ 5.02067673, -0.9783039 , 6.1431897 ]),
'dipole_location_2': array([ 4.66628999, 10.80907459, 7.21771744]),
'dipole_moment_1': array([ 871.30659253, -299.17389491, -388.99846068]),
'dipole_moment_2': array([-189.87268624, 677.28285845, 710.79975568]),
}),
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 4.9116305003549454e-08,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.0109375,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.11316396672817797,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.028125,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.886835984155517,
'dipole_frequency_1': 1.1715179359592061e-05,
'dipole_frequency_2': 0.0019103783276337497,
'dipole_location_1': array([-0.95736547, 1.09273812, 7.47158641]),
'dipole_location_2': array([ -3.18510322, -15.64493131, 5.81623624]),
'dipole_moment_1': array([-184.64961369, 956.56786553, 225.57136075]),
'dipole_moment_2': array([ -34.63395137, 801.17771816, -597.42342885]),
}),
dict({
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 1.977090156727901e-10,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.00045552157211010855,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.002734375,
'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.9995444782301809,
'dipole_frequency_1': 999786.9069039805,
'dipole_frequency_2': 186034.67996840767,
'dipole_location_1': array([-5.59679125, 6.3411602 , 5.33602522]),
'dipole_location_2': array([-0.03412955, -6.83522954, 5.58551513]),
'dipole_moment_1': array([826.38270589, 491.81526944, 274.24325726]),
'dipole_moment_2': array([ 202.74745884, -656.07483714, -726.95204519]),
}),
])
# ---

View File

@ -1,26 +0,0 @@
import re
import deepdog.direct_monte_carlo
def test_config_check_self():
config = deepdog.direct_monte_carlo.DirectMonteCarloConfig(
tag="test_tag",
bayesrun_file_timestamp=False,
)
expected_filename = "test_tag.realdata.fast_filter.bayesrun.csv"
actual_filename = config.get_filename()
assert actual_filename == expected_filename
regex = config.get_filename_regex()
assert re.match(regex, actual_filename) is not None
def test_config_check_self_with_timestamp():
config = deepdog.direct_monte_carlo.DirectMonteCarloConfig(
tag="test_tag",
bayesrun_file_timestamp=True,
)
expected_filename_ending = "test_tag.realdata.fast_filter.bayesrun.csv"
actual_filename = config.get_filename()
assert actual_filename.endswith(expected_filename_ending)
regex = config.get_filename_regex()
assert re.match(regex, actual_filename) is not None

View File

@ -1,42 +0,0 @@
import deepdog.direct_monte_carlo.cost_function_filter
import numpy
def test_px_cost_function_filter_example():
dipoles_1 = [
[1, 2, 3, 4, 5, 6, 7],
[2, 3, 2, 5, 4, 7, 6],
]
dipoles_2 = [
[15, 9, 8, 7, 6, 5, 3],
[30, 4, 4, 7, 3, 1, 4],
]
dipoleses = numpy.array([dipoles_1, dipoles_2])
def cost_function(dipoleses: numpy.ndarray) -> numpy.ndarray:
return dipoleses[:, :, 0].max(axis=-1)
expected_costs = numpy.array([2, 30])
numpy.testing.assert_array_equal(cost_function(dipoleses), expected_costs)
filter = deepdog.direct_monte_carlo.cost_function_filter.CostFunctionTargetFilter(
cost_function, 5
)
actual_filtered = filter.filter_samples(dipoleses)
expected_filtered = numpy.array([dipoles_1])
assert actual_filtered.size != 0
numpy.testing.assert_array_equal(actual_filtered, expected_filtered)
filter_stricter = (
deepdog.direct_monte_carlo.cost_function_filter.CostFunctionTargetFilter(
cost_function, 0.5
)
)
actual_filtered_stricter = filter_stricter.filter_samples(dipoleses)
assert actual_filtered_stricter.size == 0

View File

@ -1,137 +0,0 @@
import pdme.measurement
import pdme.measurement.input_types
from pdme.model import (
LogSpacedRandomCountMultipleDipoleFixedMagnitudeModel,
LogSpacedRandomCountMultipleDipoleFixedMagnitudeXYModel,
LogSpacedRandomCountMultipleDipoleFixedMagnitudeFixedOrientationModel,
)
import deepdog.direct_monte_carlo.dmc_filters
import numpy.random
import numpy.testing
import logging
_logger = logging.getLogger(__name__)
def fixed_z_model_func(
xmin,
xmax,
ymin,
ymax,
zmin,
zmax,
wexp_min,
wexp_max,
pfixed,
n_max,
prob_occupancy,
):
return LogSpacedRandomCountMultipleDipoleFixedMagnitudeFixedOrientationModel(
xmin,
xmax,
ymin,
ymax,
zmin,
zmax,
wexp_min,
wexp_max,
pfixed,
0,
0,
n_max,
prob_occupancy,
)
def get_model(orientation):
model_funcs = {
"fixedz": fixed_z_model_func,
"free": LogSpacedRandomCountMultipleDipoleFixedMagnitudeModel,
"fixedxy": LogSpacedRandomCountMultipleDipoleFixedMagnitudeXYModel,
}
model = model_funcs[orientation](
-10,
10,
-17.5,
17.5,
5,
7.5,
-5,
6.5,
10**3,
2,
0.99999999,
)
model.n = 2
model.rng = numpy.random.default_rng(1234)
return (
f"connors_geom-5height-orientation_{orientation}-pfixexp_{3}-dipole_count_{2}",
model,
)
def test_electric_field_x_dmc_filter():
dipoles_raw = [
[(1, 2, 3), (4, 5, 6), 1],
[(-1, 5, 2), (6, 5, 4), 10],
]
dipoles = [
pdme.measurement.OscillatingDipole(numpy.array(d[0]), numpy.array(d[1]), d[2])
for d in dipoles_raw
]
_logger.debug(f"dipoles: {dipoles}")
dot_inputs_raw = [
([-1, -1, 0], 1),
([-1, -1, 0], 2),
([-1, -1, 0], 3),
([-1, -1, 0], 4),
]
dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(dot_inputs_raw)
_logger.debug(f"dot_inputs_array: {dot_inputs_array}")
arrangement = pdme.measurement.OscillatingDipoleArrangement(dipoles)
measurements = []
for input in dot_inputs_raw:
ex = sum(
[
dipole.s_electric_fieldx_at_position(*input)
for dipole in arrangement.dipoles
]
)
ex_low = ex * 0.5
ex_high = ex * 1.5
meas = pdme.measurement.DotRangeMeasurement(ex_low, ex_high, input[0], input[1])
measurements.append(meas)
filter = deepdog.direct_monte_carlo.dmc_filters.SingleDotSpinQubitFrequencyFilter(
measurements
)
samples = numpy.array(
[
[
[1, 2, 3, 4, 5, 6, 1],
[-1, 5, 2, 6, 5, 4, 10],
],
[
[10, 20, 30, 40, 50, 60, 1],
[-1, 5, 2, 6, 5, 4, 1],
],
[
[1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2, 1],
],
]
)
expected = samples[
0:1
] # only expect to see the first guy, because that's what generated our thing
filtered = filter.filter_samples(samples)
assert len(filtered) != len(samples), "Should have filtered some out!"
numpy.testing.assert_array_equal(
filtered, expected, "The filter should have only returned the first one"
)

View File

@ -1,21 +0,0 @@
import deepdog.indexify
import logging
_logger = logging.getLogger(__name__)
def test_indexifier():
weight_dict = {"key_1": [1, 2, 3], "key_2": ["a", "b", "c"]}
indexifier = deepdog.indexify.Indexifier(weight_dict)
_logger.debug(f"setting up indexifier {indexifier}")
assert indexifier.indexify(0) == {"key_1": 1, "key_2": "a"}
assert indexifier.indexify(5) == {"key_1": 2, "key_2": "c"}
assert len(indexifier) == 9
def test_indexifier_length_short():
weight_dict = {"key_1": [1, 2, 3], "key_2": ["b", "c"]}
indexifier = deepdog.indexify.Indexifier(weight_dict)
_logger.debug(f"setting up indexifier {indexifier}")
assert len(indexifier) == 6

View File

@ -1,75 +0,0 @@
import deepdog.results.read_csv
def test_parse_groupdict():
example_column_name = (
"geom_-20_20_-10_10_0_5-orientation_free-dipole_count_100_success"
)
parsed = deepdog.results.read_csv._parse_bayesrun_column(example_column_name)
assert parsed is not None
expected = deepdog.results.read_csv.BayesrunColumnParsed(
{
"xmin": "-20",
"xmax": "20",
"ymin": "-10",
"ymax": "10",
"zmin": "0",
"zmax": "5",
"orientation": "free",
"avg_filled": "100",
"field_name": "success",
}
)
assert parsed == expected
def test_parse_groupdict_with_magnitude():
example_column_name = (
"geom_-20_20_-10_10_0_5-magnitude_3.5-orientation_free-dipole_count_100_success"
)
parsed = deepdog.results.read_csv._parse_bayesrun_column(example_column_name)
assert parsed is not None
expected = deepdog.results.read_csv.BayesrunColumnParsed(
{
"xmin": "-20",
"xmax": "20",
"ymin": "-10",
"ymax": "10",
"zmin": "0",
"zmax": "5",
"orientation": "free",
"avg_filled": "100",
"log_magnitude": "3.5",
"field_name": "success",
}
)
assert parsed == expected
def test_parse_groupdict_with_negative_magnitude():
example_column_name = "geom_-20_20_-10_10_0_5-magnitude_-3.5-orientation_free-dipole_count_100_success"
parsed = deepdog.results.read_csv._parse_bayesrun_column(example_column_name)
assert parsed is not None
expected = deepdog.results.read_csv.BayesrunColumnParsed(
{
"xmin": "-20",
"xmax": "20",
"ymin": "-10",
"ymax": "10",
"zmin": "0",
"zmax": "5",
"orientation": "free",
"avg_filled": "100",
"log_magnitude": "-3.5",
"field_name": "success",
}
)
assert parsed == expected
# def test_parse_no_match_column_name():
# parsed = deepdog.results.parse_bayesrun_column("There's nothing here")
# assert parsed is None

View File

@ -1,19 +0,0 @@
import deepdog.results
import pytest
def test_parse_bayesrun_filename():
valid1 = "20250226-204120-dot1-dot1-2-0.realdata.fast_filter.bayesrun.csv"
timestamp, slug = deepdog.results._parse_string_output_filename(valid1)
assert timestamp == "20250226-204120"
assert slug == "dot1-dot1-2-0"
valid2 = "dot1-dot1-2-0.realdata.fast_filter.bayesrun.csv"
timestamp, slug = deepdog.results._parse_string_output_filename(valid2)
assert timestamp is None
assert slug == "dot1-dot1-2-0"
with pytest.raises(ValueError):
deepdog.results._parse_string_output_filename("not_a_valid_filename")

View File

@ -1,10 +0,0 @@
# serializer version: 1
# name: test_subset_simulation_multi_result_coalescing_easy_arithmetic
MultiSubsetSimulationResult(child_results=[SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.8, lowest_likelihood=0.5, messages=[]), SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.6, lowest_likelihood=0.01, messages=[])], model_name='test', estimated_likelihood=0.6928203230275509, arithmetic_mean_estimated_likelihood=0.7, num_children=2, num_finished_children=2, clean_estimate=True)
# ---
# name: test_subset_simulation_multi_result_coalescing_easy_geometric
MultiSubsetSimulationResult(child_results=[SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.1, lowest_likelihood=0.5, messages=[]), SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.001, lowest_likelihood=0.01, messages=[])], model_name='test', estimated_likelihood=0.010000000000000004, arithmetic_mean_estimated_likelihood=0.0505, num_children=2, num_finished_children=2, clean_estimate=True)
# ---
# name: test_subset_simulation_multi_result_coalescing_include_dirty
MultiSubsetSimulationResult(child_results=[SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.8, lowest_likelihood=0.5, messages=[]), SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.08, lowest_likelihood=0.01, messages=[]), SubsetSimulationResult(probs_list=(), over_target_cost=None, over_target_likelihood=None, under_target_cost=None, under_target_likelihood=None, lowest_likelihood=0.0001, messages=[])], model_name='test', estimated_likelihood=0.01856635533445112, arithmetic_mean_estimated_likelihood=0.29336666666666666, num_children=3, num_finished_children=2, clean_estimate=False)
# ---

View File

@ -1,92 +0,0 @@
import deepdog.subset_simulation.subset_simulation_impl as impl
import numpy
def test_subset_simulation_multi_result_coalescing_include_dirty(snapshot):
res1 = impl.SubsetSimulationResult(
probs_list=(),
over_target_cost=1,
over_target_likelihood=1,
under_target_cost=0.99,
under_target_likelihood=0.8,
lowest_likelihood=0.5,
messages=[],
)
res2 = impl.SubsetSimulationResult(
probs_list=(),
over_target_cost=1,
over_target_likelihood=1,
under_target_cost=0.99,
under_target_likelihood=0.08,
lowest_likelihood=0.01,
messages=[],
)
res3 = impl.SubsetSimulationResult(
probs_list=(),
over_target_cost=None,
over_target_likelihood=None,
under_target_cost=None,
under_target_likelihood=None,
lowest_likelihood=0.0001,
messages=[],
)
combined = impl.coalesce_ss_results("test", [res1, res2, res3])
assert combined == snapshot
def test_subset_simulation_multi_result_coalescing_easy_arithmetic(snapshot):
res1 = impl.SubsetSimulationResult(
probs_list=(),
over_target_cost=1,
over_target_likelihood=1,
under_target_cost=0.99,
under_target_likelihood=0.8,
lowest_likelihood=0.5,
messages=[],
)
res2 = impl.SubsetSimulationResult(
probs_list=(),
over_target_cost=1,
over_target_likelihood=1,
under_target_cost=0.99,
under_target_likelihood=0.6,
lowest_likelihood=0.01,
messages=[],
)
combined = impl.coalesce_ss_results("test", [res1, res2])
assert combined.arithmetic_mean_estimated_likelihood == 0.7
assert combined == snapshot
def test_subset_simulation_multi_result_coalescing_easy_geometric(snapshot):
res1 = impl.SubsetSimulationResult(
probs_list=(),
over_target_cost=1,
over_target_likelihood=1,
under_target_cost=0.99,
under_target_likelihood=0.1,
lowest_likelihood=0.5,
messages=[],
)
res2 = impl.SubsetSimulationResult(
probs_list=(),
over_target_cost=1,
over_target_likelihood=1,
under_target_cost=0.99,
under_target_likelihood=0.001,
lowest_likelihood=0.01,
messages=[],
)
combined = impl.coalesce_ss_results("test", [res1, res2])
numpy.testing.assert_allclose(combined.estimated_likelihood, 0.01)
assert combined == snapshot