chore(release): 1.7.0

feat: adds configurable skip if file exists
chore(release): 1.6.0
2025-02-26 21:57:13 -06:00 · 2025-02-26 21:55:12 -06:00 · 2025-02-26 21:08:00 -06:00 · 2025-02-26 21:01:19 -06:00 · 2025-02-24 08:34:11 -06:00 · 2024-12-29 21:23:30 -06:00
57 changed files with 5486 additions and 576 deletions
--- a/.flake8
+++ b/.flake8
@ -1,3 +1,3 @@
 [flake8]
-ignore = W191, E501, W503
+ignore = W191, E501, W503, E203
 max-line-length = 120
--- a/.gitignore
+++ b/.gitignore
@ -114,6 +114,10 @@ ENV/
 env.bak/
 venv.bak/

+# direnv
+.envrc
+.direnv
+
 # Spyder project settings
 .spyderproject
 .spyproject
@ -139,3 +143,7 @@ dmypy.json
 cython_debug/

 *.csv
+
+local_scripts/
+
+.vscode
--- a/.versionrc
+++ b/.versionrc
@ -0,0 +1,10 @@
+{
+	"bumpFiles": [
+		{
+			"filename": "pyproject.toml",
+			"updater": "scripts/standard-version/pyproject-updater.js"
+		}
+	],
+	"sign": true,
+	"tag-prefix": ""
+}
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -0,0 +1,377 @@
+# Changelog
+
+All notable changes to this project will be documented in this file. See [standard-version](https://github.com/conventional-changelog/standard-version) for commit guidelines.
+
+## [1.7.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.6.0...1.7.0) (2025-02-27)
+
+
+### Features
+
+* adds configurable skip if file exists ([24c6e31](https://gitea.deepak.science:2222/physics/deepdog/commit/24c6e311c1d3067eb98cc60e6ca38d76373bf08e))
+
+## [1.6.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.5.0...1.6.0) (2025-02-27)
+
+
+### Features
+
+* Adds ability to parse bayesruns without timestamps ([46f6b6c](https://gitea.deepak.science:2222/physics/deepdog/commit/46f6b6cdf15c67aedf0c871d201b8db320bccbdf))
+* allows negative log magnitude strings in models ([c8435b4](https://gitea.deepak.science:2222/physics/deepdog/commit/c8435b4b2a6e4b89030f53b5734eb743e2003fb7))
+
+## [1.5.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.4.0...1.5.0) (2024-12-30)
+
+
+### Features
+
+* add configurable max number of dipoles to write ([a1b59cd](https://gitea.deepak.science:2222/physics/deepdog/commit/a1b59cd18b30359328a09210d9393f211aab30c2))
+* add configurable max number of dipoles to write ([53f8993](https://gitea.deepak.science:2222/physics/deepdog/commit/53f8993f2b155228fff5cbee84f10c62eb149a1f))
+
+## [1.4.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.3.0...1.4.0) (2024-09-04)
+
+
+### Features
+
+* add subset sim probs command for bayes for subset simulation results ([c881da2](https://gitea.deepak.science:2222/physics/deepdog/commit/c881da28370a1e51d062e1a7edaa62af6eb98d0a))
+* allows some betetr matching for single_dipole runs ([5425ce1](https://gitea.deepak.science:2222/physics/deepdog/commit/5425ce1362919af4cc4dbd5813df3be8d877b198))
+* indexifier now has len ([d962ecb](https://gitea.deepak.science:2222/physics/deepdog/commit/d962ecb11e929de1d9aa458b5d8e82270eff0039))
+
+
+### Bug Fixes
+
+* update log file arg names in cli scripts ([6a5c593](https://gitea.deepak.science:2222/physics/deepdog/commit/6a5c5931d4fc849d0d6a0f2b971523a0f039d559))
+
+## [1.3.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.2.1...1.3.0) (2024-05-20)
+
+
+### Features
+
+* add multi run to wrap multi model and repeat runs ([92b49fc](https://gitea.deepak.science:2222/physics/deepdog/commit/92b49fce7c86f14484deb1c4aaaa810a6f69c08a))
+* adds a filter that works with cost functions ([8845b28](https://gitea.deepak.science:2222/physics/deepdog/commit/8845b2875f2c91c91dd3988fabda26400c59b2d7))
+* improve initial cost calculation to allow multiprocessing, adds ability to specify a number of levels to do with direct mc instead of subset simulation ([09fad2e](https://gitea.deepak.science:2222/physics/deepdog/commit/09fad2e1024d9237a6a4f7931f51cb4c84b83bf8))
+
+
+### Bug Fixes
+
+* Adds ugly hack for stdevs for this uniform range to multiply by root3, proper fix would be in pdme ([b1c01b2](https://gitea.deepak.science:2222/physics/deepdog/commit/b1c01b25c8f2c3947be23f5b2c656c37437dab17))
+* fix seeding to avoid recreating seed combinations across multi runs ([24ac65b](https://gitea.deepak.science:2222/physics/deepdog/commit/24ac65bf9c74c454fec826ca9de640fe095f5a17))
+
+### [1.2.1](https://gitea.deepak.science:2222/physics/deepdog/compare/1.2.0...1.2.1) (2024-05-12)
+
+## [1.2.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.1.0...1.2.0) (2024-05-09)
+
+
+### Features
+
+* adds additional matching regexes ([dc1d2d4](https://gitea.deepak.science:2222/physics/deepdog/commit/dc1d2d45a3e631c5efccce80f8a24fa87c6089e0))
+* adds magnitude enabled parsing option ([f0e2fa3](https://gitea.deepak.science:2222/physics/deepdog/commit/f0e2fa3da9f5a5136908d691137a904fda4e3a9a))
+
+## [1.1.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.0.1...1.1.0) (2024-05-03)
+
+
+### Features
+
+* allows disabling timestamps in directmc bayesrun files ([fb018ab](https://gitea.deepak.science:2222/physics/deepdog/commit/fb018abeae2adf4438a030140a6c905f11bb6bc1))
+* removes legacy bayes run, technically breaking but just don't use them ([5361dad](https://gitea.deepak.science:2222/physics/deepdog/commit/5361dada8be4950b5157862f6a92254b543889c3))
+
+### [1.0.1](https://gitea.deepak.science:2222/physics/deepdog/compare/1.0.0...1.0.1) (2024-05-02)
+
+
+### Bug Fixes
+
+* fixes issue of zero division error with no successes for anything ([e25db1e](https://gitea.deepak.science:2222/physics/deepdog/commit/e25db1e0f677e8d9a657fa1631305cc8f05ff9ff))
+
+## [1.0.0](https://gitea.deepak.science:2222/physics/deepdog/compare/0.8.1...1.0.0) (2024-05-01)
+
+
+### ⚠ BREAKING CHANGES
+
+* allows new seed spec instead of cli arg, removes old cli arg
+
+### Features
+
+* adds additional file slug parsing ([2105754](https://gitea.deepak.science:2222/physics/deepdog/commit/2105754911c89bde9dcbea9866462225604a3524))
+* Adds more powerful direct mc runs to sub for old real spectrum run ([f2b1a1d](https://gitea.deepak.science:2222/physics/deepdog/commit/f2b1a1dd3b3436e37d84f7843b9b2a202be4b51c))
+* allows new seed spec instead of cli arg, removes old cli arg ([7108dd0](https://gitea.deepak.science:2222/physics/deepdog/commit/7108dd0111c7dfd6ec204df1d0058530cd3dcab9))
+
+
+### Bug Fixes
+
+* no longer throws error for overlapping keys, the warning should hopefully be enough? ([f3ba4cb](https://gitea.deepak.science:2222/physics/deepdog/commit/f3ba4cbfd36a9f08cdc4d8774a7f745f8c98bac3))
+
+### [0.8.1](https://gitea.deepak.science:2222/physics/deepdog/compare/0.8.0...0.8.1) (2024-04-28)
+
+### [0.8.1](https://gitea.deepak.science:2222/physics/deepdog/compare/0.8.0...0.8.1) (2024-04-28)
+
+## [0.8.0](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.10...0.8.0) (2024-04-28)
+
+
+### ⚠ BREAKING CHANGES
+
+* fixes the spin qubit frequency phase shift calculation which had an index problem
+
+### Bug Fixes
+
+* fixes the spin qubit frequency phase shift calculation which had an index problem ([f9646e3](https://gitea.deepak.science:2222/physics/deepdog/commit/f9646e33868e1a0da8ab663230c0c692ac25bb74))
+
+### [0.7.10](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.9...0.7.10) (2024-04-28)
+
+
+### Features
+
+* adds cli probs ([4b2e573](https://gitea.deepak.science:2222/physics/deepdog/commit/4b2e57371546731137b011461849bb849d4d4e0f))
+* better management of cli wrapper ([b0ad4be](https://gitea.deepak.science:2222/physics/deepdog/commit/b0ad4bead0d4762eb7f848f6e557f6d9b61200b9))
+
+### [0.7.9](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.8...0.7.9) (2024-04-21)
+
+
+### Features
+
+* adds ability to write custom dmc filters ([ea080ca](https://gitea.deepak.science:2222/physics/deepdog/commit/ea080ca1c7068042ce1e0a222d317f785a6b05f4))
+* adds tarucha phase calculation, using spin qubit precession rate noise ([3ae0783](https://gitea.deepak.science:2222/physics/deepdog/commit/3ae0783d00cbe6a76439c1d671f2cff621d8d0a8))
+
+### [0.7.8](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.7...0.7.8) (2024-02-29)
+
+
+### Bug Fixes
+
+* uses correct measurements ([5f534a6](https://gitea.deepak.science:2222/physics/deepdog/commit/5f534a60cc7c4838fcacee11a7e58b97d34e154a))
+
+### [0.7.7](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.6...0.7.7) (2024-02-29)
+
+
+### Bug Fixes
+
+* fixes phase calculation issue with setting input array ([48e41cb](https://gitea.deepak.science:2222/physics/deepdog/commit/48e41cbd2c58d4c4d2747822d618d7d55257643d))
+
+### [0.7.6](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.5...0.7.6) (2024-02-28)
+
+
+### Features
+
+* adds ability to use phase measurements only for correlations ([bb72e90](https://gitea.deepak.science:2222/physics/deepdog/commit/bb72e903d14704a3783daf2dbc1797b90880aa85))
+
+
+### Bug Fixes
+
+* fixes typeerror vs indexerror on bare float as cost in subset simulation ([65e1948](https://gitea.deepak.science:2222/physics/deepdog/commit/65e19488359d7f5656660da7da8f32ed474989c3))
+
+### [0.7.5](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.4...0.7.5) (2023-12-09)
+
+
+### Features
+
+* adds direct monte carlo package ([1741807](https://gitea.deepak.science:2222/physics/deepdog/commit/1741807be43d08fb51bc94518dd3b67585c04c20))
+* adds longchain logging if logging last generation ([b4e5f53](https://gitea.deepak.science:2222/physics/deepdog/commit/b4e5f5372682fc64c3734a96c4a899e018f127ce))
+* allows disabling timestamp in subset simulation bayes results ([9a4548d](https://gitea.deepak.science:2222/physics/deepdog/commit/9a4548def45a01f1f518135d4237c3dc09dcc342))
+
+### [0.7.4](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.3...0.7.4) (2023-07-27)
+
+
+### Features
+
+* adds configurable chunk size for the initial mc level 0 SS stage cost calculation to reduce memory usage ([9a7a3ff](https://gitea.deepak.science:2222/physics/deepdog/commit/9a7a3ff2c7ebe81d5e10647ce39844c372ff7b07))
+* allows for deepdog bayesrun with ss to not print csv to make snapshot testing possible ([8e6ead4](https://gitea.deepak.science:2222/physics/deepdog/commit/8e6ead416c9eba56f568f648d0df44caaa510cfe))
+
+
+### Bug Fixes
+
+* fixes bug if case of clamping necessary ([161bcf4](https://gitea.deepak.science:2222/physics/deepdog/commit/161bcf42addf331661c3929073688b9f2c13502c))
+* fixes bug with clamped probabilities being underestimated ([e6defc7](https://gitea.deepak.science:2222/physics/deepdog/commit/e6defc794871a48ac331023eb477bd235b78d6d0))
+
+### [0.7.3](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.2...0.7.3) (2023-07-27)
+
+
+### Features
+
+* adds utility options and avoids memory leak ([598dad1](https://gitea.deepak.science:2222/physics/deepdog/commit/598dad1e6dc8fc0b7a5b4a90c8e17bf744e8d98c))
+
+### [0.7.2](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.1...0.7.2) (2023-07-24)
+
+
+### Features
+
+* clamps results now ([9bb8fc5](https://gitea.deepak.science:2222/physics/deepdog/commit/9bb8fc50fe1bd1a285a333c5a396bfb6ac3176cf))
+
+
+### Bug Fixes
+
+* fixes clamping format etc. ([a170a3c](https://gitea.deepak.science:2222/physics/deepdog/commit/a170a3ce01adcec356e5aaab9abcc0ec4accd64b))
+
+### [0.7.1](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.0...0.7.1) (2023-07-24)
+
+
+### Features
+
+* adds subset simulation stuff ([33cab9a](https://gitea.deepak.science:2222/physics/deepdog/commit/33cab9ab4179cec13ae9e591a8ffc32df4dda989))
+
+## [0.7.0](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.7...0.7.0) (2023-05-01)
+
+
+### ⚠ BREAKING CHANGES
+
+* removes fastfilter parameter because it should never be needed
+
+### Features
+
+* adds pair capability to real spectrum run hopefully ([a089951](https://gitea.deepak.science:2222/physics/deepdog/commit/a089951bbefcd8a0b2efeb49b7a8090412cbb23d))
+* removes fastfilter parameter because it should never be needed ([a015daf](https://gitea.deepak.science:2222/physics/deepdog/commit/a015daf5ff6fa5f6155c8d7e02981b588840a5b0))
+
+### [0.6.7](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.6...0.6.7) (2023-04-14)
+
+
+### Features
+
+* adds option to cap core count for real spectrum run ([bf15f4a](https://gitea.deepak.science:2222/physics/deepdog/commit/bf15f4a7b7f59504983624e7d512ed7474372032))
+* adds option to cap core count for temp aware run ([12903b2](https://gitea.deepak.science:2222/physics/deepdog/commit/12903b2540cefb040174d230bc0d04719a6dc1b7))
+
+
+### Bug Fixes
+
+* avoids redefinition of core count in loop ([1cf4454](https://gitea.deepak.science:2222/physics/deepdog/commit/1cf44541531541088198bd4599d467df3e1acbcf))
+
+### [0.6.6](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.5...0.6.6) (2023-04-09)
+
+
+### Bug Fixes
+
+* removes bad logging in multiprocessing function ([8fd1b75](https://gitea.deepak.science:2222/physics/deepdog/commit/8fd1b75e1378301210bfa8f14dd09174bbd21414))
+
+### [0.6.5](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.4...0.6.5) (2023-04-09)
+
+
+### Features
+
+* adds temp aware guy using new pdme temp-flexible feature for bundling temp models ([de1ec3e](https://gitea.deepak.science:2222/physics/deepdog/commit/de1ec3e70062d418e0d4c89716905cc9313d2e26))
+
+### [0.6.4](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.3...0.6.4) (2022-08-13)
+
+
+### Features
+
+* Prints model names while running ([7ea1d71](https://gitea.deepak.science:2222/physics/deepdog/commit/7ea1d715f67e81c9fa841c5a62f1cc700ff7363d))
+
+### [0.6.3](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.2...0.6.3) (2022-06-12)
+
+
+### Features
+
+* adds fast filter variant ([2c5c122](https://gitea.deepak.science:2222/physics/deepdog/commit/2c5c1228209e51d17253f07470e2f1e6dc6872d7))
+* adds tester for fast filter real spectrum ([0a1a277](https://gitea.deepak.science:2222/physics/deepdog/commit/0a1a27759b0d4ab01da214b76ab14bf2b1fe00e3))
+
+### [0.6.2](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.1...0.6.2) (2022-05-26)
+
+
+### Features
+
+* adds better import api for real data run ([d7e0f13](https://gitea.deepak.science:2222/physics/deepdog/commit/d7e0f13ca55197b24cb534c80f321ee76b9c4a40))
+
+### [0.6.1](https://gitea.deepak.science:2222/physics/deepdog/compare/0.6.0...0.6.1) (2022-05-22)
+
+
+### Features
+
+* adds new runner for real spectra ([bd56f24](https://gitea.deepak.science:2222/physics/deepdog/commit/bd56f247748babb2ee1f2a1182d25aa968bff5a5))
+
+## [0.6.0](https://gitea.deepak.science:2222/physics/deepdog/compare/0.5.0...0.6.0) (2022-05-22)
+
+
+### ⚠ BREAKING CHANGES
+
+* bayes run now handles multidipoles with changes to output file format etc.
+* logs multiple dipoles better maybe
+* switches over to pdme new stuff, uses models and scraps discretisations entirely
+* removes alt_bayes bayes distinction, which was superfluous when only alt worked
+
+### Features
+
+* adds pdme 0.7.0 for multiprocessing ([874d876](https://gitea.deepak.science:2222/physics/deepdog/commit/874d876c9d774433b034d47c4cc0cdac41e6f2c7))
+* bayes run now handles multidipoles with changes to output file format etc. ([5d0a7a4](https://gitea.deepak.science:2222/physics/deepdog/commit/5d0a7a4be09c58f8f8f859384f01d7912a98b8b9))
+* logs multiple dipoles better maybe ([ae8977b](https://gitea.deepak.science:2222/physics/deepdog/commit/ae8977bb1e4d6cd71e88ea0876da8f4318e030b6))
+* removes alt_bayes bayes distinction, which was superfluous when only alt worked ([101569d](https://gitea.deepak.science:2222/physics/deepdog/commit/101569d749e4f3f1842886aa2fd3321b8132278b))
+* switches over to pdme new stuff, uses models and scraps discretisations entirely ([6e29f7a](https://gitea.deepak.science:2222/physics/deepdog/commit/6e29f7a702b578c266a42bba23ac973d155ada10))
+* Uses multidipole for bayes run, with more verbose output ([df89776](https://gitea.deepak.science:2222/physics/deepdog/commit/df8977655de977fd3c4f7383dd9571e551eb1382))
+
+
+### Bug Fixes
+
+* another bug fix for csv generation ([b7da3d6](https://gitea.deepak.science:2222/physics/deepdog/commit/b7da3d61cc5c128cba1d2fcb3770b71b7f6fc4b8))
+* fixes crash when dipole count is smaller than expected max during file write ([b5e0ecb](https://gitea.deepak.science:2222/physics/deepdog/commit/b5e0ecb52886b32d9055302eacfabb69338026b4))
+* fixes format string in csv output for headers ([9afa209](https://gitea.deepak.science:2222/physics/deepdog/commit/9afa209864cdb9255988778e987fe05952848fd4))
+* fixes random issue ([eec926a](https://gitea.deepak.science:2222/physics/deepdog/commit/eec926aaac654f78942b4c6b612e4d1cdcbf81dc))
+* moves logging successes to after they've actually happened ([0caad05](https://gitea.deepak.science:2222/physics/deepdog/commit/0caad05e3cc6a9adba8bf937c3d2f944e1b096a3))
+* now doesn't double randomise frequency ([23b202b](https://gitea.deepak.science:2222/physics/deepdog/commit/23b202beb81cb89f7f20b691e83116fa53764902))
+* whoops deleted word multiprocessing ([31070b5](https://gitea.deepak.science:2222/physics/deepdog/commit/31070b5342c265d930b4c51402f42a3ee2415066))
+
+## [0.5.0](https://gitea.deepak.science:2222/physics/deepdog/compare/0.4.0...0.5.0) (2022-04-30)
+
+
+### ⚠ BREAKING CHANGES
+
+* simulpairs now uses different rng calculator
+
+### Features
+
+* adds simulpairs run ([e9277c3](https://gitea.deepak.science:2222/physics/deepdog/commit/e9277c3da777359feb352c0b19f3bb029248ba2f))
+* has better parallelisation ([edf0ba6](https://gitea.deepak.science:2222/physics/deepdog/commit/edf0ba6532c0588fce32341709cdb70e384b83f4))
+* simulpairs now uses different rng calculator ([50dbc48](https://gitea.deepak.science:2222/physics/deepdog/commit/50dbc4835e60bace9e9b4ba37415f073a3c9e479))
+
+
+### Bug Fixes
+
+* better parallelisation hopefully ([42829c0](https://gitea.deepak.science:2222/physics/deepdog/commit/42829c0327e080e18be2fb75e746f6ac0d7c2f6d))
+* Makes altbayessimulpairs available in package ([492a5e6](https://gitea.deepak.science:2222/physics/deepdog/commit/492a5e6681c85f95840e28cfd5d4ce4ca1d54eba))
+* stronger names ([0954429](https://gitea.deepak.science:2222/physics/deepdog/commit/0954429e2d015a105ff16dfbb9e7a352bf53e5e9))
+* Uses correct filename arg for passed in rng ([349341b](https://gitea.deepak.science:2222/physics/deepdog/commit/349341b405375a43b933f1fd7db4ee9fc501def3))
+* uses correct filename for pairs guy ([4c06b39](https://gitea.deepak.science:2222/physics/deepdog/commit/4c06b3912c811c93c310b1d9e4c153f2014c4f8b))
+
+## [0.4.0](https://gitea.deepak.science:2222/physics/deepdog/compare/0.3.5...0.4.0) (2022-04-10)
+
+
+### ⚠ BREAKING CHANGES
+
+* Adds pair calculations, with changing api format
+
+### Features
+
+* Adds dynamic cycle count increases to help reach minimum success count ([ec7b4ca](https://gitea.deepak.science:2222/physics/deepdog/commit/ec7b4cac393c15e94c513215c4f1ba32be2ae87a))
+* Adds pair calculations, with changing api format ([6463b13](https://gitea.deepak.science:2222/physics/deepdog/commit/6463b135ef2d212b565864b5ac1b655e014d2194))
+
+
+### Bug Fixes
+
+* uses bigfix from pdme for negatives ([c1c711f](https://gitea.deepak.science:2222/physics/deepdog/commit/c1c711f47b574d3a9b8a24dbcbdd7f50b9be8ea9))
+
+### [0.3.5](https://gitea.deepak.science:2222/physics/deepdog/compare/0.3.4...0.3.5) (2022-03-07)
+
+
+### Features
+
+* makes chunksize configurable ([88d9613](https://gitea.deepak.science:2222/physics/deepdog/commit/88d961313c1db0d49fd96939aa725a8706fa0412))
+
+### [0.3.4](https://gitea.deepak.science:2222/physics/deepdog/compare/0.3.3...0.3.4) (2022-03-06)
+
+
+### Features
+
+* Changes chunksize for multiprocessing ([0784cd5](https://gitea.deepak.science:2222/physics/deepdog/commit/0784cd53d79e00684506604f094b5d820b3994d4))
+
+### [0.3.3](https://gitea.deepak.science:2222/physics/deepdog/compare/0.3.2...0.3.3) (2022-03-06)
+
+
+### Bug Fixes
+
+* Fixes count to use cycles as well ([8617e4d](https://gitea.deepak.science:2222/physics/deepdog/commit/8617e4d2742b112cc824068150682ce3b2cdd879))
+
+### [0.3.2](https://gitea.deepak.science:2222/physics/deepdog/compare/0.3.1...0.3.2) (2022-03-06)
+
+
+### Features
+
+* Adds monte carlo cycles to trade off space and cpu ([e6d8d33](https://gitea.deepak.science:2222/physics/deepdog/commit/e6d8d33c27e7922581e91c10de4f5faff2a51f8b))
+
+### [0.3.1](https://gitea.deepak.science:2222/physics/deepdog/compare/v0.3.0...v0.3.1) (2022-03-06)
+
+
+### Features
+
+* Adds alt bayes solver with monte carlo sampler ([7284dbe](https://gitea.deepak.science:2222/physics/deepdog/commit/7284dbeb34ef46189d81fb719252dfa74b8e9fa8))
+* Updates to pdme version for faster bayes resolution ([d078004](https://gitea.deepak.science:2222/physics/deepdog/commit/d078004773d9d9dccd0a9a52ca96aa57690f9b7e))
--- a/20
+++ b/20
@ -4,7 +4,7 @@ pipeline {
 		label 'deepdog'  // all your pods will be named with this prefix, followed by a unique id
 		idleMinutes 5  // how long the pod will live after no jobs have run on it
 		yamlFile 'jenkins/ci-agent-pod.yaml'  // path to the pod definition relative to the root of our project
-		defaultContainer 'python'  // define a default container if more than a few stages use it, will default to jnlp container
+		defaultContainer 'poetry'  // define a default container if more than a few stages use it, will default to jnlp container
 	  }
 	}

@ -12,36 +12,30 @@ pipeline {
 		parallelsAlwaysFailFast()
 	}

-	environment {
-		POETRY_HOME="/opt/poetry"
-		POETRY_VERSION="1.1.12"
-	}
-
 	stages {
 		stage('Build') {
 			steps {
 				echo 'Building...'
 				sh 'python --version'
-				sh 'curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python'
-				sh '${POETRY_HOME}/bin/poetry --version'
-				sh '${POETRY_HOME}/bin/poetry install'
+				sh 'poetry --version'
+				sh 'poetry install'
 			}
 		}
 		stage('Test') {
 			parallel{
 				stage('pytest') {
 					steps {
-						sh '${POETRY_HOME}/bin/poetry run pytest'
+						sh 'poetry run pytest'
 					}
 				}
 				stage('lint') {
 					steps {
-						sh '${POETRY_HOME}/bin/poetry run flake8 deepdog tests'
+						sh 'poetry run flake8 deepdog tests'
 					}
 				}
 				stage('mypy') {
 					steps {
-						sh '${POETRY_HOME}/bin/poetry run mypy deepdog'
+						sh 'poetry run mypy deepdog'
 					}
 				}
 			}
@ -57,7 +51,7 @@ pipeline {
 			}
 			steps {
 				echo 'Deploying...'
-				sh '${POETRY_HOME}/bin/poetry publish -u ${PYPI_USR} -p ${PYPI_PSW} --build'
+				sh 'poetry publish -u ${PYPI_USR} -p ${PYPI_PSW} --build'
 			}
 		}

--- a/README.md
+++ b/README.md
@ -1,3 +1,25 @@
 # deepdog

-The dipole diagnostic tool.
+[![Conventional Commits](https://img.shields.io/badge/Conventional%20Commits-1.0.0-green.svg?style=flat-square)](https://conventionalcommits.org)
+[![PyPI](https://img.shields.io/pypi/v/deepdog?style=flat-square)](https://pypi.org/project/deepdog/)
+[![Jenkins](https://img.shields.io/jenkins/build?jobUrl=https%3A%2F%2Fjenkins.deepak.science%2Fjob%2Fgitea-physics%2Fjob%2Fdeepdog%2Fjob%2Fmaster&style=flat-square)](https://jenkins.deepak.science/job/gitea-physics/job/deepdog/job/master/)
+![Jenkins tests](https://img.shields.io/jenkins/tests?compact_message&jobUrl=https%3A%2F%2Fjenkins.deepak.science%2Fjob%2Fgitea-physics%2Fjob%2Fdeepdog%2Fjob%2Fmaster%2F&style=flat-square)
+![Jenkins Coverage](https://img.shields.io/jenkins/coverage/cobertura?jobUrl=https%3A%2F%2Fjenkins.deepak.science%2Fjob%2Fgitea-physics%2Fjob%2Fdeepdog%2Fjob%2Fmaster%2F&style=flat-square)
+![Maintenance](https://img.shields.io/maintenance/yes/2024?style=flat-square)
+
+The DiPole DiaGnostic tool.
+
+## Getting started
+
+`poetry install` to start locally
+
+Commit using [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/), and when commits are on master, release with `just release`.
+
+In general `just --list` has some of the useful stuff for figuring out what development tools there are.
+
+Poetry as an installer is good, even better is using Nix (maybe with direnv to automatically pick up the `devShell` from `flake.nix`).
+In either case `just` should handle actually calling things in a way that's agnostic to poetry as a runner or through nix.
+
+### local scripts
+`local_scripts` folder allows for scripts to be run using this code, but that probably isn't the most auditable for actual usage.
+The API is still only something I'm using so there's no guarantees yet that it will be stable; overall semantic versioning should help with API breaks.
--- a/deepdog/init.py
+++ b/deepdog/init.py
@ -1,14 +1,18 @@
 import logging
 from deepdog.meta import __version__
-from deepdog.bayes_run import BayesRun
-from deepdog.diagnostic import Diagnostic
+from deepdog.real_spectrum_run import RealSpectrumRun
+from deepdog.temp_aware_real_spectrum_run import TempAwareRealSpectrumRun


 def get_version():
 	return __version__


-__all__ = ["get_version", "BayesRun", "Diagnostic"]
+__all__ = [
+	"get_version",
+	"RealSpectrumRun",
+	"TempAwareRealSpectrumRun",
+]


 logging.getLogger(__name__).addHandler(logging.NullHandler())
--- a/deepdog/bayes_run.py
+++ b/deepdog/bayes_run.py
@ -1,113 +0,0 @@
-import pdme.model
-from typing import Sequence, Tuple, List
-import datetime
-import itertools
-import csv
-import logging
-import numpy
-import scipy.optimize
-import multiprocessing
-
-
-# TODO: remove hardcode
-COST_THRESHOLD = 1e-10
-
-
-# TODO: It's garbage to have this here duplicated from pdme.
-DotInput = Tuple[numpy.typing.ArrayLike, float]
-
-
-_logger = logging.getLogger(__name__)
-
-
-def get_a_result(discretisation, dots, index) -> Tuple[Tuple[int, ...], scipy.optimize.OptimizeResult]:
-	return (index, discretisation.solve_for_index(dots, index))
-
-
-class BayesRun():
-	'''
-	A single Bayes run for a given set of dots.
-
-	Parameters
-	----------
-	dot_inputs : Sequence[DotInput]
-		The dot inputs for this bayes run.
-	discretisations_with_names : Sequence[Tuple(str, pdme.model.Model)]
-		The models to evaluate.
-	actual_model_discretisation : pdme.model.Discretisation
-		The discretisation for the model which is actually correct.
-	filename_slug : str
-		The filename slug to include.
-	run_count: int
-		The number of runs to do.
-	'''
-	def __init__(self, dot_inputs: Sequence[DotInput], discretisations_with_names: Sequence[Tuple[str, pdme.model.Discretisation]], actual_model: pdme.model.Model, filename_slug: str, run_count: int, max_frequency: float = None) -> None:
-		self.dot_inputs = dot_inputs
-		self.discretisations = [disc for (_, disc) in discretisations_with_names]
-		self.model_names = [name for (name, _) in discretisations_with_names]
-		self.actual_model = actual_model
-		self.model_count = len(self.discretisations)
-		self.run_count = run_count
-		self.csv_fields = ["dipole_moment", "dipole_location", "dipole_frequency"]
-		self.compensate_zeros = True
-
-		for name in self.model_names:
-			self.csv_fields.extend([f"{name}_success", f"{name}_count", f"{name}_prob"])
-
-		self.probabilities = [1 / self.model_count] * self.model_count
-
-		timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
-		self.filename = f"{timestamp}-{filename_slug}.csv"
-		self.max_frequency = max_frequency
-
-	def go(self) -> None:
-		with open(self.filename, "a", newline="") as outfile:
-			writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
-			writer.writeheader()
-
-		for run in range(1, self.run_count + 1):
-			frequency: float = run
-			if self.max_frequency is not None and self.max_frequency > 1:
-				rng = numpy.random.default_rng()
-				frequency = rng.uniform(1, self.max_frequency)
-			dipoles = self.actual_model.get_dipoles(frequency)
-
-			dots = dipoles.get_dot_measurements(self.dot_inputs)
-			_logger.info(f"Going to work on dipole at {dipoles.dipoles}")
-
-			results = []
-			_logger.debug("Going to iterate over discretisations now")
-			for disc_count, discretisation in enumerate(self.discretisations):
-				_logger.debug(f"Doing discretisation #{disc_count}")
-				with multiprocessing.Pool(multiprocessing.cpu_count() - 1 or 1) as pool:
-					results.append(pool.starmap(get_a_result, zip(itertools.repeat(discretisation), itertools.repeat(dots), discretisation.all_indices())))
-
-			_logger.debug("Done, constructing output now")
-			row = {
-				"dipole_moment": dipoles.dipoles[0].p,
-				"dipole_location": dipoles.dipoles[0].s,
-				"dipole_frequency": dipoles.dipoles[0].w
-			}
-			successes: List[int] = []
-			for model_index, (name, result) in enumerate(zip(self.model_names, results)):
-				count = 0
-				success = 0
-				for idx, val in result:
-					count += 1
-					if val.success and val.cost <= COST_THRESHOLD:
-						success += 1
-
-				row[f"{name}_success"] = success
-				row[f"{name}_count"] = count
-				successes.append(max(success, 1))
-
-			success_weight = sum([succ * prob for succ, prob in zip(successes, self.probabilities)])
-			new_probabilities = [succ * old_prob / success_weight for succ, old_prob in zip(successes, self.probabilities)]
-			self.probabilities = new_probabilities
-			for name, probability in zip(self.model_names, self.probabilities):
-				row[f"{name}_prob"] = probability
-			_logger.info(row)
-
-			with open(self.filename, "a", newline="") as outfile:
-				writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
-				writer.writerow(row)
--- a/deepdog/cli/init.py
+++ b/deepdog/cli/init.py
--- a/deepdog/cli/probs/init.py
+++ b/deepdog/cli/probs/init.py
@ -0,0 +1,5 @@
+from deepdog.cli.probs.main import wrapped_main
+
+__all__ = [
+	"wrapped_main",
+]
--- a/deepdog/cli/probs/args.py
+++ b/deepdog/cli/probs/args.py
@ -0,0 +1,51 @@
+import argparse
+import os
+
+
+def parse_args() -> argparse.Namespace:
+	def dir_path(path):
+		if os.path.isdir(path):
+			return path
+		else:
+			raise argparse.ArgumentTypeError(f"readable_dir:{path} is not a valid path")
+
+	parser = argparse.ArgumentParser(
+		"probs", description="Calculating probability from finished bayesrun"
+	)
+	parser.add_argument(
+		"--log-file",
+		type=str,
+		help="A filename for logging to, if not provided will only log to stderr",
+		default=None,
+	)
+	parser.add_argument(
+		"--bayesrun-directory",
+		"-d",
+		type=dir_path,
+		help="The directory to search for bayesrun files, defaulting to cwd if not passed",
+		default=".",
+	)
+	parser.add_argument(
+		"--indexify-json",
+		help="A json file with the indexify config for parsing job indexes. Will skip if not present",
+		default="",
+	)
+	parser.add_argument(
+		"--coalesced-keys",
+		type=str,
+		help="A comma separated list of strings over which to coalesce data. By default coalesce over all fields within model names, ignore file level names",
+		default="",
+	)
+	parser.add_argument(
+		"--uncoalesced-outfile",
+		type=str,
+		help="output filename for uncoalesced data. If not provided, will not be written",
+		default=None,
+	)
+	parser.add_argument(
+		"--coalesced-outfile",
+		type=str,
+		help="output filename for coalesced data. If not provided, will not be written",
+		default=None,
+	)
+	return parser.parse_args()
--- a/deepdog/cli/probs/dicts.py
+++ b/deepdog/cli/probs/dicts.py
@ -0,0 +1,178 @@
+import typing
+from deepdog.results import BayesrunOutput
+import logging
+import csv
+import tqdm
+
+_logger = logging.getLogger(__name__)
+
+
+def build_model_dict(
+	bayes_outputs: typing.Sequence[BayesrunOutput],
+) -> typing.Dict[
+	typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
+]:
+	"""
+	Maybe someday do something smarter with the coalescing and stuff but don't want to so i won't
+	"""
+	# assume that everything is well formatted and the keys are the same across entire list and initialise list of keys.
+	# model dict will contain a model_key: {calculation_dict} where each calculation_dict represents a single calculation for that model,
+	# the uncoalesced version, keyed by the specific file keys
+	model_dict: typing.Dict[
+		typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
+	] = {}
+
+	_logger.info("building model dict")
+	for out in tqdm.tqdm(bayes_outputs, desc="reading outputs", leave=False):
+		for model_result in out.results:
+			model_key = tuple(v for v in model_result.parsed_model_keys.values())
+			if model_key not in model_dict:
+				model_dict[model_key] = {}
+			calculation_dict = model_dict[model_key]
+			calculation_key = tuple(v for v in out.data.values())
+			if calculation_key not in calculation_dict:
+				calculation_dict[calculation_key] = {
+					"_model_key_dict": model_result.parsed_model_keys,
+					"_calculation_key_dict": out.data,
+					"success": model_result.success,
+					"count": model_result.count,
+				}
+			else:
+				raise ValueError(
+					f"Got {calculation_key} twice for model_key {model_key}"
+				)
+
+	return model_dict
+
+
+def write_uncoalesced_dict(
+	uncoalesced_output_filename: typing.Optional[str],
+	uncoalesced_model_dict: typing.Dict[
+		typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
+	],
+):
+	if uncoalesced_output_filename is None or uncoalesced_output_filename == "":
+		_logger.warning("Not provided a uncoalesced filename, not going to try")
+		return
+
+	first_value = next(iter(next(iter(uncoalesced_model_dict.values())).values()))
+	model_field_names = set(first_value["_model_key_dict"].keys())
+	calculation_field_names = set(first_value["_calculation_key_dict"].keys())
+	if not (set(model_field_names).isdisjoint(calculation_field_names)):
+		_logger.info(f"Detected model field names {model_field_names}")
+		_logger.info(f"Detected calculation field names {calculation_field_names}")
+		_logger.warning(
+			f"model field names {model_field_names} and calculation {calculation_field_names} have an overlap, which is possibly a problem"
+		)
+	collected_fieldnames = list(model_field_names)
+	collected_fieldnames.extend(calculation_field_names)
+	collected_fieldnames.extend(["success", "count"])
+	_logger.info(f"Full uncoalesced fieldnames are {collected_fieldnames}")
+	with open(uncoalesced_output_filename, "w", newline="") as uncoalesced_output_file:
+		writer = csv.DictWriter(
+			uncoalesced_output_file, fieldnames=collected_fieldnames
+		)
+		writer.writeheader()
+
+		for model_dict in uncoalesced_model_dict.values():
+			for calculation in model_dict.values():
+				row = calculation["_model_key_dict"].copy()
+				row.update(calculation["_calculation_key_dict"].copy())
+				row.update(
+					{
+						"success": calculation["success"],
+						"count": calculation["count"],
+					}
+				)
+				writer.writerow(row)
+
+
+def coalesced_dict(
+	uncoalesced_model_dict: typing.Dict[
+		typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
+	],
+	minimum_count: float = 0.1,
+):
+	"""
+	pass in uncoalesced dict
+	the minimum_count field is what we use to make sure our probs are never zero
+	"""
+	coalesced_dict = {}
+
+	# we are already iterating so for no reason because performance really doesn't matter let's count the keys ourselves
+	num_keys = 0
+
+	# first pass coalesce
+	for model_key, model_dict in uncoalesced_model_dict.items():
+		num_keys += 1
+		for calculation in model_dict.values():
+			if model_key not in coalesced_dict:
+				coalesced_dict[model_key] = {
+					"_model_key_dict": calculation["_model_key_dict"].copy(),
+					"calculations_coalesced": 0,
+					"count": 0,
+					"success": 0,
+				}
+			sub_dict = coalesced_dict[model_key]
+			sub_dict["calculations_coalesced"] += 1
+			sub_dict["count"] += calculation["count"]
+			sub_dict["success"] += calculation["success"]
+
+	# second pass do probability calculation
+
+	prior = 1 / num_keys
+	_logger.info(f"Got {num_keys} model keys, so our prior will be {prior}")
+
+	total_weight = 0
+	for coalesced_model_dict in coalesced_dict.values():
+		model_weight = (
+			max(minimum_count, coalesced_model_dict["success"])
+			/ coalesced_model_dict["count"]
+		) * prior
+		total_weight += model_weight
+
+	total_prob = 0
+	for coalesced_model_dict in coalesced_dict.values():
+		model_weight = (
+			max(minimum_count, coalesced_model_dict["success"])
+			/ coalesced_model_dict["count"]
+		)
+		prob = model_weight * prior / total_weight
+		coalesced_model_dict["prob"] = prob
+		total_prob += prob
+
+	_logger.debug(
+		f"Got a total probability of {total_prob}, which should be close to 1 up to float/rounding error"
+	)
+	return coalesced_dict
+
+
+def write_coalesced_dict(
+	coalesced_output_filename: typing.Optional[str],
+	coalesced_model_dict: typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]],
+):
+	if coalesced_output_filename is None or coalesced_output_filename == "":
+		_logger.warning("Not provided a uncoalesced filename, not going to try")
+		return
+
+	first_value = next(iter(coalesced_model_dict.values()))
+	model_field_names = set(first_value["_model_key_dict"].keys())
+	_logger.info(f"Detected model field names {model_field_names}")
+
+	collected_fieldnames = list(model_field_names)
+	collected_fieldnames.extend(["calculations_coalesced", "success", "count", "prob"])
+	with open(coalesced_output_filename, "w", newline="") as coalesced_output_file:
+		writer = csv.DictWriter(coalesced_output_file, fieldnames=collected_fieldnames)
+		writer.writeheader()
+
+		for model_dict in coalesced_model_dict.values():
+			row = model_dict["_model_key_dict"].copy()
+			row.update(
+				{
+					"calculations_coalesced": model_dict["calculations_coalesced"],
+					"success": model_dict["success"],
+					"count": model_dict["count"],
+					"prob": model_dict["prob"],
+				}
+			)
+			writer.writerow(row)
--- a/deepdog/cli/probs/main.py
+++ b/deepdog/cli/probs/main.py
@ -0,0 +1,100 @@
+import logging
+import argparse
+import json
+import deepdog.cli.probs.args
+import deepdog.cli.probs.dicts
+import deepdog.results
+import deepdog.indexify
+import pathlib
+import tqdm
+import tqdm.contrib.logging
+
+
+_logger = logging.getLogger(__name__)
+
+
+def set_up_logging(log_file: str):
+
+	log_pattern = "%(asctime)s | %(levelname)-7s | %(name)s:%(lineno)d | %(message)s"
+	if log_file is None:
+		handlers = [
+			logging.StreamHandler(),
+		]
+	else:
+		handlers = [logging.StreamHandler(), logging.FileHandler(log_file)]
+	logging.basicConfig(
+		level=logging.DEBUG,
+		format=log_pattern,
+		# it's okay to ignore this mypy error because who cares about logger handler types
+		handlers=handlers,  # type: ignore
+	)
+	logging.captureWarnings(True)
+
+
+def main(args: argparse.Namespace):
+	"""
+	Main function with passed in arguments and no additional logging setup in case we want to extract out later
+	"""
+
+	with tqdm.contrib.logging.logging_redirect_tqdm():
+		_logger.info(f"args: {args}")
+
+		try:
+			if args.coalesced_keys:
+				raise NotImplementedError(
+					"Currently not supporting coalesced keys, but maybe in future"
+				)
+		except AttributeError:
+			# we don't care if this is missing because we don't actually want it to be there
+			pass
+
+		indexifier = None
+		if args.indexify_json:
+			with open(args.indexify_json, "r") as indexify_json_file:
+				indexify_spec = json.load(indexify_json_file)
+				indexify_data = indexify_spec["indexes"]
+				if "seed_spec" in indexify_spec:
+					seed_spec = indexify_spec["seed_spec"]
+					indexify_data[seed_spec["field_name"]] = list(
+						range(seed_spec["num_seeds"])
+					)
+				# _logger.debug(f"Indexifier data looks like {indexify_data}")
+				indexifier = deepdog.indexify.Indexifier(indexify_data)
+
+		bayes_dir = pathlib.Path(args.bayesrun_directory)
+		out_files = [f for f in bayes_dir.iterdir() if f.name.endswith("bayesrun.csv")]
+		_logger.info(
+			f"Reading {len(out_files)} bayesrun.csv files in directory {args.bayesrun_directory}"
+		)
+		# _logger.info(out_files)
+		parsed_output_files = [
+			deepdog.results.read_output_file(f, indexifier)
+			for f in tqdm.tqdm(out_files, desc="reading files", leave=False)
+		]
+
+		# Refactor here to allow for arbitrary likelihood file sources
+		_logger.info("building uncoalesced dict")
+		uncoalesced_dict = deepdog.cli.probs.dicts.build_model_dict(parsed_output_files)
+
+		if "uncoalesced_outfile" in args and args.uncoalesced_outfile:
+			deepdog.cli.probs.dicts.write_uncoalesced_dict(
+				args.uncoalesced_outfile, uncoalesced_dict
+			)
+		else:
+			_logger.info("Skipping writing uncoalesced")
+
+		_logger.info("building coalesced dict")
+		coalesced = deepdog.cli.probs.dicts.coalesced_dict(uncoalesced_dict)
+
+		if "coalesced_outfile" in args and args.coalesced_outfile:
+			deepdog.cli.probs.dicts.write_coalesced_dict(
+				args.coalesced_outfile, coalesced
+			)
+		else:
+			_logger.info("Skipping writing coalesced")
+
+
+def wrapped_main():
+	args = deepdog.cli.probs.args.parse_args()
+	set_up_logging(args.log_file)
+	main(args)
--- a/deepdog/cli/subset_sim_probs/init.py
+++ b/deepdog/cli/subset_sim_probs/init.py
@ -0,0 +1,5 @@
+from deepdog.cli.subset_sim_probs.main import wrapped_main
+
+__all__ = [
+	"wrapped_main",
+]
--- a/deepdog/cli/subset_sim_probs/args.py
+++ b/deepdog/cli/subset_sim_probs/args.py
@ -0,0 +1,52 @@
+import argparse
+import os
+
+
+def parse_args() -> argparse.Namespace:
+	def dir_path(path):
+		if os.path.isdir(path):
+			return path
+		else:
+			raise argparse.ArgumentTypeError(f"readable_dir:{path} is not a valid path")
+
+	parser = argparse.ArgumentParser(
+		"subset_sim_probs",
+		description="Calculating probability from finished subset sim run",
+	)
+	parser.add_argument(
+		"--log-file",
+		type=str,
+		help="A filename for logging to, if not provided will only log to stderr",
+		default=None,
+	)
+	parser.add_argument(
+		"--results-directory",
+		"-d",
+		type=dir_path,
+		help="The directory to search for bayesrun files, defaulting to cwd if not passed",
+		default=".",
+	)
+	parser.add_argument(
+		"--indexify-json",
+		help="A json file with the indexify config for parsing job indexes. Will skip if not present",
+		default="",
+	)
+	parser.add_argument(
+		"--outfile",
+		"-o",
+		type=str,
+		help="output filename for coalesced data. If not provided, will not be written",
+		default=None,
+	)
+	confirm_outfile_overwrite_group = parser.add_mutually_exclusive_group()
+	confirm_outfile_overwrite_group.add_argument(
+		"--never-overwrite-outfile",
+		action="store_true",
+		help="If a duplicate outfile is detected, skip confirmation and automatically exit early",
+	)
+	confirm_outfile_overwrite_group.add_argument(
+		"--force-overwrite-outfile",
+		action="store_true",
+		help="Skips checking for duplicate outfiles and overwrites",
+	)
+	return parser.parse_args()
--- a/deepdog/cli/subset_sim_probs/dicts.py
+++ b/deepdog/cli/subset_sim_probs/dicts.py
@ -0,0 +1,136 @@
+import typing
+from deepdog.results import GeneralOutput
+import logging
+import csv
+import tqdm
+
+_logger = logging.getLogger(__name__)
+
+
+def build_model_dict(
+	general_outputs: typing.Sequence[GeneralOutput],
+) -> typing.Dict[
+	typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
+]:
+	"""
+	Maybe someday do something smarter with the coalescing and stuff but don't want to so i won't
+	"""
+	# assume that everything is well formatted and the keys are the same across entire list and initialise list of keys.
+	# model dict will contain a model_key: {calculation_dict} where each calculation_dict represents a single calculation for that model,
+	# the uncoalesced version, keyed by the specific file keys
+	model_dict: typing.Dict[
+		typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
+	] = {}
+
+	_logger.info("building model dict")
+	for out in tqdm.tqdm(general_outputs, desc="reading outputs", leave=False):
+		for model_result in out.results:
+			model_key = tuple(v for v in model_result.parsed_model_keys.values())
+			if model_key not in model_dict:
+				model_dict[model_key] = {}
+			calculation_dict = model_dict[model_key]
+			calculation_key = tuple(v for v in out.data.values())
+			if calculation_key not in calculation_dict:
+				calculation_dict[calculation_key] = {
+					"_model_key_dict": model_result.parsed_model_keys,
+					"_calculation_key_dict": out.data,
+					"num_finished_runs": int(
+						model_result.result_dict["num_finished_runs"]
+					),
+					"num_runs": int(model_result.result_dict["num_runs"]),
+					"estimated_likelihood": float(
+						model_result.result_dict["estimated_likelihood"]
+					),
+				}
+			else:
+				raise ValueError(
+					f"Got {calculation_key} twice for model_key {model_key}"
+				)
+
+	return model_dict
+
+
+def coalesced_dict(
+	uncoalesced_model_dict: typing.Dict[
+		typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
+	],
+):
+	"""
+	pass in uncoalesced dict
+	the minimum_count field is what we use to make sure our probs are never zero
+	"""
+	coalesced_dict = {}
+
+	# we are already iterating so for no reason because performance really doesn't matter let's count the keys ourselves
+	num_keys = 0
+
+	# first pass coalesce
+	for model_key, model_dict in uncoalesced_model_dict.items():
+		num_keys += 1
+		for calculation in model_dict.values():
+			if model_key not in coalesced_dict:
+				coalesced_dict[model_key] = {
+					"_model_key_dict": calculation["_model_key_dict"].copy(),
+					"calculations_coalesced": 1,
+					"num_finished_runs": calculation["num_finished_runs"],
+					"num_runs": calculation["num_runs"],
+					"estimated_likelihood": calculation["estimated_likelihood"],
+				}
+			else:
+				_logger.error(f"We shouldn't be here! Double key for {model_key=}")
+				raise ValueError()
+
+	# second pass do probability calculation
+
+	prior = 1 / num_keys
+	_logger.info(f"Got {num_keys} model keys, so our prior will be {prior}")
+
+	total_weight = 0
+	for coalesced_model_dict in coalesced_dict.values():
+		model_weight = coalesced_model_dict["estimated_likelihood"] * prior
+		total_weight += model_weight
+
+	total_prob = 0
+	for coalesced_model_dict in coalesced_dict.values():
+		likelihood = coalesced_model_dict["estimated_likelihood"]
+		prob = likelihood * prior / total_weight
+		coalesced_model_dict["prob"] = prob
+		total_prob += prob
+
+	_logger.debug(
+		f"Got a total probability of {total_prob}, which should be close to 1 up to float/rounding error"
+	)
+	return coalesced_dict
+
+
+def write_coalesced_dict(
+	coalesced_output_filename: typing.Optional[str],
+	coalesced_model_dict: typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]],
+):
+	if coalesced_output_filename is None or coalesced_output_filename == "":
+		_logger.warning("Not provided a uncoalesced filename, not going to try")
+		return
+
+	first_value = next(iter(coalesced_model_dict.values()))
+	model_field_names = set(first_value["_model_key_dict"].keys())
+	_logger.info(f"Detected model field names {model_field_names}")
+
+	collected_fieldnames = list(model_field_names)
+	collected_fieldnames.extend(
+		["calculations_coalesced", "num_finished_runs", "num_runs", "prob"]
+	)
+	with open(coalesced_output_filename, "w", newline="") as coalesced_output_file:
+		writer = csv.DictWriter(coalesced_output_file, fieldnames=collected_fieldnames)
+		writer.writeheader()
+
+		for model_dict in coalesced_model_dict.values():
+			row = model_dict["_model_key_dict"].copy()
+			row.update(
+				{
+					"calculations_coalesced": model_dict["calculations_coalesced"],
+					"num_finished_runs": model_dict["num_finished_runs"],
+					"num_runs": model_dict["num_runs"],
+					"prob": model_dict["prob"],
+				}
+			)
+			writer.writerow(row)
--- a/deepdog/cli/subset_sim_probs/main.py
+++ b/deepdog/cli/subset_sim_probs/main.py
@ -0,0 +1,113 @@
+import logging
+import argparse
+import json
+
+import deepdog.cli.subset_sim_probs.args
+import deepdog.cli.subset_sim_probs.dicts
+import deepdog.cli.util
+import deepdog.results
+import deepdog.indexify
+import pathlib
+import tqdm
+import os
+import tqdm.contrib.logging
+
+
+_logger = logging.getLogger(__name__)
+
+
+def set_up_logging(log_file: str):
+
+	log_pattern = "%(asctime)s | %(levelname)-7s | %(name)s:%(lineno)d | %(message)s"
+	if log_file is None:
+		handlers = [
+			logging.StreamHandler(),
+		]
+	else:
+		handlers = [logging.StreamHandler(), logging.FileHandler(log_file)]
+	logging.basicConfig(
+		level=logging.DEBUG,
+		format=log_pattern,
+		# it's okay to ignore this mypy error because who cares about logger handler types
+		handlers=handlers,  # type: ignore
+	)
+	logging.captureWarnings(True)
+
+
+def main(args: argparse.Namespace):
+	"""
+	Main function with passed in arguments and no additional logging setup in case we want to extract out later
+	"""
+
+	with tqdm.contrib.logging.logging_redirect_tqdm():
+		_logger.info(f"args: {args}")
+
+		if "outfile" in args and args.outfile:
+			if os.path.exists(args.outfile):
+				if args.never_overwrite_outfile:
+					_logger.warning(
+						f"Filename {args.outfile} already exists, and never want overwrite, so aborting."
+					)
+					return
+				elif args.force_overwrite_outfile:
+					_logger.warning(f"Forcing overwrite of {args.outfile}")
+				else:
+					# need to confirm
+					confirm_overwrite = deepdog.cli.util.confirm_prompt(
+						f"Filename {args.outfile} exists, overwrite?"
+					)
+					if not confirm_overwrite:
+						_logger.warning(
+							f"Filename {args.outfile} already exists and do not want overwrite, aborting."
+						)
+						return
+					else:
+						_logger.warning(f"Overwriting file {args.outfile}")
+
+		indexifier = None
+		if args.indexify_json:
+			with open(args.indexify_json, "r") as indexify_json_file:
+				indexify_spec = json.load(indexify_json_file)
+				indexify_data = indexify_spec["indexes"]
+				if "seed_spec" in indexify_spec:
+					seed_spec = indexify_spec["seed_spec"]
+					indexify_data[seed_spec["field_name"]] = list(
+						range(seed_spec["num_seeds"])
+					)
+				# _logger.debug(f"Indexifier data looks like {indexify_data}")
+				indexifier = deepdog.indexify.Indexifier(indexify_data)
+
+		results_dir = pathlib.Path(args.results_directory)
+		out_files = [
+			f for f in results_dir.iterdir() if f.name.endswith("subsetsim.csv")
+		]
+		_logger.info(
+			f"Reading {len(out_files)} subsetsim.csv files in directory {args.results_directory}"
+		)
+		# _logger.info(out_files)
+		parsed_output_files = [
+			deepdog.results.read_subset_sim_file(f, indexifier)
+			for f in tqdm.tqdm(out_files, desc="reading files", leave=False)
+		]
+
+		# Refactor here to allow for arbitrary likelihood file sources
+		_logger.info("building uncoalesced dict")
+		uncoalesced_dict = deepdog.cli.subset_sim_probs.dicts.build_model_dict(
+			parsed_output_files
+		)
+
+		_logger.info("building coalesced dict")
+		coalesced = deepdog.cli.subset_sim_probs.dicts.coalesced_dict(uncoalesced_dict)
+
+		if "outfile" in args and args.outfile:
+			deepdog.cli.subset_sim_probs.dicts.write_coalesced_dict(
+				args.outfile, coalesced
+			)
+		else:
+			_logger.info("Skipping writing coalesced")
+
+
+def wrapped_main():
+	args = deepdog.cli.subset_sim_probs.args.parse_args()
+	set_up_logging(args.log_file)
+	main(args)
--- a/deepdog/cli/util/init.py
+++ b/deepdog/cli/util/init.py
@ -0,0 +1,3 @@
+from deepdog.cli.util.confirm import confirm_prompt
+
+__all__ = ["confirm_prompt"]
--- a/deepdog/cli/util/confirm.py
+++ b/deepdog/cli/util/confirm.py
@ -0,0 +1,23 @@
+_RESPONSE_MAP = {
+	"yes": True,
+	"ye": True,
+	"y": True,
+	"no": False,
+	"n": False,
+	"nope": False,
+	"true": True,
+	"false": False,
+}
+
+
+def confirm_prompt(question: str) -> bool:
+	"""Prompt with the question and returns yes or no based on response."""
+	prompt = question + " [y/n]: "
+
+	while True:
+		choice = input(prompt).lower()
+
+		if choice in _RESPONSE_MAP:
+			return _RESPONSE_MAP[choice]
+		else:
+			print('Respond with "yes" or "no"')
--- a/deepdog/diagnostic.py
+++ b/deepdog/diagnostic.py
@ -1,80 +0,0 @@
-from pdme.measurement import OscillatingDipole, OscillatingDipoleArrangement
-import pdme
-from deepdog.bayes_run import DotInput
-import datetime
-import numpy
-import logging
-from typing import Sequence, Tuple
-import csv
-import itertools
-import multiprocessing
-
-_logger = logging.getLogger(__name__)
-
-
-def get_a_result(discretisation, dots, index):
-	return (index, discretisation.solve_for_index(dots, index))
-
-
-class Diagnostic():
-	'''
-	Represents a diagnostic for a single dipole moment given a set of discretisations.
-
-	Parameters
-	----------
-	dot_inputs : Sequence[DotInput]
-		The dot inputs for this diagnostic.
-	discretisations_with_names : Sequence[Tuple(str, pdme.model.Model)]
-		The models to evaluate.
-	actual_model_discretisation : pdme.model.Discretisation
-		The discretisation for the model which is actually correct.
-	filename_slug : str
-		The filename slug to include.
-	run_count: int
-		The number of runs to do.
-	'''
-	def __init__(self, actual_dipole_moment: numpy.ndarray, actual_dipole_position: numpy.ndarray, actual_dipole_frequency: float, dot_inputs: Sequence[DotInput], discretisations_with_names: Sequence[Tuple[str, pdme.model.Discretisation]], filename_slug: str) -> None:
-		self.dipoles = OscillatingDipoleArrangement([OscillatingDipole(actual_dipole_moment, actual_dipole_position, actual_dipole_frequency)])
-		self.dots = self.dipoles.get_dot_measurements(dot_inputs)
-
-		self.discretisations_with_names = discretisations_with_names
-		self.model_count = len(self.discretisations_with_names)
-
-		self.csv_fields = ["model", "index", "bounds", "actual_dipole_moment", "actual_dipole_position", "actual_dipole_freq", "success", "result"]
-
-		timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
-		self.filename = f"{timestamp}-{filename_slug}.csv"
-
-	def go(self):
-		with open(self.filename, "a", newline="") as outfile:
-			# csv fields
-			writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect='unix')
-			writer.writeheader()
-
-		for (name, discretisation) in self.discretisations_with_names:
-			_logger.info(f"Working on discretisation {name}")
-
-			results = []
-			with multiprocessing.Pool(multiprocessing.cpu_count() - 1 or 1) as pool:
-				results = pool.starmap(get_a_result, zip(itertools.repeat(discretisation), itertools.repeat(self.dots), discretisation.all_indices()))
-
-			with open(self.filename, "a", newline='') as outfile:
-				writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect='unix')
-
-				for idx, result in results:
-
-					bounds = discretisation.bounds(idx)
-
-					actual_success = result.success and result.cost <= 1e-10
-					row = {
-						"model": name,
-						"index": idx,
-						"bounds": bounds,
-						"actual_dipole_moment": self.dipoles.dipoles[0].p,
-						"actual_dipole_position": self.dipoles.dipoles[0].s,
-						"actual_dipole_freq": self.dipoles.dipoles[0].w,
-						"success": actual_success,
-						"result": result.normalised_x if actual_success else None,
-					}
-					_logger.debug(f"Writing result {row}")
-					writer.writerow(row)
--- a/deepdog/direct_monte_carlo/init.py
+++ b/deepdog/direct_monte_carlo/init.py
@ -0,0 +1,6 @@
+from deepdog.direct_monte_carlo.direct_mc import (
+	DirectMonteCarloRun,
+	DirectMonteCarloConfig,
+)
+
+__all__ = ["DirectMonteCarloRun", "DirectMonteCarloConfig"]
--- a/deepdog/direct_monte_carlo/compose_filter.py
+++ b/deepdog/direct_monte_carlo/compose_filter.py
@ -0,0 +1,14 @@
+from typing import Sequence
+from deepdog.direct_monte_carlo.direct_mc import DirectMonteCarloFilter
+import numpy
+
+
+class ComposedDMCFilter(DirectMonteCarloFilter):
+	def __init__(self, filters: Sequence[DirectMonteCarloFilter]):
+		self.filters = filters
+
+	def filter_samples(self, samples: numpy.ndarray) -> numpy.ndarray:
+		current_sample = samples
+		for filter in self.filters:
+			current_sample = filter.filter_samples(current_sample)
+		return current_sample
--- a/deepdog/direct_monte_carlo/cost_function_filter.py
+++ b/deepdog/direct_monte_carlo/cost_function_filter.py
@ -0,0 +1,24 @@
+from deepdog.direct_monte_carlo.direct_mc import DirectMonteCarloFilter
+from typing import Callable
+import numpy
+
+
+class CostFunctionTargetFilter(DirectMonteCarloFilter):
+	def __init__(
+		self,
+		cost_function: Callable[[numpy.ndarray], numpy.ndarray],
+		target_cost: float,
+	):
+		"""
+		Filters dipoles by cost, only leaving dipoles with cost below target_cost
+		"""
+		self.cost_function = cost_function
+		self.target_cost = target_cost
+
+	def filter_samples(self, samples: numpy.ndarray) -> numpy.ndarray:
+		current_sample = samples
+
+		costs = self.cost_function(current_sample)
+
+		current_sample = current_sample[costs < self.target_cost]
+		return current_sample
--- a/deepdog/direct_monte_carlo/direct_mc.py
+++ b/deepdog/direct_monte_carlo/direct_mc.py
@ -0,0 +1,435 @@
+import re
+import pathlib
+import csv
+import pdme.model
+import pdme.measurement
+import pdme.measurement.input_types
+import pdme.subspace_simulation
+import datetime
+from typing import Tuple, Dict, NewType, Any, Sequence
+from dataclasses import dataclass
+import logging
+import numpy
+import numpy.random
+import pdme.util.fast_v_calc
+import multiprocessing
+
+_logger = logging.getLogger(__name__)
+
+ANTI_ZERO_SUCCESS_THRES = 0.1
+
+
+@dataclass
+class DirectMonteCarloResult:
+	successes: int
+	monte_carlo_count: int
+	likelihood: float
+	model_name: str
+
+
+@dataclass
+class DirectMonteCarloConfig:
+	monte_carlo_count_per_cycle: int = 10000
+	monte_carlo_cycles: int = 10
+	target_success: int = 100
+	max_monte_carlo_cycles_steps: int = 10
+	monte_carlo_seed: int = 1234
+	write_successes_to_file: bool = False
+	tag: str = ""
+	cap_core_count: int = 0  # 0 means cap at num cores - 1
+	chunk_size: int = 50
+	# chunk size of some kind
+	write_bayesrun_file: bool = True
+	bayesrun_file_timestamp: bool = True
+	skip_if_exists: bool = False
+
+	def get_filename(self) -> str:
+		"""
+		Generate a filename for the output of this run.
+		"""
+		# set starting execution timestamp
+		timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
+
+		if self.bayesrun_file_timestamp:
+			timestamp_str = f"{timestamp}-"
+		else:
+			timestamp_str = ""
+		filename = f"{timestamp_str}{self.tag}.realdata.fast_filter.bayesrun.csv"
+		_logger.debug(f"Got filename {filename}")
+		return filename
+
+	def get_filename_regex(self) -> str:
+		"""
+		Generate a regex for the output of this run.
+		"""
+
+		# having both timestamp and the hyphen separately optional is a bit of a hack
+		# too loose, but will never matter
+		pattern = rf"(?P<timestamp>\d{{8}}-\d{{6}})?-?{self.tag}\.realdata\.fast_filter\.bayesrun\.csv"
+		return pattern
+
+
+# Aliasing dict as a generic data container
+DirectMonteCarloData = NewType("DirectMonteCarloData", Dict[str, Any])
+
+
+class DirectMonteCarloFilter:
+	"""
+	Abstract class for filtering out samples matching some criteria. Initialise with data as needed,
+	then filter out samples as needed.
+	"""
+
+	def filter_samples(self, samples: numpy.ndarray) -> numpy.ndarray:
+		raise NotImplementedError
+
+
+class DirectMonteCarloRun:
+	"""
+	A single model Direct Monte Carlo run, currently implemented only using single threading.
+	An encapsulation of the steps needed for a Bayes run.
+
+	Parameters
+	----------
+	model_name_pairs : Sequence[Tuple(str, pdme.model.DipoleModel)]
+	The models to evaluate, with names
+
+	measurements: Sequence[pdme.measurement.DotRangeMeasurement]
+	The measurements as dot ranges to use as the bounds for the Monte Carlo calculation.
+
+	monte_carlo_count_per_cycle: int
+	The number of Monte Carlo iterations to use in a single cycle calculation.
+
+	monte_carlo_cycles: int
+	The number of cycles to use in each step.
+	Increasing monte_carlo_count_per_cycle increases memory usage (and runtime), while this increases runtime, allowing
+	control over memory use.
+
+	target_success: int
+	The number of successes to target before exiting early.
+	Should likely be ~100 but can go higher to.
+
+	max_monte_carlo_cycles_steps: int
+	The number of steps to use. Each step consists of monte_carlo_cycles cycles, each of which has monte_carlo_count_per_cycle iterations.
+
+	monte_carlo_seed: int
+	The seed to use for the RNG.
+	"""
+
+	def __init__(
+		self,
+		model_name_pairs: Sequence[Tuple[str, pdme.model.DipoleModel]],
+		filter: DirectMonteCarloFilter,
+		config: DirectMonteCarloConfig,
+	):
+		self.model_name_pairs = model_name_pairs
+
+		# self.measurements = measurements
+		# self.dot_inputs = [(measure.r, measure.f) for measure in self.measurements]
+
+		# self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
+		# 	self.dot_inputs
+		# )
+
+		self.config = config
+		self.filter = filter
+		# (
+		# 	self.lows,
+		# 	self.highs,
+		# ) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
+		# 	self.measurements
+		# )
+
+	def _single_run(
+		self, model_name_pair: Tuple[str, pdme.model.DipoleModel], seed
+	) -> numpy.ndarray:
+		rng = numpy.random.default_rng(seed)
+
+		_, model = model_name_pair
+		# don't log here it's madness
+		# _logger.info(f"Executing for model {model_name}")
+
+		sample_dipoles = model.get_monte_carlo_dipole_inputs(
+			self.config.monte_carlo_count_per_cycle, -1, rng
+		)
+
+		current_sample = sample_dipoles
+
+		return self.filter.filter_samples(current_sample)
+		# for di, low, high in zip(self.dot_inputs_array, self.lows, self.highs):
+
+		# 	if len(current_sample) < 1:
+		# 		break
+		# 	vals = pdme.util.fast_v_calc.fast_vs_for_dipoleses(
+		# 		numpy.array([di]), current_sample
+		# 	)
+
+		# 	current_sample = current_sample[
+		# 		numpy.all((vals > low) & (vals < high), axis=1)
+		# 	]
+		# return current_sample
+
+	def _wrapped_single_run(self, args: Tuple):
+		"""
+		single run wrapped up for multiprocessing call.
+
+		takes in a tuple of arguments corresponding to
+		(model_name_pair, seed, return_configs)
+
+		return_configs is a boolean, if true then will return tuple of (count, [matching configs])
+		if false, return (count, [])
+		"""
+		# here's where we do our work
+
+		model_name_pair, seed, return_configs = args
+		cycle_success_configs = self._single_run(model_name_pair, seed)
+		cycle_success_count = len(cycle_success_configs)
+
+		if return_configs:
+			return (cycle_success_count, cycle_success_configs)
+		else:
+			return (cycle_success_count, [])
+
+	def execute_no_multiprocessing(self) -> Sequence[DirectMonteCarloResult]:
+
+		count_per_step = (
+			self.config.monte_carlo_count_per_cycle * self.config.monte_carlo_cycles
+		)
+		seed_sequence = numpy.random.SeedSequence(self.config.monte_carlo_seed)
+
+		# core count etc. logic here
+
+		results = []
+		for model_name_pair in self.model_name_pairs:
+			step_count = 0
+			total_success = 0
+			total_count = 0
+
+			_logger.info(f"Working on model {model_name_pair[0]}")
+			# This is probably where multiprocessing logic should go
+			while (step_count < self.config.max_monte_carlo_cycles_steps) and (
+				total_success < self.config.target_success
+			):
+				_logger.debug(f"Executing step {step_count}")
+				for cycle_i, seed in enumerate(
+					seed_sequence.spawn(self.config.monte_carlo_cycles)
+				):
+					# here's where we do our work
+					cycle_success_configs = self._single_run(model_name_pair, seed)
+					cycle_success_count = len(cycle_success_configs)
+					if cycle_success_count > 0:
+						_logger.debug(
+							f"For cycle {cycle_i} received {cycle_success_count} successes"
+						)
+						# _logger.debug(cycle_success_configs)
+						if self.config.write_successes_to_file:
+							sorted_by_freq = numpy.array(
+								[
+									pdme.subspace_simulation.sort_array_of_dipoles_by_frequency(
+										dipole_config
+									)
+									for dipole_config in cycle_success_configs
+								]
+							)
+							dipole_count = numpy.array(cycle_success_configs).shape[1]
+							for n in range(dipole_count):
+								number_dipoles_to_write = self.config.target_success * 5
+								_logger.info(f"Limiting to {number_dipoles_to_write=}")
+								numpy.savetxt(
+									f"{self.config.tag}_{step_count}_{cycle_i}_dipole_{n}.csv",
+									sorted_by_freq[:number_dipoles_to_write, n],
+									delimiter=",",
+								)
+					total_success += cycle_success_count
+				_logger.debug(
+					f"At end of step {step_count} have {total_success} successes"
+				)
+				step_count += 1
+				total_count += count_per_step
+
+			results.append(
+				DirectMonteCarloResult(
+					successes=total_success,
+					monte_carlo_count=total_count,
+					likelihood=total_success / total_count,
+					model_name=model_name_pair[0],
+				)
+			)
+		return results
+
+	def execute(self) -> Sequence[DirectMonteCarloResult]:
+
+		filename = self.config.get_filename()
+		if self.config.skip_if_exists:
+			_logger.info(f"Checking if {filename} exists")
+			cwd = pathlib.Path.cwd()
+			if (cwd / filename).exists():
+				_logger.info(f"File {filename} exists, skipping")
+				return []
+			if self.config.bayesrun_file_timestamp:
+				_logger.info(
+					"Also need to check file endings because of possible past or current timestamps, check only occurs if writing timestamp is set"
+				)
+				pattern = self.config.get_filename_regex()
+				for file in cwd.iterdir():
+					match = re.match(pattern, file.name)
+					if match is not None:
+						_logger.info(f"Matched {file.name} to {pattern}")
+						_logger.info(f"File {filename} exists, skipping")
+						return []
+				_logger.info(
+					f"Finished checking against pattern {pattern}, hopefully didn't take too long!"
+				)
+
+		count_per_step = (
+			self.config.monte_carlo_count_per_cycle * self.config.monte_carlo_cycles
+		)
+		seed_sequence = numpy.random.SeedSequence(self.config.monte_carlo_seed)
+
+		# core count etc. logic here
+		core_count = multiprocessing.cpu_count() - 1 or 1
+		if (self.config.cap_core_count >= 1) and (
+			self.config.cap_core_count < core_count
+		):
+			core_count = self.config.cap_core_count
+		_logger.info(f"Using {core_count} cores")
+
+		results = []
+		with multiprocessing.Pool(core_count) as pool:
+
+			for model_name_pair in self.model_name_pairs:
+				_logger.info(f"Working on model {model_name_pair[0]}")
+				# This is probably where multiprocessing logic should go
+
+				step_count = 0
+				total_success = 0
+				total_count = 0
+
+				while (step_count < self.config.max_monte_carlo_cycles_steps) and (
+					total_success < self.config.target_success
+				):
+
+					step_count += 1
+
+					_logger.debug(f"Executing step {step_count}")
+
+					seeds = seed_sequence.spawn(self.config.monte_carlo_cycles)
+
+					raw_pool_results = list(
+						pool.imap_unordered(
+							self._wrapped_single_run,
+							[
+								(
+									model_name_pair,
+									seed,
+									self.config.write_successes_to_file,
+								)
+								for seed in seeds
+							],
+							self.config.chunk_size,
+						)
+					)
+
+					pool_results = sum(result[0] for result in raw_pool_results)
+
+					_logger.debug(f"Pool results: {pool_results}")
+
+					if self.config.write_successes_to_file:
+
+						_logger.info("Writing dipole results")
+
+						cycle_success_configs = numpy.concatenate(
+							[result[1] for result in raw_pool_results]
+						)
+
+						dipole_count = numpy.array(cycle_success_configs).shape[1]
+
+						max_number_dipoles_to_write = self.config.target_success * 5
+						_logger.debug(
+							f"Limiting to {max_number_dipoles_to_write=}, have {len(cycle_success_configs)}"
+						)
+
+						if len(cycle_success_configs):
+							sorted_by_freq = numpy.array(
+								[
+									pdme.subspace_simulation.sort_array_of_dipoles_by_frequency(
+										dipole_config
+									)
+									for dipole_config in cycle_success_configs[
+										:max_number_dipoles_to_write
+									]
+								]
+							)
+
+							for n in range(dipole_count):
+
+								dipole_filename = (
+									f"{self.config.tag}_{step_count}_dipole_{n}.csv"
+								)
+								_logger.debug(
+									f"Writing {min(len(cycle_success_configs), max_number_dipoles_to_write)} to {dipole_filename}"
+								)
+
+								numpy.savetxt(
+									dipole_filename,
+									sorted_by_freq[:, n],
+									delimiter=",",
+								)
+						else:
+							_logger.debug(
+								"Instructed to write results, but none obtained"
+							)
+
+					total_success += pool_results
+					total_count += count_per_step
+					_logger.debug(
+						f"At end of step {step_count} have {total_success} successes"
+					)
+
+				results.append(
+					DirectMonteCarloResult(
+						successes=total_success,
+						monte_carlo_count=total_count,
+						likelihood=total_success / total_count,
+						model_name=model_name_pair[0],
+					)
+				)
+
+		if self.config.write_bayesrun_file:
+
+			_logger.info(f"Going to write to file [{filename}]")
+			# row: Dict[str, Union[int, float, str]] = {}
+			row = {}
+
+			num_models = len(self.model_name_pairs)
+			success_weight = sum(
+				[
+					(
+						max(ANTI_ZERO_SUCCESS_THRES, res.successes)
+						/ res.monte_carlo_count
+					)
+					/ num_models
+					for res in results
+				]
+			)
+
+			for res in results:
+				row.update(
+					{
+						f"{res.model_name}_success": res.successes,
+						f"{res.model_name}_count": res.monte_carlo_count,
+						f"{res.model_name}_prob": (
+							max(ANTI_ZERO_SUCCESS_THRES, res.successes)
+							/ res.monte_carlo_count
+						)
+						/ (num_models * success_weight),
+					}
+				)
+			_logger.info(f"Writing row {row}")
+			fieldnames = list(row.keys())
+
+			with open(filename, "w", newline="") as outfile:
+				writer = csv.DictWriter(outfile, fieldnames=fieldnames, dialect="unix")
+				writer.writeheader()
+				writer.writerow(row)
+
+		return results
--- a/deepdog/direct_monte_carlo/dmc_filters.py
+++ b/deepdog/direct_monte_carlo/dmc_filters.py
@ -0,0 +1,115 @@
+from numpy import ndarray
+from deepdog.direct_monte_carlo.direct_mc import DirectMonteCarloFilter
+from typing import Sequence
+import pdme.measurement
+import pdme.measurement.input_types
+import pdme.util.fast_nonlocal_spectrum
+import pdme.util.fast_v_calc
+import numpy
+
+
+class SingleDotPotentialFilter(DirectMonteCarloFilter):
+	def __init__(self, measurements: Sequence[pdme.measurement.DotRangeMeasurement]):
+		self.measurements = measurements
+		self.dot_inputs = [(measure.r, measure.f) for measure in self.measurements]
+
+		self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
+			self.dot_inputs
+		)
+		(
+			self.lows,
+			self.highs,
+		) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
+			self.measurements
+		)
+
+	def filter_samples(self, samples: ndarray) -> ndarray:
+		current_sample = samples
+		for di, low, high in zip(self.dot_inputs_array, self.lows, self.highs):
+
+			if len(current_sample) < 1:
+				break
+			vals = pdme.util.fast_v_calc.fast_vs_for_dipoleses(
+				numpy.array([di]), current_sample
+			)
+
+			current_sample = current_sample[
+				numpy.all((vals > low) & (vals < high), axis=1)
+			]
+		return current_sample
+
+
+class SingleDotSpinQubitFrequencyFilter(DirectMonteCarloFilter):
+	def __init__(self, measurements: Sequence[pdme.measurement.DotRangeMeasurement]):
+		self.measurements = measurements
+		self.dot_inputs = [(measure.r, measure.f) for measure in self.measurements]
+
+		self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
+			self.dot_inputs
+		)
+		(
+			self.lows,
+			self.highs,
+		) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
+			self.measurements
+		)
+
+	def filter_samples(self, samples: ndarray) -> ndarray:
+		current_sample = samples
+		for di, low, high in zip(self.dot_inputs_array, self.lows, self.highs):
+
+			if len(current_sample) < 1:
+				break
+			vals = pdme.util.fast_v_calc.fast_efieldxs_for_dipoleses(
+				numpy.array([di]), current_sample
+			)
+			# _logger.info(vals)
+
+			current_sample = current_sample[
+				numpy.all((vals > low) & (vals < high), axis=1)
+			]
+		# _logger.info(f"leaving with {len(current_sample)}")
+		return current_sample
+
+
+class DoubleDotSpinQubitFrequencyFilter(DirectMonteCarloFilter):
+	def __init__(
+		self,
+		pair_phase_measurements: Sequence[pdme.measurement.DotPairRangeMeasurement],
+	):
+		self.pair_phase_measurements = pair_phase_measurements
+		self.dot_pair_inputs = [
+			(measure.r1, measure.r2, measure.f)
+			for measure in self.pair_phase_measurements
+		]
+		self.dot_pair_inputs_array = (
+			pdme.measurement.input_types.dot_pair_inputs_to_array(self.dot_pair_inputs)
+		)
+		(
+			self.pair_phase_lows,
+			self.pair_phase_highs,
+		) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
+			self.pair_phase_measurements
+		)
+
+	def filter_samples(self, samples: ndarray) -> ndarray:
+		current_sample = samples
+
+		for pi, plow, phigh in zip(
+			self.dot_pair_inputs_array, self.pair_phase_lows, self.pair_phase_highs
+		):
+			if len(current_sample) < 1:
+				break
+
+			vals = pdme.util.fast_nonlocal_spectrum.signarg(
+				pdme.util.fast_nonlocal_spectrum.fast_s_spin_qubit_tarucha_nonlocal_dipoleses(
+					numpy.array([pi]), current_sample
+				)
+			)
+			current_sample = current_sample[
+				numpy.all(
+					((vals > plow) & (vals < phigh)) | ((vals < plow) & (vals > phigh)),
+					axis=1,
+				)
+			]
+		return current_sample
--- a/deepdog/indexify/init.py
+++ b/deepdog/indexify/init.py
@ -0,0 +1,62 @@
+"""
+Probably should just include a way to handle the indexify function I reuse so much.
+
+All about breaking an integer into a tuple of values from lists, which is useful because of how we do CHTC runs.
+"""
+import itertools
+import typing
+import logging
+import math
+
+_logger = logging.getLogger(__name__)
+
+
+# from https://stackoverflow.com/questions/5228158/cartesian-product-of-a-dictionary-of-lists
+def _dict_product(dicts):
+	"""
+	>>> list(dict_product(dict(number=[1,2], character='ab')))
+	[{'character': 'a', 'number': 1},
+	{'character': 'a', 'number': 2},
+	{'character': 'b', 'number': 1},
+	{'character': 'b', 'number': 2}]
+	"""
+	return list(dict(zip(dicts.keys(), x)) for x in itertools.product(*dicts.values()))
+
+
+class Indexifier:
+	"""
+	The order of keys is very important, but collections.OrderedDict is no longer needed in python 3.7.
+	I think it's okay to rely on that.
+	"""
+
+	def __init__(self, list_dict: typing.Dict[str, typing.Sequence]):
+		self.dict = list_dict
+		self.product_dict = _dict_product(self.dict)
+
+	def indexify(self, n: int) -> typing.Dict[str, typing.Any]:
+		return self.product_dict[n]
+
+	def __len__(self) -> int:
+		weights = [len(v) for v in self.dict.values()]
+		return math.prod(weights)
+
+	def _indexify_indices(self, n: int) -> typing.Sequence[int]:
+		"""
+		legacy indexify from old scripts, copypast.
+		could be used like
+		>>> ret = {}
+		>>> for k, i in zip(self.dict.keys(), self._indexify_indices):
+		>>>		ret[k] = self.dict[k][i]
+		>>> return ret
+		"""
+		weights = [len(v) for v in self.dict.values()]
+		N = math.prod(weights)
+		curr_n = n
+		curr_N = N
+		out = []
+		for w in weights[:-1]:
+			# print(f"current: {curr_N}, {curr_n}, {curr_n // w}")
+			curr_N = curr_N // w  # should be int division anyway
+			out.append(curr_n // curr_N)
+			curr_n = curr_n % curr_N
+		return out
--- a/deepdog/meta.py
+++ b/deepdog/meta.py
@ -1,3 +1,3 @@
 from importlib.metadata import version

-__version__ = version('deepdog')
+__version__ = version("deepdog")
--- a/deepdog/real_spectrum_run.py
+++ b/deepdog/real_spectrum_run.py
@ -0,0 +1,442 @@
+import pdme.inputs
+import pdme.model
+import pdme.measurement
+import pdme.measurement.input_types
+import pdme.measurement.oscillating_dipole
+import pdme.util.fast_v_calc
+import pdme.util.fast_nonlocal_spectrum
+from typing import Sequence, Tuple, List, Dict, Union, Optional
+import datetime
+import csv
+import multiprocessing
+import logging
+import numpy
+
+
+# TODO: remove hardcode
+CHUNKSIZE = 50
+
+
+_logger = logging.getLogger(__name__)
+
+
+def get_a_result_fast_filter_pairs(input) -> int:
+	(
+		model,
+		dot_inputs,
+		lows,
+		highs,
+		pair_inputs,
+		pair_lows,
+		pair_highs,
+		monte_carlo_count,
+		seed,
+	) = input
+
+	rng = numpy.random.default_rng(seed)
+	# TODO: A long term refactor is to pull the frequency stuff out from here. The None stands for max_frequency, which is unneeded in the actually useful models.
+	sample_dipoles = model.get_monte_carlo_dipole_inputs(
+		monte_carlo_count, None, rng_to_use=rng
+	)
+
+	current_sample = sample_dipoles
+	for di, low, high in zip(dot_inputs, lows, highs):
+
+		if len(current_sample) < 1:
+			break
+		vals = pdme.util.fast_v_calc.fast_vs_for_dipoleses(
+			numpy.array([di]), current_sample
+		)
+
+		current_sample = current_sample[numpy.all((vals > low) & (vals < high), axis=1)]
+
+	for pi, plow, phigh in zip(pair_inputs, pair_lows, pair_highs):
+		if len(current_sample) < 1:
+			break
+		vals = pdme.util.fast_nonlocal_spectrum.fast_s_nonlocal_dipoleses(
+			numpy.array([pi]), current_sample
+		)
+
+		current_sample = current_sample[
+			numpy.all(
+				((vals > plow) & (vals < phigh)) | ((vals < plow) & (vals > phigh)),
+				axis=1,
+			)
+		]
+	return len(current_sample)
+
+
+def get_a_result_fast_filter_potential_pair_phase_only(input) -> int:
+	(
+		model,
+		pair_inputs,
+		pair_phase_lows,
+		pair_phase_highs,
+		monte_carlo_count,
+		seed,
+	) = input
+
+	rng = numpy.random.default_rng(seed)
+	# TODO: A long term refactor is to pull the frequency stuff out from here. The None stands for max_frequency, which is unneeded in the actually useful models.
+	sample_dipoles = model.get_monte_carlo_dipole_inputs(
+		monte_carlo_count, None, rng_to_use=rng
+	)
+
+	current_sample = sample_dipoles
+
+	for pi, plow, phigh in zip(pair_inputs, pair_phase_lows, pair_phase_highs):
+		if len(current_sample) < 1:
+			break
+		vals = pdme.util.fast_nonlocal_spectrum.signarg(
+			pdme.util.fast_nonlocal_spectrum.fast_s_nonlocal_dipoleses(
+				numpy.array([pi]), current_sample
+			)
+		)
+
+		current_sample = current_sample[
+			numpy.all(
+				((vals > plow) & (vals < phigh)) | ((vals < plow) & (vals > phigh)),
+				axis=1,
+			)
+		]
+	return len(current_sample)
+
+
+def get_a_result_fast_filter_tarucha_spin_qubit_pair_phase_only(input) -> int:
+	(
+		model,
+		pair_inputs,
+		pair_phase_lows,
+		pair_phase_highs,
+		monte_carlo_count,
+		seed,
+	) = input
+
+	rng = numpy.random.default_rng(seed)
+	# TODO: A long term refactor is to pull the frequency stuff out from here. The None stands for max_frequency, which is unneeded in the actually useful models.
+	sample_dipoles = model.get_monte_carlo_dipole_inputs(
+		monte_carlo_count, None, rng_to_use=rng
+	)
+
+	current_sample = sample_dipoles
+
+	for pi, plow, phigh in zip(pair_inputs, pair_phase_lows, pair_phase_highs):
+		if len(current_sample) < 1:
+			break
+
+		###
+		# This should be  abstracted out, but we're going to dump it here for time pressure's sake
+		###
+		# vals = pdme.util.fast_nonlocal_spectrum.signarg(
+		# 	pdme.util.fast_nonlocal_spectrum.fast_s_nonlocal_dipoleses(
+		# 		numpy.array([pi]), current_sample
+		# 	)
+		#
+		vals = pdme.util.fast_nonlocal_spectrum.signarg(
+			pdme.util.fast_nonlocal_spectrum.fast_s_spin_qubit_tarucha_nonlocal_dipoleses(
+				numpy.array([pi]), current_sample
+			)
+		)
+		current_sample = current_sample[
+			numpy.all(
+				((vals > plow) & (vals < phigh)) | ((vals < plow) & (vals > phigh)),
+				axis=1,
+			)
+		]
+	return len(current_sample)
+
+
+def get_a_result_fast_filter(input) -> int:
+	model, dot_inputs, lows, highs, monte_carlo_count, seed = input
+
+	rng = numpy.random.default_rng(seed)
+	# TODO: A long term refactor is to pull the frequency stuff out from here. The None stands for max_frequency, which is unneeded in the actually useful models.
+	sample_dipoles = model.get_monte_carlo_dipole_inputs(
+		monte_carlo_count, None, rng_to_use=rng
+	)
+
+	current_sample = sample_dipoles
+	for di, low, high in zip(dot_inputs, lows, highs):
+
+		if len(current_sample) < 1:
+			break
+		vals = pdme.util.fast_v_calc.fast_vs_for_dipoleses(
+			numpy.array([di]), current_sample
+		)
+
+		current_sample = current_sample[numpy.all((vals > low) & (vals < high), axis=1)]
+	return len(current_sample)
+
+
+class RealSpectrumRun:
+	"""
+	A bayes run given some real data.
+
+	Parameters
+	----------
+	measurements : Sequence[pdme.measurement.DotRangeMeasurement]
+	The dot inputs for this bayes run.
+
+	models_with_names : Sequence[Tuple(str, pdme.model.DipoleModel)]
+	The models to evaluate.
+
+	actual_model : pdme.model.DipoleModel
+	The model which is actually correct.
+
+	filename_slug : str
+	The filename slug to include.
+
+	run_count: int
+	The number of runs to do.
+
+	If pair_measurements is not None, uses pair measurement method (and single measurements too).
+	If pair_phase_measurements is not None, ignores measurements and uses phase measurements _only_
+	This is lazy design on my part.
+
+	"""
+
+	def __init__(
+		self,
+		measurements: Sequence[pdme.measurement.DotRangeMeasurement],
+		models_with_names: Sequence[Tuple[str, pdme.model.DipoleModel]],
+		filename_slug: str,
+		monte_carlo_count: int = 10000,
+		monte_carlo_cycles: int = 10,
+		target_success: int = 100,
+		max_monte_carlo_cycles_steps: int = 10,
+		chunksize: int = CHUNKSIZE,
+		initial_seed: int = 12345,
+		cap_core_count: int = 0,
+		pair_measurements: Optional[
+			Sequence[pdme.measurement.DotPairRangeMeasurement]
+		] = None,
+		pair_phase_measurements: Optional[
+			Sequence[pdme.measurement.DotPairRangeMeasurement]
+		] = None,
+	) -> None:
+		self.measurements = measurements
+		self.dot_inputs = [(measure.r, measure.f) for measure in self.measurements]
+
+		self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
+			self.dot_inputs
+		)
+
+		if pair_measurements is not None:
+			self.pair_measurements = pair_measurements
+			self.use_pair_measurements = True
+			self.use_pair_phase_measurements = False
+
+			self.dot_pair_inputs = [
+				(measure.r1, measure.r2, measure.f)
+				for measure in self.pair_measurements
+			]
+			self.dot_pair_inputs_array = (
+				pdme.measurement.input_types.dot_pair_inputs_to_array(
+					self.dot_pair_inputs
+				)
+			)
+		elif pair_phase_measurements is not None:
+			self.use_pair_measurements = False
+			self.use_pair_phase_measurements = True
+			self.pair_phase_measurements = pair_phase_measurements
+			self.dot_pair_inputs = [
+				(measure.r1, measure.r2, measure.f)
+				for measure in self.pair_phase_measurements
+			]
+			self.dot_pair_inputs_array = (
+				pdme.measurement.input_types.dot_pair_inputs_to_array(
+					self.dot_pair_inputs
+				)
+			)
+		else:
+			self.use_pair_measurements = False
+			self.use_pair_phase_measurements = False
+
+		self.models = [model for (_, model) in models_with_names]
+		self.model_names = [name for (name, _) in models_with_names]
+		self.model_count = len(self.models)
+
+		self.monte_carlo_count = monte_carlo_count
+		self.monte_carlo_cycles = monte_carlo_cycles
+		self.target_success = target_success
+		self.max_monte_carlo_cycles_steps = max_monte_carlo_cycles_steps
+
+		self.csv_fields = []
+
+		self.compensate_zeros = True
+		self.chunksize = chunksize
+		for name in self.model_names:
+			self.csv_fields.extend([f"{name}_success", f"{name}_count", f"{name}_prob"])
+
+		# for now initialise priors as uniform.
+		self.probabilities = [1 / self.model_count] * self.model_count
+
+		timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
+
+		ff_string = "fast_filter"
+
+		self.filename = f"{timestamp}-{filename_slug}.realdata.{ff_string}.bayesrun.csv"
+		self.initial_seed = initial_seed
+
+		self.cap_core_count = cap_core_count
+
+	def go(self) -> None:
+		with open(self.filename, "a", newline="") as outfile:
+			writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
+			writer.writeheader()
+
+		(
+			lows,
+			highs,
+		) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
+			self.measurements
+		)
+
+		pair_lows = None
+		pair_highs = None
+		if self.use_pair_measurements:
+			(
+				pair_lows,
+				pair_highs,
+			) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
+				self.pair_measurements
+			)
+
+		pair_phase_lows = None
+		pair_phase_highs = None
+		if self.use_pair_phase_measurements:
+			(
+				pair_phase_lows,
+				pair_phase_highs,
+			) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
+				self.pair_phase_measurements
+			)
+
+		# define a new seed sequence for each run
+		seed_sequence = numpy.random.SeedSequence(self.initial_seed)
+
+		results = []
+		_logger.debug("Going to iterate over models now")
+		core_count = multiprocessing.cpu_count() - 1 or 1
+		if (self.cap_core_count >= 1) and (self.cap_core_count < core_count):
+			core_count = self.cap_core_count
+		_logger.info(f"Using {core_count} cores")
+		for model_count, (model, model_name) in enumerate(
+			zip(self.models, self.model_names)
+		):
+			_logger.debug(f"Doing model #{model_count}: {model_name}")
+			with multiprocessing.Pool(core_count) as pool:
+				cycle_count = 0
+				cycle_success = 0
+				cycles = 0
+				while (cycles < self.max_monte_carlo_cycles_steps) and (
+					cycle_success <= self.target_success
+				):
+					_logger.debug(f"Starting cycle {cycles}")
+					cycles += 1
+					current_success = 0
+					cycle_count += self.monte_carlo_count * self.monte_carlo_cycles
+
+					# generate a seed from the sequence for each core.
+					# note this needs to be inside the loop for monte carlo cycle steps!
+					# that way we get more stuff.
+					seeds = seed_sequence.spawn(self.monte_carlo_cycles)
+
+					if self.use_pair_measurements:
+						_logger.debug("using pair measurements")
+						current_success = sum(
+							pool.imap_unordered(
+								get_a_result_fast_filter_pairs,
+								[
+									(
+										model,
+										self.dot_inputs_array,
+										lows,
+										highs,
+										self.dot_pair_inputs_array,
+										pair_lows,
+										pair_highs,
+										self.monte_carlo_count,
+										seed,
+									)
+									for seed in seeds
+								],
+								self.chunksize,
+							)
+						)
+					elif self.use_pair_phase_measurements:
+						_logger.debug("using pair phase measurements")
+						_logger.debug("specifically using tarucha")
+						current_success = sum(
+							pool.imap_unordered(
+								get_a_result_fast_filter_tarucha_spin_qubit_pair_phase_only,
+								[
+									(
+										model,
+										self.dot_pair_inputs_array,
+										pair_phase_lows,
+										pair_phase_highs,
+										self.monte_carlo_count,
+										seed,
+									)
+									for seed in seeds
+								],
+								self.chunksize,
+							)
+						)
+					else:
+
+						current_success = sum(
+							pool.imap_unordered(
+								get_a_result_fast_filter,
+								[
+									(
+										model,
+										self.dot_inputs_array,
+										lows,
+										highs,
+										self.monte_carlo_count,
+										seed,
+									)
+									for seed in seeds
+								],
+								self.chunksize,
+							)
+						)
+
+					cycle_success += current_success
+					_logger.debug(f"current running successes: {cycle_success}")
+				results.append((cycle_count, cycle_success))
+
+		_logger.debug("Done, constructing output now")
+		row: Dict[str, Union[int, float, str]] = {}
+
+		successes: List[float] = []
+		counts: List[int] = []
+		for model_index, (name, (count, result)) in enumerate(
+			zip(self.model_names, results)
+		):
+
+			row[f"{name}_success"] = result
+			row[f"{name}_count"] = count
+			successes.append(max(result, 0.5))
+			counts.append(count)
+
+		success_weight = sum(
+			[
+				(succ / count) * prob
+				for succ, count, prob in zip(successes, counts, self.probabilities)
+			]
+		)
+		new_probabilities = [
+			(succ / count) * old_prob / success_weight
+			for succ, count, old_prob in zip(successes, counts, self.probabilities)
+		]
+		self.probabilities = new_probabilities
+		for name, probability in zip(self.model_names, self.probabilities):
+			row[f"{name}_prob"] = probability
+		_logger.info(row)
+
+		with open(self.filename, "a", newline="") as outfile:
+			writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
+			writer.writerow(row)
--- a/deepdog/results/init.py
+++ b/deepdog/results/init.py
@ -0,0 +1,166 @@
+import dataclasses
+import re
+import typing
+import logging
+import deepdog.indexify
+import pathlib
+import csv
+from deepdog.results.read_csv import (
+	parse_bayesrun_row,
+	BayesrunModelResult,
+	parse_general_row,
+	GeneralModelResult,
+)
+from deepdog.results.filename import parse_file_slug
+
+_logger = logging.getLogger(__name__)
+
+FILENAME_REGEX = re.compile(
+	r"(?P<timestamp>\d{8}-\d{6})-(?P<filename_slug>.*)\.realdata\.fast_filter\.bayesrun\.csv"
+)
+
+# probably a better way but who cares
+NO_TIMESTAMP_FILENAME_REGEX = re.compile(
+	r"(?P<filename_slug>.*)\.realdata\.fast_filter\.bayesrun\.csv"
+)
+
+
+SUBSET_SIM_FILENAME_REGEX = re.compile(
+	r"(?P<filename_slug>.*)-(?:no_adaptive_steps_)?(?P<num_ss_runs>\d+)-nc_(?P<n_c>\d+)-ns_(?P<n_s>\d+)-mmax_(?P<mmax>\d+)\.multi\.subsetsim\.csv"
+)
+
+
+@dataclasses.dataclass
+class BayesrunOutputFilename:
+	timestamp: typing.Optional[str]
+	filename_slug: str
+	path: pathlib.Path
+
+
+@dataclasses.dataclass
+class BayesrunOutput:
+	filename: BayesrunOutputFilename
+	data: typing.Dict["str", typing.Any]
+	results: typing.Sequence[BayesrunModelResult]
+
+
+@dataclasses.dataclass
+class GeneralOutput:
+	filename: BayesrunOutputFilename
+	data: typing.Dict["str", typing.Any]
+	results: typing.Sequence[GeneralModelResult]
+
+
+def _parse_string_output_filename(
+	filename: str,
+) -> typing.Tuple[typing.Optional[str], str]:
+	if match := FILENAME_REGEX.match(filename):
+		groups = match.groupdict()
+		return (groups["timestamp"], groups["filename_slug"])
+	elif match := NO_TIMESTAMP_FILENAME_REGEX.match(filename):
+		groups = match.groupdict()
+		return (None, groups["filename_slug"])
+	else:
+		raise ValueError(f"Could not parse {filename} as a bayesrun output filename")
+
+
+def _parse_output_filename(file: pathlib.Path) -> BayesrunOutputFilename:
+	filename = file.name
+	timestamp, slug = _parse_string_output_filename(filename)
+	return BayesrunOutputFilename(timestamp=timestamp, filename_slug=slug, path=file)
+
+
+def _parse_ss_output_filename(file: pathlib.Path) -> BayesrunOutputFilename:
+	filename = file.name
+	match = SUBSET_SIM_FILENAME_REGEX.match(filename)
+	if not match:
+		raise ValueError(f"{filename} was not a valid subset sim output")
+	groups = match.groupdict()
+	return BayesrunOutputFilename(
+		filename_slug=groups["filename_slug"], path=file, timestamp=None
+	)
+
+
+def read_subset_sim_file(
+	file: pathlib.Path, indexifier: typing.Optional[deepdog.indexify.Indexifier]
+) -> GeneralOutput:
+
+	parsed_filename = tag = _parse_ss_output_filename(file)
+	out = GeneralOutput(filename=parsed_filename, data={}, results=[])
+
+	out.data.update(dataclasses.asdict(tag))
+	parsed_tag = parse_file_slug(parsed_filename.filename_slug)
+	if parsed_tag is None:
+		_logger.warning(
+			f"Could not parse {tag} against any matching regexes. Going to skip tag parsing"
+		)
+	else:
+		out.data.update(parsed_tag)
+		if indexifier is not None:
+			try:
+				job_index = parsed_tag["job_index"]
+				indexified = indexifier.indexify(int(job_index))
+				out.data.update(indexified)
+			except KeyError:
+				# This isn't really that important of an error, apart from the warning
+				_logger.warning(
+					f"Parsed tag to {parsed_tag}, and attempted to indexify but no job_index key was found. skipping and moving on"
+				)
+
+	with file.open() as input_file:
+		reader = csv.DictReader(input_file)
+		rows = [r for r in reader]
+		if len(rows) == 1:
+			row = rows[0]
+		else:
+			raise ValueError(f"Confused about having multiple rows in {file.name}")
+	results = parse_general_row(
+		row, ("num_finished_runs", "num_runs", None, "estimated_likelihood")
+	)
+
+	out.results = results
+
+	return out
+
+
+def read_output_file(
+	file: pathlib.Path, indexifier: typing.Optional[deepdog.indexify.Indexifier]
+) -> BayesrunOutput:
+
+	parsed_filename = tag = _parse_output_filename(file)
+	out = BayesrunOutput(filename=parsed_filename, data={}, results=[])
+
+	out.data.update(dataclasses.asdict(tag))
+	parsed_tag = parse_file_slug(parsed_filename.filename_slug)
+	if parsed_tag is None:
+		_logger.warning(
+			f"Could not parse {tag} against any matching regexes. Going to skip tag parsing"
+		)
+	else:
+		out.data.update(parsed_tag)
+		if indexifier is not None:
+			try:
+				job_index = parsed_tag["job_index"]
+				indexified = indexifier.indexify(int(job_index))
+				out.data.update(indexified)
+			except KeyError:
+				# This isn't really that important of an error, apart from the warning
+				_logger.warning(
+					f"Parsed tag to {parsed_tag}, and attempted to indexify but no job_index key was found. skipping and moving on"
+				)
+
+	with file.open() as input_file:
+		reader = csv.DictReader(input_file)
+		rows = [r for r in reader]
+		if len(rows) == 1:
+			row = rows[0]
+		else:
+			raise ValueError(f"Confused about having multiple rows in {file.name}")
+	results = parse_bayesrun_row(row)
+
+	out.results = results
+
+	return out
+
+
+__all__ = ["read_output_file", "BayesrunOutput"]
--- a/deepdog/results/filename.py
+++ b/deepdog/results/filename.py
@ -0,0 +1,22 @@
+import re
+import typing
+
+
+FILE_SLUG_REGEXES = [
+	re.compile(pattern)
+	for pattern in [
+		r"(?P<tag>\w+)-(?P<job_index>\d+)",
+		r"mock_tarucha-(?P<job_index>\d+)",
+		r"(?:(?P<mock>mock)_)?tarucha(?:_(?P<tarucha_run_id>\d+))?-(?P<job_index>\d+)",
+		r"(?P<tag>\w+)-(?P<included_dots>[\w,]+)-(?P<target_cost>\d*\.?\d+)-(?P<job_index>\d+)",
+	]
+]
+
+
+def parse_file_slug(slug: str) -> typing.Optional[typing.Dict[str, str]]:
+	for pattern in FILE_SLUG_REGEXES:
+		match = pattern.match(slug)
+		if match:
+			return match.groupdict()
+	else:
+		return None
--- a/deepdog/results/read_csv.py
+++ b/deepdog/results/read_csv.py
@ -0,0 +1,141 @@
+import typing
+import re
+import dataclasses
+
+MODEL_REGEXES = [
+	re.compile(pattern)
+	for pattern in [
+		r"geom_(?P<xmin>-?\d+)_(?P<xmax>-?\d+)_(?P<ymin>-?\d+)_(?P<ymax>-?\d+)_(?P<zmin>-?\d+)_(?P<zmax>-?\d+)-orientation_(?P<orientation>free|fixedxy|fixedz)-dipole_count_(?P<avg_filled>\d+)_(?P<field_name>\w*)",
+		r"geom_(?P<xmin>-?\d+)_(?P<xmax>-?\d+)_(?P<ymin>-?\d+)_(?P<ymax>-?\d+)_(?P<zmin>-?\d+)_(?P<zmax>-?\d+)-magnitude_(?P<log_magnitude>\d*\.?\d+)-orientation_(?P<orientation>free|fixedxy|fixedz)-dipole_count_(?P<avg_filled>\d+)_(?P<field_name>\w*)",
+		r"geom_(?P<xmin>-?\d*\.?\d+)_(?P<xmax>-?\d*\.?\d+)_(?P<ymin>-?\d*\.?\d+)_(?P<ymax>-?\d*\.?\d+)_(?P<zmin>-?\d*\.?\d+)_(?P<zmax>-?\d*\.?\d+)-magnitude_(?P<log_magnitude>\d*\.?\d+)-orientation_(?P<orientation>free|fixedxy|fixedz)-dipole_count_(?P<avg_filled>\d+)_(?P<field_name>\w*)",
+		r"geom_(?P<xmin>-?\d+)_(?P<xmax>-?\d+)_(?P<ymin>-?\d+)_(?P<ymax>-?\d+)_(?P<zmin>-?\d+)_(?P<zmax>-?\d+)-magnitude_(?P<log_magnitude>-?\d*\.?\d+)-orientation_(?P<orientation>free|fixedxy|fixedz)-dipole_count_(?P<avg_filled>\d+)_(?P<field_name>\w*)",
+		r"geom_(?P<xmin>-?\d*\.?\d+)_(?P<xmax>-?\d*\.?\d+)_(?P<ymin>-?\d*\.?\d+)_(?P<ymax>-?\d*\.?\d+)_(?P<zmin>-?\d*\.?\d+)_(?P<zmax>-?\d*\.?\d+)-magnitude_(?P<log_magnitude>-?\d*\.?\d+)-orientation_(?P<orientation>free|fixedxy|fixedz)-dipole_count_(?P<avg_filled>\d+)_(?P<field_name>\w*)",
+	]
+]
+
+
+@dataclasses.dataclass
+class BayesrunModelResult:
+	parsed_model_keys: typing.Dict[str, str]
+	success: int
+	count: int
+
+
+@dataclasses.dataclass
+class GeneralModelResult:
+	parsed_model_keys: typing.Dict[str, str]
+	result_dict: typing.Dict[str, str]
+
+
+class BayesrunColumnParsed:
+	"""
+	class for parsing a bayesrun while pulling certain special fields out
+	"""
+
+	def __init__(self, groupdict: typing.Dict[str, str]):
+		self.column_field = groupdict["field_name"]
+		self.model_field_dict = {
+			k: v for k, v in groupdict.items() if k != "field_name"
+		}
+		self._groupdict_str = repr(groupdict)
+
+	def __str__(self):
+		return f"BayesrunColumnParsed[{self.column_field}: {self.model_field_dict}]"
+
+	def __repr__(self):
+		return f"BayesrunColumnParsed({self._groupdict_str})"
+
+	def __eq__(self, other):
+		if isinstance(other, BayesrunColumnParsed):
+			return (self.column_field == other.column_field) and (
+				self.model_field_dict == other.model_field_dict
+			)
+		return NotImplemented
+
+
+def _parse_bayesrun_column(
+	column: str,
+) -> typing.Optional[BayesrunColumnParsed]:
+	"""
+	Tries one by one all of a predefined list of regexes that I might have used in the past.
+	Returns the groupdict for the first match, or None if no match found.
+	"""
+	for pattern in MODEL_REGEXES:
+		match = pattern.match(column)
+		if match:
+			return BayesrunColumnParsed(match.groupdict())
+	else:
+		return None
+
+
+def _batch_iterable_into_chunks(iterable, n=1):
+	"""
+	utility for batching bayesrun files where columns appear in threes
+	"""
+	for ndx in range(0, len(iterable), n):
+		yield iterable[ndx : min(ndx + n, len(iterable))]
+
+
+def parse_general_row(
+	row: typing.Dict[str, str],
+	expected_fields: typing.Sequence[typing.Optional[str]],
+) -> typing.Sequence[GeneralModelResult]:
+	results = []
+	batched_keys = _batch_iterable_into_chunks(list(row.keys()), len(expected_fields))
+	for model_keys in batched_keys:
+		parsed = [_parse_bayesrun_column(column) for column in model_keys]
+		values = [row[column] for column in model_keys]
+
+		result_dict = {}
+		parsed_keys = None
+		for expected_field, parsed_field, value in zip(expected_fields, parsed, values):
+			if expected_field is None:
+				continue
+			if parsed_field is None:
+				raise ValueError(
+					f"No viable row found for {expected_field=} in {model_keys=}"
+				)
+			if parsed_field.column_field != expected_field:
+				raise ValueError(
+					f"The column {parsed_field.column_field} does not match expected {expected_field}"
+				)
+			result_dict[expected_field] = value
+			if parsed_keys is None:
+				parsed_keys = parsed_field.model_field_dict
+
+		if parsed_keys is None:
+			raise ValueError(f"Somehow parsed keys is none here, for {row=}")
+		results.append(
+			GeneralModelResult(parsed_model_keys=parsed_keys, result_dict=result_dict)
+		)
+	return results
+
+
+def parse_bayesrun_row(
+	row: typing.Dict[str, str],
+) -> typing.Sequence[BayesrunModelResult]:
+
+	results = []
+	batched_keys = _batch_iterable_into_chunks(list(row.keys()), 3)
+	for model_keys in batched_keys:
+		parsed = [_parse_bayesrun_column(column) for column in model_keys]
+		values = [row[column] for column in model_keys]
+		if parsed[0] is None:
+			raise ValueError(f"no viable success row found for keys {model_keys}")
+		if parsed[1] is None:
+			raise ValueError(f"no viable count row found for keys {model_keys}")
+		if parsed[0].column_field != "success":
+			raise ValueError(f"The column {model_keys[0]} is not a success field")
+		if parsed[1].column_field != "count":
+			raise ValueError(f"The column {model_keys[1]} is not a count field")
+		parsed_keys = parsed[0].model_field_dict
+		success = int(values[0])
+		count = int(values[1])
+		results.append(
+			BayesrunModelResult(
+				parsed_model_keys=parsed_keys,
+				success=success,
+				count=count,
+			)
+		)
+	return results
--- a/deepdog/subset_simulation/init.py
+++ b/deepdog/subset_simulation/init.py
@ -0,0 +1,3 @@
+from deepdog.subset_simulation.subset_simulation_impl import SubsetSimulation
+
+__all__ = ["SubsetSimulation"]
--- a/deepdog/subset_simulation/subset_simulation_impl.py
+++ b/deepdog/subset_simulation/subset_simulation_impl.py
@ -0,0 +1,623 @@
+import logging
+import multiprocessing
+import numpy
+import pdme.measurement
+import pdme.measurement.input_types
+import pdme.model
+import pdme.subspace_simulation
+from typing import Sequence, Tuple, Optional, Callable, Union, List
+
+from dataclasses import dataclass
+
+_logger = logging.getLogger(__name__)
+
+
+@dataclass
+class SubsetSimulationResult:
+	probs_list: Sequence[Tuple]
+	over_target_cost: Optional[float]
+	over_target_likelihood: Optional[float]
+	under_target_cost: Optional[float]
+	under_target_likelihood: Optional[float]
+	lowest_likelihood: Optional[float]
+	messages: Sequence[str]
+
+
+@dataclass
+class MultiSubsetSimulationResult:
+	child_results: Sequence[SubsetSimulationResult]
+	model_name: str
+	estimated_likelihood: float
+	arithmetic_mean_estimated_likelihood: float
+	num_children: int
+	num_finished_children: int
+	clean_estimate: bool
+
+
+class SubsetSimulation:
+	def __init__(
+		self,
+		model_name_pair,
+		# actual_measurements: Sequence[pdme.measurement.DotMeasurement],
+		cost_function: Callable[[numpy.ndarray], numpy.ndarray],
+		n_c: int,
+		n_s: int,
+		m_max: int,
+		target_cost: Optional[float] = None,
+		level_0_seed: Union[int, Sequence[int]] = 200,
+		mcmc_seed: Union[int, Sequence[int]] = 20,
+		use_adaptive_steps=True,
+		default_phi_step=0.01,
+		default_theta_step=0.01,
+		default_r_step=0.01,
+		default_w_log_step=0.01,
+		default_upper_w_log_step=4,
+		num_initial_dmc_gens=1,
+		keep_probs_list=True,
+		dump_last_generation_to_file=False,
+		initial_cost_chunk_size=100,
+		initial_cost_multiprocess=True,
+		cap_core_count: int = 0,  # 0 means cap at num cores - 1
+	):
+		name, model = model_name_pair
+		self.model_name = name
+		self.model = model
+		_logger.info(f"got model {self.model_name}")
+
+		# dot_inputs = [(meas.r, meas.f) for meas in actual_measurements]
+		# self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
+		# 	dot_inputs
+		# )
+		# _logger.debug(f"actual measurements: {actual_measurements}")
+		# self.actual_measurement_array = numpy.array([m.v for m in actual_measurements])
+
+		# def cost_function_to_use(dipoles_to_test):
+		# 	return pdme.subspace_simulation.proportional_costs_vs_actual_measurement(
+		# 		self.dot_inputs_array, self.actual_measurement_array, dipoles_to_test
+		# 	)
+
+		self.cost_function_to_use = cost_function
+
+		self.n_c = n_c
+		self.n_s = n_s
+		self.m_max = m_max
+
+		self.level_0_seed = level_0_seed
+		self.mcmc_seed = mcmc_seed
+
+		self.use_adaptive_steps = use_adaptive_steps
+		self.default_phi_step = (
+			default_phi_step * 1.73
+		)  # this is a hack to fix a missing sqrt 3 in the proposal function code.
+		self.default_theta_step = default_theta_step
+		self.default_r_step = (
+			default_r_step * 1.73
+		)  # this is a hack to fix a missing sqrt 3 in the proposal function code.
+		self.default_w_log_step = (
+			default_w_log_step * 1.73
+		)  # this is a hack to fix a missing sqrt 3 in the proposal function code.
+		self.default_upper_w_log_step = default_upper_w_log_step
+
+		_logger.info("using params:")
+		_logger.info(f"\tn_c: {self.n_c}")
+		_logger.info(f"\tn_s: {self.n_s}")
+		_logger.info(f"\tm: {self.m_max}")
+		_logger.info(f"\t{num_initial_dmc_gens=}")
+		_logger.info(f"\t{mcmc_seed=}")
+		_logger.info(f"\t{level_0_seed=}")
+		_logger.info("let's do level 0...")
+
+		self.target_cost = target_cost
+		_logger.info(f"will stop at target cost {target_cost}")
+
+		self.keep_probs_list = keep_probs_list
+		self.dump_last_generations = dump_last_generation_to_file
+
+		self.initial_cost_chunk_size = initial_cost_chunk_size
+		self.initial_cost_multiprocess = initial_cost_multiprocess
+
+		self.cap_core_count = cap_core_count
+
+		self.num_dmc_gens = num_initial_dmc_gens
+
+	def _single_chain_gen(self, args: Tuple):
+		threshold_cost, stdevs, rng_seed, (c, s) = args
+		rng = numpy.random.default_rng(rng_seed)
+		return self.model.get_repeat_counting_mcmc_chain(
+			s,
+			self.cost_function_to_use,
+			self.n_s,
+			threshold_cost,
+			stdevs,
+			initial_cost=c,
+			rng_arg=rng,
+		)
+
+	def execute(self) -> SubsetSimulationResult:
+
+		probs_list = []
+
+		output_messages = []
+
+		# If we have n_s = 10 and n_c = 100, then our big N = 1000 and p = 1/10
+		# The DMC stage would normally generate 1000, then pick the best 100 and start counting prob = p/10.
+		# Let's say we want our DMC stage to go down to level 2.
+		# Then we need to filter out p^2, so our initial has to be N_0 = N / p = n_c * n_s^2
+		initial_dmc_n = self.n_c * (self.n_s**self.num_dmc_gens)
+		initial_level = (
+			self.num_dmc_gens - 1
+		)  # This is perfunctory but let's label it here really explicitly
+		_logger.info(f"Generating {initial_dmc_n} for DMC stage")
+		sample_dipoles = self.model.get_monte_carlo_dipole_inputs(
+			initial_dmc_n,
+			-1,
+			rng_to_use=numpy.random.default_rng(self.level_0_seed),
+		)
+		# _logger.debug(sample_dipoles)
+		# _logger.debug(sample_dipoles.shape)
+
+		_logger.debug("Finished dipole generation")
+		_logger.debug(
+			f"Using iterated multiprocessing cost function thing with chunk size {self.initial_cost_chunk_size}"
+		)
+
+		# core count etc. logic here
+		core_count = multiprocessing.cpu_count() - 1 or 1
+		if (self.cap_core_count >= 1) and (self.cap_core_count < core_count):
+			core_count = self.cap_core_count
+		_logger.info(f"Using {core_count} cores")
+
+		with multiprocessing.Pool(core_count) as pool:
+
+			# Do the initial DMC calculation in a multiprocessing
+
+			chunks = numpy.array_split(
+				sample_dipoles,
+				range(
+					self.initial_cost_chunk_size,
+					len(sample_dipoles),
+					self.initial_cost_chunk_size,
+				),
+			)
+			if self.initial_cost_multiprocess:
+				_logger.debug("Multiprocessing initial costs")
+				raw_costs = pool.map(self.cost_function_to_use, chunks)
+			else:
+				_logger.debug("Single process initial costs")
+				raw_costs = []
+				for chunk_idx, chunk in enumerate(chunks):
+					_logger.debug(f"doing chunk #{chunk_idx}")
+					raw_costs.append(self.cost_function_to_use(chunk))
+			costs = numpy.concatenate(raw_costs)
+			_logger.debug("finished initial dmc cost calculation")
+			# _logger.debug(f"costs: {costs}")
+			sorted_indexes = costs.argsort()[::-1]
+
+			# _logger.debug(costs[sorted_indexes])
+			# _logger.debug(sample_dipoles[sorted_indexes])
+
+			sorted_costs = costs[sorted_indexes]
+			sorted_dipoles = sample_dipoles[sorted_indexes]
+
+			all_dipoles = numpy.array(
+				[
+					pdme.subspace_simulation.sort_array_of_dipoles_by_frequency(samp)
+					for samp in sorted_dipoles
+				]
+			)
+			all_chains = list(zip(sorted_costs, all_dipoles))
+			for dmc_level in range(initial_level):
+				# if initial level is 1, we want to print out what the level 0 threshold would have been?
+				_logger.debug(f"Get the pseudo statistics for level {dmc_level}")
+				_logger.debug(f"Whole chain has length {len(all_chains)}")
+				pseudo_threshold_index = -(
+					self.n_c * (self.n_s ** (self.num_dmc_gens - dmc_level - 1))
+				)
+				_logger.debug(
+					f"Have a pseudo_threshold_index of {pseudo_threshold_index}, or {len(all_chains) + pseudo_threshold_index}"
+				)
+				pseudo_threshold_cost = all_chains[-pseudo_threshold_index][0]
+				_logger.info(
+					f"Pseudo-level {dmc_level} threshold cost {pseudo_threshold_cost}, at P = (1 / {self.n_s})^{dmc_level + 1}"
+				)
+				all_chains = all_chains[pseudo_threshold_index:]
+
+			long_mcmc_rng = numpy.random.default_rng(self.mcmc_seed)
+			mcmc_rng_seed_sequence = numpy.random.SeedSequence(self.mcmc_seed)
+
+			threshold_cost = all_chains[-self.n_c][0]
+			_logger.info(
+				f"Finishing DMC threshold cost {threshold_cost} at level {initial_level}, at P = (1 / {self.n_s})^{initial_level + 1}"
+			)
+			_logger.debug(f"Executing the MCMC with chains of length {len(all_chains)}")
+
+			# Now we move on to the MCMC part of the algorithm
+
+			# This is important, we want to allow some extra initial levels so we need to account for that here!
+			for i in range(self.num_dmc_gens, self.m_max):
+				_logger.info(f"Starting level {i}")
+				next_seeds = all_chains[-self.n_c :]
+
+				if self.dump_last_generations:
+					_logger.info("writing out csv file")
+					next_dipoles_seed_dipoles = numpy.array([n[1] for n in next_seeds])
+					for n in range(self.model.n):
+						_logger.info(f"{next_dipoles_seed_dipoles[:, n].shape}")
+						numpy.savetxt(
+							f"generation_{self.n_c}_{self.n_s}_{i}_dipole_{n}.csv",
+							next_dipoles_seed_dipoles[:, n],
+							delimiter=",",
+						)
+
+					next_seeds_as_array = numpy.array([s for _, s in next_seeds])
+					stdevs = self.get_stdevs_from_arrays(next_seeds_as_array)
+					_logger.info(f"got stdevs: {stdevs.stdevs}")
+					all_long_chains = []
+					for seed_index, (c, s) in enumerate(
+						next_seeds[:: len(next_seeds) // 20]
+					):
+						# chain = mcmc(s, threshold_cost, n_s, model, dot_inputs_array, actual_measurement_array, mcmc_rng, curr_cost=c, stdevs=stdevs)
+						# until new version gotta do
+						_logger.debug(
+							f"\t{seed_index}: doing long chain on the next seed"
+						)
+
+						long_chain = self.model.get_mcmc_chain(
+							s,
+							self.cost_function_to_use,
+							1000,
+							threshold_cost,
+							stdevs,
+							initial_cost=c,
+							rng_arg=long_mcmc_rng,
+						)
+						for _, chained in long_chain:
+							all_long_chains.append(chained)
+					all_long_chains_array = numpy.array(all_long_chains)
+					for n in range(self.model.n):
+						_logger.info(f"{all_long_chains_array[:, n].shape}")
+						numpy.savetxt(
+							f"long_chain_generation_{self.n_c}_{self.n_s}_{i}_dipole_{n}.csv",
+							all_long_chains_array[:, n],
+							delimiter=",",
+						)
+
+				if self.keep_probs_list:
+					for cost_index, cost_chain in enumerate(all_chains[: -self.n_c]):
+						probs_list.append(
+							(
+								(
+									(self.n_c * self.n_s - cost_index)
+									/ (self.n_c * self.n_s)
+								)
+								/ (self.n_s ** (i)),
+								cost_chain[0],
+								i + 1,
+							)
+						)
+
+				next_seeds_as_array = numpy.array([s for _, s in next_seeds])
+
+				stdevs = self.get_stdevs_from_arrays(next_seeds_as_array)
+				_logger.debug(f"got stdevs, begin: {stdevs.stdevs[:10]}")
+				_logger.debug("Starting the MCMC")
+				all_chains = []
+
+				seeds = mcmc_rng_seed_sequence.spawn(len(next_seeds))
+				pool_results = pool.imap_unordered(
+					self._single_chain_gen,
+					[
+						(threshold_cost, stdevs, rng_seed, test_seed)
+						for rng_seed, test_seed in zip(seeds, next_seeds)
+					],
+					chunksize=50,
+				)
+
+				# count for ergodicity analysis
+				samples_generated = 0
+				samples_rejected = 0
+
+				for rejected_count, chain in pool_results:
+					for cost, chained in chain:
+						try:
+							filtered_cost = cost[0]
+						except (IndexError, TypeError):
+							filtered_cost = cost
+						all_chains.append((filtered_cost, chained))
+
+					samples_generated += self.n_s
+					samples_rejected += rejected_count
+
+				_logger.debug("finished mcmc")
+				_logger.debug(f"{samples_rejected=} out of {samples_generated=}")
+				if samples_rejected * 2 > samples_generated:
+					reject_ratio = samples_rejected / samples_generated
+					rejectionmessage = f"On level {i}, rejected {samples_rejected} out of {samples_generated}, {reject_ratio=} is too high and may indicate ergodicity problems"
+					output_messages.append(rejectionmessage)
+					_logger.warning(rejectionmessage)
+				# _logger.debug(all_chains)
+
+				all_chains.sort(key=lambda c: c[0], reverse=True)
+				_logger.debug("finished sorting all_chains")
+
+				threshold_cost = all_chains[-self.n_c][0]
+				_logger.info(
+					f"current threshold cost: {threshold_cost}, at P = (1 / {self.n_s})^{i + 1}"
+				)
+				if (self.target_cost is not None) and (
+					threshold_cost < self.target_cost
+				):
+					_logger.info(
+						f"got a threshold cost {threshold_cost}, less than {self.target_cost}. will leave early"
+					)
+
+					cost_list = [c[0] for c in all_chains]
+					over_index = reverse_bisect_right(cost_list, self.target_cost)
+
+					winner = all_chains[over_index][1]
+					_logger.info(f"Winner obtained: {winner}")
+					shorter_probs_list = []
+					for cost_index, cost_chain in enumerate(all_chains):
+						if self.keep_probs_list:
+							probs_list.append(
+								(
+									(
+										(self.n_c * self.n_s - cost_index)
+										/ (self.n_c * self.n_s)
+									)
+									/ (self.n_s ** (i)),
+									cost_chain[0],
+									i + 1,
+								)
+							)
+						shorter_probs_list.append(
+							(
+								cost_chain[0],
+								(
+									(self.n_c * self.n_s - cost_index)
+									/ (self.n_c * self.n_s)
+								)
+								/ (self.n_s ** (i)),
+							)
+						)
+					# _logger.info(shorter_probs_list)
+					result = SubsetSimulationResult(
+						probs_list=probs_list,
+						over_target_cost=shorter_probs_list[over_index - 1][0],
+						over_target_likelihood=shorter_probs_list[over_index - 1][1],
+						under_target_cost=shorter_probs_list[over_index][0],
+						under_target_likelihood=shorter_probs_list[over_index][1],
+						lowest_likelihood=shorter_probs_list[-1][1],
+						messages=output_messages,
+					)
+					return result
+
+				# _logger.debug([c[0] for c in all_chains[-n_c:]])
+				_logger.info(f"doing level {i + 1}")
+
+		if self.keep_probs_list:
+			for cost_index, cost_chain in enumerate(all_chains):
+				probs_list.append(
+					(
+						((self.n_c * self.n_s - cost_index) / (self.n_c * self.n_s))
+						/ (self.n_s ** (self.m_max)),
+						cost_chain[0],
+						self.m_max + 1,
+					)
+				)
+		threshold_cost = all_chains[-self.n_c][0]
+		_logger.info(
+			f"final threshold cost: {threshold_cost}, at P = (1 / {self.n_s})^{self.m_max + 1}"
+		)
+		# for a in all_chains[-10:]:
+		# 	_logger.info(a)
+		# for prob, prob_cost in probs_list:
+		# 	_logger.info(f"\t{prob}: {prob_cost}")
+		probs_list.sort(key=lambda c: c[0], reverse=True)
+
+		min_likelihood = ((1) / (self.n_c * self.n_s)) / (self.n_s ** (self.m_max))
+
+		result = SubsetSimulationResult(
+			probs_list=probs_list,
+			over_target_cost=None,
+			over_target_likelihood=None,
+			under_target_cost=None,
+			under_target_likelihood=None,
+			lowest_likelihood=min_likelihood,
+			messages=output_messages,
+		)
+		return result
+
+	def get_stdevs_from_arrays(
+		self, array
+	) -> pdme.subspace_simulation.MCMCStandardDeviation:
+		# stdevs = get_stdevs_from_arrays(next_seeds_as_array, model)
+		if self.use_adaptive_steps:
+
+			stdev_array = []
+			count = array.shape[1]
+			for dipole_index in range(count):
+				selected = array[:, dipole_index]
+				pxs = selected[:, 0]
+				pys = selected[:, 1]
+				pzs = selected[:, 2]
+				thetas = numpy.arccos(pzs / self.model.pfixed)
+				phis = numpy.arctan2(pys, pxs)
+
+				rstdevs = numpy.maximum(
+					numpy.std(selected, axis=0)[3:6],
+					self.default_r_step / (self.n_s * 10),
+				)
+				frequency_stdevs = numpy.minimum(
+					numpy.maximum(
+						numpy.std(numpy.log(selected[:, -1])),
+						self.default_w_log_step / (self.n_s * 10),
+					),
+					self.default_upper_w_log_step,
+				)
+				stdev_array.append(
+					pdme.subspace_simulation.DipoleStandardDeviation(
+						p_theta_step=max(
+							numpy.std(thetas), self.default_theta_step / (self.n_s * 10)
+						),
+						p_phi_step=max(
+							numpy.std(phis), self.default_phi_step / (self.n_s * 10)
+						),
+						rx_step=rstdevs[0],
+						ry_step=rstdevs[1],
+						rz_step=rstdevs[2],
+						w_log_step=frequency_stdevs,
+					)
+				)
+		else:
+			default_stdev = pdme.subspace_simulation.DipoleStandardDeviation(
+				self.default_phi_step,
+				self.default_theta_step,
+				self.default_r_step,
+				self.default_r_step,
+				self.default_r_step,
+				self.default_w_log_step,
+			)
+			stdev_array = [default_stdev]
+		stdevs = pdme.subspace_simulation.MCMCStandardDeviation(stdev_array)
+		return stdevs
+
+
+class MultiSubsetSimulations:
+	def __init__(
+		self,
+		model_name_pairs: Sequence[Tuple[str, pdme.model.DipoleModel]],
+		# actual_measurements: Sequence[pdme.measurement.DotMeasurement],
+		cost_function: Callable[[numpy.ndarray], numpy.ndarray],
+		num_runs: int,
+		n_c: int,
+		n_s: int,
+		m_max: int,
+		target_cost: float,
+		num_initial_dmc_gens: int = 1,
+		level_0_seed_seed: int = 200,
+		mcmc_seed_seed: int = 20,
+		use_adaptive_steps=True,
+		default_phi_step=0.01,
+		default_theta_step=0.01,
+		default_r_step=0.01,
+		default_w_log_step=0.01,
+		default_upper_w_log_step=4,
+		initial_cost_chunk_size=100,
+		cap_core_count: int = 0,  # 0 means cap at num cores - 1
+	):
+		self.model_name_pairs = model_name_pairs
+		self.cost_function = cost_function
+		self.num_runs = num_runs
+		self.n_c = n_c
+		self.n_s = n_s
+		self.m_max = m_max
+		self.target_cost = target_cost  # This is not optional here!
+
+		self.num_dmc_gens = num_initial_dmc_gens
+
+		self.level_0_seed_seed = level_0_seed_seed
+		self.mcmc_seed_seed = mcmc_seed_seed
+
+		self.use_adaptive_steps = use_adaptive_steps
+		self.default_phi_step = default_phi_step
+		self.default_theta_step = default_theta_step
+		self.default_r_step = default_r_step
+		self.default_w_log_step = default_w_log_step
+		self.default_upper_w_log_step = default_upper_w_log_step
+		self.initial_cost_chunk_size = initial_cost_chunk_size
+		self.cap_core_count = cap_core_count
+
+	def execute(self) -> Sequence[MultiSubsetSimulationResult]:
+		output: List[MultiSubsetSimulationResult] = []
+		for model_index, model_name_pair in enumerate(self.model_name_pairs):
+			ss_results = [
+				SubsetSimulation(
+					model_name_pair,
+					self.cost_function,
+					self.n_c,
+					self.n_s,
+					self.m_max,
+					self.target_cost,
+					num_initial_dmc_gens=self.num_dmc_gens,
+					level_0_seed=[model_index, run_index, self.level_0_seed_seed],
+					mcmc_seed=[model_index, run_index, self.mcmc_seed_seed],
+					use_adaptive_steps=self.use_adaptive_steps,
+					default_phi_step=self.default_phi_step,
+					default_theta_step=self.default_theta_step,
+					default_r_step=self.default_r_step,
+					default_w_log_step=self.default_w_log_step,
+					default_upper_w_log_step=self.default_upper_w_log_step,
+					keep_probs_list=False,
+					dump_last_generation_to_file=False,
+					initial_cost_chunk_size=self.initial_cost_chunk_size,
+					cap_core_count=self.cap_core_count,
+				).execute()
+				for run_index in range(self.num_runs)
+			]
+			output.append(coalesce_ss_results(model_name_pair[0], ss_results))
+		return output
+
+
+def coalesce_ss_results(
+	model_name: str, results: Sequence[SubsetSimulationResult]
+) -> MultiSubsetSimulationResult:
+
+	num_finished = sum(1 for res in results if res.under_target_likelihood is not None)
+
+	estimated_likelihoods = numpy.array(
+		[
+			res.under_target_likelihood
+			if res.under_target_likelihood is not None
+			else res.lowest_likelihood
+			for res in results
+		]
+	)
+
+	_logger.info(estimated_likelihoods)
+	geometric_mean_estimated_likelihoods = numpy.exp(
+		numpy.log(estimated_likelihoods).mean()
+	)
+	_logger.info(geometric_mean_estimated_likelihoods)
+	arithmetic_mean_estimated_likelihoods = estimated_likelihoods.mean()
+
+	result = MultiSubsetSimulationResult(
+		child_results=results,
+		model_name=model_name,
+		estimated_likelihood=geometric_mean_estimated_likelihoods,
+		arithmetic_mean_estimated_likelihood=arithmetic_mean_estimated_likelihoods,
+		num_children=len(results),
+		num_finished_children=num_finished,
+		clean_estimate=num_finished == len(results),
+	)
+	return result
+
+
+def reverse_bisect_right(a, x, lo=0, hi=None):
+	"""Return the index where to insert item x in list a, assuming a is sorted in descending order.
+
+	The return value i is such that all e in a[:i] have e >= x, and all e in
+	a[i:] have e < x.  So if x already appears in the list, a.insert(x) will
+	insert just after the rightmost x already there.
+
+	Optional args lo (default 0) and hi (default len(a)) bound the
+	slice of a to be searched.
+
+	Essentially, the function returns number of elements in a which are >= than x.
+	>>> a = [8, 6, 5, 4, 2]
+	>>> reverse_bisect_right(a, 5)
+	3
+	>>> a[:reverse_bisect_right(a, 5)]
+	[8, 6, 5]
+	"""
+	if lo < 0:
+		raise ValueError("lo must be non-negative")
+	if hi is None:
+		hi = len(a)
+	while lo < hi:
+		mid = (lo + hi) // 2
+		if x > a[mid]:
+			hi = mid
+		else:
+			lo = mid + 1
+	return lo
--- a/deepdog/temp_aware_real_spectrum_run.py
+++ b/deepdog/temp_aware_real_spectrum_run.py
@ -0,0 +1,231 @@
+import pdme.inputs
+import pdme.model
+import pdme.measurement
+import pdme.measurement.input_types
+import pdme.measurement.oscillating_dipole
+import pdme.util.fast_v_calc
+import pdme.util.fast_nonlocal_spectrum
+from typing import Sequence, Tuple, List, Dict, Union, Mapping
+import datetime
+import csv
+import multiprocessing
+import logging
+import numpy
+
+
+# TODO: remove hardcode
+CHUNKSIZE = 50
+
+
+_logger = logging.getLogger(__name__)
+
+
+def get_a_result_fast_filter(input) -> int:
+	# 								(
+	# 	model,
+	# 	self.dot_inputs_array_dict,
+	# 	low_high_dict,
+	# 	self.monte_carlo_count,
+	# 	seed,
+	# )
+	model, dot_inputs_dict, low_high_dict, monte_carlo_count, seed = input
+
+	rng = numpy.random.default_rng(seed)
+	# TODO: A long term refactor is to pull the frequency stuff out from here. The None stands for max_frequency, which is unneeded in the actually useful models.
+	sample_dipoles = model.get_monte_carlo_dipole_inputs(
+		monte_carlo_count, None, rng_to_use=rng
+	)
+
+	current_sample = sample_dipoles
+	for temp in dot_inputs_dict.keys():
+		dot_inputs = dot_inputs_dict[temp]
+		lows, highs = low_high_dict[temp]
+		for di, low, high in zip(dot_inputs, lows, highs):
+
+			if len(current_sample) < 1:
+				break
+			vals = pdme.util.fast_v_calc.fast_vs_for_asymmetric_dipoleses(
+				numpy.array([di]), current_sample, temp
+			)
+
+			current_sample = current_sample[
+				numpy.all((vals > low) & (vals < high), axis=1)
+			]
+	return len(current_sample)
+
+
+class TempAwareRealSpectrumRun:
+	"""
+	A bayes run given some real data, with potentially variable temperature.
+
+	Parameters
+	----------
+	measurements_dict : Dict[float, Sequence[pdme.measurement.DotRangeMeasurement]]
+	The dot inputs for this bayes run, in a dictionary indexed by temperatures
+
+	models_with_names : models_with_names: Sequence[Tuple[str, pdme.model.DipoleModel]],
+
+	The models to evaluate.
+
+	actual_model : pdme.model.DipoleModel
+	The model which is actually correct.
+
+	filename_slug : str
+	The filename slug to include.
+
+	run_count: int
+	The number of runs to do.
+	"""
+
+	def __init__(
+		self,
+		measurements_dict: Mapping[
+			float, Sequence[pdme.measurement.DotRangeMeasurement]
+		],
+		models_with_names: Sequence[Tuple[str, pdme.model.DipoleModel]],
+		filename_slug: str,
+		monte_carlo_count: int = 10000,
+		monte_carlo_cycles: int = 10,
+		target_success: int = 100,
+		max_monte_carlo_cycles_steps: int = 10,
+		chunksize: int = CHUNKSIZE,
+		initial_seed: int = 12345,
+		cap_core_count: int = 0,
+	) -> None:
+		self.measurements_dict = measurements_dict
+		self.dot_inputs_dict = {
+			k: [(measure.r, measure.f) for measure in measurements]
+			for k, measurements in measurements_dict.items()
+		}
+
+		self.dot_inputs_array_dict = {
+			k: pdme.measurement.input_types.dot_inputs_to_array(dot_inputs)
+			for k, dot_inputs in self.dot_inputs_dict.items()
+		}
+
+		self.models = [model for (_, model) in models_with_names]
+		self.model_names = [name for (name, _) in models_with_names]
+		self.model_count = len(self.models)
+
+		self.monte_carlo_count = monte_carlo_count
+		self.monte_carlo_cycles = monte_carlo_cycles
+		self.target_success = target_success
+		self.max_monte_carlo_cycles_steps = max_monte_carlo_cycles_steps
+
+		self.csv_fields = []
+
+		self.compensate_zeros = True
+		self.chunksize = chunksize
+		for name in self.model_names:
+			self.csv_fields.extend([f"{name}_success", f"{name}_count", f"{name}_prob"])
+
+		# for now initialise priors as uniform.
+		self.probabilities = [1 / self.model_count] * self.model_count
+
+		timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
+		ff_string = "fast_filter"
+		self.filename = f"{timestamp}-{filename_slug}.realdata.{ff_string}.bayesrun.csv"
+		self.initial_seed = initial_seed
+
+		self.cap_core_count = cap_core_count
+
+	def go(self) -> None:
+		with open(self.filename, "a", newline="") as outfile:
+			writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
+			writer.writeheader()
+
+		low_high_dict = {}
+		for temp, measurements in self.measurements_dict.items():
+			(
+				lows,
+				highs,
+			) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
+				measurements
+			)
+			low_high_dict[temp] = (lows, highs)
+
+		# define a new seed sequence for each run
+		seed_sequence = numpy.random.SeedSequence(self.initial_seed)
+
+		results = []
+		_logger.debug("Going to iterate over models now")
+		core_count = multiprocessing.cpu_count() - 1 or 1
+		if (self.cap_core_count >= 1) and (self.cap_core_count < core_count):
+			core_count = self.cap_core_count
+		_logger.info(f"Using {core_count} cores")
+		for model_count, (model, model_name) in enumerate(
+			zip(self.models, self.model_names)
+		):
+			_logger.debug(f"Doing model #{model_count}: {model_name}")
+			with multiprocessing.Pool(core_count) as pool:
+				cycle_count = 0
+				cycle_success = 0
+				cycles = 0
+				while (cycles < self.max_monte_carlo_cycles_steps) and (
+					cycle_success <= self.target_success
+				):
+					_logger.debug(f"Starting cycle {cycles}")
+					cycles += 1
+					current_success = 0
+					cycle_count += self.monte_carlo_count * self.monte_carlo_cycles
+
+					# generate a seed from the sequence for each core.
+					# note this needs to be inside the loop for monte carlo cycle steps!
+					# that way we get more stuff.
+					seeds = seed_sequence.spawn(self.monte_carlo_cycles)
+
+					result_func = get_a_result_fast_filter
+
+					current_success = sum(
+						pool.imap_unordered(
+							result_func,
+							[
+								(
+									model,
+									self.dot_inputs_array_dict,
+									low_high_dict,
+									self.monte_carlo_count,
+									seed,
+								)
+								for seed in seeds
+							],
+							self.chunksize,
+						)
+					)
+
+					cycle_success += current_success
+					_logger.debug(f"current running successes: {cycle_success}")
+				results.append((cycle_count, cycle_success))
+
+		_logger.debug("Done, constructing output now")
+		row: Dict[str, Union[int, float, str]] = {}
+
+		successes: List[float] = []
+		counts: List[int] = []
+		for model_index, (name, (count, result)) in enumerate(
+			zip(self.model_names, results)
+		):
+
+			row[f"{name}_success"] = result
+			row[f"{name}_count"] = count
+			successes.append(max(result, 0.5))
+			counts.append(count)
+
+		success_weight = sum(
+			[
+				(succ / count) * prob
+				for succ, count, prob in zip(successes, counts, self.probabilities)
+			]
+		)
+		new_probabilities = [
+			(succ / count) * old_prob / success_weight
+			for succ, count, old_prob in zip(successes, counts, self.probabilities)
+		]
+		self.probabilities = new_probabilities
+		for name, probability in zip(self.model_names, self.probabilities):
+			row[f"{name}_prob"] = probability
+		_logger.info(row)
+
+		with open(self.filename, "a", newline="") as outfile:
+			writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
+			writer.writerow(row)
--- a/do.sh
+++ b/do.sh
@ -1,29 +0,0 @@
-#!/usr/bin/env bash
-# Do - The Simplest Build Tool on Earth.
-# Documentation and examples see https://github.com/8gears/do
-
-set -Eeuo pipefail # -e "Automatic exit from bash shell script on error"  -u "Treat unset variables and parameters as errors"
-
-build() {
-   echo "I am ${FUNCNAME[0]}ing"
-   poetry build
-}
-
-test() {
-   echo "I am ${FUNCNAME[0]}ing"
-   poetry run flake8 deepdog tests
-   poetry run mypy deepdog
-   poetry run pytest
-}
-
-htmlcov() {
-	poetry run pytest --cov-report=html
-}
-
-all() {
-   build && test
-}
-
-"$@" # <- execute the task
-
-[ "$#" -gt 0 ] || printf "Usage:\n\t./do.sh %s\n" "($(compgen -A function | grep '^[^_]' | paste -sd '|' -))"
--- a/flake.lock
+++ b/flake.lock
@ -0,0 +1,174 @@
+{
+  "nodes": {
+    "flake-utils": {
+      "inputs": {
+        "systems": "systems"
+      },
+      "locked": {
+        "lastModified": 1710146030,
+        "narHash": "sha256-SZ5L6eA7HJ/nmkzGG7/ISclqe6oZdOZTNoesiInkXPQ=",
+        "owner": "numtide",
+        "repo": "flake-utils",
+        "rev": "b1d9ab70662946ef0850d488da1c9019f3a9752a",
+        "type": "github"
+      },
+      "original": {
+        "owner": "numtide",
+        "repo": "flake-utils",
+        "type": "github"
+      }
+    },
+    "flake-utils_2": {
+      "inputs": {
+        "systems": "systems_2"
+      },
+      "locked": {
+        "lastModified": 1705309234,
+        "narHash": "sha256-uNRRNRKmJyCRC/8y1RqBkqWBLM034y4qN7EprSdmgyA=",
+        "owner": "numtide",
+        "repo": "flake-utils",
+        "rev": "1ef2e671c3b0c19053962c07dbda38332dcebf26",
+        "type": "github"
+      },
+      "original": {
+        "owner": "numtide",
+        "repo": "flake-utils",
+        "type": "github"
+      }
+    },
+    "nix-github-actions": {
+      "inputs": {
+        "nixpkgs": [
+          "poetry2nixSrc",
+          "nixpkgs"
+        ]
+      },
+      "locked": {
+        "lastModified": 1703863825,
+        "narHash": "sha256-rXwqjtwiGKJheXB43ybM8NwWB8rO2dSRrEqes0S7F5Y=",
+        "owner": "nix-community",
+        "repo": "nix-github-actions",
+        "rev": "5163432afc817cf8bd1f031418d1869e4c9d5547",
+        "type": "github"
+      },
+      "original": {
+        "owner": "nix-community",
+        "repo": "nix-github-actions",
+        "type": "github"
+      }
+    },
+    "nixpkgs": {
+      "locked": {
+        "lastModified": 1710703777,
+        "narHash": "sha256-M4CNAgjrtvrxIWIAc98RTYcVFoAgwUhrYekeiMScj18=",
+        "owner": "NixOS",
+        "repo": "nixpkgs",
+        "rev": "fc7885fbcea4b782142e06ce2d4d08cf92862004",
+        "type": "github"
+      },
+      "original": {
+        "owner": "NixOS",
+        "repo": "nixpkgs",
+        "type": "github"
+      }
+    },
+    "poetry2nixSrc": {
+      "inputs": {
+        "flake-utils": "flake-utils_2",
+        "nix-github-actions": "nix-github-actions",
+        "nixpkgs": [
+          "nixpkgs"
+        ],
+        "systems": "systems_3",
+        "treefmt-nix": "treefmt-nix"
+      },
+      "locked": {
+        "lastModified": 1708589824,
+        "narHash": "sha256-2GOiFTkvs5MtVF65sC78KNVxQSmsxtk0WmV1wJ9V2ck=",
+        "owner": "nix-community",
+        "repo": "poetry2nix",
+        "rev": "3c92540611f42d3fb2d0d084a6c694cd6544b609",
+        "type": "github"
+      },
+      "original": {
+        "owner": "nix-community",
+        "repo": "poetry2nix",
+        "type": "github"
+      }
+    },
+    "root": {
+      "inputs": {
+        "flake-utils": "flake-utils",
+        "nixpkgs": "nixpkgs",
+        "poetry2nixSrc": "poetry2nixSrc"
+      }
+    },
+    "systems": {
+      "locked": {
+        "lastModified": 1681028828,
+        "narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
+        "owner": "nix-systems",
+        "repo": "default",
+        "rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
+        "type": "github"
+      },
+      "original": {
+        "owner": "nix-systems",
+        "repo": "default",
+        "type": "github"
+      }
+    },
+    "systems_2": {
+      "locked": {
+        "lastModified": 1681028828,
+        "narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
+        "owner": "nix-systems",
+        "repo": "default",
+        "rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
+        "type": "github"
+      },
+      "original": {
+        "owner": "nix-systems",
+        "repo": "default",
+        "type": "github"
+      }
+    },
+    "systems_3": {
+      "locked": {
+        "lastModified": 1681028828,
+        "narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
+        "owner": "nix-systems",
+        "repo": "default",
+        "rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
+        "type": "github"
+      },
+      "original": {
+        "id": "systems",
+        "type": "indirect"
+      }
+    },
+    "treefmt-nix": {
+      "inputs": {
+        "nixpkgs": [
+          "poetry2nixSrc",
+          "nixpkgs"
+        ]
+      },
+      "locked": {
+        "lastModified": 1708335038,
+        "narHash": "sha256-ETLZNFBVCabo7lJrpjD6cAbnE11eDOjaQnznmg/6hAE=",
+        "owner": "numtide",
+        "repo": "treefmt-nix",
+        "rev": "e504621290a1fd896631ddbc5e9c16f4366c9f65",
+        "type": "github"
+      },
+      "original": {
+        "owner": "numtide",
+        "repo": "treefmt-nix",
+        "type": "github"
+      }
+    }
+  },
+  "root": "root",
+  "version": 7
+}
--- a/flake.nix
+++ b/flake.nix
@ -0,0 +1,47 @@
+{
+  description = "Application packaged using poetry2nix";
+
+  inputs.flake-utils.url = "github:numtide/flake-utils";
+  inputs.nixpkgs.url = "github:NixOS/nixpkgs";
+  inputs.poetry2nixSrc = {
+    url = "github:nix-community/poetry2nix";
+    inputs.nixpkgs.follows = "nixpkgs";
+  };
+
+  outputs = { self, nixpkgs, flake-utils, poetry2nixSrc }:
+    flake-utils.lib.eachDefaultSystem (system:
+      let
+        pkgs = nixpkgs.legacyPackages.${system};
+	poetry2nix = poetry2nixSrc.lib.mkPoetry2Nix { inherit pkgs; };
+      in {
+        packages = {
+	  deepdogApp = poetry2nix.mkPoetryApplication {
+            projectDir = self;
+	    python = pkgs.python39;
+	    preferWheels = true;
+	  };
+	  deepdogEnv = poetry2nix.mkPoetryEnv {
+	    projectDir = self;
+	    python = pkgs.python39;
+	    preferWheels = true;
+	    overrides = poetry2nix.overrides.withDefaults (self: super: {
+	    });
+	  };
+	  default = self.packages.${system}.deepdogEnv;
+	};
+	devShells.default = pkgs.mkShell {
+	  inputsFrom = [ self.packages.${system}.deepdogEnv ];
+	  buildInputs = [
+	    pkgs.poetry
+	    self.packages.${system}.deepdogEnv
+	    self.packages.${system}.deepdogApp
+	    pkgs.just
+	    pkgs.nodejs
+	  ];
+	  shellHook = ''
+	    export DO_NIX_CUSTOM=1
+	  '';
+	};
+      }
+    );
+}
--- a/jenkins/ci-agent-pod.yaml
+++ b/jenkins/ci-agent-pod.yaml
@ -1,9 +1,11 @@
 apiVersion: v1
 kind: Pod
 spec:
+  imagePullSecrets:
+    - name: regcreds
  containers:  # list of containers that you want present for your build, you can define a default container in the Jenkinsfile
-    - name: python
-      image: python:3.8
+    - name: poetry
+      image: ghcr.io/dmallubhotla/poetry-image:1
      command: ["tail", "-f", "/dev/null"]  # this or any command that is bascially a noop is required, this is so that you don't overwrite the entrypoint of the base container
      imagePullPolicy: Always # use cache or pull image for agent
      resources:  # limits the resources your build contaienr
--- a/60
+++ b/60
@ -0,0 +1,60 @@
+
+# execute default build
+default: build
+
+# builds the python module using poetry
+build:
+	echo "building..."
+	poetry build
+
+# print a message displaying whether nix is being used
+checknix:
+	#!/usr/bin/env bash
+	set -euxo pipefail
+	if [[ "${DO_NIX_CUSTOM:=0}" -eq 1 ]]; then
+		echo "In an interactive nix env."
+	else
+		echo "Using poetry as runner, no nix detected."
+	fi
+
+# run all tests
+test: fmt
+	#!/usr/bin/env bash
+	set -euxo pipefail
+	
+	if [[ "${DO_NIX_CUSTOM:=0}" -eq 1 ]]; then
+		echo "testing, using nix..."
+		flake8 deepdog tests
+		mypy deepdog
+		pytest
+	else
+		echo "testing..."
+		poetry run flake8 deepdog tests
+		poetry run mypy deepdog
+		poetry run pytest
+	fi
+
+# format code
+fmt:
+	#!/usr/bin/env bash
+	set -euxo pipefail
+	if [[ "${DO_NIX_CUSTOM:=0}" -eq 1 ]]; then
+	      black .
+	else
+		poetry run black .
+	fi
+	find deepdog -type f -name "*.py" -exec sed -i -e 's/    /\t/g' {} \;
+	find tests -type f -name "*.py" -exec sed -i -e 's/    /\t/g' {} \;
+
+# release the app, checking that our working tree is clean and ready for release, optionally takes target version
+release version="":
+	#!/usr/bin/env bash
+	set -euxo pipefail
+	if [[ -n "{{version}}" ]]; then
+		./scripts/release.sh {{version}}
+	else
+		./scripts/release.sh
+	fi
+
+htmlcov:
+	poetry run pytest --cov-report=html
--- a/poetry.lock
+++ b/poetry.lock
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,18 +1,28 @@
 [tool.poetry]
 name = "deepdog"
-version = "0.1.5"
+version = "1.7.0"
 description = ""
 authors = ["Deepak Mallubhotla <dmallubhotla+github@gmail.com>"]

 [tool.poetry.dependencies]
-python = "^3.8,<3.10"
-pdme = "^0.4.1"
+python = ">=3.8.1,<3.10"
+pdme = "^1.5.0"
+numpy = "1.22.3"
+scipy = "1.10"
+tqdm = "^4.66.2"

 [tool.poetry.dev-dependencies]
 pytest = ">=6"
 flake8 = "^4.0.1"
-pytest-cov = "^3.0.0"
-mypy = "^0.931"
+pytest-cov = "^4.1.0"
+mypy = "^0.971"
+python-semantic-release = "^7.24.0"
+black = "^22.3.0"
+syrupy = "^4.0.8"
+
+[tool.poetry.scripts]
+probs = "deepdog.cli.probs:wrapped_main"
+subset_sim_probs = "deepdog.cli.subset_sim_probs:wrapped_main"

 [build-system]
 requires = ["poetry-core>=1.0.0"]
@ -32,3 +42,14 @@ module = [
 	"scipy.optimize"
 ]
 ignore_missing_imports = true
+
+[[tool.mypy.overrides]]
+module = [
+	"tqdm",
+	"tqdm.*"
+]
+ignore_missing_imports = true
+
+[tool.semantic_release]
+version_toml = "pyproject.toml:tool.poetry.version"
+tag_format = "{version}"
--- a/renovate.json
+++ b/renovate.json
@ -0,0 +1,3 @@
+{
+	"$schema": "https://docs.renovatebot.com/renovate-schema.json"
+}
--- a/scripts/patch.sh
+++ b/scripts/patch.sh
@ -1,29 +0,0 @@
-#!/usr/bin/env bash
-set -Eeuo pipefail
-
-if [ -z "$(git status --porcelain)" ]; then 
-	# Working directory clean
-	branch_name=$(git symbolic-ref -q HEAD)
-	branch_name=${branch_name##refs/heads/}
-	branch_name=${branch_name:-HEAD}
-	
-	poetry version patch
-	version=`sed 's/version = "\([0-9]*.[0-9]*.[0-9]*\)"/\1/p' -n <pyproject.toml`
-	read -p "Create commit for version $version? " -n 1 -r
-	echo    # (optional) move to a new line
-	if [[ $REPLY =~ ^[Yy]$ ]]
-	then
-		# do dangerous stuff
-		echo "Creating a new patch"
-		git add pyproject.toml
-		git commit -m "Created version $version"
-		git tag -a "$version" -m "patch.sh created version $version"
-		git push --tags
-	else
-		echo "Surrendering, clean up by reverting pyproject.toml..."
-		exit 2
-	fi
-else 
-	echo "Can't create patch version, working tree unclean..."
-	exit 1
-fi
--- a/scripts/release.sh
+++ b/scripts/release.sh
@ -0,0 +1,52 @@
+#!/usr/bin/env bash
+set -Eeuo pipefail
+
+if [ -z "$(git status --porcelain)" ]; then
+	branch_name=$(git symbolic-ref -q HEAD)
+	branch_name=${branch_name##refs/heads/}
+	branch_name=${branch_name:-HEAD}
+	if [ $branch_name != "master" ]; then
+		echo "The current branch is not master!"
+		echo "I'd feel uncomfortable releasing from here..."
+		exit 3
+	fi
+
+	release_needed=false
+	if \
+		{ git log "$( git describe --tags --abbrev=0 )..HEAD" --format='%s' | cut -d: -f1 | sort -u | sed -e 's/([^)]*)//' | grep -q -i -E '^feat|fix|perf|refactor|revert$' ; } || \
+		{ git log "$( git describe --tags --abbrev=0 )..HEAD" --format='%s' | cut -d: -f1 | sort -u | sed -e 's/([^)]*)//' | grep -q -E '\!$' ; } || \
+		{ git log "$( git describe --tags --abbrev=0 )..HEAD" --format='%b' | grep -q -E '^BREAKING CHANGE:' ; }
+	then
+		release_needed=true
+	fi
+	
+	if ! [ "$release_needed" = true ]; then
+		echo "No release needed..."
+		exit 0
+	fi
+
+	std_version_args=()
+	if [[ -n "${1:-}" ]]; then
+		std_version_args+=( "--release-as" "$1" )
+		echo "Parameter $1 was supplied, so we should use release-as"
+	else
+		echo "No release-as parameter specifed."
+	fi
+	# Working directory clean
+	echo "Doing a dry run..."
+	npx standard-version --dry-run "${std_version_args[@]}"
+	read -p "Does that look good? [y/N] " -n 1 -r
+	echo    # (optional) move to a new line
+	if [[ $REPLY =~ ^[Yy]$ ]]
+	then
+		# do dangerous stuff
+		npx standard-version "${std_version_args[@]}"
+		git push --follow-tags origin master
+	else
+		echo "okay, never mind then..."
+		exit 2
+	fi
+else 
+	echo "Can't create release, working tree unclean..."
+	exit 1
+fi
--- a/scripts/standard-version/pyproject-updater.js
+++ b/scripts/standard-version/pyproject-updater.js
@ -0,0 +1,11 @@
+const pattern = /(\[tool\.poetry\]\nname = "deepdog"\nversion = ")(?<vers>\d+\.\d+\.\d+)(")/mg;
+
+module.exports.readVersion = function (contents) {
+	const result = pattern.exec(contents);
+	return result.groups.vers;
+}
+
+module.exports.writeVersion = function (contents, version) {
+	const newContents = contents.replace(pattern, `$1${version}$3`);
+	return newContents;
+}
--- a/tests/snapshots/test_bayes_run_with_ss.ambr
+++ b/tests/snapshots/test_bayes_run_with_ss.ambr
@ -0,0 +1,177 @@
+# serializer version: 1
+# name: test_basic_analysis
+  list([
+    dict({
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 0.1,
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.3333333333333333,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.1,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.3333333333333333,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.1,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.3333333333333333,
+      'dipole_frequency_1': 0.006029931414230269,
+      'dipole_frequency_2': 85436.78758379082,
+      'dipole_location_1': array([-4.76615152, -6.33160296,  5.29522808]),
+      'dipole_location_2': array([-4.72700391, -2.06478573,  6.52467702]),
+      'dipole_moment_1': array([ 860.14181416, -450.27082062, -239.60852996]),
+      'dipole_moment_2': array([ 908.18325588, -208.52681777, -362.93214244]),
+    }),
+    dict({
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 0.45,
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.3103448275862069,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.9,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.6206896551724138,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.1,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.06896551724137932,
+      'dipole_frequency_1': 102275.63477261562,
+      'dipole_frequency_2': 1755280.9783485082,
+      'dipole_location_1': array([ 4.71515397, -9.70362197,  5.43016546]),
+      'dipole_location_2': array([3.42476038, 3.88562934, 5.15034328]),
+      'dipole_moment_1': array([-502.60742674, -790.60222587,  349.7626267 ]),
+      'dipole_moment_2': array([-192.42708465, -434.81009148, -879.7226844 ]),
+    }),
+    dict({
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 0.7,
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.6631578947368421,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.1,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.18947368421052635,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.7,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.1473684210526316,
+      'dipole_frequency_1': 2896.799464036654,
+      'dipole_frequency_2': 9.980565189326681e-05,
+      'dipole_location_1': array([-4.97465789, 12.54716531,  6.06324588]),
+      'dipole_location_2': array([  9.84518459, -11.1183876 ,   7.35028226]),
+      'dipole_moment_1': array([997.67961917,  19.6376112 ,  65.19004305]),
+      'dipole_moment_2': array([305.63093655, 440.57669389, 844.08643362]),
+    }),
+    dict({
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 0.1,
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.663157894736842,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.1,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.18947368421052635,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.1,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.1473684210526316,
+      'dipole_frequency_1': 1.4522667818288244,
+      'dipole_frequency_2': 2704.9795645301197,
+      'dipole_location_1': array([ 7.38183022, 16.6745801 ,  7.10428414]),
+      'dipole_location_2': array([-8.15636906, -9.56609132,  6.34141559]),
+      'dipole_moment_1': array([-145.9924693 ,  738.74936496,  657.97839986]),
+      'dipole_moment_2': array([-960.16113239,  104.96824669, -258.98314046]),
+    }),
+    dict({
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 0.9,
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.9465776293823038,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.1,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.030050083472454105,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.1,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.02337228714524208,
+      'dipole_frequency_1': 3827.2315421318913,
+      'dipole_frequency_2': 1.9301094166184413e-05,
+      'dipole_location_1': array([ 5.02067673, -0.9783039 ,  6.1431897 ]),
+      'dipole_location_2': array([ 4.66628999, 10.80907459,  7.21771744]),
+      'dipole_moment_1': array([ 871.30659253, -299.17389491, -388.99846068]),
+      'dipole_moment_2': array([-189.87268624,  677.28285845,  710.79975568]),
+    }),
+  ])
+# ---
+# name: test_bayesss_with_tighter_cost
+  list([
+    dict({
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.33333333333333337,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.33333333333333337,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.33333333333333337,
+      'dipole_frequency_1': 0.006029931414230269,
+      'dipole_frequency_2': 85436.78758379082,
+      'dipole_location_1': array([-4.76615152, -6.33160296,  5.29522808]),
+      'dipole_location_2': array([-4.72700391, -2.06478573,  6.52467702]),
+      'dipole_moment_1': array([ 860.14181416, -450.27082062, -239.60852996]),
+      'dipole_moment_2': array([ 908.18325588, -208.52681777, -362.93214244]),
+    }),
+    dict({
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 0.0109375,
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.1044776119402985,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.03125,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.2985074626865672,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.0625,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.5970149253731344,
+      'dipole_frequency_1': 102275.63477261562,
+      'dipole_frequency_2': 1755280.9783485082,
+      'dipole_location_1': array([ 4.71515397, -9.70362197,  5.43016546]),
+      'dipole_location_2': array([3.42476038, 3.88562934, 5.15034328]),
+      'dipole_moment_1': array([-502.60742674, -790.60222587,  349.7626267 ]),
+      'dipole_moment_2': array([-192.42708465, -434.81009148, -879.7226844 ]),
+    }),
+    dict({
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 7.291135021404688e-05,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.021875,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.4666326413699001,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.0125,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.5332944472798858,
+      'dipole_frequency_1': 2896.799464036654,
+      'dipole_frequency_2': 9.980565189326681e-05,
+      'dipole_location_1': array([-4.97465789, 12.54716531,  6.06324588]),
+      'dipole_location_2': array([  9.84518459, -11.1183876 ,   7.35028226]),
+      'dipole_moment_1': array([997.67961917,  19.6376112 ,  65.19004305]),
+      'dipole_moment_2': array([305.63093655, 440.57669389, 844.08643362]),
+    }),
+    dict({
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 7.291135021404688e-05,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.4666326413699001,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.5332944472798858,
+      'dipole_frequency_1': 1.4522667818288244,
+      'dipole_frequency_2': 2704.9795645301197,
+      'dipole_location_1': array([ 7.38183022, 16.6745801 ,  7.10428414]),
+      'dipole_location_2': array([-8.15636906, -9.56609132,  6.34141559]),
+      'dipole_moment_1': array([-145.9924693 ,  738.74936496,  657.97839986]),
+      'dipole_moment_2': array([-960.16113239,  104.96824669, -258.98314046]),
+    }),
+    dict({
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 0.175,
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 0.00012008361740869356,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.05625,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.24702915581216964,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.15,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.7528507605704217,
+      'dipole_frequency_1': 3827.2315421318913,
+      'dipole_frequency_2': 1.9301094166184413e-05,
+      'dipole_location_1': array([ 5.02067673, -0.9783039 ,  6.1431897 ]),
+      'dipole_location_2': array([ 4.66628999, 10.80907459,  7.21771744]),
+      'dipole_moment_1': array([ 871.30659253, -299.17389491, -388.99846068]),
+      'dipole_moment_2': array([-189.87268624,  677.28285845,  710.79975568]),
+    }),
+    dict({
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 4.9116305003549454e-08,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 0.0109375,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.11316396672817797,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.028125,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.886835984155517,
+      'dipole_frequency_1': 1.1715179359592061e-05,
+      'dipole_frequency_2': 0.0019103783276337497,
+      'dipole_location_1': array([-0.95736547,  1.09273812,  7.47158641]),
+      'dipole_location_2': array([ -3.18510322, -15.64493131,   5.81623624]),
+      'dipole_moment_1': array([-184.64961369,  956.56786553,  225.57136075]),
+      'dipole_moment_2': array([ -34.63395137,  801.17771816, -597.42342885]),
+    }),
+    dict({
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
+      'connors_geom-5height-orientation_fixedxy-pfixexp_3-dipole_count_2_prob': 1.977090156727901e-10,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_likelihood': 9.765625e-06,
+      'connors_geom-5height-orientation_fixedz-pfixexp_3-dipole_count_2_prob': 0.00045552157211010855,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_likelihood': 0.002734375,
+      'connors_geom-5height-orientation_free-pfixexp_3-dipole_count_2_prob': 0.9995444782301809,
+      'dipole_frequency_1': 999786.9069039805,
+      'dipole_frequency_2': 186034.67996840767,
+      'dipole_location_1': array([-5.59679125,  6.3411602 ,  5.33602522]),
+      'dipole_location_2': array([-0.03412955, -6.83522954,  5.58551513]),
+      'dipole_moment_1': array([826.38270589, 491.81526944, 274.24325726]),
+      'dipole_moment_2': array([ 202.74745884, -656.07483714, -726.95204519]),
+    }),
+  ])
+# ---
--- a/tests/direct_monte_carlo/init.py
+++ b/tests/direct_monte_carlo/init.py
--- a/tests/direct_monte_carlo/test_config_filename.py
+++ b/tests/direct_monte_carlo/test_config_filename.py
@ -0,0 +1,26 @@
+import re
+import deepdog.direct_monte_carlo
+
+
+def test_config_check_self():
+	config = deepdog.direct_monte_carlo.DirectMonteCarloConfig(
+		tag="test_tag",
+		bayesrun_file_timestamp=False,
+	)
+	expected_filename = "test_tag.realdata.fast_filter.bayesrun.csv"
+	actual_filename = config.get_filename()
+	assert actual_filename == expected_filename
+	regex = config.get_filename_regex()
+	assert re.match(regex, actual_filename) is not None
+
+
+def test_config_check_self_with_timestamp():
+	config = deepdog.direct_monte_carlo.DirectMonteCarloConfig(
+		tag="test_tag",
+		bayesrun_file_timestamp=True,
+	)
+	expected_filename_ending = "test_tag.realdata.fast_filter.bayesrun.csv"
+	actual_filename = config.get_filename()
+	assert actual_filename.endswith(expected_filename_ending)
+	regex = config.get_filename_regex()
+	assert re.match(regex, actual_filename) is not None
--- a/tests/direct_monte_carlo/test_cost_function_filter.py
+++ b/tests/direct_monte_carlo/test_cost_function_filter.py
@ -0,0 +1,42 @@
+import deepdog.direct_monte_carlo.cost_function_filter
+import numpy
+
+
+def test_px_cost_function_filter_example():
+
+	dipoles_1 = [
+		[1, 2, 3, 4, 5, 6, 7],
+		[2, 3, 2, 5, 4, 7, 6],
+	]
+
+	dipoles_2 = [
+		[15, 9, 8, 7, 6, 5, 3],
+		[30, 4, 4, 7, 3, 1, 4],
+	]
+
+	dipoleses = numpy.array([dipoles_1, dipoles_2])
+
+	def cost_function(dipoleses: numpy.ndarray) -> numpy.ndarray:
+		return dipoleses[:, :, 0].max(axis=-1)
+
+	expected_costs = numpy.array([2, 30])
+
+	numpy.testing.assert_array_equal(cost_function(dipoleses), expected_costs)
+
+	filter = deepdog.direct_monte_carlo.cost_function_filter.CostFunctionTargetFilter(
+		cost_function, 5
+	)
+
+	actual_filtered = filter.filter_samples(dipoleses)
+	expected_filtered = numpy.array([dipoles_1])
+	assert actual_filtered.size != 0
+	numpy.testing.assert_array_equal(actual_filtered, expected_filtered)
+
+	filter_stricter = (
+		deepdog.direct_monte_carlo.cost_function_filter.CostFunctionTargetFilter(
+			cost_function, 0.5
+		)
+	)
+
+	actual_filtered_stricter = filter_stricter.filter_samples(dipoleses)
+	assert actual_filtered_stricter.size == 0
--- a/tests/direct_monte_carlo/test_eletric_field_x_dmc_filter.py
+++ b/tests/direct_monte_carlo/test_eletric_field_x_dmc_filter.py
@ -0,0 +1,137 @@
+import pdme.measurement
+import pdme.measurement.input_types
+from pdme.model import (
+	LogSpacedRandomCountMultipleDipoleFixedMagnitudeModel,
+	LogSpacedRandomCountMultipleDipoleFixedMagnitudeXYModel,
+	LogSpacedRandomCountMultipleDipoleFixedMagnitudeFixedOrientationModel,
+)
+import deepdog.direct_monte_carlo.dmc_filters
+import numpy.random
+import numpy.testing
+import logging
+
+_logger = logging.getLogger(__name__)
+
+
+def fixed_z_model_func(
+	xmin,
+	xmax,
+	ymin,
+	ymax,
+	zmin,
+	zmax,
+	wexp_min,
+	wexp_max,
+	pfixed,
+	n_max,
+	prob_occupancy,
+):
+	return LogSpacedRandomCountMultipleDipoleFixedMagnitudeFixedOrientationModel(
+		xmin,
+		xmax,
+		ymin,
+		ymax,
+		zmin,
+		zmax,
+		wexp_min,
+		wexp_max,
+		pfixed,
+		0,
+		0,
+		n_max,
+		prob_occupancy,
+	)
+
+
+def get_model(orientation):
+	model_funcs = {
+		"fixedz": fixed_z_model_func,
+		"free": LogSpacedRandomCountMultipleDipoleFixedMagnitudeModel,
+		"fixedxy": LogSpacedRandomCountMultipleDipoleFixedMagnitudeXYModel,
+	}
+	model = model_funcs[orientation](
+		-10,
+		10,
+		-17.5,
+		17.5,
+		5,
+		7.5,
+		-5,
+		6.5,
+		10**3,
+		2,
+		0.99999999,
+	)
+	model.n = 2
+	model.rng = numpy.random.default_rng(1234)
+
+	return (
+		f"connors_geom-5height-orientation_{orientation}-pfixexp_{3}-dipole_count_{2}",
+		model,
+	)
+
+
+def test_electric_field_x_dmc_filter():
+
+	dipoles_raw = [
+		[(1, 2, 3), (4, 5, 6), 1],
+		[(-1, 5, 2), (6, 5, 4), 10],
+	]
+	dipoles = [
+		pdme.measurement.OscillatingDipole(numpy.array(d[0]), numpy.array(d[1]), d[2])
+		for d in dipoles_raw
+	]
+
+	_logger.debug(f"dipoles: {dipoles}")
+	dot_inputs_raw = [
+		([-1, -1, 0], 1),
+		([-1, -1, 0], 2),
+		([-1, -1, 0], 3),
+		([-1, -1, 0], 4),
+	]
+	dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(dot_inputs_raw)
+	_logger.debug(f"dot_inputs_array: {dot_inputs_array}")
+
+	arrangement = pdme.measurement.OscillatingDipoleArrangement(dipoles)
+	measurements = []
+	for input in dot_inputs_raw:
+		ex = sum(
+			[
+				dipole.s_electric_fieldx_at_position(*input)
+				for dipole in arrangement.dipoles
+			]
+		)
+		ex_low = ex * 0.5
+		ex_high = ex * 1.5
+		meas = pdme.measurement.DotRangeMeasurement(ex_low, ex_high, input[0], input[1])
+		measurements.append(meas)
+
+	filter = deepdog.direct_monte_carlo.dmc_filters.SingleDotSpinQubitFrequencyFilter(
+		measurements
+	)
+
+	samples = numpy.array(
+		[
+			[
+				[1, 2, 3, 4, 5, 6, 1],
+				[-1, 5, 2, 6, 5, 4, 10],
+			],
+			[
+				[10, 20, 30, 40, 50, 60, 1],
+				[-1, 5, 2, 6, 5, 4, 1],
+			],
+			[
+				[1, 1, 1, 1, 1, 1, 1],
+				[2, 2, 2, 2, 2, 2, 1],
+			],
+		]
+	)
+
+	expected = samples[
+		0:1
+	]  # only expect to see the first guy, because that's what generated our thing
+	filtered = filter.filter_samples(samples)
+	assert len(filtered) != len(samples), "Should have filtered some out!"
+	numpy.testing.assert_array_equal(
+		filtered, expected, "The filter should have only returned the first one"
+	)
--- a/tests/indexify/init.py
+++ b/tests/indexify/init.py
--- a/tests/indexify/test_indexify.py
+++ b/tests/indexify/test_indexify.py
@ -0,0 +1,21 @@
+import deepdog.indexify
+import logging
+
+_logger = logging.getLogger(__name__)
+
+
+def test_indexifier():
+	weight_dict = {"key_1": [1, 2, 3], "key_2": ["a", "b", "c"]}
+	indexifier = deepdog.indexify.Indexifier(weight_dict)
+	_logger.debug(f"setting up indexifier {indexifier}")
+	assert indexifier.indexify(0) == {"key_1": 1, "key_2": "a"}
+	assert indexifier.indexify(5) == {"key_1": 2, "key_2": "c"}
+	assert len(indexifier) == 9
+
+
+def test_indexifier_length_short():
+	weight_dict = {"key_1": [1, 2, 3], "key_2": ["b", "c"]}
+	indexifier = deepdog.indexify.Indexifier(weight_dict)
+	_logger.debug(f"setting up indexifier {indexifier}")
+
+	assert len(indexifier) == 6
--- a/tests/results/init.py
+++ b/tests/results/init.py
--- a/tests/results/test_column_results.py
+++ b/tests/results/test_column_results.py
@ -0,0 +1,75 @@
+import deepdog.results.read_csv
+
+
+def test_parse_groupdict():
+	example_column_name = (
+		"geom_-20_20_-10_10_0_5-orientation_free-dipole_count_100_success"
+	)
+
+	parsed = deepdog.results.read_csv._parse_bayesrun_column(example_column_name)
+	assert parsed is not None
+	expected = deepdog.results.read_csv.BayesrunColumnParsed(
+		{
+			"xmin": "-20",
+			"xmax": "20",
+			"ymin": "-10",
+			"ymax": "10",
+			"zmin": "0",
+			"zmax": "5",
+			"orientation": "free",
+			"avg_filled": "100",
+			"field_name": "success",
+		}
+	)
+	assert parsed == expected
+
+
+def test_parse_groupdict_with_magnitude():
+	example_column_name = (
+		"geom_-20_20_-10_10_0_5-magnitude_3.5-orientation_free-dipole_count_100_success"
+	)
+
+	parsed = deepdog.results.read_csv._parse_bayesrun_column(example_column_name)
+	assert parsed is not None
+	expected = deepdog.results.read_csv.BayesrunColumnParsed(
+		{
+			"xmin": "-20",
+			"xmax": "20",
+			"ymin": "-10",
+			"ymax": "10",
+			"zmin": "0",
+			"zmax": "5",
+			"orientation": "free",
+			"avg_filled": "100",
+			"log_magnitude": "3.5",
+			"field_name": "success",
+		}
+	)
+	assert parsed == expected
+
+
+def test_parse_groupdict_with_negative_magnitude():
+	example_column_name = "geom_-20_20_-10_10_0_5-magnitude_-3.5-orientation_free-dipole_count_100_success"
+
+	parsed = deepdog.results.read_csv._parse_bayesrun_column(example_column_name)
+	assert parsed is not None
+	expected = deepdog.results.read_csv.BayesrunColumnParsed(
+		{
+			"xmin": "-20",
+			"xmax": "20",
+			"ymin": "-10",
+			"ymax": "10",
+			"zmin": "0",
+			"zmax": "5",
+			"orientation": "free",
+			"avg_filled": "100",
+			"log_magnitude": "-3.5",
+			"field_name": "success",
+		}
+	)
+	assert parsed == expected
+
+
+# def test_parse_no_match_column_name():
+# 	parsed = deepdog.results.parse_bayesrun_column("There's nothing here")
+# 	assert parsed is None
--- a/tests/results/test_parse_filename.py
+++ b/tests/results/test_parse_filename.py
@ -0,0 +1,19 @@
+import deepdog.results
+import pytest
+
+
+def test_parse_bayesrun_filename():
+	valid1 = "20250226-204120-dot1-dot1-2-0.realdata.fast_filter.bayesrun.csv"
+
+	timestamp, slug = deepdog.results._parse_string_output_filename(valid1)
+	assert timestamp == "20250226-204120"
+	assert slug == "dot1-dot1-2-0"
+
+	valid2 = "dot1-dot1-2-0.realdata.fast_filter.bayesrun.csv"
+
+	timestamp, slug = deepdog.results._parse_string_output_filename(valid2)
+	assert timestamp is None
+	assert slug == "dot1-dot1-2-0"
+
+	with pytest.raises(ValueError):
+		deepdog.results._parse_string_output_filename("not_a_valid_filename")
--- a/tests/subset_simulation/snapshots/test_subset_simulation_coalescing.ambr
+++ b/tests/subset_simulation/snapshots/test_subset_simulation_coalescing.ambr
@ -0,0 +1,10 @@
+# serializer version: 1
+# name: test_subset_simulation_multi_result_coalescing_easy_arithmetic
+  MultiSubsetSimulationResult(child_results=[SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.8, lowest_likelihood=0.5, messages=[]), SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.6, lowest_likelihood=0.01, messages=[])], model_name='test', estimated_likelihood=0.6928203230275509, arithmetic_mean_estimated_likelihood=0.7, num_children=2, num_finished_children=2, clean_estimate=True)
+# ---
+# name: test_subset_simulation_multi_result_coalescing_easy_geometric
+  MultiSubsetSimulationResult(child_results=[SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.1, lowest_likelihood=0.5, messages=[]), SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.001, lowest_likelihood=0.01, messages=[])], model_name='test', estimated_likelihood=0.010000000000000004, arithmetic_mean_estimated_likelihood=0.0505, num_children=2, num_finished_children=2, clean_estimate=True)
+# ---
+# name: test_subset_simulation_multi_result_coalescing_include_dirty
+  MultiSubsetSimulationResult(child_results=[SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.8, lowest_likelihood=0.5, messages=[]), SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.08, lowest_likelihood=0.01, messages=[]), SubsetSimulationResult(probs_list=(), over_target_cost=None, over_target_likelihood=None, under_target_cost=None, under_target_likelihood=None, lowest_likelihood=0.0001, messages=[])], model_name='test', estimated_likelihood=0.01856635533445112, arithmetic_mean_estimated_likelihood=0.29336666666666666, num_children=3, num_finished_children=2, clean_estimate=False)
+# ---
--- a/tests/subset_simulation/test_subset_simulation_coalescing.py
+++ b/tests/subset_simulation/test_subset_simulation_coalescing.py
@ -0,0 +1,92 @@
+import deepdog.subset_simulation.subset_simulation_impl as impl
+import numpy
+
+
+def test_subset_simulation_multi_result_coalescing_include_dirty(snapshot):
+	res1 = impl.SubsetSimulationResult(
+		probs_list=(),
+		over_target_cost=1,
+		over_target_likelihood=1,
+		under_target_cost=0.99,
+		under_target_likelihood=0.8,
+		lowest_likelihood=0.5,
+		messages=[],
+	)
+
+	res2 = impl.SubsetSimulationResult(
+		probs_list=(),
+		over_target_cost=1,
+		over_target_likelihood=1,
+		under_target_cost=0.99,
+		under_target_likelihood=0.08,
+		lowest_likelihood=0.01,
+		messages=[],
+	)
+
+	res3 = impl.SubsetSimulationResult(
+		probs_list=(),
+		over_target_cost=None,
+		over_target_likelihood=None,
+		under_target_cost=None,
+		under_target_likelihood=None,
+		lowest_likelihood=0.0001,
+		messages=[],
+	)
+
+	combined = impl.coalesce_ss_results("test", [res1, res2, res3])
+
+	assert combined == snapshot
+
+
+def test_subset_simulation_multi_result_coalescing_easy_arithmetic(snapshot):
+	res1 = impl.SubsetSimulationResult(
+		probs_list=(),
+		over_target_cost=1,
+		over_target_likelihood=1,
+		under_target_cost=0.99,
+		under_target_likelihood=0.8,
+		lowest_likelihood=0.5,
+		messages=[],
+	)
+
+	res2 = impl.SubsetSimulationResult(
+		probs_list=(),
+		over_target_cost=1,
+		over_target_likelihood=1,
+		under_target_cost=0.99,
+		under_target_likelihood=0.6,
+		lowest_likelihood=0.01,
+		messages=[],
+	)
+
+	combined = impl.coalesce_ss_results("test", [res1, res2])
+
+	assert combined.arithmetic_mean_estimated_likelihood == 0.7
+	assert combined == snapshot
+
+
+def test_subset_simulation_multi_result_coalescing_easy_geometric(snapshot):
+	res1 = impl.SubsetSimulationResult(
+		probs_list=(),
+		over_target_cost=1,
+		over_target_likelihood=1,
+		under_target_cost=0.99,
+		under_target_likelihood=0.1,
+		lowest_likelihood=0.5,
+		messages=[],
+	)
+
+	res2 = impl.SubsetSimulationResult(
+		probs_list=(),
+		over_target_cost=1,
+		over_target_likelihood=1,
+		under_target_cost=0.99,
+		under_target_likelihood=0.001,
+		lowest_likelihood=0.01,
+		messages=[],
+	)
+
+	combined = impl.coalesce_ss_results("test", [res1, res2])
+
+	numpy.testing.assert_allclose(combined.estimated_likelihood, 0.01)
+	assert combined == snapshot