Compare commits
80 Commits
Author | SHA1 | Date | |
---|---|---|---|
71dc906a96 | |||
24c6e311c1 | |||
4dd3004a7b | |||
46f6b6cdf1 | |||
c8435b4b2a | |||
c2375e6f5c | |||
a1b59cd18b | |||
53f8993f2b | |||
700f32ea58 | |||
3737252c4b | |||
6f79a49e59 | |||
d962ecb11e | |||
7beca501bf | |||
5425ce1362 | |||
6a5c5931d4 | |||
36ff75576c | |||
e76c619c8b | |||
c881da2837 | |||
1a1ecc01ea | |||
9cfd484d7c | |||
09fad2e102 | |||
24ac65bf9c | |||
8fbae32111 | |||
b1c01b25c8 | |||
a14d9834e5 | |||
8d04803eb3 | |||
92b49fce7c | |||
8845b2875f | |||
72791f2d0f | |||
d258cfbec7 | |||
b3bf4cde97 | |||
60f29b0b2f | |||
093a3fb5c4 | |||
dc1d2d45a3 | |||
f0e2fa3da9 | |||
2581e722e6 | |||
62bd63bf9b | |||
df4d0b5d15 | |||
5361dada8b | |||
29029c137a | |||
fb018abeae | |||
d28c190816 | |||
0262de060f | |||
e25db1e0f6 | |||
8fdbe4d334 | |||
406a1485da | |||
6dc66b1c27 | |||
f2b1a1dd3b | |||
cb166a399d | |||
7108dd0111 | |||
2105754911 | |||
f3ba4cbfd3 | |||
e5f7085324 | |||
578481324b | |||
bf8ac9850d | |||
ab408b6412 | |||
4aa0a6f234 | |||
f9646e3386 | |||
3b612b960e | |||
b0ad4bead0 | |||
4b2e573715 | |||
12e6916ab2 | |||
1e76f63725 | |||
7aa5ad2eb9 | |||
fe331bb544 | |||
03ac85a967 | |||
96589ff659 | |||
e5b5809764 | |||
1407418c60 | |||
383b51c35d | |||
5b9123d128 | |||
2b1a1c21e4 | |||
ea080ca1c7 | |||
028fe58561 | |||
b6a41872d5 | |||
731dabd74d | |||
7950f19c2d | |||
b27e504bbd | |||
33106ba772 | |||
3ae0783d00 |
4
.gitignore
vendored
4
.gitignore
vendored
@ -143,3 +143,7 @@ dmypy.json
|
||||
cython_debug/
|
||||
|
||||
*.csv
|
||||
|
||||
local_scripts/
|
||||
|
||||
.vscode
|
||||
|
126
CHANGELOG.md
126
CHANGELOG.md
@ -2,6 +2,132 @@
|
||||
|
||||
All notable changes to this project will be documented in this file. See [standard-version](https://github.com/conventional-changelog/standard-version) for commit guidelines.
|
||||
|
||||
## [1.7.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.6.0...1.7.0) (2025-02-27)
|
||||
|
||||
|
||||
### Features
|
||||
|
||||
* adds configurable skip if file exists ([24c6e31](https://gitea.deepak.science:2222/physics/deepdog/commit/24c6e311c1d3067eb98cc60e6ca38d76373bf08e))
|
||||
|
||||
## [1.6.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.5.0...1.6.0) (2025-02-27)
|
||||
|
||||
|
||||
### Features
|
||||
|
||||
* Adds ability to parse bayesruns without timestamps ([46f6b6c](https://gitea.deepak.science:2222/physics/deepdog/commit/46f6b6cdf15c67aedf0c871d201b8db320bccbdf))
|
||||
* allows negative log magnitude strings in models ([c8435b4](https://gitea.deepak.science:2222/physics/deepdog/commit/c8435b4b2a6e4b89030f53b5734eb743e2003fb7))
|
||||
|
||||
## [1.5.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.4.0...1.5.0) (2024-12-30)
|
||||
|
||||
|
||||
### Features
|
||||
|
||||
* add configurable max number of dipoles to write ([a1b59cd](https://gitea.deepak.science:2222/physics/deepdog/commit/a1b59cd18b30359328a09210d9393f211aab30c2))
|
||||
* add configurable max number of dipoles to write ([53f8993](https://gitea.deepak.science:2222/physics/deepdog/commit/53f8993f2b155228fff5cbee84f10c62eb149a1f))
|
||||
|
||||
## [1.4.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.3.0...1.4.0) (2024-09-04)
|
||||
|
||||
|
||||
### Features
|
||||
|
||||
* add subset sim probs command for bayes for subset simulation results ([c881da2](https://gitea.deepak.science:2222/physics/deepdog/commit/c881da28370a1e51d062e1a7edaa62af6eb98d0a))
|
||||
* allows some betetr matching for single_dipole runs ([5425ce1](https://gitea.deepak.science:2222/physics/deepdog/commit/5425ce1362919af4cc4dbd5813df3be8d877b198))
|
||||
* indexifier now has len ([d962ecb](https://gitea.deepak.science:2222/physics/deepdog/commit/d962ecb11e929de1d9aa458b5d8e82270eff0039))
|
||||
|
||||
|
||||
### Bug Fixes
|
||||
|
||||
* update log file arg names in cli scripts ([6a5c593](https://gitea.deepak.science:2222/physics/deepdog/commit/6a5c5931d4fc849d0d6a0f2b971523a0f039d559))
|
||||
|
||||
## [1.3.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.2.1...1.3.0) (2024-05-20)
|
||||
|
||||
|
||||
### Features
|
||||
|
||||
* add multi run to wrap multi model and repeat runs ([92b49fc](https://gitea.deepak.science:2222/physics/deepdog/commit/92b49fce7c86f14484deb1c4aaaa810a6f69c08a))
|
||||
* adds a filter that works with cost functions ([8845b28](https://gitea.deepak.science:2222/physics/deepdog/commit/8845b2875f2c91c91dd3988fabda26400c59b2d7))
|
||||
* improve initial cost calculation to allow multiprocessing, adds ability to specify a number of levels to do with direct mc instead of subset simulation ([09fad2e](https://gitea.deepak.science:2222/physics/deepdog/commit/09fad2e1024d9237a6a4f7931f51cb4c84b83bf8))
|
||||
|
||||
|
||||
### Bug Fixes
|
||||
|
||||
* Adds ugly hack for stdevs for this uniform range to multiply by root3, proper fix would be in pdme ([b1c01b2](https://gitea.deepak.science:2222/physics/deepdog/commit/b1c01b25c8f2c3947be23f5b2c656c37437dab17))
|
||||
* fix seeding to avoid recreating seed combinations across multi runs ([24ac65b](https://gitea.deepak.science:2222/physics/deepdog/commit/24ac65bf9c74c454fec826ca9de640fe095f5a17))
|
||||
|
||||
### [1.2.1](https://gitea.deepak.science:2222/physics/deepdog/compare/1.2.0...1.2.1) (2024-05-12)
|
||||
|
||||
## [1.2.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.1.0...1.2.0) (2024-05-09)
|
||||
|
||||
|
||||
### Features
|
||||
|
||||
* adds additional matching regexes ([dc1d2d4](https://gitea.deepak.science:2222/physics/deepdog/commit/dc1d2d45a3e631c5efccce80f8a24fa87c6089e0))
|
||||
* adds magnitude enabled parsing option ([f0e2fa3](https://gitea.deepak.science:2222/physics/deepdog/commit/f0e2fa3da9f5a5136908d691137a904fda4e3a9a))
|
||||
|
||||
## [1.1.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.0.1...1.1.0) (2024-05-03)
|
||||
|
||||
|
||||
### Features
|
||||
|
||||
* allows disabling timestamps in directmc bayesrun files ([fb018ab](https://gitea.deepak.science:2222/physics/deepdog/commit/fb018abeae2adf4438a030140a6c905f11bb6bc1))
|
||||
* removes legacy bayes run, technically breaking but just don't use them ([5361dad](https://gitea.deepak.science:2222/physics/deepdog/commit/5361dada8be4950b5157862f6a92254b543889c3))
|
||||
|
||||
### [1.0.1](https://gitea.deepak.science:2222/physics/deepdog/compare/1.0.0...1.0.1) (2024-05-02)
|
||||
|
||||
|
||||
### Bug Fixes
|
||||
|
||||
* fixes issue of zero division error with no successes for anything ([e25db1e](https://gitea.deepak.science:2222/physics/deepdog/commit/e25db1e0f677e8d9a657fa1631305cc8f05ff9ff))
|
||||
|
||||
## [1.0.0](https://gitea.deepak.science:2222/physics/deepdog/compare/0.8.1...1.0.0) (2024-05-01)
|
||||
|
||||
|
||||
### ⚠ BREAKING CHANGES
|
||||
|
||||
* allows new seed spec instead of cli arg, removes old cli arg
|
||||
|
||||
### Features
|
||||
|
||||
* adds additional file slug parsing ([2105754](https://gitea.deepak.science:2222/physics/deepdog/commit/2105754911c89bde9dcbea9866462225604a3524))
|
||||
* Adds more powerful direct mc runs to sub for old real spectrum run ([f2b1a1d](https://gitea.deepak.science:2222/physics/deepdog/commit/f2b1a1dd3b3436e37d84f7843b9b2a202be4b51c))
|
||||
* allows new seed spec instead of cli arg, removes old cli arg ([7108dd0](https://gitea.deepak.science:2222/physics/deepdog/commit/7108dd0111c7dfd6ec204df1d0058530cd3dcab9))
|
||||
|
||||
|
||||
### Bug Fixes
|
||||
|
||||
* no longer throws error for overlapping keys, the warning should hopefully be enough? ([f3ba4cb](https://gitea.deepak.science:2222/physics/deepdog/commit/f3ba4cbfd36a9f08cdc4d8774a7f745f8c98bac3))
|
||||
|
||||
### [0.8.1](https://gitea.deepak.science:2222/physics/deepdog/compare/0.8.0...0.8.1) (2024-04-28)
|
||||
|
||||
### [0.8.1](https://gitea.deepak.science:2222/physics/deepdog/compare/0.8.0...0.8.1) (2024-04-28)
|
||||
|
||||
## [0.8.0](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.10...0.8.0) (2024-04-28)
|
||||
|
||||
|
||||
### ⚠ BREAKING CHANGES
|
||||
|
||||
* fixes the spin qubit frequency phase shift calculation which had an index problem
|
||||
|
||||
### Bug Fixes
|
||||
|
||||
* fixes the spin qubit frequency phase shift calculation which had an index problem ([f9646e3](https://gitea.deepak.science:2222/physics/deepdog/commit/f9646e33868e1a0da8ab663230c0c692ac25bb74))
|
||||
|
||||
### [0.7.10](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.9...0.7.10) (2024-04-28)
|
||||
|
||||
|
||||
### Features
|
||||
|
||||
* adds cli probs ([4b2e573](https://gitea.deepak.science:2222/physics/deepdog/commit/4b2e57371546731137b011461849bb849d4d4e0f))
|
||||
* better management of cli wrapper ([b0ad4be](https://gitea.deepak.science:2222/physics/deepdog/commit/b0ad4bead0d4762eb7f848f6e557f6d9b61200b9))
|
||||
|
||||
### [0.7.9](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.8...0.7.9) (2024-04-21)
|
||||
|
||||
|
||||
### Features
|
||||
|
||||
* adds ability to write custom dmc filters ([ea080ca](https://gitea.deepak.science:2222/physics/deepdog/commit/ea080ca1c7068042ce1e0a222d317f785a6b05f4))
|
||||
* adds tarucha phase calculation, using spin qubit precession rate noise ([3ae0783](https://gitea.deepak.science:2222/physics/deepdog/commit/3ae0783d00cbe6a76439c1d671f2cff621d8d0a8))
|
||||
|
||||
### [0.7.8](https://gitea.deepak.science:2222/physics/deepdog/compare/0.7.7...0.7.8) (2024-02-29)
|
||||
|
||||
|
||||
|
11
README.md
11
README.md
@ -5,7 +5,7 @@
|
||||
[](https://jenkins.deepak.science/job/gitea-physics/job/deepdog/job/master/)
|
||||

|
||||

|
||||

|
||||

|
||||
|
||||
The DiPole DiaGnostic tool.
|
||||
|
||||
@ -13,6 +13,13 @@ The DiPole DiaGnostic tool.
|
||||
|
||||
`poetry install` to start locally
|
||||
|
||||
Commit using [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/), and when commits are on master, release with `doo release`.
|
||||
Commit using [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/), and when commits are on master, release with `just release`.
|
||||
|
||||
In general `just --list` has some of the useful stuff for figuring out what development tools there are.
|
||||
|
||||
Poetry as an installer is good, even better is using Nix (maybe with direnv to automatically pick up the `devShell` from `flake.nix`).
|
||||
In either case `just` should handle actually calling things in a way that's agnostic to poetry as a runner or through nix.
|
||||
|
||||
### local scripts
|
||||
`local_scripts` folder allows for scripts to be run using this code, but that probably isn't the most auditable for actual usage.
|
||||
The API is still only something I'm using so there's no guarantees yet that it will be stable; overall semantic versioning should help with API breaks.
|
||||
|
@ -1,10 +1,7 @@
|
||||
import logging
|
||||
from deepdog.meta import __version__
|
||||
from deepdog.bayes_run import BayesRun
|
||||
from deepdog.bayes_run_simulpairs import BayesRunSimulPairs
|
||||
from deepdog.real_spectrum_run import RealSpectrumRun
|
||||
from deepdog.temp_aware_real_spectrum_run import TempAwareRealSpectrumRun
|
||||
from deepdog.bayes_run_with_ss import BayesRunWithSubspaceSimulation
|
||||
|
||||
|
||||
def get_version():
|
||||
@ -13,11 +10,8 @@ def get_version():
|
||||
|
||||
__all__ = [
|
||||
"get_version",
|
||||
"BayesRun",
|
||||
"BayesRunSimulPairs",
|
||||
"RealSpectrumRun",
|
||||
"TempAwareRealSpectrumRun",
|
||||
"BayesRunWithSubspaceSimulation",
|
||||
]
|
||||
|
||||
|
||||
|
@ -1,281 +0,0 @@
|
||||
import pdme.inputs
|
||||
import pdme.model
|
||||
import pdme.measurement.input_types
|
||||
import pdme.measurement.oscillating_dipole
|
||||
import pdme.util.fast_v_calc
|
||||
import pdme.util.fast_nonlocal_spectrum
|
||||
from typing import Sequence, Tuple, List
|
||||
import datetime
|
||||
import csv
|
||||
import multiprocessing
|
||||
import logging
|
||||
import numpy
|
||||
|
||||
|
||||
# TODO: remove hardcode
|
||||
CHUNKSIZE = 50
|
||||
|
||||
# TODO: It's garbage to have this here duplicated from pdme.
|
||||
DotInput = Tuple[numpy.typing.ArrayLike, float]
|
||||
|
||||
|
||||
_logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def get_a_result(input) -> int:
|
||||
model, dot_inputs, lows, highs, monte_carlo_count, max_frequency, seed = input
|
||||
|
||||
rng = numpy.random.default_rng(seed)
|
||||
sample_dipoles = model.get_monte_carlo_dipole_inputs(
|
||||
monte_carlo_count, max_frequency, rng_to_use=rng
|
||||
)
|
||||
vals = pdme.util.fast_v_calc.fast_vs_for_dipoleses(dot_inputs, sample_dipoles)
|
||||
return numpy.count_nonzero(pdme.util.fast_v_calc.between(vals, lows, highs))
|
||||
|
||||
|
||||
def get_a_result_using_pairs(input) -> int:
|
||||
(
|
||||
model,
|
||||
dot_inputs,
|
||||
pair_inputs,
|
||||
local_lows,
|
||||
local_highs,
|
||||
nonlocal_lows,
|
||||
nonlocal_highs,
|
||||
monte_carlo_count,
|
||||
max_frequency,
|
||||
) = input
|
||||
sample_dipoles = model.get_n_single_dipoles(monte_carlo_count, max_frequency)
|
||||
local_vals = pdme.util.fast_v_calc.fast_vs_for_dipoles(dot_inputs, sample_dipoles)
|
||||
local_matches = pdme.util.fast_v_calc.between(local_vals, local_lows, local_highs)
|
||||
nonlocal_vals = pdme.util.fast_nonlocal_spectrum.fast_s_nonlocal(
|
||||
pair_inputs, sample_dipoles
|
||||
)
|
||||
nonlocal_matches = pdme.util.fast_v_calc.between(
|
||||
nonlocal_vals, nonlocal_lows, nonlocal_highs
|
||||
)
|
||||
combined_matches = numpy.logical_and(local_matches, nonlocal_matches)
|
||||
return numpy.count_nonzero(combined_matches)
|
||||
|
||||
|
||||
class BayesRun:
|
||||
"""
|
||||
A single Bayes run for a given set of dots.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
dot_inputs : Sequence[DotInput]
|
||||
The dot inputs for this bayes run.
|
||||
|
||||
models_with_names : Sequence[Tuple(str, pdme.model.DipoleModel)]
|
||||
The models to evaluate.
|
||||
|
||||
actual_model : pdme.model.DipoleModel
|
||||
The model which is actually correct.
|
||||
|
||||
filename_slug : str
|
||||
The filename slug to include.
|
||||
|
||||
run_count: int
|
||||
The number of runs to do.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
dot_positions: Sequence[numpy.typing.ArrayLike],
|
||||
frequency_range: Sequence[float],
|
||||
models_with_names: Sequence[Tuple[str, pdme.model.DipoleModel]],
|
||||
actual_model: pdme.model.DipoleModel,
|
||||
filename_slug: str,
|
||||
run_count: int = 100,
|
||||
low_error: float = 0.9,
|
||||
high_error: float = 1.1,
|
||||
monte_carlo_count: int = 10000,
|
||||
monte_carlo_cycles: int = 10,
|
||||
target_success: int = 100,
|
||||
max_monte_carlo_cycles_steps: int = 10,
|
||||
max_frequency: float = 20,
|
||||
end_threshold: float = None,
|
||||
chunksize: int = CHUNKSIZE,
|
||||
) -> None:
|
||||
self.dot_inputs = pdme.inputs.inputs_with_frequency_range(
|
||||
dot_positions, frequency_range
|
||||
)
|
||||
self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
|
||||
self.dot_inputs
|
||||
)
|
||||
|
||||
self.models = [model for (_, model) in models_with_names]
|
||||
self.model_names = [name for (name, _) in models_with_names]
|
||||
self.actual_model = actual_model
|
||||
|
||||
self.n: int
|
||||
try:
|
||||
self.n = self.actual_model.n # type: ignore
|
||||
except AttributeError:
|
||||
self.n = 1
|
||||
|
||||
self.model_count = len(self.models)
|
||||
self.monte_carlo_count = monte_carlo_count
|
||||
self.monte_carlo_cycles = monte_carlo_cycles
|
||||
self.target_success = target_success
|
||||
self.max_monte_carlo_cycles_steps = max_monte_carlo_cycles_steps
|
||||
self.run_count = run_count
|
||||
self.low_error = low_error
|
||||
self.high_error = high_error
|
||||
|
||||
self.csv_fields = []
|
||||
for i in range(self.n):
|
||||
self.csv_fields.extend(
|
||||
[
|
||||
f"dipole_moment_{i+1}",
|
||||
f"dipole_location_{i+1}",
|
||||
f"dipole_frequency_{i+1}",
|
||||
]
|
||||
)
|
||||
self.compensate_zeros = True
|
||||
self.chunksize = chunksize
|
||||
for name in self.model_names:
|
||||
self.csv_fields.extend([f"{name}_success", f"{name}_count", f"{name}_prob"])
|
||||
|
||||
self.probabilities = [1 / self.model_count] * self.model_count
|
||||
|
||||
timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
|
||||
self.filename = f"{timestamp}-{filename_slug}.bayesrun.csv"
|
||||
self.max_frequency = max_frequency
|
||||
|
||||
if end_threshold is not None:
|
||||
if 0 < end_threshold < 1:
|
||||
self.end_threshold: float = end_threshold
|
||||
self.use_end_threshold = True
|
||||
_logger.info(f"Will abort early, at {self.end_threshold}.")
|
||||
else:
|
||||
raise ValueError(
|
||||
f"end_threshold should be between 0 and 1, but is actually {end_threshold}"
|
||||
)
|
||||
|
||||
def go(self) -> None:
|
||||
with open(self.filename, "a", newline="") as outfile:
|
||||
writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
|
||||
writer.writeheader()
|
||||
|
||||
for run in range(1, self.run_count + 1):
|
||||
|
||||
# Generate the actual dipoles
|
||||
actual_dipoles = self.actual_model.get_dipoles(self.max_frequency)
|
||||
|
||||
dots = actual_dipoles.get_percent_range_dot_measurements(
|
||||
self.dot_inputs, self.low_error, self.high_error
|
||||
)
|
||||
(
|
||||
lows,
|
||||
highs,
|
||||
) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
|
||||
dots
|
||||
)
|
||||
|
||||
_logger.info(f"Going to work on dipole at {actual_dipoles.dipoles}")
|
||||
|
||||
# define a new seed sequence for each run
|
||||
seed_sequence = numpy.random.SeedSequence(run)
|
||||
|
||||
results = []
|
||||
_logger.debug("Going to iterate over models now")
|
||||
for model_count, model in enumerate(self.models):
|
||||
_logger.debug(f"Doing model #{model_count}")
|
||||
core_count = multiprocessing.cpu_count() - 1 or 1
|
||||
with multiprocessing.Pool(core_count) as pool:
|
||||
cycle_count = 0
|
||||
cycle_success = 0
|
||||
cycles = 0
|
||||
while (cycles < self.max_monte_carlo_cycles_steps) and (
|
||||
cycle_success <= self.target_success
|
||||
):
|
||||
_logger.debug(f"Starting cycle {cycles}")
|
||||
cycles += 1
|
||||
current_success = 0
|
||||
cycle_count += self.monte_carlo_count * self.monte_carlo_cycles
|
||||
|
||||
# generate a seed from the sequence for each core.
|
||||
# note this needs to be inside the loop for monte carlo cycle steps!
|
||||
# that way we get more stuff.
|
||||
seeds = seed_sequence.spawn(self.monte_carlo_cycles)
|
||||
|
||||
current_success = sum(
|
||||
pool.imap_unordered(
|
||||
get_a_result,
|
||||
[
|
||||
(
|
||||
model,
|
||||
self.dot_inputs_array,
|
||||
lows,
|
||||
highs,
|
||||
self.monte_carlo_count,
|
||||
self.max_frequency,
|
||||
seed,
|
||||
)
|
||||
for seed in seeds
|
||||
],
|
||||
self.chunksize,
|
||||
)
|
||||
)
|
||||
|
||||
cycle_success += current_success
|
||||
_logger.debug(f"current running successes: {cycle_success}")
|
||||
results.append((cycle_count, cycle_success))
|
||||
|
||||
_logger.debug("Done, constructing output now")
|
||||
row = {
|
||||
"dipole_moment_1": actual_dipoles.dipoles[0].p,
|
||||
"dipole_location_1": actual_dipoles.dipoles[0].s,
|
||||
"dipole_frequency_1": actual_dipoles.dipoles[0].w,
|
||||
}
|
||||
for i in range(1, self.n):
|
||||
try:
|
||||
current_dipoles = actual_dipoles.dipoles[i]
|
||||
row[f"dipole_moment_{i+1}"] = current_dipoles.p
|
||||
row[f"dipole_location_{i+1}"] = current_dipoles.s
|
||||
row[f"dipole_frequency_{i+1}"] = current_dipoles.w
|
||||
except IndexError:
|
||||
_logger.info(f"Not writing anymore, saw end after {i}")
|
||||
break
|
||||
|
||||
successes: List[float] = []
|
||||
counts: List[int] = []
|
||||
for model_index, (name, (count, result)) in enumerate(
|
||||
zip(self.model_names, results)
|
||||
):
|
||||
|
||||
row[f"{name}_success"] = result
|
||||
row[f"{name}_count"] = count
|
||||
successes.append(max(result, 0.5))
|
||||
counts.append(count)
|
||||
|
||||
success_weight = sum(
|
||||
[
|
||||
(succ / count) * prob
|
||||
for succ, count, prob in zip(successes, counts, self.probabilities)
|
||||
]
|
||||
)
|
||||
new_probabilities = [
|
||||
(succ / count) * old_prob / success_weight
|
||||
for succ, count, old_prob in zip(successes, counts, self.probabilities)
|
||||
]
|
||||
self.probabilities = new_probabilities
|
||||
for name, probability in zip(self.model_names, self.probabilities):
|
||||
row[f"{name}_prob"] = probability
|
||||
_logger.info(row)
|
||||
|
||||
with open(self.filename, "a", newline="") as outfile:
|
||||
writer = csv.DictWriter(
|
||||
outfile, fieldnames=self.csv_fields, dialect="unix"
|
||||
)
|
||||
writer.writerow(row)
|
||||
|
||||
if self.use_end_threshold:
|
||||
max_prob = max(self.probabilities)
|
||||
if max_prob > self.end_threshold:
|
||||
_logger.info(
|
||||
f"Aborting early, because {max_prob} is greater than {self.end_threshold}"
|
||||
)
|
||||
break
|
@ -1,382 +0,0 @@
|
||||
import pdme.inputs
|
||||
import pdme.model
|
||||
import pdme.measurement.input_types
|
||||
import pdme.measurement.oscillating_dipole
|
||||
import pdme.util.fast_v_calc
|
||||
import pdme.util.fast_nonlocal_spectrum
|
||||
from typing import Sequence, Tuple, List
|
||||
import datetime
|
||||
import csv
|
||||
import multiprocessing
|
||||
import logging
|
||||
import numpy
|
||||
import numpy.random
|
||||
|
||||
|
||||
# TODO: remove hardcode
|
||||
CHUNKSIZE = 50
|
||||
|
||||
# TODO: It's garbage to have this here duplicated from pdme.
|
||||
DotInput = Tuple[numpy.typing.ArrayLike, float]
|
||||
|
||||
|
||||
_logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def get_a_simul_result_using_pairs(input) -> numpy.ndarray:
|
||||
(
|
||||
model,
|
||||
dot_inputs,
|
||||
pair_inputs,
|
||||
local_lows,
|
||||
local_highs,
|
||||
nonlocal_lows,
|
||||
nonlocal_highs,
|
||||
monte_carlo_count,
|
||||
monte_carlo_cycles,
|
||||
max_frequency,
|
||||
seed,
|
||||
) = input
|
||||
|
||||
rng = numpy.random.default_rng(seed)
|
||||
local_total = 0
|
||||
combined_total = 0
|
||||
|
||||
sample_dipoles = model.get_monte_carlo_dipole_inputs(
|
||||
monte_carlo_count, max_frequency, rng_to_use=rng
|
||||
)
|
||||
local_vals = pdme.util.fast_v_calc.fast_vs_for_dipoleses(dot_inputs, sample_dipoles)
|
||||
local_matches = pdme.util.fast_v_calc.between(local_vals, local_lows, local_highs)
|
||||
nonlocal_vals = pdme.util.fast_nonlocal_spectrum.fast_s_nonlocal_dipoleses(
|
||||
pair_inputs, sample_dipoles
|
||||
)
|
||||
nonlocal_matches = pdme.util.fast_v_calc.between(
|
||||
nonlocal_vals, nonlocal_lows, nonlocal_highs
|
||||
)
|
||||
combined_matches = numpy.logical_and(local_matches, nonlocal_matches)
|
||||
|
||||
local_total += numpy.count_nonzero(local_matches)
|
||||
combined_total += numpy.count_nonzero(combined_matches)
|
||||
return numpy.array([local_total, combined_total])
|
||||
|
||||
|
||||
class BayesRunSimulPairs:
|
||||
"""
|
||||
A dual pairs-nonpairs Bayes run for a given set of dots.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
dot_inputs : Sequence[DotInput]
|
||||
The dot inputs for this bayes run.
|
||||
|
||||
models_with_names : Sequence[Tuple(str, pdme.model.DipoleModel)]
|
||||
The models to evaluate.
|
||||
|
||||
actual_model : pdme.model.DipoleModel
|
||||
The modoel for the model which is actually correct.
|
||||
|
||||
filename_slug : str
|
||||
The filename slug to include.
|
||||
|
||||
run_count: int
|
||||
The number of runs to do.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
dot_positions: Sequence[numpy.typing.ArrayLike],
|
||||
frequency_range: Sequence[float],
|
||||
models_with_names: Sequence[Tuple[str, pdme.model.DipoleModel]],
|
||||
actual_model: pdme.model.DipoleModel,
|
||||
filename_slug: str,
|
||||
run_count: int = 100,
|
||||
low_error: float = 0.9,
|
||||
high_error: float = 1.1,
|
||||
pairs_high_error=None,
|
||||
pairs_low_error=None,
|
||||
monte_carlo_count: int = 10000,
|
||||
monte_carlo_cycles: int = 10,
|
||||
target_success: int = 100,
|
||||
max_monte_carlo_cycles_steps: int = 10,
|
||||
max_frequency: float = 20,
|
||||
end_threshold: float = None,
|
||||
chunksize: int = CHUNKSIZE,
|
||||
) -> None:
|
||||
self.dot_inputs = pdme.inputs.inputs_with_frequency_range(
|
||||
dot_positions, frequency_range
|
||||
)
|
||||
self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
|
||||
self.dot_inputs
|
||||
)
|
||||
|
||||
self.dot_pair_inputs = pdme.inputs.input_pairs_with_frequency_range(
|
||||
dot_positions, frequency_range
|
||||
)
|
||||
self.dot_pair_inputs_array = (
|
||||
pdme.measurement.input_types.dot_pair_inputs_to_array(self.dot_pair_inputs)
|
||||
)
|
||||
|
||||
self.models = [mod for (_, mod) in models_with_names]
|
||||
self.model_names = [name for (name, _) in models_with_names]
|
||||
self.actual_model = actual_model
|
||||
|
||||
self.n: int
|
||||
try:
|
||||
self.n = self.actual_model.n # type: ignore
|
||||
except AttributeError:
|
||||
self.n = 1
|
||||
|
||||
self.model_count = len(self.models)
|
||||
self.monte_carlo_count = monte_carlo_count
|
||||
self.monte_carlo_cycles = monte_carlo_cycles
|
||||
self.target_success = target_success
|
||||
self.max_monte_carlo_cycles_steps = max_monte_carlo_cycles_steps
|
||||
self.run_count = run_count
|
||||
self.low_error = low_error
|
||||
self.high_error = high_error
|
||||
if pairs_low_error is None:
|
||||
self.pairs_low_error = self.low_error
|
||||
else:
|
||||
self.pairs_low_error = pairs_low_error
|
||||
if pairs_high_error is None:
|
||||
self.pairs_high_error = self.high_error
|
||||
else:
|
||||
self.pairs_high_error = pairs_high_error
|
||||
|
||||
self.csv_fields = []
|
||||
for i in range(self.n):
|
||||
self.csv_fields.extend(
|
||||
[
|
||||
f"dipole_moment_{i+1}",
|
||||
f"dipole_location_{i+1}",
|
||||
f"dipole_frequency_{i+1}",
|
||||
]
|
||||
)
|
||||
self.compensate_zeros = True
|
||||
self.chunksize = chunksize
|
||||
for name in self.model_names:
|
||||
self.csv_fields.extend([f"{name}_success", f"{name}_count", f"{name}_prob"])
|
||||
|
||||
self.probabilities_no_pairs = [1 / self.model_count] * self.model_count
|
||||
self.probabilities_pairs = [1 / self.model_count] * self.model_count
|
||||
|
||||
timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
|
||||
self.filename_pairs = f"{timestamp}-{filename_slug}.simulpairs.yespairs.csv"
|
||||
self.filename_no_pairs = f"{timestamp}-{filename_slug}.simulpairs.noopairs.csv"
|
||||
|
||||
self.max_frequency = max_frequency
|
||||
|
||||
if end_threshold is not None:
|
||||
if 0 < end_threshold < 1:
|
||||
self.end_threshold: float = end_threshold
|
||||
self.use_end_threshold = True
|
||||
_logger.info(f"Will abort early, at {self.end_threshold}.")
|
||||
else:
|
||||
raise ValueError(
|
||||
f"end_threshold should be between 0 and 1, but is actually {end_threshold}"
|
||||
)
|
||||
|
||||
def go(self) -> None:
|
||||
with open(self.filename_pairs, "a", newline="") as outfile:
|
||||
writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
|
||||
writer.writeheader()
|
||||
with open(self.filename_no_pairs, "a", newline="") as outfile:
|
||||
writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
|
||||
writer.writeheader()
|
||||
|
||||
for run in range(1, self.run_count + 1):
|
||||
|
||||
# Generate the actual dipoles
|
||||
actual_dipoles = self.actual_model.get_dipoles(self.max_frequency)
|
||||
|
||||
dots = actual_dipoles.get_percent_range_dot_measurements(
|
||||
self.dot_inputs, self.low_error, self.high_error
|
||||
)
|
||||
(
|
||||
lows,
|
||||
highs,
|
||||
) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
|
||||
dots
|
||||
)
|
||||
|
||||
pair_lows, pair_highs = (None, None)
|
||||
pair_measurements = actual_dipoles.get_percent_range_dot_pair_measurements(
|
||||
self.dot_pair_inputs, self.pairs_low_error, self.pairs_high_error
|
||||
)
|
||||
(
|
||||
pair_lows,
|
||||
pair_highs,
|
||||
) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
|
||||
pair_measurements
|
||||
)
|
||||
|
||||
_logger.info(f"Going to work on dipole at {actual_dipoles.dipoles}")
|
||||
|
||||
# define a new seed sequence for each run
|
||||
seed_sequence = numpy.random.SeedSequence(run)
|
||||
|
||||
results_pairs = []
|
||||
results_no_pairs = []
|
||||
_logger.debug("Going to iterate over models now")
|
||||
for model_count, model in enumerate(self.models):
|
||||
_logger.debug(f"Doing model #{model_count}")
|
||||
|
||||
core_count = multiprocessing.cpu_count() - 1 or 1
|
||||
with multiprocessing.Pool(core_count) as pool:
|
||||
cycle_count = 0
|
||||
cycle_success_pairs = 0
|
||||
cycle_success_no_pairs = 0
|
||||
cycles = 0
|
||||
while (cycles < self.max_monte_carlo_cycles_steps) and (
|
||||
min(cycle_success_pairs, cycle_success_no_pairs)
|
||||
<= self.target_success
|
||||
):
|
||||
_logger.debug(f"Starting cycle {cycles}")
|
||||
|
||||
cycles += 1
|
||||
current_success_pairs = 0
|
||||
current_success_no_pairs = 0
|
||||
cycle_count += self.monte_carlo_count * self.monte_carlo_cycles
|
||||
|
||||
# generate a seed from the sequence for each core.
|
||||
# note this needs to be inside the loop for monte carlo cycle steps!
|
||||
# that way we get more stuff.
|
||||
|
||||
seeds = seed_sequence.spawn(self.monte_carlo_cycles)
|
||||
_logger.debug(f"Creating {self.monte_carlo_cycles} seeds")
|
||||
current_success_both = numpy.array(
|
||||
sum(
|
||||
pool.imap_unordered(
|
||||
get_a_simul_result_using_pairs,
|
||||
[
|
||||
(
|
||||
model,
|
||||
self.dot_inputs_array,
|
||||
self.dot_pair_inputs_array,
|
||||
lows,
|
||||
highs,
|
||||
pair_lows,
|
||||
pair_highs,
|
||||
self.monte_carlo_count,
|
||||
self.monte_carlo_cycles,
|
||||
self.max_frequency,
|
||||
seed,
|
||||
)
|
||||
for seed in seeds
|
||||
],
|
||||
self.chunksize,
|
||||
)
|
||||
)
|
||||
)
|
||||
current_success_no_pairs = current_success_both[0]
|
||||
current_success_pairs = current_success_both[1]
|
||||
|
||||
cycle_success_no_pairs += current_success_no_pairs
|
||||
cycle_success_pairs += current_success_pairs
|
||||
_logger.debug(
|
||||
f"(pair, no_pair) successes are {(cycle_success_pairs, cycle_success_no_pairs)}"
|
||||
)
|
||||
results_pairs.append((cycle_count, cycle_success_pairs))
|
||||
results_no_pairs.append((cycle_count, cycle_success_no_pairs))
|
||||
|
||||
_logger.debug("Done, constructing output now")
|
||||
row_pairs = {
|
||||
"dipole_moment_1": actual_dipoles.dipoles[0].p,
|
||||
"dipole_location_1": actual_dipoles.dipoles[0].s,
|
||||
"dipole_frequency_1": actual_dipoles.dipoles[0].w,
|
||||
}
|
||||
row_no_pairs = {
|
||||
"dipole_moment_1": actual_dipoles.dipoles[0].p,
|
||||
"dipole_location_1": actual_dipoles.dipoles[0].s,
|
||||
"dipole_frequency_1": actual_dipoles.dipoles[0].w,
|
||||
}
|
||||
for i in range(1, self.n):
|
||||
try:
|
||||
current_dipoles = actual_dipoles.dipoles[i]
|
||||
row_pairs[f"dipole_moment_{i+1}"] = current_dipoles.p
|
||||
row_pairs[f"dipole_location_{i+1}"] = current_dipoles.s
|
||||
row_pairs[f"dipole_frequency_{i+1}"] = current_dipoles.w
|
||||
row_no_pairs[f"dipole_moment_{i+1}"] = current_dipoles.p
|
||||
row_no_pairs[f"dipole_location_{i+1}"] = current_dipoles.s
|
||||
row_no_pairs[f"dipole_frequency_{i+1}"] = current_dipoles.w
|
||||
except IndexError:
|
||||
_logger.info(f"Not writing anymore, saw end after {i}")
|
||||
break
|
||||
|
||||
successes_pairs: List[float] = []
|
||||
successes_no_pairs: List[float] = []
|
||||
counts: List[int] = []
|
||||
for model_index, (
|
||||
name,
|
||||
(count_pair, result_pair),
|
||||
(count_no_pair, result_no_pair),
|
||||
) in enumerate(zip(self.model_names, results_pairs, results_no_pairs)):
|
||||
|
||||
row_pairs[f"{name}_success"] = result_pair
|
||||
row_pairs[f"{name}_count"] = count_pair
|
||||
successes_pairs.append(max(result_pair, 0.5))
|
||||
|
||||
row_no_pairs[f"{name}_success"] = result_no_pair
|
||||
row_no_pairs[f"{name}_count"] = count_no_pair
|
||||
successes_no_pairs.append(max(result_no_pair, 0.5))
|
||||
|
||||
counts.append(count_pair)
|
||||
|
||||
success_weight_pair = sum(
|
||||
[
|
||||
(succ / count) * prob
|
||||
for succ, count, prob in zip(
|
||||
successes_pairs, counts, self.probabilities_pairs
|
||||
)
|
||||
]
|
||||
)
|
||||
success_weight_no_pair = sum(
|
||||
[
|
||||
(succ / count) * prob
|
||||
for succ, count, prob in zip(
|
||||
successes_no_pairs, counts, self.probabilities_no_pairs
|
||||
)
|
||||
]
|
||||
)
|
||||
new_probabilities_pair = [
|
||||
(succ / count) * old_prob / success_weight_pair
|
||||
for succ, count, old_prob in zip(
|
||||
successes_pairs, counts, self.probabilities_pairs
|
||||
)
|
||||
]
|
||||
new_probabilities_no_pair = [
|
||||
(succ / count) * old_prob / success_weight_no_pair
|
||||
for succ, count, old_prob in zip(
|
||||
successes_no_pairs, counts, self.probabilities_no_pairs
|
||||
)
|
||||
]
|
||||
self.probabilities_pairs = new_probabilities_pair
|
||||
self.probabilities_no_pairs = new_probabilities_no_pair
|
||||
for name, probability_pair, probability_no_pair in zip(
|
||||
self.model_names, self.probabilities_pairs, self.probabilities_no_pairs
|
||||
):
|
||||
row_pairs[f"{name}_prob"] = probability_pair
|
||||
row_no_pairs[f"{name}_prob"] = probability_no_pair
|
||||
_logger.debug(row_pairs)
|
||||
_logger.debug(row_no_pairs)
|
||||
|
||||
with open(self.filename_pairs, "a", newline="") as outfile:
|
||||
writer = csv.DictWriter(
|
||||
outfile, fieldnames=self.csv_fields, dialect="unix"
|
||||
)
|
||||
writer.writerow(row_pairs)
|
||||
with open(self.filename_no_pairs, "a", newline="") as outfile:
|
||||
writer = csv.DictWriter(
|
||||
outfile, fieldnames=self.csv_fields, dialect="unix"
|
||||
)
|
||||
writer.writerow(row_no_pairs)
|
||||
|
||||
if self.use_end_threshold:
|
||||
max_prob = min(
|
||||
max(self.probabilities_pairs), max(self.probabilities_no_pairs)
|
||||
)
|
||||
if max_prob > self.end_threshold:
|
||||
_logger.info(
|
||||
f"Aborting early, because {max_prob} is greater than {self.end_threshold}"
|
||||
)
|
||||
break
|
@ -1,261 +0,0 @@
|
||||
import deepdog.subset_simulation
|
||||
import pdme.inputs
|
||||
import pdme.model
|
||||
import pdme.measurement.input_types
|
||||
import pdme.measurement.oscillating_dipole
|
||||
import pdme.util.fast_v_calc
|
||||
import pdme.util.fast_nonlocal_spectrum
|
||||
from typing import Sequence, Tuple, List, Optional
|
||||
import datetime
|
||||
import csv
|
||||
import logging
|
||||
import numpy
|
||||
import numpy.typing
|
||||
|
||||
|
||||
# TODO: remove hardcode
|
||||
CHUNKSIZE = 50
|
||||
|
||||
# TODO: It's garbage to have this here duplicated from pdme.
|
||||
DotInput = Tuple[numpy.typing.ArrayLike, float]
|
||||
|
||||
|
||||
CLAMPING_FACTOR = 10
|
||||
|
||||
_logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class BayesRunWithSubspaceSimulation:
|
||||
"""
|
||||
A single Bayes run for a given set of dots.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
dot_inputs : Sequence[DotInput]
|
||||
The dot inputs for this bayes run.
|
||||
|
||||
models_with_names : Sequence[Tuple(str, pdme.model.DipoleModel)]
|
||||
The models to evaluate.
|
||||
|
||||
actual_model : pdme.model.DipoleModel
|
||||
The model which is actually correct.
|
||||
|
||||
filename_slug : str
|
||||
The filename slug to include.
|
||||
|
||||
run_count: int
|
||||
The number of runs to do.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
dot_positions: Sequence[numpy.typing.ArrayLike],
|
||||
frequency_range: Sequence[float],
|
||||
models_with_names: Sequence[Tuple[str, pdme.model.DipoleModel]],
|
||||
actual_model: pdme.model.DipoleModel,
|
||||
filename_slug: str,
|
||||
max_frequency: float = 20,
|
||||
end_threshold: float = None,
|
||||
run_count=100,
|
||||
chunksize: int = CHUNKSIZE,
|
||||
ss_n_c: int = 500,
|
||||
ss_n_s: int = 100,
|
||||
ss_m_max: int = 15,
|
||||
ss_target_cost: Optional[float] = None,
|
||||
ss_level_0_seed: int = 200,
|
||||
ss_mcmc_seed: int = 20,
|
||||
ss_use_adaptive_steps=True,
|
||||
ss_default_phi_step=0.01,
|
||||
ss_default_theta_step=0.01,
|
||||
ss_default_r_step=0.01,
|
||||
ss_default_w_log_step=0.01,
|
||||
ss_default_upper_w_log_step=4,
|
||||
ss_dump_last_generation=False,
|
||||
ss_initial_costs_chunk_size=100,
|
||||
write_output_to_bayesruncsv=True,
|
||||
use_timestamp_for_output=True,
|
||||
) -> None:
|
||||
self.dot_inputs = pdme.inputs.inputs_with_frequency_range(
|
||||
dot_positions, frequency_range
|
||||
)
|
||||
self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
|
||||
self.dot_inputs
|
||||
)
|
||||
|
||||
self.models_with_names = models_with_names
|
||||
self.models = [model for (_, model) in models_with_names]
|
||||
self.model_names = [name for (name, _) in models_with_names]
|
||||
self.actual_model = actual_model
|
||||
|
||||
self.n: int
|
||||
try:
|
||||
self.n = self.actual_model.n # type: ignore
|
||||
except AttributeError:
|
||||
self.n = 1
|
||||
|
||||
self.model_count = len(self.models)
|
||||
|
||||
self.csv_fields = []
|
||||
for i in range(self.n):
|
||||
self.csv_fields.extend(
|
||||
[
|
||||
f"dipole_moment_{i+1}",
|
||||
f"dipole_location_{i+1}",
|
||||
f"dipole_frequency_{i+1}",
|
||||
]
|
||||
)
|
||||
self.compensate_zeros = True
|
||||
self.chunksize = chunksize
|
||||
for name in self.model_names:
|
||||
self.csv_fields.extend([f"{name}_likelihood", f"{name}_prob"])
|
||||
|
||||
self.probabilities = [1 / self.model_count] * self.model_count
|
||||
|
||||
if use_timestamp_for_output:
|
||||
timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
|
||||
self.filename = f"{timestamp}-{filename_slug}.bayesrunwithss.csv"
|
||||
else:
|
||||
self.filename = f"{filename_slug}.bayesrunwithss.csv"
|
||||
self.max_frequency = max_frequency
|
||||
|
||||
if end_threshold is not None:
|
||||
if 0 < end_threshold < 1:
|
||||
self.end_threshold: float = end_threshold
|
||||
self.use_end_threshold = True
|
||||
_logger.info(f"Will abort early, at {self.end_threshold}.")
|
||||
else:
|
||||
raise ValueError(
|
||||
f"end_threshold should be between 0 and 1, but is actually {end_threshold}"
|
||||
)
|
||||
|
||||
self.ss_n_c = ss_n_c
|
||||
self.ss_n_s = ss_n_s
|
||||
self.ss_m_max = ss_m_max
|
||||
self.ss_target_cost = ss_target_cost
|
||||
self.ss_level_0_seed = ss_level_0_seed
|
||||
self.ss_mcmc_seed = ss_mcmc_seed
|
||||
self.ss_use_adaptive_steps = ss_use_adaptive_steps
|
||||
self.ss_default_phi_step = ss_default_phi_step
|
||||
self.ss_default_theta_step = ss_default_theta_step
|
||||
self.ss_default_r_step = ss_default_r_step
|
||||
self.ss_default_w_log_step = ss_default_w_log_step
|
||||
self.ss_default_upper_w_log_step = ss_default_upper_w_log_step
|
||||
self.ss_dump_last_generation = ss_dump_last_generation
|
||||
self.ss_initial_costs_chunk_size = ss_initial_costs_chunk_size
|
||||
self.run_count = run_count
|
||||
|
||||
self.write_output_to_csv = write_output_to_bayesruncsv
|
||||
|
||||
def go(self) -> Sequence:
|
||||
|
||||
if self.write_output_to_csv:
|
||||
with open(self.filename, "a", newline="") as outfile:
|
||||
writer = csv.DictWriter(
|
||||
outfile, fieldnames=self.csv_fields, dialect="unix"
|
||||
)
|
||||
writer.writeheader()
|
||||
|
||||
return_result = []
|
||||
|
||||
for run in range(1, self.run_count + 1):
|
||||
|
||||
# Generate the actual dipoles
|
||||
actual_dipoles = self.actual_model.get_dipoles(self.max_frequency)
|
||||
|
||||
measurements = actual_dipoles.get_dot_measurements(self.dot_inputs)
|
||||
|
||||
_logger.info(f"Going to work on dipole at {actual_dipoles.dipoles}")
|
||||
|
||||
# define a new seed sequence for each run
|
||||
|
||||
results = []
|
||||
_logger.debug("Going to iterate over models now")
|
||||
for model_count, model in enumerate(self.models_with_names):
|
||||
_logger.debug(f"Doing model #{model_count}, {model[0]}")
|
||||
subset_run = deepdog.subset_simulation.SubsetSimulation(
|
||||
model,
|
||||
self.dot_inputs,
|
||||
measurements,
|
||||
self.ss_n_c,
|
||||
self.ss_n_s,
|
||||
self.ss_m_max,
|
||||
self.ss_target_cost,
|
||||
self.ss_level_0_seed,
|
||||
self.ss_mcmc_seed,
|
||||
self.ss_use_adaptive_steps,
|
||||
self.ss_default_phi_step,
|
||||
self.ss_default_theta_step,
|
||||
self.ss_default_r_step,
|
||||
self.ss_default_w_log_step,
|
||||
self.ss_default_upper_w_log_step,
|
||||
initial_cost_chunk_size=self.ss_initial_costs_chunk_size,
|
||||
keep_probs_list=False,
|
||||
dump_last_generation_to_file=self.ss_dump_last_generation,
|
||||
)
|
||||
results.append(subset_run.execute())
|
||||
|
||||
_logger.debug("Done, constructing output now")
|
||||
row = {
|
||||
"dipole_moment_1": actual_dipoles.dipoles[0].p,
|
||||
"dipole_location_1": actual_dipoles.dipoles[0].s,
|
||||
"dipole_frequency_1": actual_dipoles.dipoles[0].w,
|
||||
}
|
||||
for i in range(1, self.n):
|
||||
try:
|
||||
current_dipoles = actual_dipoles.dipoles[i]
|
||||
row[f"dipole_moment_{i+1}"] = current_dipoles.p
|
||||
row[f"dipole_location_{i+1}"] = current_dipoles.s
|
||||
row[f"dipole_frequency_{i+1}"] = current_dipoles.w
|
||||
except IndexError:
|
||||
_logger.info(f"Not writing anymore, saw end after {i}")
|
||||
break
|
||||
|
||||
likelihoods: List[float] = []
|
||||
|
||||
for (name, result) in zip(self.model_names, results):
|
||||
if result.over_target_likelihood is None:
|
||||
if result.lowest_likelihood is None:
|
||||
_logger.error(f"result {result} looks bad")
|
||||
clamped_likelihood = 10**-15
|
||||
else:
|
||||
clamped_likelihood = result.lowest_likelihood / CLAMPING_FACTOR
|
||||
_logger.warning(
|
||||
f"got a none result, clamping to {clamped_likelihood}"
|
||||
)
|
||||
else:
|
||||
clamped_likelihood = result.over_target_likelihood
|
||||
likelihoods.append(clamped_likelihood)
|
||||
row[f"{name}_likelihood"] = clamped_likelihood
|
||||
|
||||
success_weight = sum(
|
||||
[
|
||||
likelihood * prob
|
||||
for likelihood, prob in zip(likelihoods, self.probabilities)
|
||||
]
|
||||
)
|
||||
new_probabilities = [
|
||||
likelihood * old_prob / success_weight
|
||||
for likelihood, old_prob in zip(likelihoods, self.probabilities)
|
||||
]
|
||||
self.probabilities = new_probabilities
|
||||
for name, probability in zip(self.model_names, self.probabilities):
|
||||
row[f"{name}_prob"] = probability
|
||||
_logger.info(row)
|
||||
return_result.append(row)
|
||||
|
||||
if self.write_output_to_csv:
|
||||
with open(self.filename, "a", newline="") as outfile:
|
||||
writer = csv.DictWriter(
|
||||
outfile, fieldnames=self.csv_fields, dialect="unix"
|
||||
)
|
||||
writer.writerow(row)
|
||||
|
||||
if self.use_end_threshold:
|
||||
max_prob = max(self.probabilities)
|
||||
if max_prob > self.end_threshold:
|
||||
_logger.info(
|
||||
f"Aborting early, because {max_prob} is greater than {self.end_threshold}"
|
||||
)
|
||||
break
|
||||
|
||||
return return_result
|
0
deepdog/cli/__init__.py
Normal file
0
deepdog/cli/__init__.py
Normal file
5
deepdog/cli/probs/__init__.py
Normal file
5
deepdog/cli/probs/__init__.py
Normal file
@ -0,0 +1,5 @@
|
||||
from deepdog.cli.probs.main import wrapped_main
|
||||
|
||||
__all__ = [
|
||||
"wrapped_main",
|
||||
]
|
51
deepdog/cli/probs/args.py
Normal file
51
deepdog/cli/probs/args.py
Normal file
@ -0,0 +1,51 @@
|
||||
import argparse
|
||||
import os
|
||||
|
||||
|
||||
def parse_args() -> argparse.Namespace:
|
||||
def dir_path(path):
|
||||
if os.path.isdir(path):
|
||||
return path
|
||||
else:
|
||||
raise argparse.ArgumentTypeError(f"readable_dir:{path} is not a valid path")
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
"probs", description="Calculating probability from finished bayesrun"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--log-file",
|
||||
type=str,
|
||||
help="A filename for logging to, if not provided will only log to stderr",
|
||||
default=None,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--bayesrun-directory",
|
||||
"-d",
|
||||
type=dir_path,
|
||||
help="The directory to search for bayesrun files, defaulting to cwd if not passed",
|
||||
default=".",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--indexify-json",
|
||||
help="A json file with the indexify config for parsing job indexes. Will skip if not present",
|
||||
default="",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--coalesced-keys",
|
||||
type=str,
|
||||
help="A comma separated list of strings over which to coalesce data. By default coalesce over all fields within model names, ignore file level names",
|
||||
default="",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--uncoalesced-outfile",
|
||||
type=str,
|
||||
help="output filename for uncoalesced data. If not provided, will not be written",
|
||||
default=None,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--coalesced-outfile",
|
||||
type=str,
|
||||
help="output filename for coalesced data. If not provided, will not be written",
|
||||
default=None,
|
||||
)
|
||||
return parser.parse_args()
|
178
deepdog/cli/probs/dicts.py
Normal file
178
deepdog/cli/probs/dicts.py
Normal file
@ -0,0 +1,178 @@
|
||||
import typing
|
||||
from deepdog.results import BayesrunOutput
|
||||
import logging
|
||||
import csv
|
||||
import tqdm
|
||||
|
||||
_logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def build_model_dict(
|
||||
bayes_outputs: typing.Sequence[BayesrunOutput],
|
||||
) -> typing.Dict[
|
||||
typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
|
||||
]:
|
||||
"""
|
||||
Maybe someday do something smarter with the coalescing and stuff but don't want to so i won't
|
||||
"""
|
||||
# assume that everything is well formatted and the keys are the same across entire list and initialise list of keys.
|
||||
# model dict will contain a model_key: {calculation_dict} where each calculation_dict represents a single calculation for that model,
|
||||
# the uncoalesced version, keyed by the specific file keys
|
||||
model_dict: typing.Dict[
|
||||
typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
|
||||
] = {}
|
||||
|
||||
_logger.info("building model dict")
|
||||
for out in tqdm.tqdm(bayes_outputs, desc="reading outputs", leave=False):
|
||||
for model_result in out.results:
|
||||
model_key = tuple(v for v in model_result.parsed_model_keys.values())
|
||||
if model_key not in model_dict:
|
||||
model_dict[model_key] = {}
|
||||
calculation_dict = model_dict[model_key]
|
||||
calculation_key = tuple(v for v in out.data.values())
|
||||
if calculation_key not in calculation_dict:
|
||||
calculation_dict[calculation_key] = {
|
||||
"_model_key_dict": model_result.parsed_model_keys,
|
||||
"_calculation_key_dict": out.data,
|
||||
"success": model_result.success,
|
||||
"count": model_result.count,
|
||||
}
|
||||
else:
|
||||
raise ValueError(
|
||||
f"Got {calculation_key} twice for model_key {model_key}"
|
||||
)
|
||||
|
||||
return model_dict
|
||||
|
||||
|
||||
def write_uncoalesced_dict(
|
||||
uncoalesced_output_filename: typing.Optional[str],
|
||||
uncoalesced_model_dict: typing.Dict[
|
||||
typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
|
||||
],
|
||||
):
|
||||
if uncoalesced_output_filename is None or uncoalesced_output_filename == "":
|
||||
_logger.warning("Not provided a uncoalesced filename, not going to try")
|
||||
return
|
||||
|
||||
first_value = next(iter(next(iter(uncoalesced_model_dict.values())).values()))
|
||||
model_field_names = set(first_value["_model_key_dict"].keys())
|
||||
calculation_field_names = set(first_value["_calculation_key_dict"].keys())
|
||||
if not (set(model_field_names).isdisjoint(calculation_field_names)):
|
||||
_logger.info(f"Detected model field names {model_field_names}")
|
||||
_logger.info(f"Detected calculation field names {calculation_field_names}")
|
||||
_logger.warning(
|
||||
f"model field names {model_field_names} and calculation {calculation_field_names} have an overlap, which is possibly a problem"
|
||||
)
|
||||
collected_fieldnames = list(model_field_names)
|
||||
collected_fieldnames.extend(calculation_field_names)
|
||||
collected_fieldnames.extend(["success", "count"])
|
||||
_logger.info(f"Full uncoalesced fieldnames are {collected_fieldnames}")
|
||||
with open(uncoalesced_output_filename, "w", newline="") as uncoalesced_output_file:
|
||||
writer = csv.DictWriter(
|
||||
uncoalesced_output_file, fieldnames=collected_fieldnames
|
||||
)
|
||||
writer.writeheader()
|
||||
|
||||
for model_dict in uncoalesced_model_dict.values():
|
||||
for calculation in model_dict.values():
|
||||
row = calculation["_model_key_dict"].copy()
|
||||
row.update(calculation["_calculation_key_dict"].copy())
|
||||
row.update(
|
||||
{
|
||||
"success": calculation["success"],
|
||||
"count": calculation["count"],
|
||||
}
|
||||
)
|
||||
writer.writerow(row)
|
||||
|
||||
|
||||
def coalesced_dict(
|
||||
uncoalesced_model_dict: typing.Dict[
|
||||
typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
|
||||
],
|
||||
minimum_count: float = 0.1,
|
||||
):
|
||||
"""
|
||||
pass in uncoalesced dict
|
||||
the minimum_count field is what we use to make sure our probs are never zero
|
||||
"""
|
||||
coalesced_dict = {}
|
||||
|
||||
# we are already iterating so for no reason because performance really doesn't matter let's count the keys ourselves
|
||||
num_keys = 0
|
||||
|
||||
# first pass coalesce
|
||||
for model_key, model_dict in uncoalesced_model_dict.items():
|
||||
num_keys += 1
|
||||
for calculation in model_dict.values():
|
||||
if model_key not in coalesced_dict:
|
||||
coalesced_dict[model_key] = {
|
||||
"_model_key_dict": calculation["_model_key_dict"].copy(),
|
||||
"calculations_coalesced": 0,
|
||||
"count": 0,
|
||||
"success": 0,
|
||||
}
|
||||
sub_dict = coalesced_dict[model_key]
|
||||
sub_dict["calculations_coalesced"] += 1
|
||||
sub_dict["count"] += calculation["count"]
|
||||
sub_dict["success"] += calculation["success"]
|
||||
|
||||
# second pass do probability calculation
|
||||
|
||||
prior = 1 / num_keys
|
||||
_logger.info(f"Got {num_keys} model keys, so our prior will be {prior}")
|
||||
|
||||
total_weight = 0
|
||||
for coalesced_model_dict in coalesced_dict.values():
|
||||
model_weight = (
|
||||
max(minimum_count, coalesced_model_dict["success"])
|
||||
/ coalesced_model_dict["count"]
|
||||
) * prior
|
||||
total_weight += model_weight
|
||||
|
||||
total_prob = 0
|
||||
for coalesced_model_dict in coalesced_dict.values():
|
||||
model_weight = (
|
||||
max(minimum_count, coalesced_model_dict["success"])
|
||||
/ coalesced_model_dict["count"]
|
||||
)
|
||||
prob = model_weight * prior / total_weight
|
||||
coalesced_model_dict["prob"] = prob
|
||||
total_prob += prob
|
||||
|
||||
_logger.debug(
|
||||
f"Got a total probability of {total_prob}, which should be close to 1 up to float/rounding error"
|
||||
)
|
||||
return coalesced_dict
|
||||
|
||||
|
||||
def write_coalesced_dict(
|
||||
coalesced_output_filename: typing.Optional[str],
|
||||
coalesced_model_dict: typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]],
|
||||
):
|
||||
if coalesced_output_filename is None or coalesced_output_filename == "":
|
||||
_logger.warning("Not provided a uncoalesced filename, not going to try")
|
||||
return
|
||||
|
||||
first_value = next(iter(coalesced_model_dict.values()))
|
||||
model_field_names = set(first_value["_model_key_dict"].keys())
|
||||
_logger.info(f"Detected model field names {model_field_names}")
|
||||
|
||||
collected_fieldnames = list(model_field_names)
|
||||
collected_fieldnames.extend(["calculations_coalesced", "success", "count", "prob"])
|
||||
with open(coalesced_output_filename, "w", newline="") as coalesced_output_file:
|
||||
writer = csv.DictWriter(coalesced_output_file, fieldnames=collected_fieldnames)
|
||||
writer.writeheader()
|
||||
|
||||
for model_dict in coalesced_model_dict.values():
|
||||
row = model_dict["_model_key_dict"].copy()
|
||||
row.update(
|
||||
{
|
||||
"calculations_coalesced": model_dict["calculations_coalesced"],
|
||||
"success": model_dict["success"],
|
||||
"count": model_dict["count"],
|
||||
"prob": model_dict["prob"],
|
||||
}
|
||||
)
|
||||
writer.writerow(row)
|
100
deepdog/cli/probs/main.py
Normal file
100
deepdog/cli/probs/main.py
Normal file
@ -0,0 +1,100 @@
|
||||
import logging
|
||||
import argparse
|
||||
import json
|
||||
import deepdog.cli.probs.args
|
||||
import deepdog.cli.probs.dicts
|
||||
import deepdog.results
|
||||
import deepdog.indexify
|
||||
import pathlib
|
||||
import tqdm
|
||||
import tqdm.contrib.logging
|
||||
|
||||
|
||||
_logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def set_up_logging(log_file: str):
|
||||
|
||||
log_pattern = "%(asctime)s | %(levelname)-7s | %(name)s:%(lineno)d | %(message)s"
|
||||
if log_file is None:
|
||||
handlers = [
|
||||
logging.StreamHandler(),
|
||||
]
|
||||
else:
|
||||
handlers = [logging.StreamHandler(), logging.FileHandler(log_file)]
|
||||
logging.basicConfig(
|
||||
level=logging.DEBUG,
|
||||
format=log_pattern,
|
||||
# it's okay to ignore this mypy error because who cares about logger handler types
|
||||
handlers=handlers, # type: ignore
|
||||
)
|
||||
logging.captureWarnings(True)
|
||||
|
||||
|
||||
def main(args: argparse.Namespace):
|
||||
"""
|
||||
Main function with passed in arguments and no additional logging setup in case we want to extract out later
|
||||
"""
|
||||
|
||||
with tqdm.contrib.logging.logging_redirect_tqdm():
|
||||
_logger.info(f"args: {args}")
|
||||
|
||||
try:
|
||||
if args.coalesced_keys:
|
||||
raise NotImplementedError(
|
||||
"Currently not supporting coalesced keys, but maybe in future"
|
||||
)
|
||||
except AttributeError:
|
||||
# we don't care if this is missing because we don't actually want it to be there
|
||||
pass
|
||||
|
||||
indexifier = None
|
||||
if args.indexify_json:
|
||||
with open(args.indexify_json, "r") as indexify_json_file:
|
||||
indexify_spec = json.load(indexify_json_file)
|
||||
indexify_data = indexify_spec["indexes"]
|
||||
if "seed_spec" in indexify_spec:
|
||||
seed_spec = indexify_spec["seed_spec"]
|
||||
indexify_data[seed_spec["field_name"]] = list(
|
||||
range(seed_spec["num_seeds"])
|
||||
)
|
||||
# _logger.debug(f"Indexifier data looks like {indexify_data}")
|
||||
indexifier = deepdog.indexify.Indexifier(indexify_data)
|
||||
|
||||
bayes_dir = pathlib.Path(args.bayesrun_directory)
|
||||
out_files = [f for f in bayes_dir.iterdir() if f.name.endswith("bayesrun.csv")]
|
||||
_logger.info(
|
||||
f"Reading {len(out_files)} bayesrun.csv files in directory {args.bayesrun_directory}"
|
||||
)
|
||||
# _logger.info(out_files)
|
||||
parsed_output_files = [
|
||||
deepdog.results.read_output_file(f, indexifier)
|
||||
for f in tqdm.tqdm(out_files, desc="reading files", leave=False)
|
||||
]
|
||||
|
||||
# Refactor here to allow for arbitrary likelihood file sources
|
||||
_logger.info("building uncoalesced dict")
|
||||
uncoalesced_dict = deepdog.cli.probs.dicts.build_model_dict(parsed_output_files)
|
||||
|
||||
if "uncoalesced_outfile" in args and args.uncoalesced_outfile:
|
||||
deepdog.cli.probs.dicts.write_uncoalesced_dict(
|
||||
args.uncoalesced_outfile, uncoalesced_dict
|
||||
)
|
||||
else:
|
||||
_logger.info("Skipping writing uncoalesced")
|
||||
|
||||
_logger.info("building coalesced dict")
|
||||
coalesced = deepdog.cli.probs.dicts.coalesced_dict(uncoalesced_dict)
|
||||
|
||||
if "coalesced_outfile" in args and args.coalesced_outfile:
|
||||
deepdog.cli.probs.dicts.write_coalesced_dict(
|
||||
args.coalesced_outfile, coalesced
|
||||
)
|
||||
else:
|
||||
_logger.info("Skipping writing coalesced")
|
||||
|
||||
|
||||
def wrapped_main():
|
||||
args = deepdog.cli.probs.args.parse_args()
|
||||
set_up_logging(args.log_file)
|
||||
main(args)
|
5
deepdog/cli/subset_sim_probs/__init__.py
Normal file
5
deepdog/cli/subset_sim_probs/__init__.py
Normal file
@ -0,0 +1,5 @@
|
||||
from deepdog.cli.subset_sim_probs.main import wrapped_main
|
||||
|
||||
__all__ = [
|
||||
"wrapped_main",
|
||||
]
|
52
deepdog/cli/subset_sim_probs/args.py
Normal file
52
deepdog/cli/subset_sim_probs/args.py
Normal file
@ -0,0 +1,52 @@
|
||||
import argparse
|
||||
import os
|
||||
|
||||
|
||||
def parse_args() -> argparse.Namespace:
|
||||
def dir_path(path):
|
||||
if os.path.isdir(path):
|
||||
return path
|
||||
else:
|
||||
raise argparse.ArgumentTypeError(f"readable_dir:{path} is not a valid path")
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
"subset_sim_probs",
|
||||
description="Calculating probability from finished subset sim run",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--log-file",
|
||||
type=str,
|
||||
help="A filename for logging to, if not provided will only log to stderr",
|
||||
default=None,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--results-directory",
|
||||
"-d",
|
||||
type=dir_path,
|
||||
help="The directory to search for bayesrun files, defaulting to cwd if not passed",
|
||||
default=".",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--indexify-json",
|
||||
help="A json file with the indexify config for parsing job indexes. Will skip if not present",
|
||||
default="",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--outfile",
|
||||
"-o",
|
||||
type=str,
|
||||
help="output filename for coalesced data. If not provided, will not be written",
|
||||
default=None,
|
||||
)
|
||||
confirm_outfile_overwrite_group = parser.add_mutually_exclusive_group()
|
||||
confirm_outfile_overwrite_group.add_argument(
|
||||
"--never-overwrite-outfile",
|
||||
action="store_true",
|
||||
help="If a duplicate outfile is detected, skip confirmation and automatically exit early",
|
||||
)
|
||||
confirm_outfile_overwrite_group.add_argument(
|
||||
"--force-overwrite-outfile",
|
||||
action="store_true",
|
||||
help="Skips checking for duplicate outfiles and overwrites",
|
||||
)
|
||||
return parser.parse_args()
|
136
deepdog/cli/subset_sim_probs/dicts.py
Normal file
136
deepdog/cli/subset_sim_probs/dicts.py
Normal file
@ -0,0 +1,136 @@
|
||||
import typing
|
||||
from deepdog.results import GeneralOutput
|
||||
import logging
|
||||
import csv
|
||||
import tqdm
|
||||
|
||||
_logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def build_model_dict(
|
||||
general_outputs: typing.Sequence[GeneralOutput],
|
||||
) -> typing.Dict[
|
||||
typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
|
||||
]:
|
||||
"""
|
||||
Maybe someday do something smarter with the coalescing and stuff but don't want to so i won't
|
||||
"""
|
||||
# assume that everything is well formatted and the keys are the same across entire list and initialise list of keys.
|
||||
# model dict will contain a model_key: {calculation_dict} where each calculation_dict represents a single calculation for that model,
|
||||
# the uncoalesced version, keyed by the specific file keys
|
||||
model_dict: typing.Dict[
|
||||
typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
|
||||
] = {}
|
||||
|
||||
_logger.info("building model dict")
|
||||
for out in tqdm.tqdm(general_outputs, desc="reading outputs", leave=False):
|
||||
for model_result in out.results:
|
||||
model_key = tuple(v for v in model_result.parsed_model_keys.values())
|
||||
if model_key not in model_dict:
|
||||
model_dict[model_key] = {}
|
||||
calculation_dict = model_dict[model_key]
|
||||
calculation_key = tuple(v for v in out.data.values())
|
||||
if calculation_key not in calculation_dict:
|
||||
calculation_dict[calculation_key] = {
|
||||
"_model_key_dict": model_result.parsed_model_keys,
|
||||
"_calculation_key_dict": out.data,
|
||||
"num_finished_runs": int(
|
||||
model_result.result_dict["num_finished_runs"]
|
||||
),
|
||||
"num_runs": int(model_result.result_dict["num_runs"]),
|
||||
"estimated_likelihood": float(
|
||||
model_result.result_dict["estimated_likelihood"]
|
||||
),
|
||||
}
|
||||
else:
|
||||
raise ValueError(
|
||||
f"Got {calculation_key} twice for model_key {model_key}"
|
||||
)
|
||||
|
||||
return model_dict
|
||||
|
||||
|
||||
def coalesced_dict(
|
||||
uncoalesced_model_dict: typing.Dict[
|
||||
typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
|
||||
],
|
||||
):
|
||||
"""
|
||||
pass in uncoalesced dict
|
||||
the minimum_count field is what we use to make sure our probs are never zero
|
||||
"""
|
||||
coalesced_dict = {}
|
||||
|
||||
# we are already iterating so for no reason because performance really doesn't matter let's count the keys ourselves
|
||||
num_keys = 0
|
||||
|
||||
# first pass coalesce
|
||||
for model_key, model_dict in uncoalesced_model_dict.items():
|
||||
num_keys += 1
|
||||
for calculation in model_dict.values():
|
||||
if model_key not in coalesced_dict:
|
||||
coalesced_dict[model_key] = {
|
||||
"_model_key_dict": calculation["_model_key_dict"].copy(),
|
||||
"calculations_coalesced": 1,
|
||||
"num_finished_runs": calculation["num_finished_runs"],
|
||||
"num_runs": calculation["num_runs"],
|
||||
"estimated_likelihood": calculation["estimated_likelihood"],
|
||||
}
|
||||
else:
|
||||
_logger.error(f"We shouldn't be here! Double key for {model_key=}")
|
||||
raise ValueError()
|
||||
|
||||
# second pass do probability calculation
|
||||
|
||||
prior = 1 / num_keys
|
||||
_logger.info(f"Got {num_keys} model keys, so our prior will be {prior}")
|
||||
|
||||
total_weight = 0
|
||||
for coalesced_model_dict in coalesced_dict.values():
|
||||
model_weight = coalesced_model_dict["estimated_likelihood"] * prior
|
||||
total_weight += model_weight
|
||||
|
||||
total_prob = 0
|
||||
for coalesced_model_dict in coalesced_dict.values():
|
||||
likelihood = coalesced_model_dict["estimated_likelihood"]
|
||||
prob = likelihood * prior / total_weight
|
||||
coalesced_model_dict["prob"] = prob
|
||||
total_prob += prob
|
||||
|
||||
_logger.debug(
|
||||
f"Got a total probability of {total_prob}, which should be close to 1 up to float/rounding error"
|
||||
)
|
||||
return coalesced_dict
|
||||
|
||||
|
||||
def write_coalesced_dict(
|
||||
coalesced_output_filename: typing.Optional[str],
|
||||
coalesced_model_dict: typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]],
|
||||
):
|
||||
if coalesced_output_filename is None or coalesced_output_filename == "":
|
||||
_logger.warning("Not provided a uncoalesced filename, not going to try")
|
||||
return
|
||||
|
||||
first_value = next(iter(coalesced_model_dict.values()))
|
||||
model_field_names = set(first_value["_model_key_dict"].keys())
|
||||
_logger.info(f"Detected model field names {model_field_names}")
|
||||
|
||||
collected_fieldnames = list(model_field_names)
|
||||
collected_fieldnames.extend(
|
||||
["calculations_coalesced", "num_finished_runs", "num_runs", "prob"]
|
||||
)
|
||||
with open(coalesced_output_filename, "w", newline="") as coalesced_output_file:
|
||||
writer = csv.DictWriter(coalesced_output_file, fieldnames=collected_fieldnames)
|
||||
writer.writeheader()
|
||||
|
||||
for model_dict in coalesced_model_dict.values():
|
||||
row = model_dict["_model_key_dict"].copy()
|
||||
row.update(
|
||||
{
|
||||
"calculations_coalesced": model_dict["calculations_coalesced"],
|
||||
"num_finished_runs": model_dict["num_finished_runs"],
|
||||
"num_runs": model_dict["num_runs"],
|
||||
"prob": model_dict["prob"],
|
||||
}
|
||||
)
|
||||
writer.writerow(row)
|
113
deepdog/cli/subset_sim_probs/main.py
Normal file
113
deepdog/cli/subset_sim_probs/main.py
Normal file
@ -0,0 +1,113 @@
|
||||
import logging
|
||||
import argparse
|
||||
import json
|
||||
|
||||
import deepdog.cli.subset_sim_probs.args
|
||||
import deepdog.cli.subset_sim_probs.dicts
|
||||
import deepdog.cli.util
|
||||
import deepdog.results
|
||||
import deepdog.indexify
|
||||
import pathlib
|
||||
import tqdm
|
||||
import os
|
||||
import tqdm.contrib.logging
|
||||
|
||||
|
||||
_logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def set_up_logging(log_file: str):
|
||||
|
||||
log_pattern = "%(asctime)s | %(levelname)-7s | %(name)s:%(lineno)d | %(message)s"
|
||||
if log_file is None:
|
||||
handlers = [
|
||||
logging.StreamHandler(),
|
||||
]
|
||||
else:
|
||||
handlers = [logging.StreamHandler(), logging.FileHandler(log_file)]
|
||||
logging.basicConfig(
|
||||
level=logging.DEBUG,
|
||||
format=log_pattern,
|
||||
# it's okay to ignore this mypy error because who cares about logger handler types
|
||||
handlers=handlers, # type: ignore
|
||||
)
|
||||
logging.captureWarnings(True)
|
||||
|
||||
|
||||
def main(args: argparse.Namespace):
|
||||
"""
|
||||
Main function with passed in arguments and no additional logging setup in case we want to extract out later
|
||||
"""
|
||||
|
||||
with tqdm.contrib.logging.logging_redirect_tqdm():
|
||||
_logger.info(f"args: {args}")
|
||||
|
||||
if "outfile" in args and args.outfile:
|
||||
if os.path.exists(args.outfile):
|
||||
if args.never_overwrite_outfile:
|
||||
_logger.warning(
|
||||
f"Filename {args.outfile} already exists, and never want overwrite, so aborting."
|
||||
)
|
||||
return
|
||||
elif args.force_overwrite_outfile:
|
||||
_logger.warning(f"Forcing overwrite of {args.outfile}")
|
||||
else:
|
||||
# need to confirm
|
||||
confirm_overwrite = deepdog.cli.util.confirm_prompt(
|
||||
f"Filename {args.outfile} exists, overwrite?"
|
||||
)
|
||||
if not confirm_overwrite:
|
||||
_logger.warning(
|
||||
f"Filename {args.outfile} already exists and do not want overwrite, aborting."
|
||||
)
|
||||
return
|
||||
else:
|
||||
_logger.warning(f"Overwriting file {args.outfile}")
|
||||
|
||||
indexifier = None
|
||||
if args.indexify_json:
|
||||
with open(args.indexify_json, "r") as indexify_json_file:
|
||||
indexify_spec = json.load(indexify_json_file)
|
||||
indexify_data = indexify_spec["indexes"]
|
||||
if "seed_spec" in indexify_spec:
|
||||
seed_spec = indexify_spec["seed_spec"]
|
||||
indexify_data[seed_spec["field_name"]] = list(
|
||||
range(seed_spec["num_seeds"])
|
||||
)
|
||||
# _logger.debug(f"Indexifier data looks like {indexify_data}")
|
||||
indexifier = deepdog.indexify.Indexifier(indexify_data)
|
||||
|
||||
results_dir = pathlib.Path(args.results_directory)
|
||||
out_files = [
|
||||
f for f in results_dir.iterdir() if f.name.endswith("subsetsim.csv")
|
||||
]
|
||||
_logger.info(
|
||||
f"Reading {len(out_files)} subsetsim.csv files in directory {args.results_directory}"
|
||||
)
|
||||
# _logger.info(out_files)
|
||||
parsed_output_files = [
|
||||
deepdog.results.read_subset_sim_file(f, indexifier)
|
||||
for f in tqdm.tqdm(out_files, desc="reading files", leave=False)
|
||||
]
|
||||
|
||||
# Refactor here to allow for arbitrary likelihood file sources
|
||||
_logger.info("building uncoalesced dict")
|
||||
uncoalesced_dict = deepdog.cli.subset_sim_probs.dicts.build_model_dict(
|
||||
parsed_output_files
|
||||
)
|
||||
|
||||
_logger.info("building coalesced dict")
|
||||
coalesced = deepdog.cli.subset_sim_probs.dicts.coalesced_dict(uncoalesced_dict)
|
||||
|
||||
if "outfile" in args and args.outfile:
|
||||
deepdog.cli.subset_sim_probs.dicts.write_coalesced_dict(
|
||||
args.outfile, coalesced
|
||||
)
|
||||
else:
|
||||
_logger.info("Skipping writing coalesced")
|
||||
|
||||
|
||||
def wrapped_main():
|
||||
args = deepdog.cli.subset_sim_probs.args.parse_args()
|
||||
set_up_logging(args.log_file)
|
||||
main(args)
|
3
deepdog/cli/util/__init__.py
Normal file
3
deepdog/cli/util/__init__.py
Normal file
@ -0,0 +1,3 @@
|
||||
from deepdog.cli.util.confirm import confirm_prompt
|
||||
|
||||
__all__ = ["confirm_prompt"]
|
23
deepdog/cli/util/confirm.py
Normal file
23
deepdog/cli/util/confirm.py
Normal file
@ -0,0 +1,23 @@
|
||||
_RESPONSE_MAP = {
|
||||
"yes": True,
|
||||
"ye": True,
|
||||
"y": True,
|
||||
"no": False,
|
||||
"n": False,
|
||||
"nope": False,
|
||||
"true": True,
|
||||
"false": False,
|
||||
}
|
||||
|
||||
|
||||
def confirm_prompt(question: str) -> bool:
|
||||
"""Prompt with the question and returns yes or no based on response."""
|
||||
prompt = question + " [y/n]: "
|
||||
|
||||
while True:
|
||||
choice = input(prompt).lower()
|
||||
|
||||
if choice in _RESPONSE_MAP:
|
||||
return _RESPONSE_MAP[choice]
|
||||
else:
|
||||
print('Respond with "yes" or "no"')
|
14
deepdog/direct_monte_carlo/compose_filter.py
Normal file
14
deepdog/direct_monte_carlo/compose_filter.py
Normal file
@ -0,0 +1,14 @@
|
||||
from typing import Sequence
|
||||
from deepdog.direct_monte_carlo.direct_mc import DirectMonteCarloFilter
|
||||
import numpy
|
||||
|
||||
|
||||
class ComposedDMCFilter(DirectMonteCarloFilter):
|
||||
def __init__(self, filters: Sequence[DirectMonteCarloFilter]):
|
||||
self.filters = filters
|
||||
|
||||
def filter_samples(self, samples: numpy.ndarray) -> numpy.ndarray:
|
||||
current_sample = samples
|
||||
for filter in self.filters:
|
||||
current_sample = filter.filter_samples(current_sample)
|
||||
return current_sample
|
24
deepdog/direct_monte_carlo/cost_function_filter.py
Normal file
24
deepdog/direct_monte_carlo/cost_function_filter.py
Normal file
@ -0,0 +1,24 @@
|
||||
from deepdog.direct_monte_carlo.direct_mc import DirectMonteCarloFilter
|
||||
from typing import Callable
|
||||
import numpy
|
||||
|
||||
|
||||
class CostFunctionTargetFilter(DirectMonteCarloFilter):
|
||||
def __init__(
|
||||
self,
|
||||
cost_function: Callable[[numpy.ndarray], numpy.ndarray],
|
||||
target_cost: float,
|
||||
):
|
||||
"""
|
||||
Filters dipoles by cost, only leaving dipoles with cost below target_cost
|
||||
"""
|
||||
self.cost_function = cost_function
|
||||
self.target_cost = target_cost
|
||||
|
||||
def filter_samples(self, samples: numpy.ndarray) -> numpy.ndarray:
|
||||
current_sample = samples
|
||||
|
||||
costs = self.cost_function(current_sample)
|
||||
|
||||
current_sample = current_sample[costs < self.target_cost]
|
||||
return current_sample
|
@ -1,22 +1,30 @@
|
||||
import re
|
||||
import pathlib
|
||||
import csv
|
||||
import pdme.model
|
||||
import pdme.measurement
|
||||
import pdme.measurement.input_types
|
||||
import pdme.subspace_simulation
|
||||
from typing import Tuple, Sequence
|
||||
import datetime
|
||||
from typing import Tuple, Dict, NewType, Any, Sequence
|
||||
from dataclasses import dataclass
|
||||
import logging
|
||||
import numpy
|
||||
import numpy.random
|
||||
import pdme.util.fast_v_calc
|
||||
import multiprocessing
|
||||
|
||||
_logger = logging.getLogger(__name__)
|
||||
|
||||
ANTI_ZERO_SUCCESS_THRES = 0.1
|
||||
|
||||
|
||||
@dataclass
|
||||
class DirectMonteCarloResult:
|
||||
successes: int
|
||||
monte_carlo_count: int
|
||||
likelihood: float
|
||||
model_name: str
|
||||
|
||||
|
||||
@dataclass
|
||||
@ -28,6 +36,51 @@ class DirectMonteCarloConfig:
|
||||
monte_carlo_seed: int = 1234
|
||||
write_successes_to_file: bool = False
|
||||
tag: str = ""
|
||||
cap_core_count: int = 0 # 0 means cap at num cores - 1
|
||||
chunk_size: int = 50
|
||||
# chunk size of some kind
|
||||
write_bayesrun_file: bool = True
|
||||
bayesrun_file_timestamp: bool = True
|
||||
skip_if_exists: bool = False
|
||||
|
||||
def get_filename(self) -> str:
|
||||
"""
|
||||
Generate a filename for the output of this run.
|
||||
"""
|
||||
# set starting execution timestamp
|
||||
timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
|
||||
|
||||
if self.bayesrun_file_timestamp:
|
||||
timestamp_str = f"{timestamp}-"
|
||||
else:
|
||||
timestamp_str = ""
|
||||
filename = f"{timestamp_str}{self.tag}.realdata.fast_filter.bayesrun.csv"
|
||||
_logger.debug(f"Got filename {filename}")
|
||||
return filename
|
||||
|
||||
def get_filename_regex(self) -> str:
|
||||
"""
|
||||
Generate a regex for the output of this run.
|
||||
"""
|
||||
|
||||
# having both timestamp and the hyphen separately optional is a bit of a hack
|
||||
# too loose, but will never matter
|
||||
pattern = rf"(?P<timestamp>\d{{8}}-\d{{6}})?-?{self.tag}\.realdata\.fast_filter\.bayesrun\.csv"
|
||||
return pattern
|
||||
|
||||
|
||||
# Aliasing dict as a generic data container
|
||||
DirectMonteCarloData = NewType("DirectMonteCarloData", Dict[str, Any])
|
||||
|
||||
|
||||
class DirectMonteCarloFilter:
|
||||
"""
|
||||
Abstract class for filtering out samples matching some criteria. Initialise with data as needed,
|
||||
then filter out samples as needed.
|
||||
"""
|
||||
|
||||
def filter_samples(self, samples: numpy.ndarray) -> numpy.ndarray:
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
class DirectMonteCarloRun:
|
||||
@ -37,8 +90,8 @@ class DirectMonteCarloRun:
|
||||
|
||||
Parameters
|
||||
----------
|
||||
model_name_pair : Sequence[Tuple(str, pdme.model.DipoleModel)]
|
||||
The model to evaluate, with name.
|
||||
model_name_pairs : Sequence[Tuple(str, pdme.model.DipoleModel)]
|
||||
The models to evaluate, with names
|
||||
|
||||
measurements: Sequence[pdme.measurement.DotRangeMeasurement]
|
||||
The measurements as dot ranges to use as the bounds for the Monte Carlo calculation.
|
||||
@ -64,57 +117,95 @@ class DirectMonteCarloRun:
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
model_name_pair: Tuple[str, pdme.model.DipoleModel],
|
||||
measurements: Sequence[pdme.measurement.DotRangeMeasurement],
|
||||
model_name_pairs: Sequence[Tuple[str, pdme.model.DipoleModel]],
|
||||
filter: DirectMonteCarloFilter,
|
||||
config: DirectMonteCarloConfig,
|
||||
):
|
||||
self.model_name, self.model = model_name_pair
|
||||
self.model_name_pairs = model_name_pairs
|
||||
|
||||
self.measurements = measurements
|
||||
self.dot_inputs = [(measure.r, measure.f) for measure in self.measurements]
|
||||
# self.measurements = measurements
|
||||
# self.dot_inputs = [(measure.r, measure.f) for measure in self.measurements]
|
||||
|
||||
self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
|
||||
self.dot_inputs
|
||||
)
|
||||
# self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
|
||||
# self.dot_inputs
|
||||
# )
|
||||
|
||||
self.config = config
|
||||
(
|
||||
self.lows,
|
||||
self.highs,
|
||||
) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
|
||||
self.measurements
|
||||
)
|
||||
self.filter = filter
|
||||
# (
|
||||
# self.lows,
|
||||
# self.highs,
|
||||
# ) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
|
||||
# self.measurements
|
||||
# )
|
||||
|
||||
def _single_run(self, seed) -> numpy.ndarray:
|
||||
def _single_run(
|
||||
self, model_name_pair: Tuple[str, pdme.model.DipoleModel], seed
|
||||
) -> numpy.ndarray:
|
||||
rng = numpy.random.default_rng(seed)
|
||||
|
||||
sample_dipoles = self.model.get_monte_carlo_dipole_inputs(
|
||||
_, model = model_name_pair
|
||||
# don't log here it's madness
|
||||
# _logger.info(f"Executing for model {model_name}")
|
||||
|
||||
sample_dipoles = model.get_monte_carlo_dipole_inputs(
|
||||
self.config.monte_carlo_count_per_cycle, -1, rng
|
||||
)
|
||||
|
||||
current_sample = sample_dipoles
|
||||
for di, low, high in zip(self.dot_inputs_array, self.lows, self.highs):
|
||||
|
||||
if len(current_sample) < 1:
|
||||
break
|
||||
vals = pdme.util.fast_v_calc.fast_vs_for_dipoleses(
|
||||
numpy.array([di]), current_sample
|
||||
)
|
||||
return self.filter.filter_samples(current_sample)
|
||||
# for di, low, high in zip(self.dot_inputs_array, self.lows, self.highs):
|
||||
|
||||
current_sample = current_sample[
|
||||
numpy.all((vals > low) & (vals < high), axis=1)
|
||||
]
|
||||
return current_sample
|
||||
# if len(current_sample) < 1:
|
||||
# break
|
||||
# vals = pdme.util.fast_v_calc.fast_vs_for_dipoleses(
|
||||
# numpy.array([di]), current_sample
|
||||
# )
|
||||
|
||||
def execute(self) -> DirectMonteCarloResult:
|
||||
step_count = 0
|
||||
total_success = 0
|
||||
total_count = 0
|
||||
# current_sample = current_sample[
|
||||
# numpy.all((vals > low) & (vals < high), axis=1)
|
||||
# ]
|
||||
# return current_sample
|
||||
|
||||
def _wrapped_single_run(self, args: Tuple):
|
||||
"""
|
||||
single run wrapped up for multiprocessing call.
|
||||
|
||||
takes in a tuple of arguments corresponding to
|
||||
(model_name_pair, seed, return_configs)
|
||||
|
||||
return_configs is a boolean, if true then will return tuple of (count, [matching configs])
|
||||
if false, return (count, [])
|
||||
"""
|
||||
# here's where we do our work
|
||||
|
||||
model_name_pair, seed, return_configs = args
|
||||
cycle_success_configs = self._single_run(model_name_pair, seed)
|
||||
cycle_success_count = len(cycle_success_configs)
|
||||
|
||||
if return_configs:
|
||||
return (cycle_success_count, cycle_success_configs)
|
||||
else:
|
||||
return (cycle_success_count, [])
|
||||
|
||||
def execute_no_multiprocessing(self) -> Sequence[DirectMonteCarloResult]:
|
||||
|
||||
count_per_step = (
|
||||
self.config.monte_carlo_count_per_cycle * self.config.monte_carlo_cycles
|
||||
)
|
||||
seed_sequence = numpy.random.SeedSequence(self.config.monte_carlo_seed)
|
||||
|
||||
# core count etc. logic here
|
||||
|
||||
results = []
|
||||
for model_name_pair in self.model_name_pairs:
|
||||
step_count = 0
|
||||
total_success = 0
|
||||
total_count = 0
|
||||
|
||||
_logger.info(f"Working on model {model_name_pair[0]}")
|
||||
# This is probably where multiprocessing logic should go
|
||||
while (step_count < self.config.max_monte_carlo_cycles_steps) and (
|
||||
total_success < self.config.target_success
|
||||
):
|
||||
@ -122,13 +213,14 @@ class DirectMonteCarloRun:
|
||||
for cycle_i, seed in enumerate(
|
||||
seed_sequence.spawn(self.config.monte_carlo_cycles)
|
||||
):
|
||||
cycle_success_configs = self._single_run(seed)
|
||||
# here's where we do our work
|
||||
cycle_success_configs = self._single_run(model_name_pair, seed)
|
||||
cycle_success_count = len(cycle_success_configs)
|
||||
if cycle_success_count > 0:
|
||||
_logger.debug(
|
||||
f"For cycle {cycle_i} received {cycle_success_count} successes"
|
||||
)
|
||||
_logger.debug(cycle_success_configs)
|
||||
# _logger.debug(cycle_success_configs)
|
||||
if self.config.write_successes_to_file:
|
||||
sorted_by_freq = numpy.array(
|
||||
[
|
||||
@ -140,18 +232,204 @@ class DirectMonteCarloRun:
|
||||
)
|
||||
dipole_count = numpy.array(cycle_success_configs).shape[1]
|
||||
for n in range(dipole_count):
|
||||
number_dipoles_to_write = self.config.target_success * 5
|
||||
_logger.info(f"Limiting to {number_dipoles_to_write=}")
|
||||
numpy.savetxt(
|
||||
f"{self.config.tag}_{step_count}_{cycle_i}_dipole_{n}.csv",
|
||||
sorted_by_freq[:, n],
|
||||
sorted_by_freq[:number_dipoles_to_write, n],
|
||||
delimiter=",",
|
||||
)
|
||||
total_success += cycle_success_count
|
||||
_logger.debug(f"At end of step {step_count} have {total_success} successes")
|
||||
_logger.debug(
|
||||
f"At end of step {step_count} have {total_success} successes"
|
||||
)
|
||||
step_count += 1
|
||||
total_count += count_per_step
|
||||
|
||||
return DirectMonteCarloResult(
|
||||
results.append(
|
||||
DirectMonteCarloResult(
|
||||
successes=total_success,
|
||||
monte_carlo_count=total_count,
|
||||
likelihood=total_success / total_count,
|
||||
model_name=model_name_pair[0],
|
||||
)
|
||||
)
|
||||
return results
|
||||
|
||||
def execute(self) -> Sequence[DirectMonteCarloResult]:
|
||||
|
||||
filename = self.config.get_filename()
|
||||
if self.config.skip_if_exists:
|
||||
_logger.info(f"Checking if {filename} exists")
|
||||
cwd = pathlib.Path.cwd()
|
||||
if (cwd / filename).exists():
|
||||
_logger.info(f"File {filename} exists, skipping")
|
||||
return []
|
||||
if self.config.bayesrun_file_timestamp:
|
||||
_logger.info(
|
||||
"Also need to check file endings because of possible past or current timestamps, check only occurs if writing timestamp is set"
|
||||
)
|
||||
pattern = self.config.get_filename_regex()
|
||||
for file in cwd.iterdir():
|
||||
match = re.match(pattern, file.name)
|
||||
if match is not None:
|
||||
_logger.info(f"Matched {file.name} to {pattern}")
|
||||
_logger.info(f"File {filename} exists, skipping")
|
||||
return []
|
||||
_logger.info(
|
||||
f"Finished checking against pattern {pattern}, hopefully didn't take too long!"
|
||||
)
|
||||
|
||||
count_per_step = (
|
||||
self.config.monte_carlo_count_per_cycle * self.config.monte_carlo_cycles
|
||||
)
|
||||
seed_sequence = numpy.random.SeedSequence(self.config.monte_carlo_seed)
|
||||
|
||||
# core count etc. logic here
|
||||
core_count = multiprocessing.cpu_count() - 1 or 1
|
||||
if (self.config.cap_core_count >= 1) and (
|
||||
self.config.cap_core_count < core_count
|
||||
):
|
||||
core_count = self.config.cap_core_count
|
||||
_logger.info(f"Using {core_count} cores")
|
||||
|
||||
results = []
|
||||
with multiprocessing.Pool(core_count) as pool:
|
||||
|
||||
for model_name_pair in self.model_name_pairs:
|
||||
_logger.info(f"Working on model {model_name_pair[0]}")
|
||||
# This is probably where multiprocessing logic should go
|
||||
|
||||
step_count = 0
|
||||
total_success = 0
|
||||
total_count = 0
|
||||
|
||||
while (step_count < self.config.max_monte_carlo_cycles_steps) and (
|
||||
total_success < self.config.target_success
|
||||
):
|
||||
|
||||
step_count += 1
|
||||
|
||||
_logger.debug(f"Executing step {step_count}")
|
||||
|
||||
seeds = seed_sequence.spawn(self.config.monte_carlo_cycles)
|
||||
|
||||
raw_pool_results = list(
|
||||
pool.imap_unordered(
|
||||
self._wrapped_single_run,
|
||||
[
|
||||
(
|
||||
model_name_pair,
|
||||
seed,
|
||||
self.config.write_successes_to_file,
|
||||
)
|
||||
for seed in seeds
|
||||
],
|
||||
self.config.chunk_size,
|
||||
)
|
||||
)
|
||||
|
||||
pool_results = sum(result[0] for result in raw_pool_results)
|
||||
|
||||
_logger.debug(f"Pool results: {pool_results}")
|
||||
|
||||
if self.config.write_successes_to_file:
|
||||
|
||||
_logger.info("Writing dipole results")
|
||||
|
||||
cycle_success_configs = numpy.concatenate(
|
||||
[result[1] for result in raw_pool_results]
|
||||
)
|
||||
|
||||
dipole_count = numpy.array(cycle_success_configs).shape[1]
|
||||
|
||||
max_number_dipoles_to_write = self.config.target_success * 5
|
||||
_logger.debug(
|
||||
f"Limiting to {max_number_dipoles_to_write=}, have {len(cycle_success_configs)}"
|
||||
)
|
||||
|
||||
if len(cycle_success_configs):
|
||||
sorted_by_freq = numpy.array(
|
||||
[
|
||||
pdme.subspace_simulation.sort_array_of_dipoles_by_frequency(
|
||||
dipole_config
|
||||
)
|
||||
for dipole_config in cycle_success_configs[
|
||||
:max_number_dipoles_to_write
|
||||
]
|
||||
]
|
||||
)
|
||||
|
||||
for n in range(dipole_count):
|
||||
|
||||
dipole_filename = (
|
||||
f"{self.config.tag}_{step_count}_dipole_{n}.csv"
|
||||
)
|
||||
_logger.debug(
|
||||
f"Writing {min(len(cycle_success_configs), max_number_dipoles_to_write)} to {dipole_filename}"
|
||||
)
|
||||
|
||||
numpy.savetxt(
|
||||
dipole_filename,
|
||||
sorted_by_freq[:, n],
|
||||
delimiter=",",
|
||||
)
|
||||
else:
|
||||
_logger.debug(
|
||||
"Instructed to write results, but none obtained"
|
||||
)
|
||||
|
||||
total_success += pool_results
|
||||
total_count += count_per_step
|
||||
_logger.debug(
|
||||
f"At end of step {step_count} have {total_success} successes"
|
||||
)
|
||||
|
||||
results.append(
|
||||
DirectMonteCarloResult(
|
||||
successes=total_success,
|
||||
monte_carlo_count=total_count,
|
||||
likelihood=total_success / total_count,
|
||||
model_name=model_name_pair[0],
|
||||
)
|
||||
)
|
||||
|
||||
if self.config.write_bayesrun_file:
|
||||
|
||||
_logger.info(f"Going to write to file [{filename}]")
|
||||
# row: Dict[str, Union[int, float, str]] = {}
|
||||
row = {}
|
||||
|
||||
num_models = len(self.model_name_pairs)
|
||||
success_weight = sum(
|
||||
[
|
||||
(
|
||||
max(ANTI_ZERO_SUCCESS_THRES, res.successes)
|
||||
/ res.monte_carlo_count
|
||||
)
|
||||
/ num_models
|
||||
for res in results
|
||||
]
|
||||
)
|
||||
|
||||
for res in results:
|
||||
row.update(
|
||||
{
|
||||
f"{res.model_name}_success": res.successes,
|
||||
f"{res.model_name}_count": res.monte_carlo_count,
|
||||
f"{res.model_name}_prob": (
|
||||
max(ANTI_ZERO_SUCCESS_THRES, res.successes)
|
||||
/ res.monte_carlo_count
|
||||
)
|
||||
/ (num_models * success_weight),
|
||||
}
|
||||
)
|
||||
_logger.info(f"Writing row {row}")
|
||||
fieldnames = list(row.keys())
|
||||
|
||||
with open(filename, "w", newline="") as outfile:
|
||||
writer = csv.DictWriter(outfile, fieldnames=fieldnames, dialect="unix")
|
||||
writer.writeheader()
|
||||
writer.writerow(row)
|
||||
|
||||
return results
|
||||
|
115
deepdog/direct_monte_carlo/dmc_filters.py
Normal file
115
deepdog/direct_monte_carlo/dmc_filters.py
Normal file
@ -0,0 +1,115 @@
|
||||
from numpy import ndarray
|
||||
from deepdog.direct_monte_carlo.direct_mc import DirectMonteCarloFilter
|
||||
from typing import Sequence
|
||||
import pdme.measurement
|
||||
import pdme.measurement.input_types
|
||||
import pdme.util.fast_nonlocal_spectrum
|
||||
import pdme.util.fast_v_calc
|
||||
import numpy
|
||||
|
||||
|
||||
class SingleDotPotentialFilter(DirectMonteCarloFilter):
|
||||
def __init__(self, measurements: Sequence[pdme.measurement.DotRangeMeasurement]):
|
||||
self.measurements = measurements
|
||||
self.dot_inputs = [(measure.r, measure.f) for measure in self.measurements]
|
||||
|
||||
self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
|
||||
self.dot_inputs
|
||||
)
|
||||
(
|
||||
self.lows,
|
||||
self.highs,
|
||||
) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
|
||||
self.measurements
|
||||
)
|
||||
|
||||
def filter_samples(self, samples: ndarray) -> ndarray:
|
||||
current_sample = samples
|
||||
for di, low, high in zip(self.dot_inputs_array, self.lows, self.highs):
|
||||
|
||||
if len(current_sample) < 1:
|
||||
break
|
||||
vals = pdme.util.fast_v_calc.fast_vs_for_dipoleses(
|
||||
numpy.array([di]), current_sample
|
||||
)
|
||||
|
||||
current_sample = current_sample[
|
||||
numpy.all((vals > low) & (vals < high), axis=1)
|
||||
]
|
||||
return current_sample
|
||||
|
||||
|
||||
class SingleDotSpinQubitFrequencyFilter(DirectMonteCarloFilter):
|
||||
def __init__(self, measurements: Sequence[pdme.measurement.DotRangeMeasurement]):
|
||||
self.measurements = measurements
|
||||
self.dot_inputs = [(measure.r, measure.f) for measure in self.measurements]
|
||||
|
||||
self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
|
||||
self.dot_inputs
|
||||
)
|
||||
(
|
||||
self.lows,
|
||||
self.highs,
|
||||
) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
|
||||
self.measurements
|
||||
)
|
||||
|
||||
def filter_samples(self, samples: ndarray) -> ndarray:
|
||||
current_sample = samples
|
||||
for di, low, high in zip(self.dot_inputs_array, self.lows, self.highs):
|
||||
|
||||
if len(current_sample) < 1:
|
||||
break
|
||||
vals = pdme.util.fast_v_calc.fast_efieldxs_for_dipoleses(
|
||||
numpy.array([di]), current_sample
|
||||
)
|
||||
# _logger.info(vals)
|
||||
|
||||
current_sample = current_sample[
|
||||
numpy.all((vals > low) & (vals < high), axis=1)
|
||||
]
|
||||
# _logger.info(f"leaving with {len(current_sample)}")
|
||||
return current_sample
|
||||
|
||||
|
||||
class DoubleDotSpinQubitFrequencyFilter(DirectMonteCarloFilter):
|
||||
def __init__(
|
||||
self,
|
||||
pair_phase_measurements: Sequence[pdme.measurement.DotPairRangeMeasurement],
|
||||
):
|
||||
self.pair_phase_measurements = pair_phase_measurements
|
||||
self.dot_pair_inputs = [
|
||||
(measure.r1, measure.r2, measure.f)
|
||||
for measure in self.pair_phase_measurements
|
||||
]
|
||||
self.dot_pair_inputs_array = (
|
||||
pdme.measurement.input_types.dot_pair_inputs_to_array(self.dot_pair_inputs)
|
||||
)
|
||||
(
|
||||
self.pair_phase_lows,
|
||||
self.pair_phase_highs,
|
||||
) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
|
||||
self.pair_phase_measurements
|
||||
)
|
||||
|
||||
def filter_samples(self, samples: ndarray) -> ndarray:
|
||||
current_sample = samples
|
||||
|
||||
for pi, plow, phigh in zip(
|
||||
self.dot_pair_inputs_array, self.pair_phase_lows, self.pair_phase_highs
|
||||
):
|
||||
if len(current_sample) < 1:
|
||||
break
|
||||
|
||||
vals = pdme.util.fast_nonlocal_spectrum.signarg(
|
||||
pdme.util.fast_nonlocal_spectrum.fast_s_spin_qubit_tarucha_nonlocal_dipoleses(
|
||||
numpy.array([pi]), current_sample
|
||||
)
|
||||
)
|
||||
current_sample = current_sample[
|
||||
numpy.all(
|
||||
((vals > plow) & (vals < phigh)) | ((vals < plow) & (vals > phigh)),
|
||||
axis=1,
|
||||
)
|
||||
]
|
||||
return current_sample
|
62
deepdog/indexify/__init__.py
Normal file
62
deepdog/indexify/__init__.py
Normal file
@ -0,0 +1,62 @@
|
||||
"""
|
||||
Probably should just include a way to handle the indexify function I reuse so much.
|
||||
|
||||
All about breaking an integer into a tuple of values from lists, which is useful because of how we do CHTC runs.
|
||||
"""
|
||||
import itertools
|
||||
import typing
|
||||
import logging
|
||||
import math
|
||||
|
||||
_logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
# from https://stackoverflow.com/questions/5228158/cartesian-product-of-a-dictionary-of-lists
|
||||
def _dict_product(dicts):
|
||||
"""
|
||||
>>> list(dict_product(dict(number=[1,2], character='ab')))
|
||||
[{'character': 'a', 'number': 1},
|
||||
{'character': 'a', 'number': 2},
|
||||
{'character': 'b', 'number': 1},
|
||||
{'character': 'b', 'number': 2}]
|
||||
"""
|
||||
return list(dict(zip(dicts.keys(), x)) for x in itertools.product(*dicts.values()))
|
||||
|
||||
|
||||
class Indexifier:
|
||||
"""
|
||||
The order of keys is very important, but collections.OrderedDict is no longer needed in python 3.7.
|
||||
I think it's okay to rely on that.
|
||||
"""
|
||||
|
||||
def __init__(self, list_dict: typing.Dict[str, typing.Sequence]):
|
||||
self.dict = list_dict
|
||||
self.product_dict = _dict_product(self.dict)
|
||||
|
||||
def indexify(self, n: int) -> typing.Dict[str, typing.Any]:
|
||||
return self.product_dict[n]
|
||||
|
||||
def __len__(self) -> int:
|
||||
weights = [len(v) for v in self.dict.values()]
|
||||
return math.prod(weights)
|
||||
|
||||
def _indexify_indices(self, n: int) -> typing.Sequence[int]:
|
||||
"""
|
||||
legacy indexify from old scripts, copypast.
|
||||
could be used like
|
||||
>>> ret = {}
|
||||
>>> for k, i in zip(self.dict.keys(), self._indexify_indices):
|
||||
>>> ret[k] = self.dict[k][i]
|
||||
>>> return ret
|
||||
"""
|
||||
weights = [len(v) for v in self.dict.values()]
|
||||
N = math.prod(weights)
|
||||
curr_n = n
|
||||
curr_N = N
|
||||
out = []
|
||||
for w in weights[:-1]:
|
||||
# print(f"current: {curr_N}, {curr_n}, {curr_n // w}")
|
||||
curr_N = curr_N // w # should be int division anyway
|
||||
out.append(curr_n // curr_N)
|
||||
curr_n = curr_n % curr_N
|
||||
return out
|
@ -66,7 +66,7 @@ def get_a_result_fast_filter_pairs(input) -> int:
|
||||
return len(current_sample)
|
||||
|
||||
|
||||
def get_a_result_fast_filter_pair_phase_only(input) -> int:
|
||||
def get_a_result_fast_filter_potential_pair_phase_only(input) -> int:
|
||||
(
|
||||
model,
|
||||
pair_inputs,
|
||||
@ -102,6 +102,50 @@ def get_a_result_fast_filter_pair_phase_only(input) -> int:
|
||||
return len(current_sample)
|
||||
|
||||
|
||||
def get_a_result_fast_filter_tarucha_spin_qubit_pair_phase_only(input) -> int:
|
||||
(
|
||||
model,
|
||||
pair_inputs,
|
||||
pair_phase_lows,
|
||||
pair_phase_highs,
|
||||
monte_carlo_count,
|
||||
seed,
|
||||
) = input
|
||||
|
||||
rng = numpy.random.default_rng(seed)
|
||||
# TODO: A long term refactor is to pull the frequency stuff out from here. The None stands for max_frequency, which is unneeded in the actually useful models.
|
||||
sample_dipoles = model.get_monte_carlo_dipole_inputs(
|
||||
monte_carlo_count, None, rng_to_use=rng
|
||||
)
|
||||
|
||||
current_sample = sample_dipoles
|
||||
|
||||
for pi, plow, phigh in zip(pair_inputs, pair_phase_lows, pair_phase_highs):
|
||||
if len(current_sample) < 1:
|
||||
break
|
||||
|
||||
###
|
||||
# This should be abstracted out, but we're going to dump it here for time pressure's sake
|
||||
###
|
||||
# vals = pdme.util.fast_nonlocal_spectrum.signarg(
|
||||
# pdme.util.fast_nonlocal_spectrum.fast_s_nonlocal_dipoleses(
|
||||
# numpy.array([pi]), current_sample
|
||||
# )
|
||||
#
|
||||
vals = pdme.util.fast_nonlocal_spectrum.signarg(
|
||||
pdme.util.fast_nonlocal_spectrum.fast_s_spin_qubit_tarucha_nonlocal_dipoleses(
|
||||
numpy.array([pi]), current_sample
|
||||
)
|
||||
)
|
||||
current_sample = current_sample[
|
||||
numpy.all(
|
||||
((vals > plow) & (vals < phigh)) | ((vals < plow) & (vals > phigh)),
|
||||
axis=1,
|
||||
)
|
||||
]
|
||||
return len(current_sample)
|
||||
|
||||
|
||||
def get_a_result_fast_filter(input) -> int:
|
||||
model, dot_inputs, lows, highs, monte_carlo_count, seed = input
|
||||
|
||||
@ -299,6 +343,7 @@ class RealSpectrumRun:
|
||||
seeds = seed_sequence.spawn(self.monte_carlo_cycles)
|
||||
|
||||
if self.use_pair_measurements:
|
||||
_logger.debug("using pair measurements")
|
||||
current_success = sum(
|
||||
pool.imap_unordered(
|
||||
get_a_result_fast_filter_pairs,
|
||||
@ -320,9 +365,11 @@ class RealSpectrumRun:
|
||||
)
|
||||
)
|
||||
elif self.use_pair_phase_measurements:
|
||||
_logger.debug("using pair phase measurements")
|
||||
_logger.debug("specifically using tarucha")
|
||||
current_success = sum(
|
||||
pool.imap_unordered(
|
||||
get_a_result_fast_filter_pairs,
|
||||
get_a_result_fast_filter_tarucha_spin_qubit_pair_phase_only,
|
||||
[
|
||||
(
|
||||
model,
|
||||
|
166
deepdog/results/__init__.py
Normal file
166
deepdog/results/__init__.py
Normal file
@ -0,0 +1,166 @@
|
||||
import dataclasses
|
||||
import re
|
||||
import typing
|
||||
import logging
|
||||
import deepdog.indexify
|
||||
import pathlib
|
||||
import csv
|
||||
from deepdog.results.read_csv import (
|
||||
parse_bayesrun_row,
|
||||
BayesrunModelResult,
|
||||
parse_general_row,
|
||||
GeneralModelResult,
|
||||
)
|
||||
from deepdog.results.filename import parse_file_slug
|
||||
|
||||
_logger = logging.getLogger(__name__)
|
||||
|
||||
FILENAME_REGEX = re.compile(
|
||||
r"(?P<timestamp>\d{8}-\d{6})-(?P<filename_slug>.*)\.realdata\.fast_filter\.bayesrun\.csv"
|
||||
)
|
||||
|
||||
# probably a better way but who cares
|
||||
NO_TIMESTAMP_FILENAME_REGEX = re.compile(
|
||||
r"(?P<filename_slug>.*)\.realdata\.fast_filter\.bayesrun\.csv"
|
||||
)
|
||||
|
||||
|
||||
SUBSET_SIM_FILENAME_REGEX = re.compile(
|
||||
r"(?P<filename_slug>.*)-(?:no_adaptive_steps_)?(?P<num_ss_runs>\d+)-nc_(?P<n_c>\d+)-ns_(?P<n_s>\d+)-mmax_(?P<mmax>\d+)\.multi\.subsetsim\.csv"
|
||||
)
|
||||
|
||||
|
||||
@dataclasses.dataclass
|
||||
class BayesrunOutputFilename:
|
||||
timestamp: typing.Optional[str]
|
||||
filename_slug: str
|
||||
path: pathlib.Path
|
||||
|
||||
|
||||
@dataclasses.dataclass
|
||||
class BayesrunOutput:
|
||||
filename: BayesrunOutputFilename
|
||||
data: typing.Dict["str", typing.Any]
|
||||
results: typing.Sequence[BayesrunModelResult]
|
||||
|
||||
|
||||
@dataclasses.dataclass
|
||||
class GeneralOutput:
|
||||
filename: BayesrunOutputFilename
|
||||
data: typing.Dict["str", typing.Any]
|
||||
results: typing.Sequence[GeneralModelResult]
|
||||
|
||||
|
||||
def _parse_string_output_filename(
|
||||
filename: str,
|
||||
) -> typing.Tuple[typing.Optional[str], str]:
|
||||
if match := FILENAME_REGEX.match(filename):
|
||||
groups = match.groupdict()
|
||||
return (groups["timestamp"], groups["filename_slug"])
|
||||
elif match := NO_TIMESTAMP_FILENAME_REGEX.match(filename):
|
||||
groups = match.groupdict()
|
||||
return (None, groups["filename_slug"])
|
||||
else:
|
||||
raise ValueError(f"Could not parse {filename} as a bayesrun output filename")
|
||||
|
||||
|
||||
def _parse_output_filename(file: pathlib.Path) -> BayesrunOutputFilename:
|
||||
filename = file.name
|
||||
timestamp, slug = _parse_string_output_filename(filename)
|
||||
return BayesrunOutputFilename(timestamp=timestamp, filename_slug=slug, path=file)
|
||||
|
||||
|
||||
def _parse_ss_output_filename(file: pathlib.Path) -> BayesrunOutputFilename:
|
||||
filename = file.name
|
||||
match = SUBSET_SIM_FILENAME_REGEX.match(filename)
|
||||
if not match:
|
||||
raise ValueError(f"{filename} was not a valid subset sim output")
|
||||
groups = match.groupdict()
|
||||
return BayesrunOutputFilename(
|
||||
filename_slug=groups["filename_slug"], path=file, timestamp=None
|
||||
)
|
||||
|
||||
|
||||
def read_subset_sim_file(
|
||||
file: pathlib.Path, indexifier: typing.Optional[deepdog.indexify.Indexifier]
|
||||
) -> GeneralOutput:
|
||||
|
||||
parsed_filename = tag = _parse_ss_output_filename(file)
|
||||
out = GeneralOutput(filename=parsed_filename, data={}, results=[])
|
||||
|
||||
out.data.update(dataclasses.asdict(tag))
|
||||
parsed_tag = parse_file_slug(parsed_filename.filename_slug)
|
||||
if parsed_tag is None:
|
||||
_logger.warning(
|
||||
f"Could not parse {tag} against any matching regexes. Going to skip tag parsing"
|
||||
)
|
||||
else:
|
||||
out.data.update(parsed_tag)
|
||||
if indexifier is not None:
|
||||
try:
|
||||
job_index = parsed_tag["job_index"]
|
||||
indexified = indexifier.indexify(int(job_index))
|
||||
out.data.update(indexified)
|
||||
except KeyError:
|
||||
# This isn't really that important of an error, apart from the warning
|
||||
_logger.warning(
|
||||
f"Parsed tag to {parsed_tag}, and attempted to indexify but no job_index key was found. skipping and moving on"
|
||||
)
|
||||
|
||||
with file.open() as input_file:
|
||||
reader = csv.DictReader(input_file)
|
||||
rows = [r for r in reader]
|
||||
if len(rows) == 1:
|
||||
row = rows[0]
|
||||
else:
|
||||
raise ValueError(f"Confused about having multiple rows in {file.name}")
|
||||
results = parse_general_row(
|
||||
row, ("num_finished_runs", "num_runs", None, "estimated_likelihood")
|
||||
)
|
||||
|
||||
out.results = results
|
||||
|
||||
return out
|
||||
|
||||
|
||||
def read_output_file(
|
||||
file: pathlib.Path, indexifier: typing.Optional[deepdog.indexify.Indexifier]
|
||||
) -> BayesrunOutput:
|
||||
|
||||
parsed_filename = tag = _parse_output_filename(file)
|
||||
out = BayesrunOutput(filename=parsed_filename, data={}, results=[])
|
||||
|
||||
out.data.update(dataclasses.asdict(tag))
|
||||
parsed_tag = parse_file_slug(parsed_filename.filename_slug)
|
||||
if parsed_tag is None:
|
||||
_logger.warning(
|
||||
f"Could not parse {tag} against any matching regexes. Going to skip tag parsing"
|
||||
)
|
||||
else:
|
||||
out.data.update(parsed_tag)
|
||||
if indexifier is not None:
|
||||
try:
|
||||
job_index = parsed_tag["job_index"]
|
||||
indexified = indexifier.indexify(int(job_index))
|
||||
out.data.update(indexified)
|
||||
except KeyError:
|
||||
# This isn't really that important of an error, apart from the warning
|
||||
_logger.warning(
|
||||
f"Parsed tag to {parsed_tag}, and attempted to indexify but no job_index key was found. skipping and moving on"
|
||||
)
|
||||
|
||||
with file.open() as input_file:
|
||||
reader = csv.DictReader(input_file)
|
||||
rows = [r for r in reader]
|
||||
if len(rows) == 1:
|
||||
row = rows[0]
|
||||
else:
|
||||
raise ValueError(f"Confused about having multiple rows in {file.name}")
|
||||
results = parse_bayesrun_row(row)
|
||||
|
||||
out.results = results
|
||||
|
||||
return out
|
||||
|
||||
|
||||
__all__ = ["read_output_file", "BayesrunOutput"]
|
22
deepdog/results/filename.py
Normal file
22
deepdog/results/filename.py
Normal file
@ -0,0 +1,22 @@
|
||||
import re
|
||||
import typing
|
||||
|
||||
|
||||
FILE_SLUG_REGEXES = [
|
||||
re.compile(pattern)
|
||||
for pattern in [
|
||||
r"(?P<tag>\w+)-(?P<job_index>\d+)",
|
||||
r"mock_tarucha-(?P<job_index>\d+)",
|
||||
r"(?:(?P<mock>mock)_)?tarucha(?:_(?P<tarucha_run_id>\d+))?-(?P<job_index>\d+)",
|
||||
r"(?P<tag>\w+)-(?P<included_dots>[\w,]+)-(?P<target_cost>\d*\.?\d+)-(?P<job_index>\d+)",
|
||||
]
|
||||
]
|
||||
|
||||
|
||||
def parse_file_slug(slug: str) -> typing.Optional[typing.Dict[str, str]]:
|
||||
for pattern in FILE_SLUG_REGEXES:
|
||||
match = pattern.match(slug)
|
||||
if match:
|
||||
return match.groupdict()
|
||||
else:
|
||||
return None
|
141
deepdog/results/read_csv.py
Normal file
141
deepdog/results/read_csv.py
Normal file
@ -0,0 +1,141 @@
|
||||
import typing
|
||||
import re
|
||||
import dataclasses
|
||||
|
||||
MODEL_REGEXES = [
|
||||
re.compile(pattern)
|
||||
for pattern in [
|
||||
r"geom_(?P<xmin>-?\d+)_(?P<xmax>-?\d+)_(?P<ymin>-?\d+)_(?P<ymax>-?\d+)_(?P<zmin>-?\d+)_(?P<zmax>-?\d+)-orientation_(?P<orientation>free|fixedxy|fixedz)-dipole_count_(?P<avg_filled>\d+)_(?P<field_name>\w*)",
|
||||
r"geom_(?P<xmin>-?\d+)_(?P<xmax>-?\d+)_(?P<ymin>-?\d+)_(?P<ymax>-?\d+)_(?P<zmin>-?\d+)_(?P<zmax>-?\d+)-magnitude_(?P<log_magnitude>\d*\.?\d+)-orientation_(?P<orientation>free|fixedxy|fixedz)-dipole_count_(?P<avg_filled>\d+)_(?P<field_name>\w*)",
|
||||
r"geom_(?P<xmin>-?\d*\.?\d+)_(?P<xmax>-?\d*\.?\d+)_(?P<ymin>-?\d*\.?\d+)_(?P<ymax>-?\d*\.?\d+)_(?P<zmin>-?\d*\.?\d+)_(?P<zmax>-?\d*\.?\d+)-magnitude_(?P<log_magnitude>\d*\.?\d+)-orientation_(?P<orientation>free|fixedxy|fixedz)-dipole_count_(?P<avg_filled>\d+)_(?P<field_name>\w*)",
|
||||
r"geom_(?P<xmin>-?\d+)_(?P<xmax>-?\d+)_(?P<ymin>-?\d+)_(?P<ymax>-?\d+)_(?P<zmin>-?\d+)_(?P<zmax>-?\d+)-magnitude_(?P<log_magnitude>-?\d*\.?\d+)-orientation_(?P<orientation>free|fixedxy|fixedz)-dipole_count_(?P<avg_filled>\d+)_(?P<field_name>\w*)",
|
||||
r"geom_(?P<xmin>-?\d*\.?\d+)_(?P<xmax>-?\d*\.?\d+)_(?P<ymin>-?\d*\.?\d+)_(?P<ymax>-?\d*\.?\d+)_(?P<zmin>-?\d*\.?\d+)_(?P<zmax>-?\d*\.?\d+)-magnitude_(?P<log_magnitude>-?\d*\.?\d+)-orientation_(?P<orientation>free|fixedxy|fixedz)-dipole_count_(?P<avg_filled>\d+)_(?P<field_name>\w*)",
|
||||
]
|
||||
]
|
||||
|
||||
|
||||
@dataclasses.dataclass
|
||||
class BayesrunModelResult:
|
||||
parsed_model_keys: typing.Dict[str, str]
|
||||
success: int
|
||||
count: int
|
||||
|
||||
|
||||
@dataclasses.dataclass
|
||||
class GeneralModelResult:
|
||||
parsed_model_keys: typing.Dict[str, str]
|
||||
result_dict: typing.Dict[str, str]
|
||||
|
||||
|
||||
class BayesrunColumnParsed:
|
||||
"""
|
||||
class for parsing a bayesrun while pulling certain special fields out
|
||||
"""
|
||||
|
||||
def __init__(self, groupdict: typing.Dict[str, str]):
|
||||
self.column_field = groupdict["field_name"]
|
||||
self.model_field_dict = {
|
||||
k: v for k, v in groupdict.items() if k != "field_name"
|
||||
}
|
||||
self._groupdict_str = repr(groupdict)
|
||||
|
||||
def __str__(self):
|
||||
return f"BayesrunColumnParsed[{self.column_field}: {self.model_field_dict}]"
|
||||
|
||||
def __repr__(self):
|
||||
return f"BayesrunColumnParsed({self._groupdict_str})"
|
||||
|
||||
def __eq__(self, other):
|
||||
if isinstance(other, BayesrunColumnParsed):
|
||||
return (self.column_field == other.column_field) and (
|
||||
self.model_field_dict == other.model_field_dict
|
||||
)
|
||||
return NotImplemented
|
||||
|
||||
|
||||
def _parse_bayesrun_column(
|
||||
column: str,
|
||||
) -> typing.Optional[BayesrunColumnParsed]:
|
||||
"""
|
||||
Tries one by one all of a predefined list of regexes that I might have used in the past.
|
||||
Returns the groupdict for the first match, or None if no match found.
|
||||
"""
|
||||
for pattern in MODEL_REGEXES:
|
||||
match = pattern.match(column)
|
||||
if match:
|
||||
return BayesrunColumnParsed(match.groupdict())
|
||||
else:
|
||||
return None
|
||||
|
||||
|
||||
def _batch_iterable_into_chunks(iterable, n=1):
|
||||
"""
|
||||
utility for batching bayesrun files where columns appear in threes
|
||||
"""
|
||||
for ndx in range(0, len(iterable), n):
|
||||
yield iterable[ndx : min(ndx + n, len(iterable))]
|
||||
|
||||
|
||||
def parse_general_row(
|
||||
row: typing.Dict[str, str],
|
||||
expected_fields: typing.Sequence[typing.Optional[str]],
|
||||
) -> typing.Sequence[GeneralModelResult]:
|
||||
results = []
|
||||
batched_keys = _batch_iterable_into_chunks(list(row.keys()), len(expected_fields))
|
||||
for model_keys in batched_keys:
|
||||
parsed = [_parse_bayesrun_column(column) for column in model_keys]
|
||||
values = [row[column] for column in model_keys]
|
||||
|
||||
result_dict = {}
|
||||
parsed_keys = None
|
||||
for expected_field, parsed_field, value in zip(expected_fields, parsed, values):
|
||||
if expected_field is None:
|
||||
continue
|
||||
if parsed_field is None:
|
||||
raise ValueError(
|
||||
f"No viable row found for {expected_field=} in {model_keys=}"
|
||||
)
|
||||
if parsed_field.column_field != expected_field:
|
||||
raise ValueError(
|
||||
f"The column {parsed_field.column_field} does not match expected {expected_field}"
|
||||
)
|
||||
result_dict[expected_field] = value
|
||||
if parsed_keys is None:
|
||||
parsed_keys = parsed_field.model_field_dict
|
||||
|
||||
if parsed_keys is None:
|
||||
raise ValueError(f"Somehow parsed keys is none here, for {row=}")
|
||||
results.append(
|
||||
GeneralModelResult(parsed_model_keys=parsed_keys, result_dict=result_dict)
|
||||
)
|
||||
return results
|
||||
|
||||
|
||||
def parse_bayesrun_row(
|
||||
row: typing.Dict[str, str],
|
||||
) -> typing.Sequence[BayesrunModelResult]:
|
||||
|
||||
results = []
|
||||
batched_keys = _batch_iterable_into_chunks(list(row.keys()), 3)
|
||||
for model_keys in batched_keys:
|
||||
parsed = [_parse_bayesrun_column(column) for column in model_keys]
|
||||
values = [row[column] for column in model_keys]
|
||||
if parsed[0] is None:
|
||||
raise ValueError(f"no viable success row found for keys {model_keys}")
|
||||
if parsed[1] is None:
|
||||
raise ValueError(f"no viable count row found for keys {model_keys}")
|
||||
if parsed[0].column_field != "success":
|
||||
raise ValueError(f"The column {model_keys[0]} is not a success field")
|
||||
if parsed[1].column_field != "count":
|
||||
raise ValueError(f"The column {model_keys[1]} is not a count field")
|
||||
parsed_keys = parsed[0].model_field_dict
|
||||
success = int(values[0])
|
||||
count = int(values[1])
|
||||
results.append(
|
||||
BayesrunModelResult(
|
||||
parsed_model_keys=parsed_keys,
|
||||
success=success,
|
||||
count=count,
|
||||
)
|
||||
)
|
||||
return results
|
@ -1,9 +1,11 @@
|
||||
import logging
|
||||
import multiprocessing
|
||||
import numpy
|
||||
import pdme.measurement
|
||||
import pdme.measurement.input_types
|
||||
import pdme.model
|
||||
import pdme.subspace_simulation
|
||||
from typing import Sequence, Tuple, Optional
|
||||
from typing import Sequence, Tuple, Optional, Callable, Union, List
|
||||
|
||||
from dataclasses import dataclass
|
||||
|
||||
@ -18,47 +20,63 @@ class SubsetSimulationResult:
|
||||
under_target_cost: Optional[float]
|
||||
under_target_likelihood: Optional[float]
|
||||
lowest_likelihood: Optional[float]
|
||||
messages: Sequence[str]
|
||||
|
||||
|
||||
@dataclass
|
||||
class MultiSubsetSimulationResult:
|
||||
child_results: Sequence[SubsetSimulationResult]
|
||||
model_name: str
|
||||
estimated_likelihood: float
|
||||
arithmetic_mean_estimated_likelihood: float
|
||||
num_children: int
|
||||
num_finished_children: int
|
||||
clean_estimate: bool
|
||||
|
||||
|
||||
class SubsetSimulation:
|
||||
def __init__(
|
||||
self,
|
||||
model_name_pair,
|
||||
dot_inputs,
|
||||
actual_measurements: Sequence[pdme.measurement.DotMeasurement],
|
||||
# actual_measurements: Sequence[pdme.measurement.DotMeasurement],
|
||||
cost_function: Callable[[numpy.ndarray], numpy.ndarray],
|
||||
n_c: int,
|
||||
n_s: int,
|
||||
m_max: int,
|
||||
target_cost: Optional[float] = None,
|
||||
level_0_seed: int = 200,
|
||||
mcmc_seed: int = 20,
|
||||
level_0_seed: Union[int, Sequence[int]] = 200,
|
||||
mcmc_seed: Union[int, Sequence[int]] = 20,
|
||||
use_adaptive_steps=True,
|
||||
default_phi_step=0.01,
|
||||
default_theta_step=0.01,
|
||||
default_r_step=0.01,
|
||||
default_w_log_step=0.01,
|
||||
default_upper_w_log_step=4,
|
||||
num_initial_dmc_gens=1,
|
||||
keep_probs_list=True,
|
||||
dump_last_generation_to_file=False,
|
||||
initial_cost_chunk_size=100,
|
||||
initial_cost_multiprocess=True,
|
||||
cap_core_count: int = 0, # 0 means cap at num cores - 1
|
||||
):
|
||||
name, model = model_name_pair
|
||||
self.model_name = name
|
||||
self.model = model
|
||||
_logger.info(f"got model {self.model_name}")
|
||||
|
||||
self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
|
||||
dot_inputs
|
||||
)
|
||||
# dot_inputs = [(meas.r, meas.f) for meas in actual_measurements]
|
||||
# self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
|
||||
# dot_inputs
|
||||
# )
|
||||
# _logger.debug(f"actual measurements: {actual_measurements}")
|
||||
self.actual_measurement_array = numpy.array([m.v for m in actual_measurements])
|
||||
# self.actual_measurement_array = numpy.array([m.v for m in actual_measurements])
|
||||
|
||||
def cost_function_to_use(dipoles_to_test):
|
||||
return pdme.subspace_simulation.proportional_costs_vs_actual_measurement(
|
||||
self.dot_inputs_array, self.actual_measurement_array, dipoles_to_test
|
||||
)
|
||||
# def cost_function_to_use(dipoles_to_test):
|
||||
# return pdme.subspace_simulation.proportional_costs_vs_actual_measurement(
|
||||
# self.dot_inputs_array, self.actual_measurement_array, dipoles_to_test
|
||||
# )
|
||||
|
||||
self.cost_function_to_use = cost_function_to_use
|
||||
self.cost_function_to_use = cost_function
|
||||
|
||||
self.n_c = n_c
|
||||
self.n_s = n_s
|
||||
@ -68,16 +86,25 @@ class SubsetSimulation:
|
||||
self.mcmc_seed = mcmc_seed
|
||||
|
||||
self.use_adaptive_steps = use_adaptive_steps
|
||||
self.default_phi_step = default_phi_step
|
||||
self.default_phi_step = (
|
||||
default_phi_step * 1.73
|
||||
) # this is a hack to fix a missing sqrt 3 in the proposal function code.
|
||||
self.default_theta_step = default_theta_step
|
||||
self.default_r_step = default_r_step
|
||||
self.default_w_log_step = default_w_log_step
|
||||
self.default_r_step = (
|
||||
default_r_step * 1.73
|
||||
) # this is a hack to fix a missing sqrt 3 in the proposal function code.
|
||||
self.default_w_log_step = (
|
||||
default_w_log_step * 1.73
|
||||
) # this is a hack to fix a missing sqrt 3 in the proposal function code.
|
||||
self.default_upper_w_log_step = default_upper_w_log_step
|
||||
|
||||
_logger.info("using params:")
|
||||
_logger.info(f"\tn_c: {self.n_c}")
|
||||
_logger.info(f"\tn_s: {self.n_s}")
|
||||
_logger.info(f"\tm: {self.m_max}")
|
||||
_logger.info(f"\t{num_initial_dmc_gens=}")
|
||||
_logger.info(f"\t{mcmc_seed=}")
|
||||
_logger.info(f"\t{level_0_seed=}")
|
||||
_logger.info("let's do level 0...")
|
||||
|
||||
self.target_cost = target_cost
|
||||
@ -87,44 +114,91 @@ class SubsetSimulation:
|
||||
self.dump_last_generations = dump_last_generation_to_file
|
||||
|
||||
self.initial_cost_chunk_size = initial_cost_chunk_size
|
||||
self.initial_cost_multiprocess = initial_cost_multiprocess
|
||||
|
||||
self.cap_core_count = cap_core_count
|
||||
|
||||
self.num_dmc_gens = num_initial_dmc_gens
|
||||
|
||||
def _single_chain_gen(self, args: Tuple):
|
||||
threshold_cost, stdevs, rng_seed, (c, s) = args
|
||||
rng = numpy.random.default_rng(rng_seed)
|
||||
return self.model.get_repeat_counting_mcmc_chain(
|
||||
s,
|
||||
self.cost_function_to_use,
|
||||
self.n_s,
|
||||
threshold_cost,
|
||||
stdevs,
|
||||
initial_cost=c,
|
||||
rng_arg=rng,
|
||||
)
|
||||
|
||||
def execute(self) -> SubsetSimulationResult:
|
||||
|
||||
probs_list = []
|
||||
|
||||
output_messages = []
|
||||
|
||||
# If we have n_s = 10 and n_c = 100, then our big N = 1000 and p = 1/10
|
||||
# The DMC stage would normally generate 1000, then pick the best 100 and start counting prob = p/10.
|
||||
# Let's say we want our DMC stage to go down to level 2.
|
||||
# Then we need to filter out p^2, so our initial has to be N_0 = N / p = n_c * n_s^2
|
||||
initial_dmc_n = self.n_c * (self.n_s**self.num_dmc_gens)
|
||||
initial_level = (
|
||||
self.num_dmc_gens - 1
|
||||
) # This is perfunctory but let's label it here really explicitly
|
||||
_logger.info(f"Generating {initial_dmc_n} for DMC stage")
|
||||
sample_dipoles = self.model.get_monte_carlo_dipole_inputs(
|
||||
self.n_c * self.n_s,
|
||||
initial_dmc_n,
|
||||
-1,
|
||||
rng_to_use=numpy.random.default_rng(self.level_0_seed),
|
||||
)
|
||||
# _logger.debug(sample_dipoles)
|
||||
# _logger.debug(sample_dipoles.shape)
|
||||
|
||||
raw_costs = []
|
||||
_logger.debug("Finished dipole generation")
|
||||
_logger.debug(
|
||||
f"Using iterated cost function thing with chunk size {self.initial_cost_chunk_size}"
|
||||
f"Using iterated multiprocessing cost function thing with chunk size {self.initial_cost_chunk_size}"
|
||||
)
|
||||
|
||||
for x in range(0, len(sample_dipoles), self.initial_cost_chunk_size):
|
||||
_logger.debug(f"doing chunk {x}")
|
||||
raw_costs.extend(
|
||||
self.cost_function_to_use(
|
||||
sample_dipoles[x : x + self.initial_cost_chunk_size]
|
||||
)
|
||||
)
|
||||
costs = numpy.array(raw_costs)
|
||||
# core count etc. logic here
|
||||
core_count = multiprocessing.cpu_count() - 1 or 1
|
||||
if (self.cap_core_count >= 1) and (self.cap_core_count < core_count):
|
||||
core_count = self.cap_core_count
|
||||
_logger.info(f"Using {core_count} cores")
|
||||
|
||||
_logger.debug(f"costs: {costs}")
|
||||
with multiprocessing.Pool(core_count) as pool:
|
||||
|
||||
# Do the initial DMC calculation in a multiprocessing
|
||||
|
||||
chunks = numpy.array_split(
|
||||
sample_dipoles,
|
||||
range(
|
||||
self.initial_cost_chunk_size,
|
||||
len(sample_dipoles),
|
||||
self.initial_cost_chunk_size,
|
||||
),
|
||||
)
|
||||
if self.initial_cost_multiprocess:
|
||||
_logger.debug("Multiprocessing initial costs")
|
||||
raw_costs = pool.map(self.cost_function_to_use, chunks)
|
||||
else:
|
||||
_logger.debug("Single process initial costs")
|
||||
raw_costs = []
|
||||
for chunk_idx, chunk in enumerate(chunks):
|
||||
_logger.debug(f"doing chunk #{chunk_idx}")
|
||||
raw_costs.append(self.cost_function_to_use(chunk))
|
||||
costs = numpy.concatenate(raw_costs)
|
||||
_logger.debug("finished initial dmc cost calculation")
|
||||
# _logger.debug(f"costs: {costs}")
|
||||
sorted_indexes = costs.argsort()[::-1]
|
||||
|
||||
_logger.debug(costs[sorted_indexes])
|
||||
_logger.debug(sample_dipoles[sorted_indexes])
|
||||
# _logger.debug(costs[sorted_indexes])
|
||||
# _logger.debug(sample_dipoles[sorted_indexes])
|
||||
|
||||
sorted_costs = costs[sorted_indexes]
|
||||
sorted_dipoles = sample_dipoles[sorted_indexes]
|
||||
|
||||
threshold_cost = sorted_costs[-self.n_c]
|
||||
|
||||
all_dipoles = numpy.array(
|
||||
[
|
||||
pdme.subspace_simulation.sort_array_of_dipoles_by_frequency(samp)
|
||||
@ -132,10 +206,36 @@ class SubsetSimulation:
|
||||
]
|
||||
)
|
||||
all_chains = list(zip(sorted_costs, all_dipoles))
|
||||
for dmc_level in range(initial_level):
|
||||
# if initial level is 1, we want to print out what the level 0 threshold would have been?
|
||||
_logger.debug(f"Get the pseudo statistics for level {dmc_level}")
|
||||
_logger.debug(f"Whole chain has length {len(all_chains)}")
|
||||
pseudo_threshold_index = -(
|
||||
self.n_c * (self.n_s ** (self.num_dmc_gens - dmc_level - 1))
|
||||
)
|
||||
_logger.debug(
|
||||
f"Have a pseudo_threshold_index of {pseudo_threshold_index}, or {len(all_chains) + pseudo_threshold_index}"
|
||||
)
|
||||
pseudo_threshold_cost = all_chains[-pseudo_threshold_index][0]
|
||||
_logger.info(
|
||||
f"Pseudo-level {dmc_level} threshold cost {pseudo_threshold_cost}, at P = (1 / {self.n_s})^{dmc_level + 1}"
|
||||
)
|
||||
all_chains = all_chains[pseudo_threshold_index:]
|
||||
|
||||
mcmc_rng = numpy.random.default_rng(self.mcmc_seed)
|
||||
long_mcmc_rng = numpy.random.default_rng(self.mcmc_seed)
|
||||
mcmc_rng_seed_sequence = numpy.random.SeedSequence(self.mcmc_seed)
|
||||
|
||||
for i in range(self.m_max):
|
||||
threshold_cost = all_chains[-self.n_c][0]
|
||||
_logger.info(
|
||||
f"Finishing DMC threshold cost {threshold_cost} at level {initial_level}, at P = (1 / {self.n_s})^{initial_level + 1}"
|
||||
)
|
||||
_logger.debug(f"Executing the MCMC with chains of length {len(all_chains)}")
|
||||
|
||||
# Now we move on to the MCMC part of the algorithm
|
||||
|
||||
# This is important, we want to allow some extra initial levels so we need to account for that here!
|
||||
for i in range(self.num_dmc_gens, self.m_max):
|
||||
_logger.info(f"Starting level {i}")
|
||||
next_seeds = all_chains[-self.n_c :]
|
||||
|
||||
if self.dump_last_generations:
|
||||
@ -158,7 +258,9 @@ class SubsetSimulation:
|
||||
):
|
||||
# chain = mcmc(s, threshold_cost, n_s, model, dot_inputs_array, actual_measurement_array, mcmc_rng, curr_cost=c, stdevs=stdevs)
|
||||
# until new version gotta do
|
||||
_logger.debug(f"\t{seed_index}: doing long chain on the next seed")
|
||||
_logger.debug(
|
||||
f"\t{seed_index}: doing long chain on the next seed"
|
||||
)
|
||||
|
||||
long_chain = self.model.get_mcmc_chain(
|
||||
s,
|
||||
@ -167,7 +269,7 @@ class SubsetSimulation:
|
||||
threshold_cost,
|
||||
stdevs,
|
||||
initial_cost=c,
|
||||
rng_arg=mcmc_rng,
|
||||
rng_arg=long_mcmc_rng,
|
||||
)
|
||||
for _, chained in long_chain:
|
||||
all_long_chains.append(chained)
|
||||
@ -184,7 +286,10 @@ class SubsetSimulation:
|
||||
for cost_index, cost_chain in enumerate(all_chains[: -self.n_c]):
|
||||
probs_list.append(
|
||||
(
|
||||
((self.n_c * self.n_s - cost_index) / (self.n_c * self.n_s))
|
||||
(
|
||||
(self.n_c * self.n_s - cost_index)
|
||||
/ (self.n_c * self.n_s)
|
||||
)
|
||||
/ (self.n_s ** (i)),
|
||||
cost_chain[0],
|
||||
i + 1,
|
||||
@ -194,31 +299,42 @@ class SubsetSimulation:
|
||||
next_seeds_as_array = numpy.array([s for _, s in next_seeds])
|
||||
|
||||
stdevs = self.get_stdevs_from_arrays(next_seeds_as_array)
|
||||
_logger.info(f"got stdevs: {stdevs.stdevs}")
|
||||
_logger.debug(f"got stdevs, begin: {stdevs.stdevs[:10]}")
|
||||
_logger.debug("Starting the MCMC")
|
||||
all_chains = []
|
||||
for seed_index, (c, s) in enumerate(next_seeds):
|
||||
# chain = mcmc(s, threshold_cost, n_s, model, dot_inputs_array, actual_measurement_array, mcmc_rng, curr_cost=c, stdevs=stdevs)
|
||||
# until new version gotta do
|
||||
_logger.debug(
|
||||
f"\t{seed_index}: getting another chain from the next seed"
|
||||
)
|
||||
chain = self.model.get_mcmc_chain(
|
||||
s,
|
||||
self.cost_function_to_use,
|
||||
self.n_s,
|
||||
threshold_cost,
|
||||
stdevs,
|
||||
initial_cost=c,
|
||||
rng_arg=mcmc_rng,
|
||||
|
||||
seeds = mcmc_rng_seed_sequence.spawn(len(next_seeds))
|
||||
pool_results = pool.imap_unordered(
|
||||
self._single_chain_gen,
|
||||
[
|
||||
(threshold_cost, stdevs, rng_seed, test_seed)
|
||||
for rng_seed, test_seed in zip(seeds, next_seeds)
|
||||
],
|
||||
chunksize=50,
|
||||
)
|
||||
|
||||
# count for ergodicity analysis
|
||||
samples_generated = 0
|
||||
samples_rejected = 0
|
||||
|
||||
for rejected_count, chain in pool_results:
|
||||
for cost, chained in chain:
|
||||
try:
|
||||
filtered_cost = cost[0]
|
||||
except (IndexError, TypeError):
|
||||
filtered_cost = cost
|
||||
all_chains.append((filtered_cost, chained))
|
||||
|
||||
samples_generated += self.n_s
|
||||
samples_rejected += rejected_count
|
||||
|
||||
_logger.debug("finished mcmc")
|
||||
_logger.debug(f"{samples_rejected=} out of {samples_generated=}")
|
||||
if samples_rejected * 2 > samples_generated:
|
||||
reject_ratio = samples_rejected / samples_generated
|
||||
rejectionmessage = f"On level {i}, rejected {samples_rejected} out of {samples_generated}, {reject_ratio=} is too high and may indicate ergodicity problems"
|
||||
output_messages.append(rejectionmessage)
|
||||
_logger.warning(rejectionmessage)
|
||||
# _logger.debug(all_chains)
|
||||
|
||||
all_chains.sort(key=lambda c: c[0], reverse=True)
|
||||
@ -228,7 +344,9 @@ class SubsetSimulation:
|
||||
_logger.info(
|
||||
f"current threshold cost: {threshold_cost}, at P = (1 / {self.n_s})^{i + 1}"
|
||||
)
|
||||
if (self.target_cost is not None) and (threshold_cost < self.target_cost):
|
||||
if (self.target_cost is not None) and (
|
||||
threshold_cost < self.target_cost
|
||||
):
|
||||
_logger.info(
|
||||
f"got a threshold cost {threshold_cost}, less than {self.target_cost}. will leave early"
|
||||
)
|
||||
@ -236,6 +354,8 @@ class SubsetSimulation:
|
||||
cost_list = [c[0] for c in all_chains]
|
||||
over_index = reverse_bisect_right(cost_list, self.target_cost)
|
||||
|
||||
winner = all_chains[over_index][1]
|
||||
_logger.info(f"Winner obtained: {winner}")
|
||||
shorter_probs_list = []
|
||||
for cost_index, cost_chain in enumerate(all_chains):
|
||||
if self.keep_probs_list:
|
||||
@ -253,7 +373,10 @@ class SubsetSimulation:
|
||||
shorter_probs_list.append(
|
||||
(
|
||||
cost_chain[0],
|
||||
((self.n_c * self.n_s - cost_index) / (self.n_c * self.n_s))
|
||||
(
|
||||
(self.n_c * self.n_s - cost_index)
|
||||
/ (self.n_c * self.n_s)
|
||||
)
|
||||
/ (self.n_s ** (i)),
|
||||
)
|
||||
)
|
||||
@ -265,6 +388,7 @@ class SubsetSimulation:
|
||||
under_target_cost=shorter_probs_list[over_index][0],
|
||||
under_target_likelihood=shorter_probs_list[over_index][1],
|
||||
lowest_likelihood=shorter_probs_list[-1][1],
|
||||
messages=output_messages,
|
||||
)
|
||||
return result
|
||||
|
||||
@ -285,8 +409,8 @@ class SubsetSimulation:
|
||||
_logger.info(
|
||||
f"final threshold cost: {threshold_cost}, at P = (1 / {self.n_s})^{self.m_max + 1}"
|
||||
)
|
||||
for a in all_chains[-10:]:
|
||||
_logger.info(a)
|
||||
# for a in all_chains[-10:]:
|
||||
# _logger.info(a)
|
||||
# for prob, prob_cost in probs_list:
|
||||
# _logger.info(f"\t{prob}: {prob_cost}")
|
||||
probs_list.sort(key=lambda c: c[0], reverse=True)
|
||||
@ -300,6 +424,7 @@ class SubsetSimulation:
|
||||
under_target_cost=None,
|
||||
under_target_likelihood=None,
|
||||
lowest_likelihood=min_likelihood,
|
||||
messages=output_messages,
|
||||
)
|
||||
return result
|
||||
|
||||
@ -358,6 +483,116 @@ class SubsetSimulation:
|
||||
return stdevs
|
||||
|
||||
|
||||
class MultiSubsetSimulations:
|
||||
def __init__(
|
||||
self,
|
||||
model_name_pairs: Sequence[Tuple[str, pdme.model.DipoleModel]],
|
||||
# actual_measurements: Sequence[pdme.measurement.DotMeasurement],
|
||||
cost_function: Callable[[numpy.ndarray], numpy.ndarray],
|
||||
num_runs: int,
|
||||
n_c: int,
|
||||
n_s: int,
|
||||
m_max: int,
|
||||
target_cost: float,
|
||||
num_initial_dmc_gens: int = 1,
|
||||
level_0_seed_seed: int = 200,
|
||||
mcmc_seed_seed: int = 20,
|
||||
use_adaptive_steps=True,
|
||||
default_phi_step=0.01,
|
||||
default_theta_step=0.01,
|
||||
default_r_step=0.01,
|
||||
default_w_log_step=0.01,
|
||||
default_upper_w_log_step=4,
|
||||
initial_cost_chunk_size=100,
|
||||
cap_core_count: int = 0, # 0 means cap at num cores - 1
|
||||
):
|
||||
self.model_name_pairs = model_name_pairs
|
||||
self.cost_function = cost_function
|
||||
self.num_runs = num_runs
|
||||
self.n_c = n_c
|
||||
self.n_s = n_s
|
||||
self.m_max = m_max
|
||||
self.target_cost = target_cost # This is not optional here!
|
||||
|
||||
self.num_dmc_gens = num_initial_dmc_gens
|
||||
|
||||
self.level_0_seed_seed = level_0_seed_seed
|
||||
self.mcmc_seed_seed = mcmc_seed_seed
|
||||
|
||||
self.use_adaptive_steps = use_adaptive_steps
|
||||
self.default_phi_step = default_phi_step
|
||||
self.default_theta_step = default_theta_step
|
||||
self.default_r_step = default_r_step
|
||||
self.default_w_log_step = default_w_log_step
|
||||
self.default_upper_w_log_step = default_upper_w_log_step
|
||||
self.initial_cost_chunk_size = initial_cost_chunk_size
|
||||
self.cap_core_count = cap_core_count
|
||||
|
||||
def execute(self) -> Sequence[MultiSubsetSimulationResult]:
|
||||
output: List[MultiSubsetSimulationResult] = []
|
||||
for model_index, model_name_pair in enumerate(self.model_name_pairs):
|
||||
ss_results = [
|
||||
SubsetSimulation(
|
||||
model_name_pair,
|
||||
self.cost_function,
|
||||
self.n_c,
|
||||
self.n_s,
|
||||
self.m_max,
|
||||
self.target_cost,
|
||||
num_initial_dmc_gens=self.num_dmc_gens,
|
||||
level_0_seed=[model_index, run_index, self.level_0_seed_seed],
|
||||
mcmc_seed=[model_index, run_index, self.mcmc_seed_seed],
|
||||
use_adaptive_steps=self.use_adaptive_steps,
|
||||
default_phi_step=self.default_phi_step,
|
||||
default_theta_step=self.default_theta_step,
|
||||
default_r_step=self.default_r_step,
|
||||
default_w_log_step=self.default_w_log_step,
|
||||
default_upper_w_log_step=self.default_upper_w_log_step,
|
||||
keep_probs_list=False,
|
||||
dump_last_generation_to_file=False,
|
||||
initial_cost_chunk_size=self.initial_cost_chunk_size,
|
||||
cap_core_count=self.cap_core_count,
|
||||
).execute()
|
||||
for run_index in range(self.num_runs)
|
||||
]
|
||||
output.append(coalesce_ss_results(model_name_pair[0], ss_results))
|
||||
return output
|
||||
|
||||
|
||||
def coalesce_ss_results(
|
||||
model_name: str, results: Sequence[SubsetSimulationResult]
|
||||
) -> MultiSubsetSimulationResult:
|
||||
|
||||
num_finished = sum(1 for res in results if res.under_target_likelihood is not None)
|
||||
|
||||
estimated_likelihoods = numpy.array(
|
||||
[
|
||||
res.under_target_likelihood
|
||||
if res.under_target_likelihood is not None
|
||||
else res.lowest_likelihood
|
||||
for res in results
|
||||
]
|
||||
)
|
||||
|
||||
_logger.info(estimated_likelihoods)
|
||||
geometric_mean_estimated_likelihoods = numpy.exp(
|
||||
numpy.log(estimated_likelihoods).mean()
|
||||
)
|
||||
_logger.info(geometric_mean_estimated_likelihoods)
|
||||
arithmetic_mean_estimated_likelihoods = estimated_likelihoods.mean()
|
||||
|
||||
result = MultiSubsetSimulationResult(
|
||||
child_results=results,
|
||||
model_name=model_name,
|
||||
estimated_likelihood=geometric_mean_estimated_likelihoods,
|
||||
arithmetic_mean_estimated_likelihood=arithmetic_mean_estimated_likelihoods,
|
||||
num_children=len(results),
|
||||
num_finished_children=num_finished,
|
||||
clean_estimate=num_finished == len(results),
|
||||
)
|
||||
return result
|
||||
|
||||
|
||||
def reverse_bisect_right(a, x, lo=0, hi=None):
|
||||
"""Return the index where to insert item x in list a, assuming a is sorted in descending order.
|
||||
|
||||
|
38
do.sh
38
do.sh
@ -1,38 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# Do - The Simplest Build Tool on Earth.
|
||||
# Documentation and examples see https://github.com/8gears/do
|
||||
|
||||
set -Eeuo pipefail # -e "Automatic exit from bash shell script on error" -u "Treat unset variables and parameters as errors"
|
||||
|
||||
build() {
|
||||
echo "I am ${FUNCNAME[0]}ing"
|
||||
poetry build
|
||||
}
|
||||
|
||||
test() {
|
||||
echo "I am ${FUNCNAME[0]}ing"
|
||||
poetry run flake8 deepdog tests
|
||||
poetry run mypy deepdog
|
||||
poetry run pytest
|
||||
}
|
||||
|
||||
fmt() {
|
||||
poetry run black .
|
||||
find . -not \( -path "./.*" -type d -prune \) -type f -name "*.py" -exec sed -i -e 's/ /\t/g' {} \;
|
||||
}
|
||||
|
||||
release() {
|
||||
./scripts/release.sh
|
||||
}
|
||||
|
||||
htmlcov() {
|
||||
poetry run pytest --cov-report=html
|
||||
}
|
||||
|
||||
all() {
|
||||
build && test
|
||||
}
|
||||
|
||||
"$@" # <- execute the task
|
||||
|
||||
[ "$#" -gt 0 ] || printf "Usage:\n\t./do.sh %s\n" "($(compgen -A function | grep '^[^_]' | paste -sd '|' -))"
|
145
flake.lock
generated
145
flake.lock
generated
@ -1,28 +1,33 @@
|
||||
{
|
||||
"nodes": {
|
||||
"flake-utils": {
|
||||
"inputs": {
|
||||
"systems": "systems"
|
||||
},
|
||||
"locked": {
|
||||
"lastModified": 1648297722,
|
||||
"narHash": "sha256-W+qlPsiZd8F3XkzXOzAoR+mpFqzm3ekQkJNa+PIh1BQ=",
|
||||
"lastModified": 1710146030,
|
||||
"narHash": "sha256-SZ5L6eA7HJ/nmkzGG7/ISclqe6oZdOZTNoesiInkXPQ=",
|
||||
"owner": "numtide",
|
||||
"repo": "flake-utils",
|
||||
"rev": "0f8662f1319ad6abf89b3380dd2722369fc51ade",
|
||||
"rev": "b1d9ab70662946ef0850d488da1c9019f3a9752a",
|
||||
"type": "github"
|
||||
},
|
||||
"original": {
|
||||
"owner": "numtide",
|
||||
"repo": "flake-utils",
|
||||
"rev": "0f8662f1319ad6abf89b3380dd2722369fc51ade",
|
||||
"type": "github"
|
||||
}
|
||||
},
|
||||
"flake-utils_2": {
|
||||
"inputs": {
|
||||
"systems": "systems_2"
|
||||
},
|
||||
"locked": {
|
||||
"lastModified": 1653893745,
|
||||
"narHash": "sha256-0jntwV3Z8//YwuOjzhV2sgJJPt+HY6KhU7VZUL0fKZQ=",
|
||||
"lastModified": 1705309234,
|
||||
"narHash": "sha256-uNRRNRKmJyCRC/8y1RqBkqWBLM034y4qN7EprSdmgyA=",
|
||||
"owner": "numtide",
|
||||
"repo": "flake-utils",
|
||||
"rev": "1ed9fb1935d260de5fe1c2f7ee0ebaae17ed2fa1",
|
||||
"rev": "1ef2e671c3b0c19053962c07dbda38332dcebf26",
|
||||
"type": "github"
|
||||
},
|
||||
"original": {
|
||||
@ -31,29 +36,34 @@
|
||||
"type": "github"
|
||||
}
|
||||
},
|
||||
"nix-github-actions": {
|
||||
"inputs": {
|
||||
"nixpkgs": [
|
||||
"poetry2nixSrc",
|
||||
"nixpkgs"
|
||||
]
|
||||
},
|
||||
"locked": {
|
||||
"lastModified": 1703863825,
|
||||
"narHash": "sha256-rXwqjtwiGKJheXB43ybM8NwWB8rO2dSRrEqes0S7F5Y=",
|
||||
"owner": "nix-community",
|
||||
"repo": "nix-github-actions",
|
||||
"rev": "5163432afc817cf8bd1f031418d1869e4c9d5547",
|
||||
"type": "github"
|
||||
},
|
||||
"original": {
|
||||
"owner": "nix-community",
|
||||
"repo": "nix-github-actions",
|
||||
"type": "github"
|
||||
}
|
||||
},
|
||||
"nixpkgs": {
|
||||
"locked": {
|
||||
"lastModified": 1655087213,
|
||||
"narHash": "sha256-4R5oQ+OwGAAcXWYrxC4gFMTUSstGxaN8kN7e8hkum/8=",
|
||||
"lastModified": 1710703777,
|
||||
"narHash": "sha256-M4CNAgjrtvrxIWIAc98RTYcVFoAgwUhrYekeiMScj18=",
|
||||
"owner": "NixOS",
|
||||
"repo": "nixpkgs",
|
||||
"rev": "37b6b161e536fddca54424cf80662bce735bdd1e",
|
||||
"type": "github"
|
||||
},
|
||||
"original": {
|
||||
"owner": "NixOS",
|
||||
"repo": "nixpkgs",
|
||||
"rev": "37b6b161e536fddca54424cf80662bce735bdd1e",
|
||||
"type": "github"
|
||||
}
|
||||
},
|
||||
"nixpkgs_2": {
|
||||
"locked": {
|
||||
"lastModified": 1655046959,
|
||||
"narHash": "sha256-gxqHZKq1ReLDe6ZMJSbmSZlLY95DsVq5o6jQihhzvmw=",
|
||||
"owner": "NixOS",
|
||||
"repo": "nixpkgs",
|
||||
"rev": "07bf3d25ce1da3bee6703657e6a787a4c6cdcea9",
|
||||
"rev": "fc7885fbcea4b782142e06ce2d4d08cf92862004",
|
||||
"type": "github"
|
||||
},
|
||||
"original": {
|
||||
@ -62,23 +72,27 @@
|
||||
"type": "github"
|
||||
}
|
||||
},
|
||||
"poetry2nix": {
|
||||
"poetry2nixSrc": {
|
||||
"inputs": {
|
||||
"flake-utils": "flake-utils_2",
|
||||
"nixpkgs": "nixpkgs_2"
|
||||
"nix-github-actions": "nix-github-actions",
|
||||
"nixpkgs": [
|
||||
"nixpkgs"
|
||||
],
|
||||
"systems": "systems_3",
|
||||
"treefmt-nix": "treefmt-nix"
|
||||
},
|
||||
"locked": {
|
||||
"lastModified": 1654921554,
|
||||
"narHash": "sha256-hkfMdQAHSwLWlg0sBVvgrQdIiBP45U1/ktmFpY4g2Mo=",
|
||||
"lastModified": 1708589824,
|
||||
"narHash": "sha256-2GOiFTkvs5MtVF65sC78KNVxQSmsxtk0WmV1wJ9V2ck=",
|
||||
"owner": "nix-community",
|
||||
"repo": "poetry2nix",
|
||||
"rev": "7b71679fa7df00e1678fc3f1d1d4f5f372341b63",
|
||||
"rev": "3c92540611f42d3fb2d0d084a6c694cd6544b609",
|
||||
"type": "github"
|
||||
},
|
||||
"original": {
|
||||
"owner": "nix-community",
|
||||
"repo": "poetry2nix",
|
||||
"rev": "7b71679fa7df00e1678fc3f1d1d4f5f372341b63",
|
||||
"type": "github"
|
||||
}
|
||||
},
|
||||
@ -86,7 +100,72 @@
|
||||
"inputs": {
|
||||
"flake-utils": "flake-utils",
|
||||
"nixpkgs": "nixpkgs",
|
||||
"poetry2nix": "poetry2nix"
|
||||
"poetry2nixSrc": "poetry2nixSrc"
|
||||
}
|
||||
},
|
||||
"systems": {
|
||||
"locked": {
|
||||
"lastModified": 1681028828,
|
||||
"narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
|
||||
"owner": "nix-systems",
|
||||
"repo": "default",
|
||||
"rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
|
||||
"type": "github"
|
||||
},
|
||||
"original": {
|
||||
"owner": "nix-systems",
|
||||
"repo": "default",
|
||||
"type": "github"
|
||||
}
|
||||
},
|
||||
"systems_2": {
|
||||
"locked": {
|
||||
"lastModified": 1681028828,
|
||||
"narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
|
||||
"owner": "nix-systems",
|
||||
"repo": "default",
|
||||
"rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
|
||||
"type": "github"
|
||||
},
|
||||
"original": {
|
||||
"owner": "nix-systems",
|
||||
"repo": "default",
|
||||
"type": "github"
|
||||
}
|
||||
},
|
||||
"systems_3": {
|
||||
"locked": {
|
||||
"lastModified": 1681028828,
|
||||
"narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
|
||||
"owner": "nix-systems",
|
||||
"repo": "default",
|
||||
"rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
|
||||
"type": "github"
|
||||
},
|
||||
"original": {
|
||||
"id": "systems",
|
||||
"type": "indirect"
|
||||
}
|
||||
},
|
||||
"treefmt-nix": {
|
||||
"inputs": {
|
||||
"nixpkgs": [
|
||||
"poetry2nixSrc",
|
||||
"nixpkgs"
|
||||
]
|
||||
},
|
||||
"locked": {
|
||||
"lastModified": 1708335038,
|
||||
"narHash": "sha256-ETLZNFBVCabo7lJrpjD6cAbnE11eDOjaQnznmg/6hAE=",
|
||||
"owner": "numtide",
|
||||
"repo": "treefmt-nix",
|
||||
"rev": "e504621290a1fd896631ddbc5e9c16f4366c9f65",
|
||||
"type": "github"
|
||||
},
|
||||
"original": {
|
||||
"owner": "numtide",
|
||||
"repo": "treefmt-nix",
|
||||
"type": "github"
|
||||
}
|
||||
}
|
||||
},
|
||||
|
80
flake.nix
80
flake.nix
@ -1,63 +1,47 @@
|
||||
{
|
||||
description = "Application packaged using poetry2nix";
|
||||
|
||||
inputs.flake-utils.url = "github:numtide/flake-utils?rev=0f8662f1319ad6abf89b3380dd2722369fc51ade";
|
||||
inputs.nixpkgs.url = "github:NixOS/nixpkgs?rev=37b6b161e536fddca54424cf80662bce735bdd1e";
|
||||
inputs.poetry2nix.url = "github:nix-community/poetry2nix?rev=7b71679fa7df00e1678fc3f1d1d4f5f372341b63";
|
||||
inputs.flake-utils.url = "github:numtide/flake-utils";
|
||||
inputs.nixpkgs.url = "github:NixOS/nixpkgs";
|
||||
inputs.poetry2nixSrc = {
|
||||
url = "github:nix-community/poetry2nix";
|
||||
inputs.nixpkgs.follows = "nixpkgs";
|
||||
};
|
||||
|
||||
outputs = { self, nixpkgs, flake-utils, poetry2nix }:
|
||||
{
|
||||
# Nixpkgs overlay providing the application
|
||||
overlay = nixpkgs.lib.composeManyExtensions [
|
||||
poetry2nix.overlay
|
||||
(final: prev: {
|
||||
# The application
|
||||
deepdog = prev.poetry2nix.mkPoetryApplication {
|
||||
overrides = final.poetry2nix.overrides.withDefaults (self: super: {
|
||||
# …
|
||||
# workaround https://github.com/nix-community/poetry2nix/issues/568
|
||||
pdme = super.pdme.overridePythonAttrs (old: {
|
||||
buildInputs = old.buildInputs or [ ] ++ [ final.python39.pkgs.poetry-core ];
|
||||
});
|
||||
});
|
||||
projectDir = ./.;
|
||||
};
|
||||
deepdogEnv = prev.poetry2nix.mkPoetryEnv {
|
||||
overrides = final.poetry2nix.overrides.withDefaults (self: super: {
|
||||
# …
|
||||
# workaround https://github.com/nix-community/poetry2nix/issues/568
|
||||
pdme = super.pdme.overridePythonAttrs (old: {
|
||||
buildInputs = old.buildInputs or [ ] ++ [ final.python39.pkgs.poetry-core ];
|
||||
});
|
||||
});
|
||||
projectDir = ./.;
|
||||
};
|
||||
})
|
||||
];
|
||||
} // (flake-utils.lib.eachDefaultSystem (system:
|
||||
outputs = { self, nixpkgs, flake-utils, poetry2nixSrc }:
|
||||
flake-utils.lib.eachDefaultSystem (system:
|
||||
let
|
||||
pkgs = import nixpkgs {
|
||||
inherit system;
|
||||
overlays = [ self.overlay ];
|
||||
pkgs = nixpkgs.legacyPackages.${system};
|
||||
poetry2nix = poetry2nixSrc.lib.mkPoetry2Nix { inherit pkgs; };
|
||||
in {
|
||||
packages = {
|
||||
deepdogApp = poetry2nix.mkPoetryApplication {
|
||||
projectDir = self;
|
||||
python = pkgs.python39;
|
||||
preferWheels = true;
|
||||
};
|
||||
in
|
||||
{
|
||||
apps = {
|
||||
deepdog = pkgs.deepdog;
|
||||
deepdogEnv = poetry2nix.mkPoetryEnv {
|
||||
projectDir = self;
|
||||
python = pkgs.python39;
|
||||
preferWheels = true;
|
||||
overrides = poetry2nix.overrides.withDefaults (self: super: {
|
||||
});
|
||||
};
|
||||
|
||||
defaultApp = pkgs.deepdog;
|
||||
devShell = pkgs.mkShell {
|
||||
default = self.packages.${system}.deepdogEnv;
|
||||
};
|
||||
devShells.default = pkgs.mkShell {
|
||||
inputsFrom = [ self.packages.${system}.deepdogEnv ];
|
||||
buildInputs = [
|
||||
pkgs.poetry
|
||||
pkgs.deepdogEnv
|
||||
pkgs.deepdog
|
||||
self.packages.${system}.deepdogEnv
|
||||
self.packages.${system}.deepdogApp
|
||||
pkgs.just
|
||||
pkgs.nodejs
|
||||
];
|
||||
shellHook = ''
|
||||
export DO_NIX_CUSTOM=1
|
||||
'';
|
||||
packages = [ pkgs.nodejs-16_x ];
|
||||
};
|
||||
|
||||
}));
|
||||
}
|
||||
);
|
||||
}
|
||||
|
60
justfile
Normal file
60
justfile
Normal file
@ -0,0 +1,60 @@
|
||||
|
||||
# execute default build
|
||||
default: build
|
||||
|
||||
# builds the python module using poetry
|
||||
build:
|
||||
echo "building..."
|
||||
poetry build
|
||||
|
||||
# print a message displaying whether nix is being used
|
||||
checknix:
|
||||
#!/usr/bin/env bash
|
||||
set -euxo pipefail
|
||||
if [[ "${DO_NIX_CUSTOM:=0}" -eq 1 ]]; then
|
||||
echo "In an interactive nix env."
|
||||
else
|
||||
echo "Using poetry as runner, no nix detected."
|
||||
fi
|
||||
|
||||
# run all tests
|
||||
test: fmt
|
||||
#!/usr/bin/env bash
|
||||
set -euxo pipefail
|
||||
|
||||
if [[ "${DO_NIX_CUSTOM:=0}" -eq 1 ]]; then
|
||||
echo "testing, using nix..."
|
||||
flake8 deepdog tests
|
||||
mypy deepdog
|
||||
pytest
|
||||
else
|
||||
echo "testing..."
|
||||
poetry run flake8 deepdog tests
|
||||
poetry run mypy deepdog
|
||||
poetry run pytest
|
||||
fi
|
||||
|
||||
# format code
|
||||
fmt:
|
||||
#!/usr/bin/env bash
|
||||
set -euxo pipefail
|
||||
if [[ "${DO_NIX_CUSTOM:=0}" -eq 1 ]]; then
|
||||
black .
|
||||
else
|
||||
poetry run black .
|
||||
fi
|
||||
find deepdog -type f -name "*.py" -exec sed -i -e 's/ /\t/g' {} \;
|
||||
find tests -type f -name "*.py" -exec sed -i -e 's/ /\t/g' {} \;
|
||||
|
||||
# release the app, checking that our working tree is clean and ready for release, optionally takes target version
|
||||
release version="":
|
||||
#!/usr/bin/env bash
|
||||
set -euxo pipefail
|
||||
if [[ -n "{{version}}" ]]; then
|
||||
./scripts/release.sh {{version}}
|
||||
else
|
||||
./scripts/release.sh
|
||||
fi
|
||||
|
||||
htmlcov:
|
||||
poetry run pytest --cov-report=html
|
844
poetry.lock
generated
844
poetry.lock
generated
File diff suppressed because it is too large
Load Diff
@ -1,14 +1,15 @@
|
||||
[tool.poetry]
|
||||
name = "deepdog"
|
||||
version = "0.7.8"
|
||||
version = "1.7.0"
|
||||
description = ""
|
||||
authors = ["Deepak Mallubhotla <dmallubhotla+github@gmail.com>"]
|
||||
|
||||
[tool.poetry.dependencies]
|
||||
python = ">=3.8.1,<3.10"
|
||||
pdme = "^0.9.3"
|
||||
pdme = "^1.5.0"
|
||||
numpy = "1.22.3"
|
||||
scipy = "1.10"
|
||||
tqdm = "^4.66.2"
|
||||
|
||||
[tool.poetry.dev-dependencies]
|
||||
pytest = ">=6"
|
||||
@ -19,6 +20,10 @@ python-semantic-release = "^7.24.0"
|
||||
black = "^22.3.0"
|
||||
syrupy = "^4.0.8"
|
||||
|
||||
[tool.poetry.scripts]
|
||||
probs = "deepdog.cli.probs:wrapped_main"
|
||||
subset_sim_probs = "deepdog.cli.subset_sim_probs:wrapped_main"
|
||||
|
||||
[build-system]
|
||||
requires = ["poetry-core>=1.0.0"]
|
||||
build-backend = "poetry.core.masonry.api"
|
||||
@ -38,6 +43,13 @@ module = [
|
||||
]
|
||||
ignore_missing_imports = true
|
||||
|
||||
[[tool.mypy.overrides]]
|
||||
module = [
|
||||
"tqdm",
|
||||
"tqdm.*"
|
||||
]
|
||||
ignore_missing_imports = true
|
||||
|
||||
[tool.semantic_release]
|
||||
version_toml = "pyproject.toml:tool.poetry.version"
|
||||
tag_format = "{version}"
|
||||
|
@ -25,15 +25,22 @@ if [ -z "$(git status --porcelain)" ]; then
|
||||
exit 0
|
||||
fi
|
||||
|
||||
std_version_args=()
|
||||
if [[ -n "${1:-}" ]]; then
|
||||
std_version_args+=( "--release-as" "$1" )
|
||||
echo "Parameter $1 was supplied, so we should use release-as"
|
||||
else
|
||||
echo "No release-as parameter specifed."
|
||||
fi
|
||||
# Working directory clean
|
||||
echo "Doing a dry run..."
|
||||
npx standard-version --dry-run
|
||||
npx standard-version --dry-run "${std_version_args[@]}"
|
||||
read -p "Does that look good? [y/N] " -n 1 -r
|
||||
echo # (optional) move to a new line
|
||||
if [[ $REPLY =~ ^[Yy]$ ]]
|
||||
then
|
||||
# do dangerous stuff
|
||||
npx standard-version
|
||||
npx standard-version "${std_version_args[@]}"
|
||||
git push --follow-tags origin master
|
||||
else
|
||||
echo "okay, never mind then..."
|
||||
|
@ -1,4 +1,4 @@
|
||||
const pattern = /(\[tool\.poetry\]\nname = "deepdog"\nversion = ")(?<vers>\d+\.\d+\.\d)(")/mg;
|
||||
const pattern = /(\[tool\.poetry\]\nname = "deepdog"\nversion = ")(?<vers>\d+\.\d+\.\d+)(")/mg;
|
||||
|
||||
module.exports.readVersion = function (contents) {
|
||||
const result = pattern.exec(contents);
|
||||
|
0
tests/direct_monte_carlo/__init__.py
Normal file
0
tests/direct_monte_carlo/__init__.py
Normal file
26
tests/direct_monte_carlo/test_config_filename.py
Normal file
26
tests/direct_monte_carlo/test_config_filename.py
Normal file
@ -0,0 +1,26 @@
|
||||
import re
|
||||
import deepdog.direct_monte_carlo
|
||||
|
||||
|
||||
def test_config_check_self():
|
||||
config = deepdog.direct_monte_carlo.DirectMonteCarloConfig(
|
||||
tag="test_tag",
|
||||
bayesrun_file_timestamp=False,
|
||||
)
|
||||
expected_filename = "test_tag.realdata.fast_filter.bayesrun.csv"
|
||||
actual_filename = config.get_filename()
|
||||
assert actual_filename == expected_filename
|
||||
regex = config.get_filename_regex()
|
||||
assert re.match(regex, actual_filename) is not None
|
||||
|
||||
|
||||
def test_config_check_self_with_timestamp():
|
||||
config = deepdog.direct_monte_carlo.DirectMonteCarloConfig(
|
||||
tag="test_tag",
|
||||
bayesrun_file_timestamp=True,
|
||||
)
|
||||
expected_filename_ending = "test_tag.realdata.fast_filter.bayesrun.csv"
|
||||
actual_filename = config.get_filename()
|
||||
assert actual_filename.endswith(expected_filename_ending)
|
||||
regex = config.get_filename_regex()
|
||||
assert re.match(regex, actual_filename) is not None
|
42
tests/direct_monte_carlo/test_cost_function_filter.py
Normal file
42
tests/direct_monte_carlo/test_cost_function_filter.py
Normal file
@ -0,0 +1,42 @@
|
||||
import deepdog.direct_monte_carlo.cost_function_filter
|
||||
import numpy
|
||||
|
||||
|
||||
def test_px_cost_function_filter_example():
|
||||
|
||||
dipoles_1 = [
|
||||
[1, 2, 3, 4, 5, 6, 7],
|
||||
[2, 3, 2, 5, 4, 7, 6],
|
||||
]
|
||||
|
||||
dipoles_2 = [
|
||||
[15, 9, 8, 7, 6, 5, 3],
|
||||
[30, 4, 4, 7, 3, 1, 4],
|
||||
]
|
||||
|
||||
dipoleses = numpy.array([dipoles_1, dipoles_2])
|
||||
|
||||
def cost_function(dipoleses: numpy.ndarray) -> numpy.ndarray:
|
||||
return dipoleses[:, :, 0].max(axis=-1)
|
||||
|
||||
expected_costs = numpy.array([2, 30])
|
||||
|
||||
numpy.testing.assert_array_equal(cost_function(dipoleses), expected_costs)
|
||||
|
||||
filter = deepdog.direct_monte_carlo.cost_function_filter.CostFunctionTargetFilter(
|
||||
cost_function, 5
|
||||
)
|
||||
|
||||
actual_filtered = filter.filter_samples(dipoleses)
|
||||
expected_filtered = numpy.array([dipoles_1])
|
||||
assert actual_filtered.size != 0
|
||||
numpy.testing.assert_array_equal(actual_filtered, expected_filtered)
|
||||
|
||||
filter_stricter = (
|
||||
deepdog.direct_monte_carlo.cost_function_filter.CostFunctionTargetFilter(
|
||||
cost_function, 0.5
|
||||
)
|
||||
)
|
||||
|
||||
actual_filtered_stricter = filter_stricter.filter_samples(dipoleses)
|
||||
assert actual_filtered_stricter.size == 0
|
137
tests/direct_monte_carlo/test_eletric_field_x_dmc_filter.py
Normal file
137
tests/direct_monte_carlo/test_eletric_field_x_dmc_filter.py
Normal file
@ -0,0 +1,137 @@
|
||||
import pdme.measurement
|
||||
import pdme.measurement.input_types
|
||||
from pdme.model import (
|
||||
LogSpacedRandomCountMultipleDipoleFixedMagnitudeModel,
|
||||
LogSpacedRandomCountMultipleDipoleFixedMagnitudeXYModel,
|
||||
LogSpacedRandomCountMultipleDipoleFixedMagnitudeFixedOrientationModel,
|
||||
)
|
||||
import deepdog.direct_monte_carlo.dmc_filters
|
||||
import numpy.random
|
||||
import numpy.testing
|
||||
import logging
|
||||
|
||||
_logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def fixed_z_model_func(
|
||||
xmin,
|
||||
xmax,
|
||||
ymin,
|
||||
ymax,
|
||||
zmin,
|
||||
zmax,
|
||||
wexp_min,
|
||||
wexp_max,
|
||||
pfixed,
|
||||
n_max,
|
||||
prob_occupancy,
|
||||
):
|
||||
return LogSpacedRandomCountMultipleDipoleFixedMagnitudeFixedOrientationModel(
|
||||
xmin,
|
||||
xmax,
|
||||
ymin,
|
||||
ymax,
|
||||
zmin,
|
||||
zmax,
|
||||
wexp_min,
|
||||
wexp_max,
|
||||
pfixed,
|
||||
0,
|
||||
0,
|
||||
n_max,
|
||||
prob_occupancy,
|
||||
)
|
||||
|
||||
|
||||
def get_model(orientation):
|
||||
model_funcs = {
|
||||
"fixedz": fixed_z_model_func,
|
||||
"free": LogSpacedRandomCountMultipleDipoleFixedMagnitudeModel,
|
||||
"fixedxy": LogSpacedRandomCountMultipleDipoleFixedMagnitudeXYModel,
|
||||
}
|
||||
model = model_funcs[orientation](
|
||||
-10,
|
||||
10,
|
||||
-17.5,
|
||||
17.5,
|
||||
5,
|
||||
7.5,
|
||||
-5,
|
||||
6.5,
|
||||
10**3,
|
||||
2,
|
||||
0.99999999,
|
||||
)
|
||||
model.n = 2
|
||||
model.rng = numpy.random.default_rng(1234)
|
||||
|
||||
return (
|
||||
f"connors_geom-5height-orientation_{orientation}-pfixexp_{3}-dipole_count_{2}",
|
||||
model,
|
||||
)
|
||||
|
||||
|
||||
def test_electric_field_x_dmc_filter():
|
||||
|
||||
dipoles_raw = [
|
||||
[(1, 2, 3), (4, 5, 6), 1],
|
||||
[(-1, 5, 2), (6, 5, 4), 10],
|
||||
]
|
||||
dipoles = [
|
||||
pdme.measurement.OscillatingDipole(numpy.array(d[0]), numpy.array(d[1]), d[2])
|
||||
for d in dipoles_raw
|
||||
]
|
||||
|
||||
_logger.debug(f"dipoles: {dipoles}")
|
||||
dot_inputs_raw = [
|
||||
([-1, -1, 0], 1),
|
||||
([-1, -1, 0], 2),
|
||||
([-1, -1, 0], 3),
|
||||
([-1, -1, 0], 4),
|
||||
]
|
||||
dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(dot_inputs_raw)
|
||||
_logger.debug(f"dot_inputs_array: {dot_inputs_array}")
|
||||
|
||||
arrangement = pdme.measurement.OscillatingDipoleArrangement(dipoles)
|
||||
measurements = []
|
||||
for input in dot_inputs_raw:
|
||||
ex = sum(
|
||||
[
|
||||
dipole.s_electric_fieldx_at_position(*input)
|
||||
for dipole in arrangement.dipoles
|
||||
]
|
||||
)
|
||||
ex_low = ex * 0.5
|
||||
ex_high = ex * 1.5
|
||||
meas = pdme.measurement.DotRangeMeasurement(ex_low, ex_high, input[0], input[1])
|
||||
measurements.append(meas)
|
||||
|
||||
filter = deepdog.direct_monte_carlo.dmc_filters.SingleDotSpinQubitFrequencyFilter(
|
||||
measurements
|
||||
)
|
||||
|
||||
samples = numpy.array(
|
||||
[
|
||||
[
|
||||
[1, 2, 3, 4, 5, 6, 1],
|
||||
[-1, 5, 2, 6, 5, 4, 10],
|
||||
],
|
||||
[
|
||||
[10, 20, 30, 40, 50, 60, 1],
|
||||
[-1, 5, 2, 6, 5, 4, 1],
|
||||
],
|
||||
[
|
||||
[1, 1, 1, 1, 1, 1, 1],
|
||||
[2, 2, 2, 2, 2, 2, 1],
|
||||
],
|
||||
]
|
||||
)
|
||||
|
||||
expected = samples[
|
||||
0:1
|
||||
] # only expect to see the first guy, because that's what generated our thing
|
||||
filtered = filter.filter_samples(samples)
|
||||
assert len(filtered) != len(samples), "Should have filtered some out!"
|
||||
numpy.testing.assert_array_equal(
|
||||
filtered, expected, "The filter should have only returned the first one"
|
||||
)
|
0
tests/indexify/__init__.py
Normal file
0
tests/indexify/__init__.py
Normal file
21
tests/indexify/test_indexify.py
Normal file
21
tests/indexify/test_indexify.py
Normal file
@ -0,0 +1,21 @@
|
||||
import deepdog.indexify
|
||||
import logging
|
||||
|
||||
_logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def test_indexifier():
|
||||
weight_dict = {"key_1": [1, 2, 3], "key_2": ["a", "b", "c"]}
|
||||
indexifier = deepdog.indexify.Indexifier(weight_dict)
|
||||
_logger.debug(f"setting up indexifier {indexifier}")
|
||||
assert indexifier.indexify(0) == {"key_1": 1, "key_2": "a"}
|
||||
assert indexifier.indexify(5) == {"key_1": 2, "key_2": "c"}
|
||||
assert len(indexifier) == 9
|
||||
|
||||
|
||||
def test_indexifier_length_short():
|
||||
weight_dict = {"key_1": [1, 2, 3], "key_2": ["b", "c"]}
|
||||
indexifier = deepdog.indexify.Indexifier(weight_dict)
|
||||
_logger.debug(f"setting up indexifier {indexifier}")
|
||||
|
||||
assert len(indexifier) == 6
|
0
tests/results/__init__.py
Normal file
0
tests/results/__init__.py
Normal file
75
tests/results/test_column_results.py
Normal file
75
tests/results/test_column_results.py
Normal file
@ -0,0 +1,75 @@
|
||||
import deepdog.results.read_csv
|
||||
|
||||
|
||||
def test_parse_groupdict():
|
||||
example_column_name = (
|
||||
"geom_-20_20_-10_10_0_5-orientation_free-dipole_count_100_success"
|
||||
)
|
||||
|
||||
parsed = deepdog.results.read_csv._parse_bayesrun_column(example_column_name)
|
||||
assert parsed is not None
|
||||
expected = deepdog.results.read_csv.BayesrunColumnParsed(
|
||||
{
|
||||
"xmin": "-20",
|
||||
"xmax": "20",
|
||||
"ymin": "-10",
|
||||
"ymax": "10",
|
||||
"zmin": "0",
|
||||
"zmax": "5",
|
||||
"orientation": "free",
|
||||
"avg_filled": "100",
|
||||
"field_name": "success",
|
||||
}
|
||||
)
|
||||
assert parsed == expected
|
||||
|
||||
|
||||
def test_parse_groupdict_with_magnitude():
|
||||
example_column_name = (
|
||||
"geom_-20_20_-10_10_0_5-magnitude_3.5-orientation_free-dipole_count_100_success"
|
||||
)
|
||||
|
||||
parsed = deepdog.results.read_csv._parse_bayesrun_column(example_column_name)
|
||||
assert parsed is not None
|
||||
expected = deepdog.results.read_csv.BayesrunColumnParsed(
|
||||
{
|
||||
"xmin": "-20",
|
||||
"xmax": "20",
|
||||
"ymin": "-10",
|
||||
"ymax": "10",
|
||||
"zmin": "0",
|
||||
"zmax": "5",
|
||||
"orientation": "free",
|
||||
"avg_filled": "100",
|
||||
"log_magnitude": "3.5",
|
||||
"field_name": "success",
|
||||
}
|
||||
)
|
||||
assert parsed == expected
|
||||
|
||||
|
||||
def test_parse_groupdict_with_negative_magnitude():
|
||||
example_column_name = "geom_-20_20_-10_10_0_5-magnitude_-3.5-orientation_free-dipole_count_100_success"
|
||||
|
||||
parsed = deepdog.results.read_csv._parse_bayesrun_column(example_column_name)
|
||||
assert parsed is not None
|
||||
expected = deepdog.results.read_csv.BayesrunColumnParsed(
|
||||
{
|
||||
"xmin": "-20",
|
||||
"xmax": "20",
|
||||
"ymin": "-10",
|
||||
"ymax": "10",
|
||||
"zmin": "0",
|
||||
"zmax": "5",
|
||||
"orientation": "free",
|
||||
"avg_filled": "100",
|
||||
"log_magnitude": "-3.5",
|
||||
"field_name": "success",
|
||||
}
|
||||
)
|
||||
assert parsed == expected
|
||||
|
||||
|
||||
# def test_parse_no_match_column_name():
|
||||
# parsed = deepdog.results.parse_bayesrun_column("There's nothing here")
|
||||
# assert parsed is None
|
19
tests/results/test_parse_filename.py
Normal file
19
tests/results/test_parse_filename.py
Normal file
@ -0,0 +1,19 @@
|
||||
import deepdog.results
|
||||
import pytest
|
||||
|
||||
|
||||
def test_parse_bayesrun_filename():
|
||||
valid1 = "20250226-204120-dot1-dot1-2-0.realdata.fast_filter.bayesrun.csv"
|
||||
|
||||
timestamp, slug = deepdog.results._parse_string_output_filename(valid1)
|
||||
assert timestamp == "20250226-204120"
|
||||
assert slug == "dot1-dot1-2-0"
|
||||
|
||||
valid2 = "dot1-dot1-2-0.realdata.fast_filter.bayesrun.csv"
|
||||
|
||||
timestamp, slug = deepdog.results._parse_string_output_filename(valid2)
|
||||
assert timestamp is None
|
||||
assert slug == "dot1-dot1-2-0"
|
||||
|
||||
with pytest.raises(ValueError):
|
||||
deepdog.results._parse_string_output_filename("not_a_valid_filename")
|
@ -0,0 +1,10 @@
|
||||
# serializer version: 1
|
||||
# name: test_subset_simulation_multi_result_coalescing_easy_arithmetic
|
||||
MultiSubsetSimulationResult(child_results=[SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.8, lowest_likelihood=0.5, messages=[]), SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.6, lowest_likelihood=0.01, messages=[])], model_name='test', estimated_likelihood=0.6928203230275509, arithmetic_mean_estimated_likelihood=0.7, num_children=2, num_finished_children=2, clean_estimate=True)
|
||||
# ---
|
||||
# name: test_subset_simulation_multi_result_coalescing_easy_geometric
|
||||
MultiSubsetSimulationResult(child_results=[SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.1, lowest_likelihood=0.5, messages=[]), SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.001, lowest_likelihood=0.01, messages=[])], model_name='test', estimated_likelihood=0.010000000000000004, arithmetic_mean_estimated_likelihood=0.0505, num_children=2, num_finished_children=2, clean_estimate=True)
|
||||
# ---
|
||||
# name: test_subset_simulation_multi_result_coalescing_include_dirty
|
||||
MultiSubsetSimulationResult(child_results=[SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.8, lowest_likelihood=0.5, messages=[]), SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.08, lowest_likelihood=0.01, messages=[]), SubsetSimulationResult(probs_list=(), over_target_cost=None, over_target_likelihood=None, under_target_cost=None, under_target_likelihood=None, lowest_likelihood=0.0001, messages=[])], model_name='test', estimated_likelihood=0.01856635533445112, arithmetic_mean_estimated_likelihood=0.29336666666666666, num_children=3, num_finished_children=2, clean_estimate=False)
|
||||
# ---
|
92
tests/subset_simulation/test_subset_simulation_coalescing.py
Normal file
92
tests/subset_simulation/test_subset_simulation_coalescing.py
Normal file
@ -0,0 +1,92 @@
|
||||
import deepdog.subset_simulation.subset_simulation_impl as impl
|
||||
import numpy
|
||||
|
||||
|
||||
def test_subset_simulation_multi_result_coalescing_include_dirty(snapshot):
|
||||
res1 = impl.SubsetSimulationResult(
|
||||
probs_list=(),
|
||||
over_target_cost=1,
|
||||
over_target_likelihood=1,
|
||||
under_target_cost=0.99,
|
||||
under_target_likelihood=0.8,
|
||||
lowest_likelihood=0.5,
|
||||
messages=[],
|
||||
)
|
||||
|
||||
res2 = impl.SubsetSimulationResult(
|
||||
probs_list=(),
|
||||
over_target_cost=1,
|
||||
over_target_likelihood=1,
|
||||
under_target_cost=0.99,
|
||||
under_target_likelihood=0.08,
|
||||
lowest_likelihood=0.01,
|
||||
messages=[],
|
||||
)
|
||||
|
||||
res3 = impl.SubsetSimulationResult(
|
||||
probs_list=(),
|
||||
over_target_cost=None,
|
||||
over_target_likelihood=None,
|
||||
under_target_cost=None,
|
||||
under_target_likelihood=None,
|
||||
lowest_likelihood=0.0001,
|
||||
messages=[],
|
||||
)
|
||||
|
||||
combined = impl.coalesce_ss_results("test", [res1, res2, res3])
|
||||
|
||||
assert combined == snapshot
|
||||
|
||||
|
||||
def test_subset_simulation_multi_result_coalescing_easy_arithmetic(snapshot):
|
||||
res1 = impl.SubsetSimulationResult(
|
||||
probs_list=(),
|
||||
over_target_cost=1,
|
||||
over_target_likelihood=1,
|
||||
under_target_cost=0.99,
|
||||
under_target_likelihood=0.8,
|
||||
lowest_likelihood=0.5,
|
||||
messages=[],
|
||||
)
|
||||
|
||||
res2 = impl.SubsetSimulationResult(
|
||||
probs_list=(),
|
||||
over_target_cost=1,
|
||||
over_target_likelihood=1,
|
||||
under_target_cost=0.99,
|
||||
under_target_likelihood=0.6,
|
||||
lowest_likelihood=0.01,
|
||||
messages=[],
|
||||
)
|
||||
|
||||
combined = impl.coalesce_ss_results("test", [res1, res2])
|
||||
|
||||
assert combined.arithmetic_mean_estimated_likelihood == 0.7
|
||||
assert combined == snapshot
|
||||
|
||||
|
||||
def test_subset_simulation_multi_result_coalescing_easy_geometric(snapshot):
|
||||
res1 = impl.SubsetSimulationResult(
|
||||
probs_list=(),
|
||||
over_target_cost=1,
|
||||
over_target_likelihood=1,
|
||||
under_target_cost=0.99,
|
||||
under_target_likelihood=0.1,
|
||||
lowest_likelihood=0.5,
|
||||
messages=[],
|
||||
)
|
||||
|
||||
res2 = impl.SubsetSimulationResult(
|
||||
probs_list=(),
|
||||
over_target_cost=1,
|
||||
over_target_likelihood=1,
|
||||
under_target_cost=0.99,
|
||||
under_target_likelihood=0.001,
|
||||
lowest_likelihood=0.01,
|
||||
messages=[],
|
||||
)
|
||||
|
||||
combined = impl.coalesce_ss_results("test", [res1, res2])
|
||||
|
||||
numpy.testing.assert_allclose(combined.estimated_likelihood, 0.01)
|
||||
assert combined == snapshot
|
@ -1,158 +0,0 @@
|
||||
import deepdog
|
||||
import logging
|
||||
import logging.config
|
||||
|
||||
import numpy.random
|
||||
|
||||
from pdme.model import (
|
||||
LogSpacedRandomCountMultipleDipoleFixedMagnitudeModel,
|
||||
LogSpacedRandomCountMultipleDipoleFixedMagnitudeXYModel,
|
||||
LogSpacedRandomCountMultipleDipoleFixedMagnitudeFixedOrientationModel,
|
||||
)
|
||||
|
||||
|
||||
_logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def fixed_z_model_func(
|
||||
xmin,
|
||||
xmax,
|
||||
ymin,
|
||||
ymax,
|
||||
zmin,
|
||||
zmax,
|
||||
wexp_min,
|
||||
wexp_max,
|
||||
pfixed,
|
||||
n_max,
|
||||
prob_occupancy,
|
||||
):
|
||||
return LogSpacedRandomCountMultipleDipoleFixedMagnitudeFixedOrientationModel(
|
||||
xmin,
|
||||
xmax,
|
||||
ymin,
|
||||
ymax,
|
||||
zmin,
|
||||
zmax,
|
||||
wexp_min,
|
||||
wexp_max,
|
||||
pfixed,
|
||||
0,
|
||||
0,
|
||||
n_max,
|
||||
prob_occupancy,
|
||||
)
|
||||
|
||||
|
||||
def get_model(orientation):
|
||||
model_funcs = {
|
||||
"fixedz": fixed_z_model_func,
|
||||
"free": LogSpacedRandomCountMultipleDipoleFixedMagnitudeModel,
|
||||
"fixedxy": LogSpacedRandomCountMultipleDipoleFixedMagnitudeXYModel,
|
||||
}
|
||||
model = model_funcs[orientation](
|
||||
-10,
|
||||
10,
|
||||
-17.5,
|
||||
17.5,
|
||||
5,
|
||||
7.5,
|
||||
-5,
|
||||
6.5,
|
||||
10**3,
|
||||
2,
|
||||
0.99999999,
|
||||
)
|
||||
model.n = 2
|
||||
model.rng = numpy.random.default_rng(1234)
|
||||
|
||||
return (
|
||||
f"connors_geom-5height-orientation_{orientation}-pfixexp_{3}-dipole_count_{2}",
|
||||
model,
|
||||
)
|
||||
|
||||
|
||||
def test_basic_analysis(snapshot):
|
||||
|
||||
dot_positions = [[0, 0, 0], [0, 1, 0]]
|
||||
|
||||
freqs = [1, 10, 100]
|
||||
models = []
|
||||
|
||||
orientations = ["free", "fixedxy", "fixedz"]
|
||||
for orientation in orientations:
|
||||
models.append(get_model(orientation))
|
||||
|
||||
_logger.info(f"have {len(models)} models to look at")
|
||||
if len(models) == 1:
|
||||
_logger.info(f"only one model, name: {models[0][0]}")
|
||||
|
||||
square_run = deepdog.BayesRunWithSubspaceSimulation(
|
||||
dot_positions,
|
||||
freqs,
|
||||
models,
|
||||
models[0][1],
|
||||
filename_slug="test",
|
||||
end_threshold=0.9,
|
||||
ss_n_c=5,
|
||||
ss_n_s=2,
|
||||
ss_m_max=10,
|
||||
ss_target_cost=150,
|
||||
ss_level_0_seed=200,
|
||||
ss_mcmc_seed=20,
|
||||
ss_use_adaptive_steps=True,
|
||||
ss_default_phi_step=0.01,
|
||||
ss_default_theta_step=0.01,
|
||||
ss_default_r_step=0.01,
|
||||
ss_default_w_log_step=0.01,
|
||||
ss_default_upper_w_log_step=4,
|
||||
ss_dump_last_generation=False,
|
||||
write_output_to_bayesruncsv=False,
|
||||
ss_initial_costs_chunk_size=1000,
|
||||
)
|
||||
result = square_run.go()
|
||||
|
||||
assert result == snapshot
|
||||
|
||||
|
||||
def test_bayesss_with_tighter_cost(snapshot):
|
||||
|
||||
dot_positions = [[0, 0, 0], [0, 1, 0]]
|
||||
|
||||
freqs = [1, 10, 100]
|
||||
models = []
|
||||
|
||||
orientations = ["free", "fixedxy", "fixedz"]
|
||||
for orientation in orientations:
|
||||
models.append(get_model(orientation))
|
||||
|
||||
_logger.info(f"have {len(models)} models to look at")
|
||||
if len(models) == 1:
|
||||
_logger.info(f"only one model, name: {models[0][0]}")
|
||||
|
||||
square_run = deepdog.BayesRunWithSubspaceSimulation(
|
||||
dot_positions,
|
||||
freqs,
|
||||
models,
|
||||
models[0][1],
|
||||
filename_slug="test",
|
||||
end_threshold=0.9,
|
||||
ss_n_c=5,
|
||||
ss_n_s=2,
|
||||
ss_m_max=10,
|
||||
ss_target_cost=1.5,
|
||||
ss_level_0_seed=200,
|
||||
ss_mcmc_seed=20,
|
||||
ss_use_adaptive_steps=True,
|
||||
ss_default_phi_step=0.01,
|
||||
ss_default_theta_step=0.01,
|
||||
ss_default_r_step=0.01,
|
||||
ss_default_w_log_step=0.01,
|
||||
ss_default_upper_w_log_step=4,
|
||||
ss_dump_last_generation=False,
|
||||
write_output_to_bayesruncsv=False,
|
||||
ss_initial_costs_chunk_size=1,
|
||||
)
|
||||
result = square_run.go()
|
||||
|
||||
assert result == snapshot
|
Loading…
x
Reference in New Issue
Block a user