chore(release): 1.7.0

feat: adds configurable skip if file exists
chore(release): 1.6.0
2025-02-26 21:57:13 -06:00 · 2025-02-26 21:55:12 -06:00 · 2025-02-26 21:08:00 -06:00 · 2025-02-26 21:01:19 -06:00 · 2025-02-24 08:34:11 -06:00 · 2024-12-29 21:23:30 -06:00
34 changed files with 1603 additions and 1468 deletions
--- a/.gitignore
+++ b/.gitignore
@ -145,3 +145,5 @@ cython_debug/
 *.csv

 local_scripts/
+
+.vscode
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -2,6 +2,76 @@

 All notable changes to this project will be documented in this file. See [standard-version](https://github.com/conventional-changelog/standard-version) for commit guidelines.

+## [1.7.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.6.0...1.7.0) (2025-02-27)
+
+
+### Features
+
+* adds configurable skip if file exists ([24c6e31](https://gitea.deepak.science:2222/physics/deepdog/commit/24c6e311c1d3067eb98cc60e6ca38d76373bf08e))
+
+## [1.6.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.5.0...1.6.0) (2025-02-27)
+
+
+### Features
+
+* Adds ability to parse bayesruns without timestamps ([46f6b6c](https://gitea.deepak.science:2222/physics/deepdog/commit/46f6b6cdf15c67aedf0c871d201b8db320bccbdf))
+* allows negative log magnitude strings in models ([c8435b4](https://gitea.deepak.science:2222/physics/deepdog/commit/c8435b4b2a6e4b89030f53b5734eb743e2003fb7))
+
+## [1.5.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.4.0...1.5.0) (2024-12-30)
+
+
+### Features
+
+* add configurable max number of dipoles to write ([a1b59cd](https://gitea.deepak.science:2222/physics/deepdog/commit/a1b59cd18b30359328a09210d9393f211aab30c2))
+* add configurable max number of dipoles to write ([53f8993](https://gitea.deepak.science:2222/physics/deepdog/commit/53f8993f2b155228fff5cbee84f10c62eb149a1f))
+
+## [1.4.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.3.0...1.4.0) (2024-09-04)
+
+
+### Features
+
+* add subset sim probs command for bayes for subset simulation results ([c881da2](https://gitea.deepak.science:2222/physics/deepdog/commit/c881da28370a1e51d062e1a7edaa62af6eb98d0a))
+* allows some betetr matching for single_dipole runs ([5425ce1](https://gitea.deepak.science:2222/physics/deepdog/commit/5425ce1362919af4cc4dbd5813df3be8d877b198))
+* indexifier now has len ([d962ecb](https://gitea.deepak.science:2222/physics/deepdog/commit/d962ecb11e929de1d9aa458b5d8e82270eff0039))
+
+
+### Bug Fixes
+
+* update log file arg names in cli scripts ([6a5c593](https://gitea.deepak.science:2222/physics/deepdog/commit/6a5c5931d4fc849d0d6a0f2b971523a0f039d559))
+
+## [1.3.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.2.1...1.3.0) (2024-05-20)
+
+
+### Features
+
+* add multi run to wrap multi model and repeat runs ([92b49fc](https://gitea.deepak.science:2222/physics/deepdog/commit/92b49fce7c86f14484deb1c4aaaa810a6f69c08a))
+* adds a filter that works with cost functions ([8845b28](https://gitea.deepak.science:2222/physics/deepdog/commit/8845b2875f2c91c91dd3988fabda26400c59b2d7))
+* improve initial cost calculation to allow multiprocessing, adds ability to specify a number of levels to do with direct mc instead of subset simulation ([09fad2e](https://gitea.deepak.science:2222/physics/deepdog/commit/09fad2e1024d9237a6a4f7931f51cb4c84b83bf8))
+
+
+### Bug Fixes
+
+* Adds ugly hack for stdevs for this uniform range to multiply by root3, proper fix would be in pdme ([b1c01b2](https://gitea.deepak.science:2222/physics/deepdog/commit/b1c01b25c8f2c3947be23f5b2c656c37437dab17))
+* fix seeding to avoid recreating seed combinations across multi runs ([24ac65b](https://gitea.deepak.science:2222/physics/deepdog/commit/24ac65bf9c74c454fec826ca9de640fe095f5a17))
+
+### [1.2.1](https://gitea.deepak.science:2222/physics/deepdog/compare/1.2.0...1.2.1) (2024-05-12)
+
+## [1.2.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.1.0...1.2.0) (2024-05-09)
+
+
+### Features
+
+* adds additional matching regexes ([dc1d2d4](https://gitea.deepak.science:2222/physics/deepdog/commit/dc1d2d45a3e631c5efccce80f8a24fa87c6089e0))
+* adds magnitude enabled parsing option ([f0e2fa3](https://gitea.deepak.science:2222/physics/deepdog/commit/f0e2fa3da9f5a5136908d691137a904fda4e3a9a))
+
+## [1.1.0](https://gitea.deepak.science:2222/physics/deepdog/compare/1.0.1...1.1.0) (2024-05-03)
+
+
+### Features
+
+* allows disabling timestamps in directmc bayesrun files ([fb018ab](https://gitea.deepak.science:2222/physics/deepdog/commit/fb018abeae2adf4438a030140a6c905f11bb6bc1))
+* removes legacy bayes run, technically breaking but just don't use them ([5361dad](https://gitea.deepak.science:2222/physics/deepdog/commit/5361dada8be4950b5157862f6a92254b543889c3))
+
 ### [1.0.1](https://gitea.deepak.science:2222/physics/deepdog/compare/1.0.0...1.0.1) (2024-05-02)


--- a/deepdog/init.py
+++ b/deepdog/init.py
@ -1,10 +1,7 @@
 import logging
 from deepdog.meta import __version__
-from deepdog.bayes_run import BayesRun
-from deepdog.bayes_run_simulpairs import BayesRunSimulPairs
 from deepdog.real_spectrum_run import RealSpectrumRun
 from deepdog.temp_aware_real_spectrum_run import TempAwareRealSpectrumRun
-from deepdog.bayes_run_with_ss import BayesRunWithSubspaceSimulation


 def get_version():
@ -13,11 +10,8 @@ def get_version():

 __all__ = [
 	"get_version",
-	"BayesRun",
-	"BayesRunSimulPairs",
 	"RealSpectrumRun",
 	"TempAwareRealSpectrumRun",
-	"BayesRunWithSubspaceSimulation",
 ]


--- a/deepdog/bayes_run.py
+++ b/deepdog/bayes_run.py
@ -1,281 +0,0 @@
-import pdme.inputs
-import pdme.model
-import pdme.measurement.input_types
-import pdme.measurement.oscillating_dipole
-import pdme.util.fast_v_calc
-import pdme.util.fast_nonlocal_spectrum
-from typing import Sequence, Tuple, List
-import datetime
-import csv
-import multiprocessing
-import logging
-import numpy
-
-
-# TODO: remove hardcode
-CHUNKSIZE = 50
-
-# TODO: It's garbage to have this here duplicated from pdme.
-DotInput = Tuple[numpy.typing.ArrayLike, float]
-
-
-_logger = logging.getLogger(__name__)
-
-
-def get_a_result(input) -> int:
-	model, dot_inputs, lows, highs, monte_carlo_count, max_frequency, seed = input
-
-	rng = numpy.random.default_rng(seed)
-	sample_dipoles = model.get_monte_carlo_dipole_inputs(
-		monte_carlo_count, max_frequency, rng_to_use=rng
-	)
-	vals = pdme.util.fast_v_calc.fast_vs_for_dipoleses(dot_inputs, sample_dipoles)
-	return numpy.count_nonzero(pdme.util.fast_v_calc.between(vals, lows, highs))
-
-
-def get_a_result_using_pairs(input) -> int:
-	(
-		model,
-		dot_inputs,
-		pair_inputs,
-		local_lows,
-		local_highs,
-		nonlocal_lows,
-		nonlocal_highs,
-		monte_carlo_count,
-		max_frequency,
-	) = input
-	sample_dipoles = model.get_n_single_dipoles(monte_carlo_count, max_frequency)
-	local_vals = pdme.util.fast_v_calc.fast_vs_for_dipoles(dot_inputs, sample_dipoles)
-	local_matches = pdme.util.fast_v_calc.between(local_vals, local_lows, local_highs)
-	nonlocal_vals = pdme.util.fast_nonlocal_spectrum.fast_s_nonlocal(
-		pair_inputs, sample_dipoles
-	)
-	nonlocal_matches = pdme.util.fast_v_calc.between(
-		nonlocal_vals, nonlocal_lows, nonlocal_highs
-	)
-	combined_matches = numpy.logical_and(local_matches, nonlocal_matches)
-	return numpy.count_nonzero(combined_matches)
-
-
-class BayesRun:
-	"""
-	A single Bayes run for a given set of dots.
-
-	Parameters
-	----------
-	dot_inputs : Sequence[DotInput]
-	The dot inputs for this bayes run.
-
-	models_with_names : Sequence[Tuple(str, pdme.model.DipoleModel)]
-	The models to evaluate.
-
-	actual_model : pdme.model.DipoleModel
-	The model which is actually correct.
-
-	filename_slug : str
-	The filename slug to include.
-
-	run_count: int
-	The number of runs to do.
-	"""
-
-	def __init__(
-		self,
-		dot_positions: Sequence[numpy.typing.ArrayLike],
-		frequency_range: Sequence[float],
-		models_with_names: Sequence[Tuple[str, pdme.model.DipoleModel]],
-		actual_model: pdme.model.DipoleModel,
-		filename_slug: str,
-		run_count: int = 100,
-		low_error: float = 0.9,
-		high_error: float = 1.1,
-		monte_carlo_count: int = 10000,
-		monte_carlo_cycles: int = 10,
-		target_success: int = 100,
-		max_monte_carlo_cycles_steps: int = 10,
-		max_frequency: float = 20,
-		end_threshold: float = None,
-		chunksize: int = CHUNKSIZE,
-	) -> None:
-		self.dot_inputs = pdme.inputs.inputs_with_frequency_range(
-			dot_positions, frequency_range
-		)
-		self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
-			self.dot_inputs
-		)
-
-		self.models = [model for (_, model) in models_with_names]
-		self.model_names = [name for (name, _) in models_with_names]
-		self.actual_model = actual_model
-
-		self.n: int
-		try:
-			self.n = self.actual_model.n  # type: ignore
-		except AttributeError:
-			self.n = 1
-
-		self.model_count = len(self.models)
-		self.monte_carlo_count = monte_carlo_count
-		self.monte_carlo_cycles = monte_carlo_cycles
-		self.target_success = target_success
-		self.max_monte_carlo_cycles_steps = max_monte_carlo_cycles_steps
-		self.run_count = run_count
-		self.low_error = low_error
-		self.high_error = high_error
-
-		self.csv_fields = []
-		for i in range(self.n):
-			self.csv_fields.extend(
-				[
-					f"dipole_moment_{i+1}",
-					f"dipole_location_{i+1}",
-					f"dipole_frequency_{i+1}",
-				]
-			)
-		self.compensate_zeros = True
-		self.chunksize = chunksize
-		for name in self.model_names:
-			self.csv_fields.extend([f"{name}_success", f"{name}_count", f"{name}_prob"])
-
-		self.probabilities = [1 / self.model_count] * self.model_count
-
-		timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
-		self.filename = f"{timestamp}-{filename_slug}.bayesrun.csv"
-		self.max_frequency = max_frequency
-
-		if end_threshold is not None:
-			if 0 < end_threshold < 1:
-				self.end_threshold: float = end_threshold
-				self.use_end_threshold = True
-				_logger.info(f"Will abort early, at {self.end_threshold}.")
-			else:
-				raise ValueError(
-					f"end_threshold should be between 0 and 1, but is actually {end_threshold}"
-				)
-
-	def go(self) -> None:
-		with open(self.filename, "a", newline="") as outfile:
-			writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
-			writer.writeheader()
-
-		for run in range(1, self.run_count + 1):
-
-			# Generate the actual dipoles
-			actual_dipoles = self.actual_model.get_dipoles(self.max_frequency)
-
-			dots = actual_dipoles.get_percent_range_dot_measurements(
-				self.dot_inputs, self.low_error, self.high_error
-			)
-			(
-				lows,
-				highs,
-			) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
-				dots
-			)
-
-			_logger.info(f"Going to work on dipole at {actual_dipoles.dipoles}")
-
-			# define a new seed sequence for each run
-			seed_sequence = numpy.random.SeedSequence(run)
-
-			results = []
-			_logger.debug("Going to iterate over models now")
-			for model_count, model in enumerate(self.models):
-				_logger.debug(f"Doing model #{model_count}")
-				core_count = multiprocessing.cpu_count() - 1 or 1
-				with multiprocessing.Pool(core_count) as pool:
-					cycle_count = 0
-					cycle_success = 0
-					cycles = 0
-					while (cycles < self.max_monte_carlo_cycles_steps) and (
-						cycle_success <= self.target_success
-					):
-						_logger.debug(f"Starting cycle {cycles}")
-						cycles += 1
-						current_success = 0
-						cycle_count += self.monte_carlo_count * self.monte_carlo_cycles
-
-						# generate a seed from the sequence for each core.
-						# note this needs to be inside the loop for monte carlo cycle steps!
-						# that way we get more stuff.
-						seeds = seed_sequence.spawn(self.monte_carlo_cycles)
-
-						current_success = sum(
-							pool.imap_unordered(
-								get_a_result,
-								[
-									(
-										model,
-										self.dot_inputs_array,
-										lows,
-										highs,
-										self.monte_carlo_count,
-										self.max_frequency,
-										seed,
-									)
-									for seed in seeds
-								],
-								self.chunksize,
-							)
-						)
-
-						cycle_success += current_success
-						_logger.debug(f"current running successes: {cycle_success}")
-					results.append((cycle_count, cycle_success))
-
-			_logger.debug("Done, constructing output now")
-			row = {
-				"dipole_moment_1": actual_dipoles.dipoles[0].p,
-				"dipole_location_1": actual_dipoles.dipoles[0].s,
-				"dipole_frequency_1": actual_dipoles.dipoles[0].w,
-			}
-			for i in range(1, self.n):
-				try:
-					current_dipoles = actual_dipoles.dipoles[i]
-					row[f"dipole_moment_{i+1}"] = current_dipoles.p
-					row[f"dipole_location_{i+1}"] = current_dipoles.s
-					row[f"dipole_frequency_{i+1}"] = current_dipoles.w
-				except IndexError:
-					_logger.info(f"Not writing anymore, saw end after {i}")
-					break
-
-			successes: List[float] = []
-			counts: List[int] = []
-			for model_index, (name, (count, result)) in enumerate(
-				zip(self.model_names, results)
-			):
-
-				row[f"{name}_success"] = result
-				row[f"{name}_count"] = count
-				successes.append(max(result, 0.5))
-				counts.append(count)
-
-			success_weight = sum(
-				[
-					(succ / count) * prob
-					for succ, count, prob in zip(successes, counts, self.probabilities)
-				]
-			)
-			new_probabilities = [
-				(succ / count) * old_prob / success_weight
-				for succ, count, old_prob in zip(successes, counts, self.probabilities)
-			]
-			self.probabilities = new_probabilities
-			for name, probability in zip(self.model_names, self.probabilities):
-				row[f"{name}_prob"] = probability
-			_logger.info(row)
-
-			with open(self.filename, "a", newline="") as outfile:
-				writer = csv.DictWriter(
-					outfile, fieldnames=self.csv_fields, dialect="unix"
-				)
-				writer.writerow(row)
-
-			if self.use_end_threshold:
-				max_prob = max(self.probabilities)
-				if max_prob > self.end_threshold:
-					_logger.info(
-						f"Aborting early, because {max_prob} is greater than {self.end_threshold}"
-					)
-					break
--- a/deepdog/bayes_run_simulpairs.py
+++ b/deepdog/bayes_run_simulpairs.py
@ -1,382 +0,0 @@
-import pdme.inputs
-import pdme.model
-import pdme.measurement.input_types
-import pdme.measurement.oscillating_dipole
-import pdme.util.fast_v_calc
-import pdme.util.fast_nonlocal_spectrum
-from typing import Sequence, Tuple, List
-import datetime
-import csv
-import multiprocessing
-import logging
-import numpy
-import numpy.random
-
-
-# TODO: remove hardcode
-CHUNKSIZE = 50
-
-# TODO: It's garbage to have this here duplicated from pdme.
-DotInput = Tuple[numpy.typing.ArrayLike, float]
-
-
-_logger = logging.getLogger(__name__)
-
-
-def get_a_simul_result_using_pairs(input) -> numpy.ndarray:
-	(
-		model,
-		dot_inputs,
-		pair_inputs,
-		local_lows,
-		local_highs,
-		nonlocal_lows,
-		nonlocal_highs,
-		monte_carlo_count,
-		monte_carlo_cycles,
-		max_frequency,
-		seed,
-	) = input
-
-	rng = numpy.random.default_rng(seed)
-	local_total = 0
-	combined_total = 0
-
-	sample_dipoles = model.get_monte_carlo_dipole_inputs(
-		monte_carlo_count, max_frequency, rng_to_use=rng
-	)
-	local_vals = pdme.util.fast_v_calc.fast_vs_for_dipoleses(dot_inputs, sample_dipoles)
-	local_matches = pdme.util.fast_v_calc.between(local_vals, local_lows, local_highs)
-	nonlocal_vals = pdme.util.fast_nonlocal_spectrum.fast_s_nonlocal_dipoleses(
-		pair_inputs, sample_dipoles
-	)
-	nonlocal_matches = pdme.util.fast_v_calc.between(
-		nonlocal_vals, nonlocal_lows, nonlocal_highs
-	)
-	combined_matches = numpy.logical_and(local_matches, nonlocal_matches)
-
-	local_total += numpy.count_nonzero(local_matches)
-	combined_total += numpy.count_nonzero(combined_matches)
-	return numpy.array([local_total, combined_total])
-
-
-class BayesRunSimulPairs:
-	"""
-	A dual pairs-nonpairs Bayes run for a given set of dots.
-
-	Parameters
-	----------
-	dot_inputs : Sequence[DotInput]
-	The dot inputs for this bayes run.
-
-	models_with_names : Sequence[Tuple(str, pdme.model.DipoleModel)]
-	The models to evaluate.
-
-	actual_model : pdme.model.DipoleModel
-	The modoel for the model which is actually correct.
-
-	filename_slug : str
-	The filename slug to include.
-
-	run_count: int
-	The number of runs to do.
-	"""
-
-	def __init__(
-		self,
-		dot_positions: Sequence[numpy.typing.ArrayLike],
-		frequency_range: Sequence[float],
-		models_with_names: Sequence[Tuple[str, pdme.model.DipoleModel]],
-		actual_model: pdme.model.DipoleModel,
-		filename_slug: str,
-		run_count: int = 100,
-		low_error: float = 0.9,
-		high_error: float = 1.1,
-		pairs_high_error=None,
-		pairs_low_error=None,
-		monte_carlo_count: int = 10000,
-		monte_carlo_cycles: int = 10,
-		target_success: int = 100,
-		max_monte_carlo_cycles_steps: int = 10,
-		max_frequency: float = 20,
-		end_threshold: float = None,
-		chunksize: int = CHUNKSIZE,
-	) -> None:
-		self.dot_inputs = pdme.inputs.inputs_with_frequency_range(
-			dot_positions, frequency_range
-		)
-		self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
-			self.dot_inputs
-		)
-
-		self.dot_pair_inputs = pdme.inputs.input_pairs_with_frequency_range(
-			dot_positions, frequency_range
-		)
-		self.dot_pair_inputs_array = (
-			pdme.measurement.input_types.dot_pair_inputs_to_array(self.dot_pair_inputs)
-		)
-
-		self.models = [mod for (_, mod) in models_with_names]
-		self.model_names = [name for (name, _) in models_with_names]
-		self.actual_model = actual_model
-
-		self.n: int
-		try:
-			self.n = self.actual_model.n  # type: ignore
-		except AttributeError:
-			self.n = 1
-
-		self.model_count = len(self.models)
-		self.monte_carlo_count = monte_carlo_count
-		self.monte_carlo_cycles = monte_carlo_cycles
-		self.target_success = target_success
-		self.max_monte_carlo_cycles_steps = max_monte_carlo_cycles_steps
-		self.run_count = run_count
-		self.low_error = low_error
-		self.high_error = high_error
-		if pairs_low_error is None:
-			self.pairs_low_error = self.low_error
-		else:
-			self.pairs_low_error = pairs_low_error
-		if pairs_high_error is None:
-			self.pairs_high_error = self.high_error
-		else:
-			self.pairs_high_error = pairs_high_error
-
-		self.csv_fields = []
-		for i in range(self.n):
-			self.csv_fields.extend(
-				[
-					f"dipole_moment_{i+1}",
-					f"dipole_location_{i+1}",
-					f"dipole_frequency_{i+1}",
-				]
-			)
-		self.compensate_zeros = True
-		self.chunksize = chunksize
-		for name in self.model_names:
-			self.csv_fields.extend([f"{name}_success", f"{name}_count", f"{name}_prob"])
-
-		self.probabilities_no_pairs = [1 / self.model_count] * self.model_count
-		self.probabilities_pairs = [1 / self.model_count] * self.model_count
-
-		timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
-		self.filename_pairs = f"{timestamp}-{filename_slug}.simulpairs.yespairs.csv"
-		self.filename_no_pairs = f"{timestamp}-{filename_slug}.simulpairs.noopairs.csv"
-
-		self.max_frequency = max_frequency
-
-		if end_threshold is not None:
-			if 0 < end_threshold < 1:
-				self.end_threshold: float = end_threshold
-				self.use_end_threshold = True
-				_logger.info(f"Will abort early, at {self.end_threshold}.")
-			else:
-				raise ValueError(
-					f"end_threshold should be between 0 and 1, but is actually {end_threshold}"
-				)
-
-	def go(self) -> None:
-		with open(self.filename_pairs, "a", newline="") as outfile:
-			writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
-			writer.writeheader()
-		with open(self.filename_no_pairs, "a", newline="") as outfile:
-			writer = csv.DictWriter(outfile, fieldnames=self.csv_fields, dialect="unix")
-			writer.writeheader()
-
-		for run in range(1, self.run_count + 1):
-
-			# Generate the actual dipoles
-			actual_dipoles = self.actual_model.get_dipoles(self.max_frequency)
-
-			dots = actual_dipoles.get_percent_range_dot_measurements(
-				self.dot_inputs, self.low_error, self.high_error
-			)
-			(
-				lows,
-				highs,
-			) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
-				dots
-			)
-
-			pair_lows, pair_highs = (None, None)
-			pair_measurements = actual_dipoles.get_percent_range_dot_pair_measurements(
-				self.dot_pair_inputs, self.pairs_low_error, self.pairs_high_error
-			)
-			(
-				pair_lows,
-				pair_highs,
-			) = pdme.measurement.input_types.dot_range_measurements_low_high_arrays(
-				pair_measurements
-			)
-
-			_logger.info(f"Going to work on dipole at {actual_dipoles.dipoles}")
-
-			# define a new seed sequence for each run
-			seed_sequence = numpy.random.SeedSequence(run)
-
-			results_pairs = []
-			results_no_pairs = []
-			_logger.debug("Going to iterate over models now")
-			for model_count, model in enumerate(self.models):
-				_logger.debug(f"Doing model #{model_count}")
-
-				core_count = multiprocessing.cpu_count() - 1 or 1
-				with multiprocessing.Pool(core_count) as pool:
-					cycle_count = 0
-					cycle_success_pairs = 0
-					cycle_success_no_pairs = 0
-					cycles = 0
-					while (cycles < self.max_monte_carlo_cycles_steps) and (
-						min(cycle_success_pairs, cycle_success_no_pairs)
-						<= self.target_success
-					):
-						_logger.debug(f"Starting cycle {cycles}")
-
-						cycles += 1
-						current_success_pairs = 0
-						current_success_no_pairs = 0
-						cycle_count += self.monte_carlo_count * self.monte_carlo_cycles
-
-						# generate a seed from the sequence for each core.
-						# note this needs to be inside the loop for monte carlo cycle steps!
-						# that way we get more stuff.
-
-						seeds = seed_sequence.spawn(self.monte_carlo_cycles)
-						_logger.debug(f"Creating {self.monte_carlo_cycles} seeds")
-						current_success_both = numpy.array(
-							sum(
-								pool.imap_unordered(
-									get_a_simul_result_using_pairs,
-									[
-										(
-											model,
-											self.dot_inputs_array,
-											self.dot_pair_inputs_array,
-											lows,
-											highs,
-											pair_lows,
-											pair_highs,
-											self.monte_carlo_count,
-											self.monte_carlo_cycles,
-											self.max_frequency,
-											seed,
-										)
-										for seed in seeds
-									],
-									self.chunksize,
-								)
-							)
-						)
-						current_success_no_pairs = current_success_both[0]
-						current_success_pairs = current_success_both[1]
-
-						cycle_success_no_pairs += current_success_no_pairs
-						cycle_success_pairs += current_success_pairs
-						_logger.debug(
-							f"(pair, no_pair) successes are {(cycle_success_pairs, cycle_success_no_pairs)}"
-						)
-					results_pairs.append((cycle_count, cycle_success_pairs))
-					results_no_pairs.append((cycle_count, cycle_success_no_pairs))
-
-			_logger.debug("Done, constructing output now")
-			row_pairs = {
-				"dipole_moment_1": actual_dipoles.dipoles[0].p,
-				"dipole_location_1": actual_dipoles.dipoles[0].s,
-				"dipole_frequency_1": actual_dipoles.dipoles[0].w,
-			}
-			row_no_pairs = {
-				"dipole_moment_1": actual_dipoles.dipoles[0].p,
-				"dipole_location_1": actual_dipoles.dipoles[0].s,
-				"dipole_frequency_1": actual_dipoles.dipoles[0].w,
-			}
-			for i in range(1, self.n):
-				try:
-					current_dipoles = actual_dipoles.dipoles[i]
-					row_pairs[f"dipole_moment_{i+1}"] = current_dipoles.p
-					row_pairs[f"dipole_location_{i+1}"] = current_dipoles.s
-					row_pairs[f"dipole_frequency_{i+1}"] = current_dipoles.w
-					row_no_pairs[f"dipole_moment_{i+1}"] = current_dipoles.p
-					row_no_pairs[f"dipole_location_{i+1}"] = current_dipoles.s
-					row_no_pairs[f"dipole_frequency_{i+1}"] = current_dipoles.w
-				except IndexError:
-					_logger.info(f"Not writing anymore, saw end after {i}")
-					break
-
-			successes_pairs: List[float] = []
-			successes_no_pairs: List[float] = []
-			counts: List[int] = []
-			for model_index, (
-				name,
-				(count_pair, result_pair),
-				(count_no_pair, result_no_pair),
-			) in enumerate(zip(self.model_names, results_pairs, results_no_pairs)):
-
-				row_pairs[f"{name}_success"] = result_pair
-				row_pairs[f"{name}_count"] = count_pair
-				successes_pairs.append(max(result_pair, 0.5))
-
-				row_no_pairs[f"{name}_success"] = result_no_pair
-				row_no_pairs[f"{name}_count"] = count_no_pair
-				successes_no_pairs.append(max(result_no_pair, 0.5))
-
-				counts.append(count_pair)
-
-			success_weight_pair = sum(
-				[
-					(succ / count) * prob
-					for succ, count, prob in zip(
-						successes_pairs, counts, self.probabilities_pairs
-					)
-				]
-			)
-			success_weight_no_pair = sum(
-				[
-					(succ / count) * prob
-					for succ, count, prob in zip(
-						successes_no_pairs, counts, self.probabilities_no_pairs
-					)
-				]
-			)
-			new_probabilities_pair = [
-				(succ / count) * old_prob / success_weight_pair
-				for succ, count, old_prob in zip(
-					successes_pairs, counts, self.probabilities_pairs
-				)
-			]
-			new_probabilities_no_pair = [
-				(succ / count) * old_prob / success_weight_no_pair
-				for succ, count, old_prob in zip(
-					successes_no_pairs, counts, self.probabilities_no_pairs
-				)
-			]
-			self.probabilities_pairs = new_probabilities_pair
-			self.probabilities_no_pairs = new_probabilities_no_pair
-			for name, probability_pair, probability_no_pair in zip(
-				self.model_names, self.probabilities_pairs, self.probabilities_no_pairs
-			):
-				row_pairs[f"{name}_prob"] = probability_pair
-				row_no_pairs[f"{name}_prob"] = probability_no_pair
-			_logger.debug(row_pairs)
-			_logger.debug(row_no_pairs)
-
-			with open(self.filename_pairs, "a", newline="") as outfile:
-				writer = csv.DictWriter(
-					outfile, fieldnames=self.csv_fields, dialect="unix"
-				)
-				writer.writerow(row_pairs)
-			with open(self.filename_no_pairs, "a", newline="") as outfile:
-				writer = csv.DictWriter(
-					outfile, fieldnames=self.csv_fields, dialect="unix"
-				)
-				writer.writerow(row_no_pairs)
-
-			if self.use_end_threshold:
-				max_prob = min(
-					max(self.probabilities_pairs), max(self.probabilities_no_pairs)
-				)
-				if max_prob > self.end_threshold:
-					_logger.info(
-						f"Aborting early, because {max_prob} is greater than {self.end_threshold}"
-					)
-					break
--- a/deepdog/bayes_run_with_ss.py
+++ b/deepdog/bayes_run_with_ss.py
@ -1,261 +0,0 @@
-import deepdog.subset_simulation
-import pdme.inputs
-import pdme.model
-import pdme.measurement.input_types
-import pdme.measurement.oscillating_dipole
-import pdme.util.fast_v_calc
-import pdme.util.fast_nonlocal_spectrum
-from typing import Sequence, Tuple, List, Optional
-import datetime
-import csv
-import logging
-import numpy
-import numpy.typing
-
-
-# TODO: remove hardcode
-CHUNKSIZE = 50
-
-# TODO: It's garbage to have this here duplicated from pdme.
-DotInput = Tuple[numpy.typing.ArrayLike, float]
-
-
-CLAMPING_FACTOR = 10
-
-_logger = logging.getLogger(__name__)
-
-
-class BayesRunWithSubspaceSimulation:
-	"""
-	A single Bayes run for a given set of dots.
-
-	Parameters
-	----------
-	dot_inputs : Sequence[DotInput]
-	The dot inputs for this bayes run.
-
-	models_with_names : Sequence[Tuple(str, pdme.model.DipoleModel)]
-	The models to evaluate.
-
-	actual_model : pdme.model.DipoleModel
-	The model which is actually correct.
-
-	filename_slug : str
-	The filename slug to include.
-
-	run_count: int
-	The number of runs to do.
-	"""
-
-	def __init__(
-		self,
-		dot_positions: Sequence[numpy.typing.ArrayLike],
-		frequency_range: Sequence[float],
-		models_with_names: Sequence[Tuple[str, pdme.model.DipoleModel]],
-		actual_model: pdme.model.DipoleModel,
-		filename_slug: str,
-		max_frequency: float = 20,
-		end_threshold: float = None,
-		run_count=100,
-		chunksize: int = CHUNKSIZE,
-		ss_n_c: int = 500,
-		ss_n_s: int = 100,
-		ss_m_max: int = 15,
-		ss_target_cost: Optional[float] = None,
-		ss_level_0_seed: int = 200,
-		ss_mcmc_seed: int = 20,
-		ss_use_adaptive_steps=True,
-		ss_default_phi_step=0.01,
-		ss_default_theta_step=0.01,
-		ss_default_r_step=0.01,
-		ss_default_w_log_step=0.01,
-		ss_default_upper_w_log_step=4,
-		ss_dump_last_generation=False,
-		ss_initial_costs_chunk_size=100,
-		write_output_to_bayesruncsv=True,
-		use_timestamp_for_output=True,
-	) -> None:
-		self.dot_inputs = pdme.inputs.inputs_with_frequency_range(
-			dot_positions, frequency_range
-		)
-		self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
-			self.dot_inputs
-		)
-
-		self.models_with_names = models_with_names
-		self.models = [model for (_, model) in models_with_names]
-		self.model_names = [name for (name, _) in models_with_names]
-		self.actual_model = actual_model
-
-		self.n: int
-		try:
-			self.n = self.actual_model.n  # type: ignore
-		except AttributeError:
-			self.n = 1
-
-		self.model_count = len(self.models)
-
-		self.csv_fields = []
-		for i in range(self.n):
-			self.csv_fields.extend(
-				[
-					f"dipole_moment_{i+1}",
-					f"dipole_location_{i+1}",
-					f"dipole_frequency_{i+1}",
-				]
-			)
-		self.compensate_zeros = True
-		self.chunksize = chunksize
-		for name in self.model_names:
-			self.csv_fields.extend([f"{name}_likelihood", f"{name}_prob"])
-
-		self.probabilities = [1 / self.model_count] * self.model_count
-
-		if use_timestamp_for_output:
-			timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
-			self.filename = f"{timestamp}-{filename_slug}.bayesrunwithss.csv"
-		else:
-			self.filename = f"{filename_slug}.bayesrunwithss.csv"
-		self.max_frequency = max_frequency
-
-		if end_threshold is not None:
-			if 0 < end_threshold < 1:
-				self.end_threshold: float = end_threshold
-				self.use_end_threshold = True
-				_logger.info(f"Will abort early, at {self.end_threshold}.")
-			else:
-				raise ValueError(
-					f"end_threshold should be between 0 and 1, but is actually {end_threshold}"
-				)
-
-		self.ss_n_c = ss_n_c
-		self.ss_n_s = ss_n_s
-		self.ss_m_max = ss_m_max
-		self.ss_target_cost = ss_target_cost
-		self.ss_level_0_seed = ss_level_0_seed
-		self.ss_mcmc_seed = ss_mcmc_seed
-		self.ss_use_adaptive_steps = ss_use_adaptive_steps
-		self.ss_default_phi_step = ss_default_phi_step
-		self.ss_default_theta_step = ss_default_theta_step
-		self.ss_default_r_step = ss_default_r_step
-		self.ss_default_w_log_step = ss_default_w_log_step
-		self.ss_default_upper_w_log_step = ss_default_upper_w_log_step
-		self.ss_dump_last_generation = ss_dump_last_generation
-		self.ss_initial_costs_chunk_size = ss_initial_costs_chunk_size
-		self.run_count = run_count
-
-		self.write_output_to_csv = write_output_to_bayesruncsv
-
-	def go(self) -> Sequence:
-
-		if self.write_output_to_csv:
-			with open(self.filename, "a", newline="") as outfile:
-				writer = csv.DictWriter(
-					outfile, fieldnames=self.csv_fields, dialect="unix"
-				)
-				writer.writeheader()
-
-		return_result = []
-
-		for run in range(1, self.run_count + 1):
-
-			# Generate the actual dipoles
-			actual_dipoles = self.actual_model.get_dipoles(self.max_frequency)
-
-			measurements = actual_dipoles.get_dot_measurements(self.dot_inputs)
-
-			_logger.info(f"Going to work on dipole at {actual_dipoles.dipoles}")
-
-			# define a new seed sequence for each run
-
-			results = []
-			_logger.debug("Going to iterate over models now")
-			for model_count, model in enumerate(self.models_with_names):
-				_logger.debug(f"Doing model #{model_count}, {model[0]}")
-				subset_run = deepdog.subset_simulation.SubsetSimulation(
-					model,
-					self.dot_inputs,
-					measurements,
-					self.ss_n_c,
-					self.ss_n_s,
-					self.ss_m_max,
-					self.ss_target_cost,
-					self.ss_level_0_seed,
-					self.ss_mcmc_seed,
-					self.ss_use_adaptive_steps,
-					self.ss_default_phi_step,
-					self.ss_default_theta_step,
-					self.ss_default_r_step,
-					self.ss_default_w_log_step,
-					self.ss_default_upper_w_log_step,
-					initial_cost_chunk_size=self.ss_initial_costs_chunk_size,
-					keep_probs_list=False,
-					dump_last_generation_to_file=self.ss_dump_last_generation,
-				)
-				results.append(subset_run.execute())
-
-			_logger.debug("Done, constructing output now")
-			row = {
-				"dipole_moment_1": actual_dipoles.dipoles[0].p,
-				"dipole_location_1": actual_dipoles.dipoles[0].s,
-				"dipole_frequency_1": actual_dipoles.dipoles[0].w,
-			}
-			for i in range(1, self.n):
-				try:
-					current_dipoles = actual_dipoles.dipoles[i]
-					row[f"dipole_moment_{i+1}"] = current_dipoles.p
-					row[f"dipole_location_{i+1}"] = current_dipoles.s
-					row[f"dipole_frequency_{i+1}"] = current_dipoles.w
-				except IndexError:
-					_logger.info(f"Not writing anymore, saw end after {i}")
-					break
-
-			likelihoods: List[float] = []
-
-			for (name, result) in zip(self.model_names, results):
-				if result.over_target_likelihood is None:
-					if result.lowest_likelihood is None:
-						_logger.error(f"result {result} looks bad")
-						clamped_likelihood = 10**-15
-					else:
-						clamped_likelihood = result.lowest_likelihood / CLAMPING_FACTOR
-					_logger.warning(
-						f"got a none result, clamping to {clamped_likelihood}"
-					)
-				else:
-					clamped_likelihood = result.over_target_likelihood
-				likelihoods.append(clamped_likelihood)
-				row[f"{name}_likelihood"] = clamped_likelihood
-
-			success_weight = sum(
-				[
-					likelihood * prob
-					for likelihood, prob in zip(likelihoods, self.probabilities)
-				]
-			)
-			new_probabilities = [
-				likelihood * old_prob / success_weight
-				for likelihood, old_prob in zip(likelihoods, self.probabilities)
-			]
-			self.probabilities = new_probabilities
-			for name, probability in zip(self.model_names, self.probabilities):
-				row[f"{name}_prob"] = probability
-			_logger.info(row)
-			return_result.append(row)
-
-			if self.write_output_to_csv:
-				with open(self.filename, "a", newline="") as outfile:
-					writer = csv.DictWriter(
-						outfile, fieldnames=self.csv_fields, dialect="unix"
-					)
-					writer.writerow(row)
-
-			if self.use_end_threshold:
-				max_prob = max(self.probabilities)
-				if max_prob > self.end_threshold:
-					_logger.info(
-						f"Aborting early, because {max_prob} is greater than {self.end_threshold}"
-					)
-					break
-
-		return return_result
--- a/deepdog/cli/probs/args.py
+++ b/deepdog/cli/probs/args.py
@ -13,7 +13,7 @@ def parse_args() -> argparse.Namespace:
 		"probs", description="Calculating probability from finished bayesrun"
 	)
 	parser.add_argument(
-		"--log_file",
+		"--log-file",
 		type=str,
 		help="A filename for logging to, if not provided will only log to stderr",
 		default=None,
--- a/deepdog/cli/probs/main.py
+++ b/deepdog/cli/probs/main.py
@ -72,6 +72,7 @@ def main(args: argparse.Namespace):
 			for f in tqdm.tqdm(out_files, desc="reading files", leave=False)
 		]

+		# Refactor here to allow for arbitrary likelihood file sources
 		_logger.info("building uncoalesced dict")
 		uncoalesced_dict = deepdog.cli.probs.dicts.build_model_dict(parsed_output_files)

--- a/deepdog/cli/subset_sim_probs/init.py
+++ b/deepdog/cli/subset_sim_probs/init.py
@ -0,0 +1,5 @@
+from deepdog.cli.subset_sim_probs.main import wrapped_main
+
+__all__ = [
+	"wrapped_main",
+]
--- a/deepdog/cli/subset_sim_probs/args.py
+++ b/deepdog/cli/subset_sim_probs/args.py
@ -0,0 +1,52 @@
+import argparse
+import os
+
+
+def parse_args() -> argparse.Namespace:
+	def dir_path(path):
+		if os.path.isdir(path):
+			return path
+		else:
+			raise argparse.ArgumentTypeError(f"readable_dir:{path} is not a valid path")
+
+	parser = argparse.ArgumentParser(
+		"subset_sim_probs",
+		description="Calculating probability from finished subset sim run",
+	)
+	parser.add_argument(
+		"--log-file",
+		type=str,
+		help="A filename for logging to, if not provided will only log to stderr",
+		default=None,
+	)
+	parser.add_argument(
+		"--results-directory",
+		"-d",
+		type=dir_path,
+		help="The directory to search for bayesrun files, defaulting to cwd if not passed",
+		default=".",
+	)
+	parser.add_argument(
+		"--indexify-json",
+		help="A json file with the indexify config for parsing job indexes. Will skip if not present",
+		default="",
+	)
+	parser.add_argument(
+		"--outfile",
+		"-o",
+		type=str,
+		help="output filename for coalesced data. If not provided, will not be written",
+		default=None,
+	)
+	confirm_outfile_overwrite_group = parser.add_mutually_exclusive_group()
+	confirm_outfile_overwrite_group.add_argument(
+		"--never-overwrite-outfile",
+		action="store_true",
+		help="If a duplicate outfile is detected, skip confirmation and automatically exit early",
+	)
+	confirm_outfile_overwrite_group.add_argument(
+		"--force-overwrite-outfile",
+		action="store_true",
+		help="Skips checking for duplicate outfiles and overwrites",
+	)
+	return parser.parse_args()
--- a/deepdog/cli/subset_sim_probs/dicts.py
+++ b/deepdog/cli/subset_sim_probs/dicts.py
@ -0,0 +1,136 @@
+import typing
+from deepdog.results import GeneralOutput
+import logging
+import csv
+import tqdm
+
+_logger = logging.getLogger(__name__)
+
+
+def build_model_dict(
+	general_outputs: typing.Sequence[GeneralOutput],
+) -> typing.Dict[
+	typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
+]:
+	"""
+	Maybe someday do something smarter with the coalescing and stuff but don't want to so i won't
+	"""
+	# assume that everything is well formatted and the keys are the same across entire list and initialise list of keys.
+	# model dict will contain a model_key: {calculation_dict} where each calculation_dict represents a single calculation for that model,
+	# the uncoalesced version, keyed by the specific file keys
+	model_dict: typing.Dict[
+		typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
+	] = {}
+
+	_logger.info("building model dict")
+	for out in tqdm.tqdm(general_outputs, desc="reading outputs", leave=False):
+		for model_result in out.results:
+			model_key = tuple(v for v in model_result.parsed_model_keys.values())
+			if model_key not in model_dict:
+				model_dict[model_key] = {}
+			calculation_dict = model_dict[model_key]
+			calculation_key = tuple(v for v in out.data.values())
+			if calculation_key not in calculation_dict:
+				calculation_dict[calculation_key] = {
+					"_model_key_dict": model_result.parsed_model_keys,
+					"_calculation_key_dict": out.data,
+					"num_finished_runs": int(
+						model_result.result_dict["num_finished_runs"]
+					),
+					"num_runs": int(model_result.result_dict["num_runs"]),
+					"estimated_likelihood": float(
+						model_result.result_dict["estimated_likelihood"]
+					),
+				}
+			else:
+				raise ValueError(
+					f"Got {calculation_key} twice for model_key {model_key}"
+				)
+
+	return model_dict
+
+
+def coalesced_dict(
+	uncoalesced_model_dict: typing.Dict[
+		typing.Tuple, typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]]
+	],
+):
+	"""
+	pass in uncoalesced dict
+	the minimum_count field is what we use to make sure our probs are never zero
+	"""
+	coalesced_dict = {}
+
+	# we are already iterating so for no reason because performance really doesn't matter let's count the keys ourselves
+	num_keys = 0
+
+	# first pass coalesce
+	for model_key, model_dict in uncoalesced_model_dict.items():
+		num_keys += 1
+		for calculation in model_dict.values():
+			if model_key not in coalesced_dict:
+				coalesced_dict[model_key] = {
+					"_model_key_dict": calculation["_model_key_dict"].copy(),
+					"calculations_coalesced": 1,
+					"num_finished_runs": calculation["num_finished_runs"],
+					"num_runs": calculation["num_runs"],
+					"estimated_likelihood": calculation["estimated_likelihood"],
+				}
+			else:
+				_logger.error(f"We shouldn't be here! Double key for {model_key=}")
+				raise ValueError()
+
+	# second pass do probability calculation
+
+	prior = 1 / num_keys
+	_logger.info(f"Got {num_keys} model keys, so our prior will be {prior}")
+
+	total_weight = 0
+	for coalesced_model_dict in coalesced_dict.values():
+		model_weight = coalesced_model_dict["estimated_likelihood"] * prior
+		total_weight += model_weight
+
+	total_prob = 0
+	for coalesced_model_dict in coalesced_dict.values():
+		likelihood = coalesced_model_dict["estimated_likelihood"]
+		prob = likelihood * prior / total_weight
+		coalesced_model_dict["prob"] = prob
+		total_prob += prob
+
+	_logger.debug(
+		f"Got a total probability of {total_prob}, which should be close to 1 up to float/rounding error"
+	)
+	return coalesced_dict
+
+
+def write_coalesced_dict(
+	coalesced_output_filename: typing.Optional[str],
+	coalesced_model_dict: typing.Dict[typing.Tuple, typing.Dict["str", typing.Any]],
+):
+	if coalesced_output_filename is None or coalesced_output_filename == "":
+		_logger.warning("Not provided a uncoalesced filename, not going to try")
+		return
+
+	first_value = next(iter(coalesced_model_dict.values()))
+	model_field_names = set(first_value["_model_key_dict"].keys())
+	_logger.info(f"Detected model field names {model_field_names}")
+
+	collected_fieldnames = list(model_field_names)
+	collected_fieldnames.extend(
+		["calculations_coalesced", "num_finished_runs", "num_runs", "prob"]
+	)
+	with open(coalesced_output_filename, "w", newline="") as coalesced_output_file:
+		writer = csv.DictWriter(coalesced_output_file, fieldnames=collected_fieldnames)
+		writer.writeheader()
+
+		for model_dict in coalesced_model_dict.values():
+			row = model_dict["_model_key_dict"].copy()
+			row.update(
+				{
+					"calculations_coalesced": model_dict["calculations_coalesced"],
+					"num_finished_runs": model_dict["num_finished_runs"],
+					"num_runs": model_dict["num_runs"],
+					"prob": model_dict["prob"],
+				}
+			)
+			writer.writerow(row)
--- a/deepdog/cli/subset_sim_probs/main.py
+++ b/deepdog/cli/subset_sim_probs/main.py
@ -0,0 +1,113 @@
+import logging
+import argparse
+import json
+
+import deepdog.cli.subset_sim_probs.args
+import deepdog.cli.subset_sim_probs.dicts
+import deepdog.cli.util
+import deepdog.results
+import deepdog.indexify
+import pathlib
+import tqdm
+import os
+import tqdm.contrib.logging
+
+
+_logger = logging.getLogger(__name__)
+
+
+def set_up_logging(log_file: str):
+
+	log_pattern = "%(asctime)s | %(levelname)-7s | %(name)s:%(lineno)d | %(message)s"
+	if log_file is None:
+		handlers = [
+			logging.StreamHandler(),
+		]
+	else:
+		handlers = [logging.StreamHandler(), logging.FileHandler(log_file)]
+	logging.basicConfig(
+		level=logging.DEBUG,
+		format=log_pattern,
+		# it's okay to ignore this mypy error because who cares about logger handler types
+		handlers=handlers,  # type: ignore
+	)
+	logging.captureWarnings(True)
+
+
+def main(args: argparse.Namespace):
+	"""
+	Main function with passed in arguments and no additional logging setup in case we want to extract out later
+	"""
+
+	with tqdm.contrib.logging.logging_redirect_tqdm():
+		_logger.info(f"args: {args}")
+
+		if "outfile" in args and args.outfile:
+			if os.path.exists(args.outfile):
+				if args.never_overwrite_outfile:
+					_logger.warning(
+						f"Filename {args.outfile} already exists, and never want overwrite, so aborting."
+					)
+					return
+				elif args.force_overwrite_outfile:
+					_logger.warning(f"Forcing overwrite of {args.outfile}")
+				else:
+					# need to confirm
+					confirm_overwrite = deepdog.cli.util.confirm_prompt(
+						f"Filename {args.outfile} exists, overwrite?"
+					)
+					if not confirm_overwrite:
+						_logger.warning(
+							f"Filename {args.outfile} already exists and do not want overwrite, aborting."
+						)
+						return
+					else:
+						_logger.warning(f"Overwriting file {args.outfile}")
+
+		indexifier = None
+		if args.indexify_json:
+			with open(args.indexify_json, "r") as indexify_json_file:
+				indexify_spec = json.load(indexify_json_file)
+				indexify_data = indexify_spec["indexes"]
+				if "seed_spec" in indexify_spec:
+					seed_spec = indexify_spec["seed_spec"]
+					indexify_data[seed_spec["field_name"]] = list(
+						range(seed_spec["num_seeds"])
+					)
+				# _logger.debug(f"Indexifier data looks like {indexify_data}")
+				indexifier = deepdog.indexify.Indexifier(indexify_data)
+
+		results_dir = pathlib.Path(args.results_directory)
+		out_files = [
+			f for f in results_dir.iterdir() if f.name.endswith("subsetsim.csv")
+		]
+		_logger.info(
+			f"Reading {len(out_files)} subsetsim.csv files in directory {args.results_directory}"
+		)
+		# _logger.info(out_files)
+		parsed_output_files = [
+			deepdog.results.read_subset_sim_file(f, indexifier)
+			for f in tqdm.tqdm(out_files, desc="reading files", leave=False)
+		]
+
+		# Refactor here to allow for arbitrary likelihood file sources
+		_logger.info("building uncoalesced dict")
+		uncoalesced_dict = deepdog.cli.subset_sim_probs.dicts.build_model_dict(
+			parsed_output_files
+		)
+
+		_logger.info("building coalesced dict")
+		coalesced = deepdog.cli.subset_sim_probs.dicts.coalesced_dict(uncoalesced_dict)
+
+		if "outfile" in args and args.outfile:
+			deepdog.cli.subset_sim_probs.dicts.write_coalesced_dict(
+				args.outfile, coalesced
+			)
+		else:
+			_logger.info("Skipping writing coalesced")
+
+
+def wrapped_main():
+	args = deepdog.cli.subset_sim_probs.args.parse_args()
+	set_up_logging(args.log_file)
+	main(args)
--- a/deepdog/cli/util/init.py
+++ b/deepdog/cli/util/init.py
@ -0,0 +1,3 @@
+from deepdog.cli.util.confirm import confirm_prompt
+
+__all__ = ["confirm_prompt"]
--- a/deepdog/cli/util/confirm.py
+++ b/deepdog/cli/util/confirm.py
@ -0,0 +1,23 @@
+_RESPONSE_MAP = {
+	"yes": True,
+	"ye": True,
+	"y": True,
+	"no": False,
+	"n": False,
+	"nope": False,
+	"true": True,
+	"false": False,
+}
+
+
+def confirm_prompt(question: str) -> bool:
+	"""Prompt with the question and returns yes or no based on response."""
+	prompt = question + " [y/n]: "
+
+	while True:
+		choice = input(prompt).lower()
+
+		if choice in _RESPONSE_MAP:
+			return _RESPONSE_MAP[choice]
+		else:
+			print('Respond with "yes" or "no"')
--- a/deepdog/direct_monte_carlo/cost_function_filter.py
+++ b/deepdog/direct_monte_carlo/cost_function_filter.py
@ -0,0 +1,24 @@
+from deepdog.direct_monte_carlo.direct_mc import DirectMonteCarloFilter
+from typing import Callable
+import numpy
+
+
+class CostFunctionTargetFilter(DirectMonteCarloFilter):
+	def __init__(
+		self,
+		cost_function: Callable[[numpy.ndarray], numpy.ndarray],
+		target_cost: float,
+	):
+		"""
+		Filters dipoles by cost, only leaving dipoles with cost below target_cost
+		"""
+		self.cost_function = cost_function
+		self.target_cost = target_cost
+
+	def filter_samples(self, samples: numpy.ndarray) -> numpy.ndarray:
+		current_sample = samples
+
+		costs = self.cost_function(current_sample)
+
+		current_sample = current_sample[costs < self.target_cost]
+		return current_sample
--- a/deepdog/direct_monte_carlo/direct_mc.py
+++ b/deepdog/direct_monte_carlo/direct_mc.py
@ -1,3 +1,5 @@
+import re
+import pathlib
 import csv
 import pdme.model
 import pdme.measurement
@ -36,8 +38,35 @@ class DirectMonteCarloConfig:
 	tag: str = ""
 	cap_core_count: int = 0  # 0 means cap at num cores - 1
 	chunk_size: int = 50
-	write_bayesrun_file = True
 	# chunk size of some kind
+	write_bayesrun_file: bool = True
+	bayesrun_file_timestamp: bool = True
+	skip_if_exists: bool = False
+
+	def get_filename(self) -> str:
+		"""
+		Generate a filename for the output of this run.
+		"""
+		# set starting execution timestamp
+		timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
+
+		if self.bayesrun_file_timestamp:
+			timestamp_str = f"{timestamp}-"
+		else:
+			timestamp_str = ""
+		filename = f"{timestamp_str}{self.tag}.realdata.fast_filter.bayesrun.csv"
+		_logger.debug(f"Got filename {filename}")
+		return filename
+
+	def get_filename_regex(self) -> str:
+		"""
+		Generate a regex for the output of this run.
+		"""
+
+		# having both timestamp and the hyphen separately optional is a bit of a hack
+		# too loose, but will never matter
+		pattern = rf"(?P<timestamp>\d{{8}}-\d{{6}})?-?{self.tag}\.realdata\.fast_filter\.bayesrun\.csv"
+		return pattern


 # Aliasing dict as a generic data container
@ -144,15 +173,21 @@ class DirectMonteCarloRun:
 		single run wrapped up for multiprocessing call.

 		takes in a tuple of arguments corresponding to
-		(model_name_pair, seed)
+		(model_name_pair, seed, return_configs)
+
+		return_configs is a boolean, if true then will return tuple of (count, [matching configs])
+		if false, return (count, [])
 		"""
 		# here's where we do our work

-		model_name_pair, seed = args
+		model_name_pair, seed, return_configs = args
 		cycle_success_configs = self._single_run(model_name_pair, seed)
 		cycle_success_count = len(cycle_success_configs)

-		return cycle_success_count
+		if return_configs:
+			return (cycle_success_count, cycle_success_configs)
+		else:
+			return (cycle_success_count, [])

 	def execute_no_multiprocessing(self) -> Sequence[DirectMonteCarloResult]:

@ -197,9 +232,11 @@ class DirectMonteCarloRun:
 							)
 							dipole_count = numpy.array(cycle_success_configs).shape[1]
 							for n in range(dipole_count):
+								number_dipoles_to_write = self.config.target_success * 5
+								_logger.info(f"Limiting to {number_dipoles_to_write=}")
 								numpy.savetxt(
 									f"{self.config.tag}_{step_count}_{cycle_i}_dipole_{n}.csv",
-									sorted_by_freq[:, n],
+									sorted_by_freq[:number_dipoles_to_write, n],
 									delimiter=",",
 								)
 					total_success += cycle_success_count
@ -221,8 +258,27 @@ class DirectMonteCarloRun:

 	def execute(self) -> Sequence[DirectMonteCarloResult]:

-		# set starting execution timestamp
-		timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
+		filename = self.config.get_filename()
+		if self.config.skip_if_exists:
+			_logger.info(f"Checking if {filename} exists")
+			cwd = pathlib.Path.cwd()
+			if (cwd / filename).exists():
+				_logger.info(f"File {filename} exists, skipping")
+				return []
+			if self.config.bayesrun_file_timestamp:
+				_logger.info(
+					"Also need to check file endings because of possible past or current timestamps, check only occurs if writing timestamp is set"
+				)
+				pattern = self.config.get_filename_regex()
+				for file in cwd.iterdir():
+					match = re.match(pattern, file.name)
+					if match is not None:
+						_logger.info(f"Matched {file.name} to {pattern}")
+						_logger.info(f"File {filename} exists, skipping")
+						return []
+				_logger.info(
+					f"Finished checking against pattern {pattern}, hopefully didn't take too long!"
+				)

 		count_per_step = (
 			self.config.monte_carlo_count_per_cycle * self.config.monte_carlo_cycles
@ -258,15 +314,71 @@ class DirectMonteCarloRun:

 					seeds = seed_sequence.spawn(self.config.monte_carlo_cycles)

-					pool_results = sum(
+					raw_pool_results = list(
 						pool.imap_unordered(
 							self._wrapped_single_run,
-							[(model_name_pair, seed) for seed in seeds],
+							[
+								(
+									model_name_pair,
+									seed,
+									self.config.write_successes_to_file,
+								)
+								for seed in seeds
+							],
 							self.config.chunk_size,
 						)
 					)
+
+					pool_results = sum(result[0] for result in raw_pool_results)
+
 					_logger.debug(f"Pool results: {pool_results}")

+					if self.config.write_successes_to_file:
+
+						_logger.info("Writing dipole results")
+
+						cycle_success_configs = numpy.concatenate(
+							[result[1] for result in raw_pool_results]
+						)
+
+						dipole_count = numpy.array(cycle_success_configs).shape[1]
+
+						max_number_dipoles_to_write = self.config.target_success * 5
+						_logger.debug(
+							f"Limiting to {max_number_dipoles_to_write=}, have {len(cycle_success_configs)}"
+						)
+
+						if len(cycle_success_configs):
+							sorted_by_freq = numpy.array(
+								[
+									pdme.subspace_simulation.sort_array_of_dipoles_by_frequency(
+										dipole_config
+									)
+									for dipole_config in cycle_success_configs[
+										:max_number_dipoles_to_write
+									]
+								]
+							)
+
+							for n in range(dipole_count):
+
+								dipole_filename = (
+									f"{self.config.tag}_{step_count}_dipole_{n}.csv"
+								)
+								_logger.debug(
+									f"Writing {min(len(cycle_success_configs), max_number_dipoles_to_write)} to {dipole_filename}"
+								)
+
+								numpy.savetxt(
+									dipole_filename,
+									sorted_by_freq[:, n],
+									delimiter=",",
+								)
+						else:
+							_logger.debug(
+								"Instructed to write results, but none obtained"
+							)
+
 					total_success += pool_results
 					total_count += count_per_step
 					_logger.debug(
@ -284,9 +396,6 @@ class DirectMonteCarloRun:

 		if self.config.write_bayesrun_file:

-			filename = (
-				f"{timestamp}-{self.config.tag}.realdata.fast_filter.bayesrun.csv"
-			)
 			_logger.info(f"Going to write to file [{filename}]")
 			# row: Dict[str, Union[int, float, str]] = {}
 			row = {}
--- a/deepdog/direct_monte_carlo/dmc_filters.py
+++ b/deepdog/direct_monte_carlo/dmc_filters.py
@ -54,109 +54,13 @@ class SingleDotSpinQubitFrequencyFilter(DirectMonteCarloFilter):
 			self.measurements
 		)

-	# oh no not this again
-	def fast_s_spin_qubit_tarucha_apsd_dipoleses(
-		self, dot_inputs: numpy.ndarray, dipoleses: numpy.ndarray
-	) -> numpy.ndarray:
-		"""
-		No error correction here baby.
-		"""
-
-		# We're going to annotate the indices on this class.
-		# Let's define some indices:
-		# A -> index of dipoleses configurations
-		# j -> within a particular configuration, indexes dipole j
-		# measurement_index -> if we have 100 frequencies for example, indexes which one of them it is
-		# If we need to use numbers, let's use A -> 2, j -> 10, measurement_index -> 9 for consistency with
-		# my other notes
-
-		# axes are [dipole_config_idx A, dipole_idx j, {px, py, pz}3]
-		ps = dipoleses[:, :, 0:3]
-		# axes are [dipole_config_idx A, dipole_idx j, {sx, sy, sz}3]
-		ss = dipoleses[:, :, 3:6]
-		# axes are [dipole_config_idx A, dipole_idx j, w], last axis is just 1
-		ws = dipoleses[:, :, 6]
-
-		# dot_index is either 0 or 1 for dot1 or dot2
-		# hopefully this adhoc grammar is making sense, with the explicit labelling of the values of the last axis in cartesian space
-		# axes are [measurement_idx, {dot_index}, {rx, ry, rz}] where the inner {dot_index} is gone
-		# [measurement_idx, cartesian3]
-		rs = dot_inputs[:, 0:3]
-		# axes are [measurement_idx]
-		fs = dot_inputs[:, 3]
-
-		# first operation!
-		# r1s has shape [measurement_idx, rxs]
-		# None inserts an extra axis so the r1s[:, None] has shape
-		# [measurement_idx, 1]([rxs]) with the last rxs hidden
-		#
-		# ss has shape [ A, j, {sx, sy, sz}3], so second term has shape [A, 1, j]([sxs])
-		# these broadcast from right to left
-		# [	 measurement_idx, 1, rxs]
-		# [A,	  1,			   j, sxs]
-		# resulting in [A, measurement_idx, j, cart3] sxs rxs are both cart3
-		diffses = rs[:, None] - ss[:, None, :]
-
-		# norms takes out axis 3, the last one, giving [A, measurement_idx, j]
-		norms = numpy.linalg.norm(diffses, axis=3)
-
-		# _logger.info(f"norms1: {norms1}")
-		# _logger.info(f"norms1 shape: {norms1.shape}")
-		#
-		# diffses1 (A, measurement_idx, j, xs)
-		# ps:  (A, j, px)
-		# result is (A, measurement_idx, j)
-		# intermediate_dot_prod = numpy.einsum("abcd,acd->abc", diffses1, ps)
-		# _logger.info(f"dot product shape: {intermediate_dot_prod.shape}")
-
-		# transpose makes it (j, measurement_idx, A)
-		# transp_intermediate_dot_prod = numpy.transpose(numpy.einsum("abcd,acd->abc", diffses1, ps) / (norms1**3))
-
-		# transpose of diffses has shape (xs, j, measurement_idx, A)
-		# numpy.transpose(diffses1)
-		# _logger.info(f"dot product shape: {transp_intermediate_dot_prod.shape}")
-
-		# inner transpose is (j, measurement_idx, A) * (xs, j, measurement_idx, A)
-		# next transpose puts it back to (A, measurement_idx, j, xs)
-		# p_dot_r_times_r_term = 3 * numpy.transpose(numpy.transpose(numpy.einsum("abcd,acd->abc", diffses1, ps) / (norms1**3)) * numpy.transpose(diffses1))
-		# _logger.info(f"p_dot_r_times_r_term: {p_dot_r_times_r_term.shape}")
-
-		# only x axis puts us at (A, measurement_idx, j)
-		# p_dot_r_times_r_term_x_only = p_dot_r_times_r_term[:, :, :, 0]
-		# _logger.info(f"p_dot_r_times_r_term_x_only.shape: {p_dot_r_times_r_term_x_only.shape}")
-
-		# now to complete the numerator we subtract the ps, which are (A, j, px):
-		# slicing off the end gives us (A, j), so we newaxis to get (A, 1, j)
-		# _logger.info(ps[:, numpy.newaxis, :, 0].shape)
-		alphses = (
-			(
-				3
-				* numpy.transpose(
-					numpy.transpose(
-						numpy.einsum("abcd,acd->abc", diffses, ps) / (norms**2)
-					)
-					* numpy.transpose(diffses)
-				)[:, :, :, 0]
-			)
-			- ps[:, numpy.newaxis, :, 0]
-		) / (norms**3)
-
-		bses = (
-			2
-			* numpy.pi
-			* ws[:, None, :]
-			/ ((2 * numpy.pi * fs[:, None]) ** 2 + 4 * ws[:, None, :] ** 2)
-		)
-
-		return numpy.einsum("...j->...", alphses * alphses * bses)
-
 	def filter_samples(self, samples: ndarray) -> ndarray:
 		current_sample = samples
 		for di, low, high in zip(self.dot_inputs_array, self.lows, self.highs):

 			if len(current_sample) < 1:
 				break
-			vals = self.fast_s_spin_qubit_tarucha_apsd_dipoleses(
+			vals = pdme.util.fast_v_calc.fast_efieldxs_for_dipoleses(
 				numpy.array([di]), current_sample
 			)
 			# _logger.info(vals)
--- a/deepdog/indexify/init.py
+++ b/deepdog/indexify/init.py
@ -31,10 +31,14 @@ class Indexifier:

 	def __init__(self, list_dict: typing.Dict[str, typing.Sequence]):
 		self.dict = list_dict
+		self.product_dict = _dict_product(self.dict)

 	def indexify(self, n: int) -> typing.Dict[str, typing.Any]:
-		product_dict = _dict_product(self.dict)
-		return product_dict[n]
+		return self.product_dict[n]
+
+	def __len__(self) -> int:
+		weights = [len(v) for v in self.dict.values()]
+		return math.prod(weights)

 	def _indexify_indices(self, n: int) -> typing.Sequence[int]:
 		"""
--- a/deepdog/results/init.py
+++ b/deepdog/results/init.py
@ -5,51 +5,38 @@ import logging
 import deepdog.indexify
 import pathlib
 import csv
+from deepdog.results.read_csv import (
+	parse_bayesrun_row,
+	BayesrunModelResult,
+	parse_general_row,
+	GeneralModelResult,
+)
+from deepdog.results.filename import parse_file_slug

 _logger = logging.getLogger(__name__)

-FILENAME_REGEX = r"(?P<timestamp>\d{8}-\d{6})-(?P<filename_slug>.*)\.realdata\.fast_filter\.bayesrun\.csv"
+FILENAME_REGEX = re.compile(
+	r"(?P<timestamp>\d{8}-\d{6})-(?P<filename_slug>.*)\.realdata\.fast_filter\.bayesrun\.csv"
+)

-MODEL_REGEXES = [
-	r"geom_(?P<xmin>-?\d+)_(?P<xmax>-?\d+)_(?P<ymin>-?\d+)_(?P<ymax>-?\d+)_(?P<zmin>-?\d+)_(?P<zmax>-?\d+)-orientation_(?P<orientation>free|fixedxy|fixedz)-dipole_count_(?P<avg_filled>\d+)_(?P<field_name>\w*)"
-]
+# probably a better way but who cares
+NO_TIMESTAMP_FILENAME_REGEX = re.compile(
+	r"(?P<filename_slug>.*)\.realdata\.fast_filter\.bayesrun\.csv"
+)

-FILE_SLUG_REGEXES = [
-	r"mock_tarucha-(?P<job_index>\d+)",
-	r"(?:(?P<mock>mock)_)?tarucha(?:_(?P<tarucha_run_id>\d+))?-(?P<job_index>\d+)",
-]
+
+SUBSET_SIM_FILENAME_REGEX = re.compile(
+	r"(?P<filename_slug>.*)-(?:no_adaptive_steps_)?(?P<num_ss_runs>\d+)-nc_(?P<n_c>\d+)-ns_(?P<n_s>\d+)-mmax_(?P<mmax>\d+)\.multi\.subsetsim\.csv"
+)


@dataclasses.dataclass
 class BayesrunOutputFilename:
-	timestamp: str
+	timestamp: typing.Optional[str]
 	filename_slug: str
 	path: pathlib.Path


-@dataclasses.dataclass
-class BayesrunColumnParsed:
-	"""
-	class for parsing a bayesrun while pulling certain special fields out
-	"""
-
-	def __init__(self, groupdict: typing.Dict[str, str]):
-		self.column_field = groupdict["field_name"]
-		self.model_field_dict = {
-			k: v for k, v in groupdict.items() if k != "field_name"
-		}
-
-	def __str__(self):
-		return f"BayesrunColumnParsed[{self.column_field}: {self.model_field_dict}]"
-
-
-@dataclasses.dataclass
-class BayesrunModelResult:
-	parsed_model_keys: typing.Dict[str, str]
-	success: int
-	count: int
-
-
@dataclasses.dataclass
 class BayesrunOutput:
 	filename: BayesrunOutputFilename
@ -57,88 +44,52 @@ class BayesrunOutput:
 	results: typing.Sequence[BayesrunModelResult]


-def _batch_iterable_into_chunks(iterable, n=1):
-	"""
-	utility for batching bayesrun files where columns appear in threes
-	"""
-	for ndx in range(0, len(iterable), n):
-		yield iterable[ndx : min(ndx + n, len(iterable))]
+@dataclasses.dataclass
+class GeneralOutput:
+	filename: BayesrunOutputFilename
+	data: typing.Dict["str", typing.Any]
+	results: typing.Sequence[GeneralModelResult]


-def _parse_bayesrun_column(
-	column: str,
-) -> typing.Optional[BayesrunColumnParsed]:
-	"""
-	Tries one by one all of a predefined list of regexes that I might have used in the past.
-	Returns the groupdict for the first match, or None if no match found.
-	"""
-	for pattern in MODEL_REGEXES:
-		match = re.match(pattern, column)
-		if match:
-			return BayesrunColumnParsed(match.groupdict())
+def _parse_string_output_filename(
+	filename: str,
+) -> typing.Tuple[typing.Optional[str], str]:
+	if match := FILENAME_REGEX.match(filename):
+		groups = match.groupdict()
+		return (groups["timestamp"], groups["filename_slug"])
+	elif match := NO_TIMESTAMP_FILENAME_REGEX.match(filename):
+		groups = match.groupdict()
+		return (None, groups["filename_slug"])
 	else:
-		return None
-
-
-def _parse_bayesrun_row(
-	row: typing.Dict[str, str],
-) -> typing.Sequence[BayesrunModelResult]:
-
-	results = []
-	batched_keys = _batch_iterable_into_chunks(list(row.keys()), 3)
-	for model_keys in batched_keys:
-		parsed = [_parse_bayesrun_column(column) for column in model_keys]
-		values = [row[column] for column in model_keys]
-		if parsed[0] is None:
-			raise ValueError(f"no viable success row found for keys {model_keys}")
-		if parsed[1] is None:
-			raise ValueError(f"no viable count row found for keys {model_keys}")
-		if parsed[0].column_field != "success":
-			raise ValueError(f"The column {model_keys[0]} is not a success field")
-		if parsed[1].column_field != "count":
-			raise ValueError(f"The column {model_keys[1]} is not a count field")
-		parsed_keys = parsed[0].model_field_dict
-		success = int(values[0])
-		count = int(values[1])
-		results.append(
-			BayesrunModelResult(
-				parsed_model_keys=parsed_keys,
-				success=success,
-				count=count,
-			)
-		)
-	return results
+		raise ValueError(f"Could not parse {filename} as a bayesrun output filename")


 def _parse_output_filename(file: pathlib.Path) -> BayesrunOutputFilename:
 	filename = file.name
-	match = re.match(FILENAME_REGEX, filename)
+	timestamp, slug = _parse_string_output_filename(filename)
+	return BayesrunOutputFilename(timestamp=timestamp, filename_slug=slug, path=file)
+
+
+def _parse_ss_output_filename(file: pathlib.Path) -> BayesrunOutputFilename:
+	filename = file.name
+	match = SUBSET_SIM_FILENAME_REGEX.match(filename)
 	if not match:
-		raise ValueError(f"{filename} was not a valid bayesrun output")
+		raise ValueError(f"{filename} was not a valid subset sim output")
 	groups = match.groupdict()
 	return BayesrunOutputFilename(
-		timestamp=groups["timestamp"], filename_slug=groups["filename_slug"], path=file
+		filename_slug=groups["filename_slug"], path=file, timestamp=None
 	)


-def _parse_file_slug(slug: str) -> typing.Optional[typing.Dict[str, str]]:
-	for pattern in FILE_SLUG_REGEXES:
-		match = re.match(pattern, slug)
-		if match:
-			return match.groupdict()
-	else:
-		return None
-
-
-def read_output_file(
+def read_subset_sim_file(
 	file: pathlib.Path, indexifier: typing.Optional[deepdog.indexify.Indexifier]
-) -> BayesrunOutput:
+) -> GeneralOutput:

-	parsed_filename = tag = _parse_output_filename(file)
-	out = BayesrunOutput(filename=parsed_filename, data={}, results=[])
+	parsed_filename = tag = _parse_ss_output_filename(file)
+	out = GeneralOutput(filename=parsed_filename, data={}, results=[])

 	out.data.update(dataclasses.asdict(tag))
-	parsed_tag = _parse_file_slug(parsed_filename.filename_slug)
+	parsed_tag = parse_file_slug(parsed_filename.filename_slug)
 	if parsed_tag is None:
 		_logger.warning(
 			f"Could not parse {tag} against any matching regexes. Going to skip tag parsing"
@ -163,8 +114,53 @@ def read_output_file(
 			row = rows[0]
 		else:
 			raise ValueError(f"Confused about having multiple rows in {file.name}")
-	results = _parse_bayesrun_row(row)
+	results = parse_general_row(
+		row, ("num_finished_runs", "num_runs", None, "estimated_likelihood")
+	)

 	out.results = results

 	return out
+
+
+def read_output_file(
+	file: pathlib.Path, indexifier: typing.Optional[deepdog.indexify.Indexifier]
+) -> BayesrunOutput:
+
+	parsed_filename = tag = _parse_output_filename(file)
+	out = BayesrunOutput(filename=parsed_filename, data={}, results=[])
+
+	out.data.update(dataclasses.asdict(tag))
+	parsed_tag = parse_file_slug(parsed_filename.filename_slug)
+	if parsed_tag is None:
+		_logger.warning(
+			f"Could not parse {tag} against any matching regexes. Going to skip tag parsing"
+		)
+	else:
+		out.data.update(parsed_tag)
+		if indexifier is not None:
+			try:
+				job_index = parsed_tag["job_index"]
+				indexified = indexifier.indexify(int(job_index))
+				out.data.update(indexified)
+			except KeyError:
+				# This isn't really that important of an error, apart from the warning
+				_logger.warning(
+					f"Parsed tag to {parsed_tag}, and attempted to indexify but no job_index key was found. skipping and moving on"
+				)
+
+	with file.open() as input_file:
+		reader = csv.DictReader(input_file)
+		rows = [r for r in reader]
+		if len(rows) == 1:
+			row = rows[0]
+		else:
+			raise ValueError(f"Confused about having multiple rows in {file.name}")
+	results = parse_bayesrun_row(row)
+
+	out.results = results
+
+	return out
+
+
+__all__ = ["read_output_file", "BayesrunOutput"]
--- a/deepdog/results/filename.py
+++ b/deepdog/results/filename.py
@ -0,0 +1,22 @@
+import re
+import typing
+
+
+FILE_SLUG_REGEXES = [
+	re.compile(pattern)
+	for pattern in [
+		r"(?P<tag>\w+)-(?P<job_index>\d+)",
+		r"mock_tarucha-(?P<job_index>\d+)",
+		r"(?:(?P<mock>mock)_)?tarucha(?:_(?P<tarucha_run_id>\d+))?-(?P<job_index>\d+)",
+		r"(?P<tag>\w+)-(?P<included_dots>[\w,]+)-(?P<target_cost>\d*\.?\d+)-(?P<job_index>\d+)",
+	]
+]
+
+
+def parse_file_slug(slug: str) -> typing.Optional[typing.Dict[str, str]]:
+	for pattern in FILE_SLUG_REGEXES:
+		match = pattern.match(slug)
+		if match:
+			return match.groupdict()
+	else:
+		return None
--- a/deepdog/results/read_csv.py
+++ b/deepdog/results/read_csv.py
@ -0,0 +1,141 @@
+import typing
+import re
+import dataclasses
+
+MODEL_REGEXES = [
+	re.compile(pattern)
+	for pattern in [
+		r"geom_(?P<xmin>-?\d+)_(?P<xmax>-?\d+)_(?P<ymin>-?\d+)_(?P<ymax>-?\d+)_(?P<zmin>-?\d+)_(?P<zmax>-?\d+)-orientation_(?P<orientation>free|fixedxy|fixedz)-dipole_count_(?P<avg_filled>\d+)_(?P<field_name>\w*)",
+		r"geom_(?P<xmin>-?\d+)_(?P<xmax>-?\d+)_(?P<ymin>-?\d+)_(?P<ymax>-?\d+)_(?P<zmin>-?\d+)_(?P<zmax>-?\d+)-magnitude_(?P<log_magnitude>\d*\.?\d+)-orientation_(?P<orientation>free|fixedxy|fixedz)-dipole_count_(?P<avg_filled>\d+)_(?P<field_name>\w*)",
+		r"geom_(?P<xmin>-?\d*\.?\d+)_(?P<xmax>-?\d*\.?\d+)_(?P<ymin>-?\d*\.?\d+)_(?P<ymax>-?\d*\.?\d+)_(?P<zmin>-?\d*\.?\d+)_(?P<zmax>-?\d*\.?\d+)-magnitude_(?P<log_magnitude>\d*\.?\d+)-orientation_(?P<orientation>free|fixedxy|fixedz)-dipole_count_(?P<avg_filled>\d+)_(?P<field_name>\w*)",
+		r"geom_(?P<xmin>-?\d+)_(?P<xmax>-?\d+)_(?P<ymin>-?\d+)_(?P<ymax>-?\d+)_(?P<zmin>-?\d+)_(?P<zmax>-?\d+)-magnitude_(?P<log_magnitude>-?\d*\.?\d+)-orientation_(?P<orientation>free|fixedxy|fixedz)-dipole_count_(?P<avg_filled>\d+)_(?P<field_name>\w*)",
+		r"geom_(?P<xmin>-?\d*\.?\d+)_(?P<xmax>-?\d*\.?\d+)_(?P<ymin>-?\d*\.?\d+)_(?P<ymax>-?\d*\.?\d+)_(?P<zmin>-?\d*\.?\d+)_(?P<zmax>-?\d*\.?\d+)-magnitude_(?P<log_magnitude>-?\d*\.?\d+)-orientation_(?P<orientation>free|fixedxy|fixedz)-dipole_count_(?P<avg_filled>\d+)_(?P<field_name>\w*)",
+	]
+]
+
+
+@dataclasses.dataclass
+class BayesrunModelResult:
+	parsed_model_keys: typing.Dict[str, str]
+	success: int
+	count: int
+
+
+@dataclasses.dataclass
+class GeneralModelResult:
+	parsed_model_keys: typing.Dict[str, str]
+	result_dict: typing.Dict[str, str]
+
+
+class BayesrunColumnParsed:
+	"""
+	class for parsing a bayesrun while pulling certain special fields out
+	"""
+
+	def __init__(self, groupdict: typing.Dict[str, str]):
+		self.column_field = groupdict["field_name"]
+		self.model_field_dict = {
+			k: v for k, v in groupdict.items() if k != "field_name"
+		}
+		self._groupdict_str = repr(groupdict)
+
+	def __str__(self):
+		return f"BayesrunColumnParsed[{self.column_field}: {self.model_field_dict}]"
+
+	def __repr__(self):
+		return f"BayesrunColumnParsed({self._groupdict_str})"
+
+	def __eq__(self, other):
+		if isinstance(other, BayesrunColumnParsed):
+			return (self.column_field == other.column_field) and (
+				self.model_field_dict == other.model_field_dict
+			)
+		return NotImplemented
+
+
+def _parse_bayesrun_column(
+	column: str,
+) -> typing.Optional[BayesrunColumnParsed]:
+	"""
+	Tries one by one all of a predefined list of regexes that I might have used in the past.
+	Returns the groupdict for the first match, or None if no match found.
+	"""
+	for pattern in MODEL_REGEXES:
+		match = pattern.match(column)
+		if match:
+			return BayesrunColumnParsed(match.groupdict())
+	else:
+		return None
+
+
+def _batch_iterable_into_chunks(iterable, n=1):
+	"""
+	utility for batching bayesrun files where columns appear in threes
+	"""
+	for ndx in range(0, len(iterable), n):
+		yield iterable[ndx : min(ndx + n, len(iterable))]
+
+
+def parse_general_row(
+	row: typing.Dict[str, str],
+	expected_fields: typing.Sequence[typing.Optional[str]],
+) -> typing.Sequence[GeneralModelResult]:
+	results = []
+	batched_keys = _batch_iterable_into_chunks(list(row.keys()), len(expected_fields))
+	for model_keys in batched_keys:
+		parsed = [_parse_bayesrun_column(column) for column in model_keys]
+		values = [row[column] for column in model_keys]
+
+		result_dict = {}
+		parsed_keys = None
+		for expected_field, parsed_field, value in zip(expected_fields, parsed, values):
+			if expected_field is None:
+				continue
+			if parsed_field is None:
+				raise ValueError(
+					f"No viable row found for {expected_field=} in {model_keys=}"
+				)
+			if parsed_field.column_field != expected_field:
+				raise ValueError(
+					f"The column {parsed_field.column_field} does not match expected {expected_field}"
+				)
+			result_dict[expected_field] = value
+			if parsed_keys is None:
+				parsed_keys = parsed_field.model_field_dict
+
+		if parsed_keys is None:
+			raise ValueError(f"Somehow parsed keys is none here, for {row=}")
+		results.append(
+			GeneralModelResult(parsed_model_keys=parsed_keys, result_dict=result_dict)
+		)
+	return results
+
+
+def parse_bayesrun_row(
+	row: typing.Dict[str, str],
+) -> typing.Sequence[BayesrunModelResult]:
+
+	results = []
+	batched_keys = _batch_iterable_into_chunks(list(row.keys()), 3)
+	for model_keys in batched_keys:
+		parsed = [_parse_bayesrun_column(column) for column in model_keys]
+		values = [row[column] for column in model_keys]
+		if parsed[0] is None:
+			raise ValueError(f"no viable success row found for keys {model_keys}")
+		if parsed[1] is None:
+			raise ValueError(f"no viable count row found for keys {model_keys}")
+		if parsed[0].column_field != "success":
+			raise ValueError(f"The column {model_keys[0]} is not a success field")
+		if parsed[1].column_field != "count":
+			raise ValueError(f"The column {model_keys[1]} is not a count field")
+		parsed_keys = parsed[0].model_field_dict
+		success = int(values[0])
+		count = int(values[1])
+		results.append(
+			BayesrunModelResult(
+				parsed_model_keys=parsed_keys,
+				success=success,
+				count=count,
+			)
+		)
+	return results
--- a/deepdog/subset_simulation/subset_simulation_impl.py
+++ b/deepdog/subset_simulation/subset_simulation_impl.py
@ -1,9 +1,11 @@
 import logging
+import multiprocessing
 import numpy
 import pdme.measurement
 import pdme.measurement.input_types
+import pdme.model
 import pdme.subspace_simulation
-from typing import Sequence, Tuple, Optional
+from typing import Sequence, Tuple, Optional, Callable, Union, List

 from dataclasses import dataclass

@ -18,47 +20,63 @@ class SubsetSimulationResult:
 	under_target_cost: Optional[float]
 	under_target_likelihood: Optional[float]
 	lowest_likelihood: Optional[float]
+	messages: Sequence[str]
+
+
+@dataclass
+class MultiSubsetSimulationResult:
+	child_results: Sequence[SubsetSimulationResult]
+	model_name: str
+	estimated_likelihood: float
+	arithmetic_mean_estimated_likelihood: float
+	num_children: int
+	num_finished_children: int
+	clean_estimate: bool


 class SubsetSimulation:
 	def __init__(
 		self,
 		model_name_pair,
-		dot_inputs,
-		actual_measurements: Sequence[pdme.measurement.DotMeasurement],
+		# actual_measurements: Sequence[pdme.measurement.DotMeasurement],
+		cost_function: Callable[[numpy.ndarray], numpy.ndarray],
 		n_c: int,
 		n_s: int,
 		m_max: int,
 		target_cost: Optional[float] = None,
-		level_0_seed: int = 200,
-		mcmc_seed: int = 20,
+		level_0_seed: Union[int, Sequence[int]] = 200,
+		mcmc_seed: Union[int, Sequence[int]] = 20,
 		use_adaptive_steps=True,
 		default_phi_step=0.01,
 		default_theta_step=0.01,
 		default_r_step=0.01,
 		default_w_log_step=0.01,
 		default_upper_w_log_step=4,
+		num_initial_dmc_gens=1,
 		keep_probs_list=True,
 		dump_last_generation_to_file=False,
 		initial_cost_chunk_size=100,
+		initial_cost_multiprocess=True,
+		cap_core_count: int = 0,  # 0 means cap at num cores - 1
 	):
 		name, model = model_name_pair
 		self.model_name = name
 		self.model = model
 		_logger.info(f"got model {self.model_name}")

-		self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
-			dot_inputs
-		)
+		# dot_inputs = [(meas.r, meas.f) for meas in actual_measurements]
+		# self.dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(
+		# 	dot_inputs
+		# )
 		# _logger.debug(f"actual measurements: {actual_measurements}")
-		self.actual_measurement_array = numpy.array([m.v for m in actual_measurements])
+		# self.actual_measurement_array = numpy.array([m.v for m in actual_measurements])

-		def cost_function_to_use(dipoles_to_test):
-			return pdme.subspace_simulation.proportional_costs_vs_actual_measurement(
-				self.dot_inputs_array, self.actual_measurement_array, dipoles_to_test
-			)
+		# def cost_function_to_use(dipoles_to_test):
+		# 	return pdme.subspace_simulation.proportional_costs_vs_actual_measurement(
+		# 		self.dot_inputs_array, self.actual_measurement_array, dipoles_to_test
+		# 	)

-		self.cost_function_to_use = cost_function_to_use
+		self.cost_function_to_use = cost_function

 		self.n_c = n_c
 		self.n_s = n_s
@ -68,16 +86,25 @@ class SubsetSimulation:
 		self.mcmc_seed = mcmc_seed

 		self.use_adaptive_steps = use_adaptive_steps
-		self.default_phi_step = default_phi_step
+		self.default_phi_step = (
+			default_phi_step * 1.73
+		)  # this is a hack to fix a missing sqrt 3 in the proposal function code.
 		self.default_theta_step = default_theta_step
-		self.default_r_step = default_r_step
-		self.default_w_log_step = default_w_log_step
+		self.default_r_step = (
+			default_r_step * 1.73
+		)  # this is a hack to fix a missing sqrt 3 in the proposal function code.
+		self.default_w_log_step = (
+			default_w_log_step * 1.73
+		)  # this is a hack to fix a missing sqrt 3 in the proposal function code.
 		self.default_upper_w_log_step = default_upper_w_log_step

 		_logger.info("using params:")
 		_logger.info(f"\tn_c: {self.n_c}")
 		_logger.info(f"\tn_s: {self.n_s}")
 		_logger.info(f"\tm: {self.m_max}")
+		_logger.info(f"\t{num_initial_dmc_gens=}")
+		_logger.info(f"\t{mcmc_seed=}")
+		_logger.info(f"\t{level_0_seed=}")
 		_logger.info("let's do level 0...")

 		self.target_cost = target_cost
@ -87,44 +114,91 @@ class SubsetSimulation:
 		self.dump_last_generations = dump_last_generation_to_file

 		self.initial_cost_chunk_size = initial_cost_chunk_size
+		self.initial_cost_multiprocess = initial_cost_multiprocess
+
+		self.cap_core_count = cap_core_count
+
+		self.num_dmc_gens = num_initial_dmc_gens
+
+	def _single_chain_gen(self, args: Tuple):
+		threshold_cost, stdevs, rng_seed, (c, s) = args
+		rng = numpy.random.default_rng(rng_seed)
+		return self.model.get_repeat_counting_mcmc_chain(
+			s,
+			self.cost_function_to_use,
+			self.n_s,
+			threshold_cost,
+			stdevs,
+			initial_cost=c,
+			rng_arg=rng,
+		)

 	def execute(self) -> SubsetSimulationResult:

 		probs_list = []

+		output_messages = []
+
+		# If we have n_s = 10 and n_c = 100, then our big N = 1000 and p = 1/10
+		# The DMC stage would normally generate 1000, then pick the best 100 and start counting prob = p/10.
+		# Let's say we want our DMC stage to go down to level 2.
+		# Then we need to filter out p^2, so our initial has to be N_0 = N / p = n_c * n_s^2
+		initial_dmc_n = self.n_c * (self.n_s**self.num_dmc_gens)
+		initial_level = (
+			self.num_dmc_gens - 1
+		)  # This is perfunctory but let's label it here really explicitly
+		_logger.info(f"Generating {initial_dmc_n} for DMC stage")
 		sample_dipoles = self.model.get_monte_carlo_dipole_inputs(
-			self.n_c * self.n_s,
+			initial_dmc_n,
 			-1,
 			rng_to_use=numpy.random.default_rng(self.level_0_seed),
 		)
 		# _logger.debug(sample_dipoles)
 		# _logger.debug(sample_dipoles.shape)

-		raw_costs = []
+		_logger.debug("Finished dipole generation")
 		_logger.debug(
-			f"Using iterated cost function thing with chunk size {self.initial_cost_chunk_size}"
+			f"Using iterated multiprocessing cost function thing with chunk size {self.initial_cost_chunk_size}"
 		)

-		for x in range(0, len(sample_dipoles), self.initial_cost_chunk_size):
-			_logger.debug(f"doing chunk {x}")
-			raw_costs.extend(
-				self.cost_function_to_use(
-					sample_dipoles[x : x + self.initial_cost_chunk_size]
-				)
-			)
-		costs = numpy.array(raw_costs)
+		# core count etc. logic here
+		core_count = multiprocessing.cpu_count() - 1 or 1
+		if (self.cap_core_count >= 1) and (self.cap_core_count < core_count):
+			core_count = self.cap_core_count
+		_logger.info(f"Using {core_count} cores")

-		_logger.debug(f"costs: {costs}")
+		with multiprocessing.Pool(core_count) as pool:
+
+			# Do the initial DMC calculation in a multiprocessing
+
+			chunks = numpy.array_split(
+				sample_dipoles,
+				range(
+					self.initial_cost_chunk_size,
+					len(sample_dipoles),
+					self.initial_cost_chunk_size,
+				),
+			)
+			if self.initial_cost_multiprocess:
+				_logger.debug("Multiprocessing initial costs")
+				raw_costs = pool.map(self.cost_function_to_use, chunks)
+			else:
+				_logger.debug("Single process initial costs")
+				raw_costs = []
+				for chunk_idx, chunk in enumerate(chunks):
+					_logger.debug(f"doing chunk #{chunk_idx}")
+					raw_costs.append(self.cost_function_to_use(chunk))
+			costs = numpy.concatenate(raw_costs)
+			_logger.debug("finished initial dmc cost calculation")
+			# _logger.debug(f"costs: {costs}")
 			sorted_indexes = costs.argsort()[::-1]

-		_logger.debug(costs[sorted_indexes])
-		_logger.debug(sample_dipoles[sorted_indexes])
+			# _logger.debug(costs[sorted_indexes])
+			# _logger.debug(sample_dipoles[sorted_indexes])

 			sorted_costs = costs[sorted_indexes]
 			sorted_dipoles = sample_dipoles[sorted_indexes]

-		threshold_cost = sorted_costs[-self.n_c]
-
 			all_dipoles = numpy.array(
 				[
 					pdme.subspace_simulation.sort_array_of_dipoles_by_frequency(samp)
@ -132,10 +206,36 @@ class SubsetSimulation:
 				]
 			)
 			all_chains = list(zip(sorted_costs, all_dipoles))
+			for dmc_level in range(initial_level):
+				# if initial level is 1, we want to print out what the level 0 threshold would have been?
+				_logger.debug(f"Get the pseudo statistics for level {dmc_level}")
+				_logger.debug(f"Whole chain has length {len(all_chains)}")
+				pseudo_threshold_index = -(
+					self.n_c * (self.n_s ** (self.num_dmc_gens - dmc_level - 1))
+				)
+				_logger.debug(
+					f"Have a pseudo_threshold_index of {pseudo_threshold_index}, or {len(all_chains) + pseudo_threshold_index}"
+				)
+				pseudo_threshold_cost = all_chains[-pseudo_threshold_index][0]
+				_logger.info(
+					f"Pseudo-level {dmc_level} threshold cost {pseudo_threshold_cost}, at P = (1 / {self.n_s})^{dmc_level + 1}"
+				)
+				all_chains = all_chains[pseudo_threshold_index:]

-		mcmc_rng = numpy.random.default_rng(self.mcmc_seed)
+			long_mcmc_rng = numpy.random.default_rng(self.mcmc_seed)
+			mcmc_rng_seed_sequence = numpy.random.SeedSequence(self.mcmc_seed)

-		for i in range(self.m_max):
+			threshold_cost = all_chains[-self.n_c][0]
+			_logger.info(
+				f"Finishing DMC threshold cost {threshold_cost} at level {initial_level}, at P = (1 / {self.n_s})^{initial_level + 1}"
+			)
+			_logger.debug(f"Executing the MCMC with chains of length {len(all_chains)}")
+
+			# Now we move on to the MCMC part of the algorithm
+
+			# This is important, we want to allow some extra initial levels so we need to account for that here!
+			for i in range(self.num_dmc_gens, self.m_max):
+				_logger.info(f"Starting level {i}")
 				next_seeds = all_chains[-self.n_c :]

 				if self.dump_last_generations:
@ -158,7 +258,9 @@ class SubsetSimulation:
 					):
 						# chain = mcmc(s, threshold_cost, n_s, model, dot_inputs_array, actual_measurement_array, mcmc_rng, curr_cost=c, stdevs=stdevs)
 						# until new version gotta do
-					_logger.debug(f"\t{seed_index}: doing long chain on the next seed")
+						_logger.debug(
+							f"\t{seed_index}: doing long chain on the next seed"
+						)

 						long_chain = self.model.get_mcmc_chain(
 							s,
@ -167,7 +269,7 @@ class SubsetSimulation:
 							threshold_cost,
 							stdevs,
 							initial_cost=c,
-						rng_arg=mcmc_rng,
+							rng_arg=long_mcmc_rng,
 						)
 						for _, chained in long_chain:
 							all_long_chains.append(chained)
@ -184,7 +286,10 @@ class SubsetSimulation:
 					for cost_index, cost_chain in enumerate(all_chains[: -self.n_c]):
 						probs_list.append(
 							(
-							((self.n_c * self.n_s - cost_index) / (self.n_c * self.n_s))
+								(
+									(self.n_c * self.n_s - cost_index)
+									/ (self.n_c * self.n_s)
+								)
 								/ (self.n_s ** (i)),
 								cost_chain[0],
 								i + 1,
@ -194,31 +299,42 @@ class SubsetSimulation:
 				next_seeds_as_array = numpy.array([s for _, s in next_seeds])

 				stdevs = self.get_stdevs_from_arrays(next_seeds_as_array)
-			_logger.info(f"got stdevs: {stdevs.stdevs}")
+				_logger.debug(f"got stdevs, begin: {stdevs.stdevs[:10]}")
 				_logger.debug("Starting the MCMC")
 				all_chains = []
-			for seed_index, (c, s) in enumerate(next_seeds):
-				# chain = mcmc(s, threshold_cost, n_s, model, dot_inputs_array, actual_measurement_array, mcmc_rng, curr_cost=c, stdevs=stdevs)
-				# until new version gotta do
-				_logger.debug(
-					f"\t{seed_index}: getting another chain from the next seed"
-				)
-				chain = self.model.get_mcmc_chain(
-					s,
-					self.cost_function_to_use,
-					self.n_s,
-					threshold_cost,
-					stdevs,
-					initial_cost=c,
-					rng_arg=mcmc_rng,
+
+				seeds = mcmc_rng_seed_sequence.spawn(len(next_seeds))
+				pool_results = pool.imap_unordered(
+					self._single_chain_gen,
+					[
+						(threshold_cost, stdevs, rng_seed, test_seed)
+						for rng_seed, test_seed in zip(seeds, next_seeds)
+					],
+					chunksize=50,
 				)
+
+				# count for ergodicity analysis
+				samples_generated = 0
+				samples_rejected = 0
+
+				for rejected_count, chain in pool_results:
 					for cost, chained in chain:
 						try:
 							filtered_cost = cost[0]
 						except (IndexError, TypeError):
 							filtered_cost = cost
 						all_chains.append((filtered_cost, chained))
+
+					samples_generated += self.n_s
+					samples_rejected += rejected_count
+
 				_logger.debug("finished mcmc")
+				_logger.debug(f"{samples_rejected=} out of {samples_generated=}")
+				if samples_rejected * 2 > samples_generated:
+					reject_ratio = samples_rejected / samples_generated
+					rejectionmessage = f"On level {i}, rejected {samples_rejected} out of {samples_generated}, {reject_ratio=} is too high and may indicate ergodicity problems"
+					output_messages.append(rejectionmessage)
+					_logger.warning(rejectionmessage)
 				# _logger.debug(all_chains)

 				all_chains.sort(key=lambda c: c[0], reverse=True)
@ -228,7 +344,9 @@ class SubsetSimulation:
 				_logger.info(
 					f"current threshold cost: {threshold_cost}, at P = (1 / {self.n_s})^{i + 1}"
 				)
-			if (self.target_cost is not None) and (threshold_cost < self.target_cost):
+				if (self.target_cost is not None) and (
+					threshold_cost < self.target_cost
+				):
 					_logger.info(
 						f"got a threshold cost {threshold_cost}, less than {self.target_cost}. will leave early"
 					)
@ -236,6 +354,8 @@ class SubsetSimulation:
 					cost_list = [c[0] for c in all_chains]
 					over_index = reverse_bisect_right(cost_list, self.target_cost)

+					winner = all_chains[over_index][1]
+					_logger.info(f"Winner obtained: {winner}")
 					shorter_probs_list = []
 					for cost_index, cost_chain in enumerate(all_chains):
 						if self.keep_probs_list:
@ -253,7 +373,10 @@ class SubsetSimulation:
 						shorter_probs_list.append(
 							(
 								cost_chain[0],
-							((self.n_c * self.n_s - cost_index) / (self.n_c * self.n_s))
+								(
+									(self.n_c * self.n_s - cost_index)
+									/ (self.n_c * self.n_s)
+								)
 								/ (self.n_s ** (i)),
 							)
 						)
@ -265,6 +388,7 @@ class SubsetSimulation:
 						under_target_cost=shorter_probs_list[over_index][0],
 						under_target_likelihood=shorter_probs_list[over_index][1],
 						lowest_likelihood=shorter_probs_list[-1][1],
+						messages=output_messages,
 					)
 					return result

@ -285,8 +409,8 @@ class SubsetSimulation:
 		_logger.info(
 			f"final threshold cost: {threshold_cost}, at P = (1 / {self.n_s})^{self.m_max + 1}"
 		)
-		for a in all_chains[-10:]:
-			_logger.info(a)
+		# for a in all_chains[-10:]:
+		# 	_logger.info(a)
 		# for prob, prob_cost in probs_list:
 		# 	_logger.info(f"\t{prob}: {prob_cost}")
 		probs_list.sort(key=lambda c: c[0], reverse=True)
@ -300,6 +424,7 @@ class SubsetSimulation:
 			under_target_cost=None,
 			under_target_likelihood=None,
 			lowest_likelihood=min_likelihood,
+			messages=output_messages,
 		)
 		return result

@ -358,6 +483,116 @@ class SubsetSimulation:
 		return stdevs


+class MultiSubsetSimulations:
+	def __init__(
+		self,
+		model_name_pairs: Sequence[Tuple[str, pdme.model.DipoleModel]],
+		# actual_measurements: Sequence[pdme.measurement.DotMeasurement],
+		cost_function: Callable[[numpy.ndarray], numpy.ndarray],
+		num_runs: int,
+		n_c: int,
+		n_s: int,
+		m_max: int,
+		target_cost: float,
+		num_initial_dmc_gens: int = 1,
+		level_0_seed_seed: int = 200,
+		mcmc_seed_seed: int = 20,
+		use_adaptive_steps=True,
+		default_phi_step=0.01,
+		default_theta_step=0.01,
+		default_r_step=0.01,
+		default_w_log_step=0.01,
+		default_upper_w_log_step=4,
+		initial_cost_chunk_size=100,
+		cap_core_count: int = 0,  # 0 means cap at num cores - 1
+	):
+		self.model_name_pairs = model_name_pairs
+		self.cost_function = cost_function
+		self.num_runs = num_runs
+		self.n_c = n_c
+		self.n_s = n_s
+		self.m_max = m_max
+		self.target_cost = target_cost  # This is not optional here!
+
+		self.num_dmc_gens = num_initial_dmc_gens
+
+		self.level_0_seed_seed = level_0_seed_seed
+		self.mcmc_seed_seed = mcmc_seed_seed
+
+		self.use_adaptive_steps = use_adaptive_steps
+		self.default_phi_step = default_phi_step
+		self.default_theta_step = default_theta_step
+		self.default_r_step = default_r_step
+		self.default_w_log_step = default_w_log_step
+		self.default_upper_w_log_step = default_upper_w_log_step
+		self.initial_cost_chunk_size = initial_cost_chunk_size
+		self.cap_core_count = cap_core_count
+
+	def execute(self) -> Sequence[MultiSubsetSimulationResult]:
+		output: List[MultiSubsetSimulationResult] = []
+		for model_index, model_name_pair in enumerate(self.model_name_pairs):
+			ss_results = [
+				SubsetSimulation(
+					model_name_pair,
+					self.cost_function,
+					self.n_c,
+					self.n_s,
+					self.m_max,
+					self.target_cost,
+					num_initial_dmc_gens=self.num_dmc_gens,
+					level_0_seed=[model_index, run_index, self.level_0_seed_seed],
+					mcmc_seed=[model_index, run_index, self.mcmc_seed_seed],
+					use_adaptive_steps=self.use_adaptive_steps,
+					default_phi_step=self.default_phi_step,
+					default_theta_step=self.default_theta_step,
+					default_r_step=self.default_r_step,
+					default_w_log_step=self.default_w_log_step,
+					default_upper_w_log_step=self.default_upper_w_log_step,
+					keep_probs_list=False,
+					dump_last_generation_to_file=False,
+					initial_cost_chunk_size=self.initial_cost_chunk_size,
+					cap_core_count=self.cap_core_count,
+				).execute()
+				for run_index in range(self.num_runs)
+			]
+			output.append(coalesce_ss_results(model_name_pair[0], ss_results))
+		return output
+
+
+def coalesce_ss_results(
+	model_name: str, results: Sequence[SubsetSimulationResult]
+) -> MultiSubsetSimulationResult:
+
+	num_finished = sum(1 for res in results if res.under_target_likelihood is not None)
+
+	estimated_likelihoods = numpy.array(
+		[
+			res.under_target_likelihood
+			if res.under_target_likelihood is not None
+			else res.lowest_likelihood
+			for res in results
+		]
+	)
+
+	_logger.info(estimated_likelihoods)
+	geometric_mean_estimated_likelihoods = numpy.exp(
+		numpy.log(estimated_likelihoods).mean()
+	)
+	_logger.info(geometric_mean_estimated_likelihoods)
+	arithmetic_mean_estimated_likelihoods = estimated_likelihoods.mean()
+
+	result = MultiSubsetSimulationResult(
+		child_results=results,
+		model_name=model_name,
+		estimated_likelihood=geometric_mean_estimated_likelihoods,
+		arithmetic_mean_estimated_likelihood=arithmetic_mean_estimated_likelihoods,
+		num_children=len(results),
+		num_finished_children=num_finished,
+		clean_estimate=num_finished == len(results),
+	)
+	return result
+
+
 def reverse_bisect_right(a, x, lo=0, hi=None):
 	"""Return the index where to insert item x in list a, assuming a is sorted in descending order.

--- a/poetry.lock
+++ b/poetry.lock
@ -786,13 +786,13 @@ files = [

 [[package]]
 name = "pdme"
-version = "1.0.0"
+version = "1.5.0"
 description = "Python dipole model evaluator"
 optional = false
 python-versions = "<3.10,>=3.8.1"
 files = [
-    {file = "pdme-1.0.0-py3-none-any.whl", hash = "sha256:8fb8d1bf3d88f73118da5731332ae00c721b98daf53b225069e422af1a1a67f2"},
-    {file = "pdme-1.0.0.tar.gz", hash = "sha256:02cabf2e6fc2ddaf0871d0b3afcf265bca16520ee7bc1c74672be62f7a8390bd"},
+    {file = "pdme-1.5.0-py3-none-any.whl", hash = "sha256:1b4fa30ba98a336957b3029563552d73286a3a5f932809ac1330e65a1f61c363"},
+    {file = "pdme-1.5.0.tar.gz", hash = "sha256:cc0ac4ffab2994e08b4efde2991c6d9dccb2942c7e33c4be3b52e068366526d1"},
 ]

 [package.dependencies]
@ -1275,4 +1275,4 @@ testing = ["big-O", "jaraco.functools", "jaraco.itertools", "more-itertools", "p
 [metadata]
 lock-version = "2.0"
 python-versions = ">=3.8.1,<3.10"
-content-hash = "a28054e255cbd49396795127380c2b7a0cfd742b15cba2184322f3c4894ed041"
+content-hash = "85114054176aa164964acea6fdc085581ee7fc2f94c1cd03ad77611b82e52c79"
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,12 +1,12 @@
 [tool.poetry]
 name = "deepdog"
-version = "1.0.1"
+version = "1.7.0"
 description = ""
 authors = ["Deepak Mallubhotla <dmallubhotla+github@gmail.com>"]

 [tool.poetry.dependencies]
 python = ">=3.8.1,<3.10"
-pdme = "^1.0.0"
+pdme = "^1.5.0"
 numpy = "1.22.3"
 scipy = "1.10"
 tqdm = "^4.66.2"
@ -22,6 +22,7 @@ syrupy = "^4.0.8"

 [tool.poetry.scripts]
 probs = "deepdog.cli.probs:wrapped_main"
+subset_sim_probs = "deepdog.cli.subset_sim_probs:wrapped_main"

 [build-system]
 requires = ["poetry-core>=1.0.0"]
--- a/tests/direct_monte_carlo/init.py
+++ b/tests/direct_monte_carlo/init.py
--- a/tests/direct_monte_carlo/test_config_filename.py
+++ b/tests/direct_monte_carlo/test_config_filename.py
@ -0,0 +1,26 @@
+import re
+import deepdog.direct_monte_carlo
+
+
+def test_config_check_self():
+	config = deepdog.direct_monte_carlo.DirectMonteCarloConfig(
+		tag="test_tag",
+		bayesrun_file_timestamp=False,
+	)
+	expected_filename = "test_tag.realdata.fast_filter.bayesrun.csv"
+	actual_filename = config.get_filename()
+	assert actual_filename == expected_filename
+	regex = config.get_filename_regex()
+	assert re.match(regex, actual_filename) is not None
+
+
+def test_config_check_self_with_timestamp():
+	config = deepdog.direct_monte_carlo.DirectMonteCarloConfig(
+		tag="test_tag",
+		bayesrun_file_timestamp=True,
+	)
+	expected_filename_ending = "test_tag.realdata.fast_filter.bayesrun.csv"
+	actual_filename = config.get_filename()
+	assert actual_filename.endswith(expected_filename_ending)
+	regex = config.get_filename_regex()
+	assert re.match(regex, actual_filename) is not None
--- a/tests/direct_monte_carlo/test_cost_function_filter.py
+++ b/tests/direct_monte_carlo/test_cost_function_filter.py
@ -0,0 +1,42 @@
+import deepdog.direct_monte_carlo.cost_function_filter
+import numpy
+
+
+def test_px_cost_function_filter_example():
+
+	dipoles_1 = [
+		[1, 2, 3, 4, 5, 6, 7],
+		[2, 3, 2, 5, 4, 7, 6],
+	]
+
+	dipoles_2 = [
+		[15, 9, 8, 7, 6, 5, 3],
+		[30, 4, 4, 7, 3, 1, 4],
+	]
+
+	dipoleses = numpy.array([dipoles_1, dipoles_2])
+
+	def cost_function(dipoleses: numpy.ndarray) -> numpy.ndarray:
+		return dipoleses[:, :, 0].max(axis=-1)
+
+	expected_costs = numpy.array([2, 30])
+
+	numpy.testing.assert_array_equal(cost_function(dipoleses), expected_costs)
+
+	filter = deepdog.direct_monte_carlo.cost_function_filter.CostFunctionTargetFilter(
+		cost_function, 5
+	)
+
+	actual_filtered = filter.filter_samples(dipoleses)
+	expected_filtered = numpy.array([dipoles_1])
+	assert actual_filtered.size != 0
+	numpy.testing.assert_array_equal(actual_filtered, expected_filtered)
+
+	filter_stricter = (
+		deepdog.direct_monte_carlo.cost_function_filter.CostFunctionTargetFilter(
+			cost_function, 0.5
+		)
+	)
+
+	actual_filtered_stricter = filter_stricter.filter_samples(dipoleses)
+	assert actual_filtered_stricter.size == 0
--- a/tests/direct_monte_carlo/test_eletric_field_x_dmc_filter.py
+++ b/tests/direct_monte_carlo/test_eletric_field_x_dmc_filter.py
@ -0,0 +1,137 @@
+import pdme.measurement
+import pdme.measurement.input_types
+from pdme.model import (
+	LogSpacedRandomCountMultipleDipoleFixedMagnitudeModel,
+	LogSpacedRandomCountMultipleDipoleFixedMagnitudeXYModel,
+	LogSpacedRandomCountMultipleDipoleFixedMagnitudeFixedOrientationModel,
+)
+import deepdog.direct_monte_carlo.dmc_filters
+import numpy.random
+import numpy.testing
+import logging
+
+_logger = logging.getLogger(__name__)
+
+
+def fixed_z_model_func(
+	xmin,
+	xmax,
+	ymin,
+	ymax,
+	zmin,
+	zmax,
+	wexp_min,
+	wexp_max,
+	pfixed,
+	n_max,
+	prob_occupancy,
+):
+	return LogSpacedRandomCountMultipleDipoleFixedMagnitudeFixedOrientationModel(
+		xmin,
+		xmax,
+		ymin,
+		ymax,
+		zmin,
+		zmax,
+		wexp_min,
+		wexp_max,
+		pfixed,
+		0,
+		0,
+		n_max,
+		prob_occupancy,
+	)
+
+
+def get_model(orientation):
+	model_funcs = {
+		"fixedz": fixed_z_model_func,
+		"free": LogSpacedRandomCountMultipleDipoleFixedMagnitudeModel,
+		"fixedxy": LogSpacedRandomCountMultipleDipoleFixedMagnitudeXYModel,
+	}
+	model = model_funcs[orientation](
+		-10,
+		10,
+		-17.5,
+		17.5,
+		5,
+		7.5,
+		-5,
+		6.5,
+		10**3,
+		2,
+		0.99999999,
+	)
+	model.n = 2
+	model.rng = numpy.random.default_rng(1234)
+
+	return (
+		f"connors_geom-5height-orientation_{orientation}-pfixexp_{3}-dipole_count_{2}",
+		model,
+	)
+
+
+def test_electric_field_x_dmc_filter():
+
+	dipoles_raw = [
+		[(1, 2, 3), (4, 5, 6), 1],
+		[(-1, 5, 2), (6, 5, 4), 10],
+	]
+	dipoles = [
+		pdme.measurement.OscillatingDipole(numpy.array(d[0]), numpy.array(d[1]), d[2])
+		for d in dipoles_raw
+	]
+
+	_logger.debug(f"dipoles: {dipoles}")
+	dot_inputs_raw = [
+		([-1, -1, 0], 1),
+		([-1, -1, 0], 2),
+		([-1, -1, 0], 3),
+		([-1, -1, 0], 4),
+	]
+	dot_inputs_array = pdme.measurement.input_types.dot_inputs_to_array(dot_inputs_raw)
+	_logger.debug(f"dot_inputs_array: {dot_inputs_array}")
+
+	arrangement = pdme.measurement.OscillatingDipoleArrangement(dipoles)
+	measurements = []
+	for input in dot_inputs_raw:
+		ex = sum(
+			[
+				dipole.s_electric_fieldx_at_position(*input)
+				for dipole in arrangement.dipoles
+			]
+		)
+		ex_low = ex * 0.5
+		ex_high = ex * 1.5
+		meas = pdme.measurement.DotRangeMeasurement(ex_low, ex_high, input[0], input[1])
+		measurements.append(meas)
+
+	filter = deepdog.direct_monte_carlo.dmc_filters.SingleDotSpinQubitFrequencyFilter(
+		measurements
+	)
+
+	samples = numpy.array(
+		[
+			[
+				[1, 2, 3, 4, 5, 6, 1],
+				[-1, 5, 2, 6, 5, 4, 10],
+			],
+			[
+				[10, 20, 30, 40, 50, 60, 1],
+				[-1, 5, 2, 6, 5, 4, 1],
+			],
+			[
+				[1, 1, 1, 1, 1, 1, 1],
+				[2, 2, 2, 2, 2, 2, 1],
+			],
+		]
+	)
+
+	expected = samples[
+		0:1
+	]  # only expect to see the first guy, because that's what generated our thing
+	filtered = filter.filter_samples(samples)
+	assert len(filtered) != len(samples), "Should have filtered some out!"
+	numpy.testing.assert_array_equal(
+		filtered, expected, "The filter should have only returned the first one"
+	)
--- a/tests/indexify/test_indexify.py
+++ b/tests/indexify/test_indexify.py
@ -10,3 +10,12 @@ def test_indexifier():
 	_logger.debug(f"setting up indexifier {indexifier}")
 	assert indexifier.indexify(0) == {"key_1": 1, "key_2": "a"}
 	assert indexifier.indexify(5) == {"key_1": 2, "key_2": "c"}
+	assert len(indexifier) == 9
+
+
+def test_indexifier_length_short():
+	weight_dict = {"key_1": [1, 2, 3], "key_2": ["b", "c"]}
+	indexifier = deepdog.indexify.Indexifier(weight_dict)
+	_logger.debug(f"setting up indexifier {indexifier}")
+
+	assert len(indexifier) == 6
--- a/tests/results/test_column_results.py
+++ b/tests/results/test_column_results.py
@ -1,4 +1,4 @@
-import deepdog.results
+import deepdog.results.read_csv


 def test_parse_groupdict():
@ -6,8 +6,9 @@ def test_parse_groupdict():
 		"geom_-20_20_-10_10_0_5-orientation_free-dipole_count_100_success"
 	)

-	parsed = deepdog.results._parse_bayesrun_column(example_column_name)
-	expected = deepdog.results.BayesrunColumnParsed(
+	parsed = deepdog.results.read_csv._parse_bayesrun_column(example_column_name)
+	assert parsed is not None
+	expected = deepdog.results.read_csv.BayesrunColumnParsed(
 		{
 			"xmin": "-20",
 			"xmax": "20",
@ -23,6 +24,52 @@ def test_parse_groupdict():
 	assert parsed == expected


+def test_parse_groupdict_with_magnitude():
+	example_column_name = (
+		"geom_-20_20_-10_10_0_5-magnitude_3.5-orientation_free-dipole_count_100_success"
+	)
+
+	parsed = deepdog.results.read_csv._parse_bayesrun_column(example_column_name)
+	assert parsed is not None
+	expected = deepdog.results.read_csv.BayesrunColumnParsed(
+		{
+			"xmin": "-20",
+			"xmax": "20",
+			"ymin": "-10",
+			"ymax": "10",
+			"zmin": "0",
+			"zmax": "5",
+			"orientation": "free",
+			"avg_filled": "100",
+			"log_magnitude": "3.5",
+			"field_name": "success",
+		}
+	)
+	assert parsed == expected
+
+
+def test_parse_groupdict_with_negative_magnitude():
+	example_column_name = "geom_-20_20_-10_10_0_5-magnitude_-3.5-orientation_free-dipole_count_100_success"
+
+	parsed = deepdog.results.read_csv._parse_bayesrun_column(example_column_name)
+	assert parsed is not None
+	expected = deepdog.results.read_csv.BayesrunColumnParsed(
+		{
+			"xmin": "-20",
+			"xmax": "20",
+			"ymin": "-10",
+			"ymax": "10",
+			"zmin": "0",
+			"zmax": "5",
+			"orientation": "free",
+			"avg_filled": "100",
+			"log_magnitude": "-3.5",
+			"field_name": "success",
+		}
+	)
+	assert parsed == expected
+
+
 # def test_parse_no_match_column_name():
 # 	parsed = deepdog.results.parse_bayesrun_column("There's nothing here")
 # 	assert parsed is None
--- a/tests/results/test_parse_filename.py
+++ b/tests/results/test_parse_filename.py
@ -0,0 +1,19 @@
+import deepdog.results
+import pytest
+
+
+def test_parse_bayesrun_filename():
+	valid1 = "20250226-204120-dot1-dot1-2-0.realdata.fast_filter.bayesrun.csv"
+
+	timestamp, slug = deepdog.results._parse_string_output_filename(valid1)
+	assert timestamp == "20250226-204120"
+	assert slug == "dot1-dot1-2-0"
+
+	valid2 = "dot1-dot1-2-0.realdata.fast_filter.bayesrun.csv"
+
+	timestamp, slug = deepdog.results._parse_string_output_filename(valid2)
+	assert timestamp is None
+	assert slug == "dot1-dot1-2-0"
+
+	with pytest.raises(ValueError):
+		deepdog.results._parse_string_output_filename("not_a_valid_filename")
--- a/tests/subset_simulation/snapshots/test_subset_simulation_coalescing.ambr
+++ b/tests/subset_simulation/snapshots/test_subset_simulation_coalescing.ambr
@ -0,0 +1,10 @@
+# serializer version: 1
+# name: test_subset_simulation_multi_result_coalescing_easy_arithmetic
+  MultiSubsetSimulationResult(child_results=[SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.8, lowest_likelihood=0.5, messages=[]), SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.6, lowest_likelihood=0.01, messages=[])], model_name='test', estimated_likelihood=0.6928203230275509, arithmetic_mean_estimated_likelihood=0.7, num_children=2, num_finished_children=2, clean_estimate=True)
+# ---
+# name: test_subset_simulation_multi_result_coalescing_easy_geometric
+  MultiSubsetSimulationResult(child_results=[SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.1, lowest_likelihood=0.5, messages=[]), SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.001, lowest_likelihood=0.01, messages=[])], model_name='test', estimated_likelihood=0.010000000000000004, arithmetic_mean_estimated_likelihood=0.0505, num_children=2, num_finished_children=2, clean_estimate=True)
+# ---
+# name: test_subset_simulation_multi_result_coalescing_include_dirty
+  MultiSubsetSimulationResult(child_results=[SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.8, lowest_likelihood=0.5, messages=[]), SubsetSimulationResult(probs_list=(), over_target_cost=1, over_target_likelihood=1, under_target_cost=0.99, under_target_likelihood=0.08, lowest_likelihood=0.01, messages=[]), SubsetSimulationResult(probs_list=(), over_target_cost=None, over_target_likelihood=None, under_target_cost=None, under_target_likelihood=None, lowest_likelihood=0.0001, messages=[])], model_name='test', estimated_likelihood=0.01856635533445112, arithmetic_mean_estimated_likelihood=0.29336666666666666, num_children=3, num_finished_children=2, clean_estimate=False)
+# ---
--- a/tests/subset_simulation/test_subset_simulation_coalescing.py
+++ b/tests/subset_simulation/test_subset_simulation_coalescing.py
@ -0,0 +1,92 @@
+import deepdog.subset_simulation.subset_simulation_impl as impl
+import numpy
+
+
+def test_subset_simulation_multi_result_coalescing_include_dirty(snapshot):
+	res1 = impl.SubsetSimulationResult(
+		probs_list=(),
+		over_target_cost=1,
+		over_target_likelihood=1,
+		under_target_cost=0.99,
+		under_target_likelihood=0.8,
+		lowest_likelihood=0.5,
+		messages=[],
+	)
+
+	res2 = impl.SubsetSimulationResult(
+		probs_list=(),
+		over_target_cost=1,
+		over_target_likelihood=1,
+		under_target_cost=0.99,
+		under_target_likelihood=0.08,
+		lowest_likelihood=0.01,
+		messages=[],
+	)
+
+	res3 = impl.SubsetSimulationResult(
+		probs_list=(),
+		over_target_cost=None,
+		over_target_likelihood=None,
+		under_target_cost=None,
+		under_target_likelihood=None,
+		lowest_likelihood=0.0001,
+		messages=[],
+	)
+
+	combined = impl.coalesce_ss_results("test", [res1, res2, res3])
+
+	assert combined == snapshot
+
+
+def test_subset_simulation_multi_result_coalescing_easy_arithmetic(snapshot):
+	res1 = impl.SubsetSimulationResult(
+		probs_list=(),
+		over_target_cost=1,
+		over_target_likelihood=1,
+		under_target_cost=0.99,
+		under_target_likelihood=0.8,
+		lowest_likelihood=0.5,
+		messages=[],
+	)
+
+	res2 = impl.SubsetSimulationResult(
+		probs_list=(),
+		over_target_cost=1,
+		over_target_likelihood=1,
+		under_target_cost=0.99,
+		under_target_likelihood=0.6,
+		lowest_likelihood=0.01,
+		messages=[],
+	)
+
+	combined = impl.coalesce_ss_results("test", [res1, res2])
+
+	assert combined.arithmetic_mean_estimated_likelihood == 0.7
+	assert combined == snapshot
+
+
+def test_subset_simulation_multi_result_coalescing_easy_geometric(snapshot):
+	res1 = impl.SubsetSimulationResult(
+		probs_list=(),
+		over_target_cost=1,
+		over_target_likelihood=1,
+		under_target_cost=0.99,
+		under_target_likelihood=0.1,
+		lowest_likelihood=0.5,
+		messages=[],
+	)
+
+	res2 = impl.SubsetSimulationResult(
+		probs_list=(),
+		over_target_cost=1,
+		over_target_likelihood=1,
+		under_target_cost=0.99,
+		under_target_likelihood=0.001,
+		lowest_likelihood=0.01,
+		messages=[],
+	)
+
+	combined = impl.coalesce_ss_results("test", [res1, res2])
+
+	numpy.testing.assert_allclose(combined.estimated_likelihood, 0.01)
+	assert combined == snapshot
--- a/tests/test_bayes_run_with_ss.py
+++ b/tests/test_bayes_run_with_ss.py
@ -1,158 +0,0 @@
-import deepdog
-import logging
-import logging.config
-
-import numpy.random
-
-from pdme.model import (
-	LogSpacedRandomCountMultipleDipoleFixedMagnitudeModel,
-	LogSpacedRandomCountMultipleDipoleFixedMagnitudeXYModel,
-	LogSpacedRandomCountMultipleDipoleFixedMagnitudeFixedOrientationModel,
-)
-
-
-_logger = logging.getLogger(__name__)
-
-
-def fixed_z_model_func(
-	xmin,
-	xmax,
-	ymin,
-	ymax,
-	zmin,
-	zmax,
-	wexp_min,
-	wexp_max,
-	pfixed,
-	n_max,
-	prob_occupancy,
-):
-	return LogSpacedRandomCountMultipleDipoleFixedMagnitudeFixedOrientationModel(
-		xmin,
-		xmax,
-		ymin,
-		ymax,
-		zmin,
-		zmax,
-		wexp_min,
-		wexp_max,
-		pfixed,
-		0,
-		0,
-		n_max,
-		prob_occupancy,
-	)
-
-
-def get_model(orientation):
-	model_funcs = {
-		"fixedz": fixed_z_model_func,
-		"free": LogSpacedRandomCountMultipleDipoleFixedMagnitudeModel,
-		"fixedxy": LogSpacedRandomCountMultipleDipoleFixedMagnitudeXYModel,
-	}
-	model = model_funcs[orientation](
-		-10,
-		10,
-		-17.5,
-		17.5,
-		5,
-		7.5,
-		-5,
-		6.5,
-		10**3,
-		2,
-		0.99999999,
-	)
-	model.n = 2
-	model.rng = numpy.random.default_rng(1234)
-
-	return (
-		f"connors_geom-5height-orientation_{orientation}-pfixexp_{3}-dipole_count_{2}",
-		model,
-	)
-
-
-def test_basic_analysis(snapshot):
-
-	dot_positions = [[0, 0, 0], [0, 1, 0]]
-
-	freqs = [1, 10, 100]
-	models = []
-
-	orientations = ["free", "fixedxy", "fixedz"]
-	for orientation in orientations:
-		models.append(get_model(orientation))
-
-	_logger.info(f"have {len(models)} models to look at")
-	if len(models) == 1:
-		_logger.info(f"only one model, name: {models[0][0]}")
-
-	square_run = deepdog.BayesRunWithSubspaceSimulation(
-		dot_positions,
-		freqs,
-		models,
-		models[0][1],
-		filename_slug="test",
-		end_threshold=0.9,
-		ss_n_c=5,
-		ss_n_s=2,
-		ss_m_max=10,
-		ss_target_cost=150,
-		ss_level_0_seed=200,
-		ss_mcmc_seed=20,
-		ss_use_adaptive_steps=True,
-		ss_default_phi_step=0.01,
-		ss_default_theta_step=0.01,
-		ss_default_r_step=0.01,
-		ss_default_w_log_step=0.01,
-		ss_default_upper_w_log_step=4,
-		ss_dump_last_generation=False,
-		write_output_to_bayesruncsv=False,
-		ss_initial_costs_chunk_size=1000,
-	)
-	result = square_run.go()
-
-	assert result == snapshot
-
-
-def test_bayesss_with_tighter_cost(snapshot):
-
-	dot_positions = [[0, 0, 0], [0, 1, 0]]
-
-	freqs = [1, 10, 100]
-	models = []
-
-	orientations = ["free", "fixedxy", "fixedz"]
-	for orientation in orientations:
-		models.append(get_model(orientation))
-
-	_logger.info(f"have {len(models)} models to look at")
-	if len(models) == 1:
-		_logger.info(f"only one model, name: {models[0][0]}")
-
-	square_run = deepdog.BayesRunWithSubspaceSimulation(
-		dot_positions,
-		freqs,
-		models,
-		models[0][1],
-		filename_slug="test",
-		end_threshold=0.9,
-		ss_n_c=5,
-		ss_n_s=2,
-		ss_m_max=10,
-		ss_target_cost=1.5,
-		ss_level_0_seed=200,
-		ss_mcmc_seed=20,
-		ss_use_adaptive_steps=True,
-		ss_default_phi_step=0.01,
-		ss_default_theta_step=0.01,
-		ss_default_r_step=0.01,
-		ss_default_w_log_step=0.01,
-		ss_default_upper_w_log_step=4,
-		ss_dump_last_generation=False,
-		write_output_to_bayesruncsv=False,
-		ss_initial_costs_chunk_size=1,
-	)
-	result = square_run.go()
-
-	assert result == snapshot
Author	SHA1	Message	Date
Deepak Mallubhotla	71dc906a96	chore(release): 1.7.0 All checks were successful gitea-physics/deepdog/pipeline/head This commit looks good Details gitea-physics/deepdog/pipeline/tag This commit looks good Details	2025-02-26 21:57:13 -06:00
Deepak Mallubhotla	24c6e311c1	feat: adds configurable skip if file exists All checks were successful gitea-physics/deepdog/pipeline/head This commit looks good Details	2025-02-26 21:55:12 -06:00
Deepak Mallubhotla	4dd3004a7b	chore(release): 1.6.0 All checks were successful gitea-physics/deepdog/pipeline/head This commit looks good Details gitea-physics/deepdog/pipeline/tag This commit looks good Details	2025-02-26 21:08:00 -06:00
Deepak Mallubhotla	46f6b6cdf1	feat: Adds ability to parse bayesruns without timestamps All checks were successful gitea-physics/deepdog/pipeline/head This commit looks good Details	2025-02-26 21:01:19 -06:00
Deepak Mallubhotla	c8435b4b2a	feat: allows negative log magnitude strings in models All checks were successful gitea-physics/deepdog/pipeline/head This commit looks good Details	2025-02-24 08:34:11 -06:00
Deepak Mallubhotla	c2375e6f5c	chore(release): 1.5.0 All checks were successful gitea-physics/deepdog/pipeline/head This commit looks good Details gitea-physics/deepdog/pipeline/tag This commit looks good Details	2024-12-29 21:23:30 -06:00
Deepak Mallubhotla	a1b59cd18b	feat: add configurable max number of dipoles to write All checks were successful gitea-physics/deepdog/pipeline/head This commit looks good Details	2024-12-29 21:14:59 -06:00
Deepak Mallubhotla	53f8993f2b	feat: add configurable max number of dipoles to write	2024-12-29 21:13:34 -06:00
Deepak Mallubhotla	700f32ea58	chore(release): 1.4.0 All checks were successful gitea-physics/deepdog/pipeline/head This commit looks good Details gitea-physics/deepdog/pipeline/tag This commit looks good Details	2024-09-04 13:58:56 -05:00
Deepak Mallubhotla	3737252c4b	log: adds additional logging of dipole count All checks were successful gitea-physics/deepdog/pipeline/head This commit looks good Details	2024-09-04 13:56:09 -05:00
Deepak Mallubhotla	6f79a49e59	log: adds additional logging of dipole count All checks were successful gitea-physics/deepdog/pipeline/head This commit looks good Details	2024-09-04 13:54:50 -05:00
Deepak Mallubhotla	d962ecb11e	feat: indexifier now has len	2024-08-26 03:34:57 -05:00
Deepak Mallubhotla	7beca501bf	fmt: ran formatter	2024-08-26 03:34:50 -05:00
Deepak Mallubhotla	5425ce1362	feat: allows some betetr matching for single_dipole runs	2024-08-26 03:31:15 -05:00
Deepak Mallubhotla	6a5c5931d4	fix: update log file arg names in cli scripts	2024-05-21 16:10:02 -05:00
Deepak Mallubhotla	36ff75576c	chore: removes redundant import All checks were successful gitea-physics/deepdog/pipeline/head This commit looks good Details	2024-05-21 15:55:25 -05:00
Deepak Mallubhotla	e76c619c8b	fmt: formatting changes	2024-05-21 15:54:55 -05:00
Deepak Mallubhotla	c881da2837	feat: add subset sim probs command for bayes for subset simulation results Some checks failed gitea-physics/deepdog/pipeline/head There was a failure building this commit Details	2024-05-21 15:54:08 -05:00
Deepak Mallubhotla	1a1ecc01ea	chore: adds vscode to gitignore	2024-05-21 15:53:21 -05:00
Deepak Mallubhotla	9cfd484d7c	chore(release): 1.3.0 All checks were successful gitea-physics/deepdog/pipeline/tag This commit looks good Details gitea-physics/deepdog/pipeline/head This commit looks good Details	2024-05-19 22:13:39 -05:00
Deepak Mallubhotla	09fad2e102	feat: improve initial cost calculation to allow multiprocessing, adds ability to specify a number of levels to do with direct mc instead of subset simulation	2024-05-19 22:11:50 -05:00
Deepak Mallubhotla	24ac65bf9c	fix: fix seeding to avoid recreating seed combinations across multi runs	2024-05-19 22:10:40 -05:00
Deepak Mallubhotla	8fbae32111	doc: some commenting and logging changes	2024-05-19 22:09:52 -05:00
Deepak Mallubhotla	b1c01b25c8	fix: Adds ugly hack for stdevs for this uniform range to multiply by root3, proper fix would be in pdme	2024-05-19 22:08:44 -05:00
Deepak Mallubhotla	a14d9834e5	doc: note on refactoring for subset sim probs	2024-05-19 22:01:42 -05:00
Deepak Mallubhotla	8d04803eb3	fmt: formatting, nicer log, removing comment	2024-05-19 02:29:59 -05:00
Deepak Mallubhotla	92b49fce7c	feat: add multi run to wrap multi model and repeat runs	2024-05-19 02:27:11 -05:00
Deepak Mallubhotla	8845b2875f	feat: adds a filter that works with cost functions	2024-05-19 02:26:00 -05:00
Deepak Mallubhotla	72791f2d0f	deps: update pdme	2024-05-19 02:25:29 -05:00
Deepak Mallubhotla	d258cfbec7	chore(release): 1.2.1 All checks were successful gitea-physics/deepdog/pipeline/head This commit looks good Details gitea-physics/deepdog/pipeline/tag This commit looks good Details	2024-05-11 20:51:05 -05:00
Deepak Mallubhotla	b3bf4cde97	perf: precompile the magic regexes for probs parsing	2024-05-11 20:49:45 -05:00
Deepak Mallubhotla	60f29b0b2f	perf: avoid recalculating product dict in indexifier to improve performance for probs	2024-05-11 20:49:26 -05:00
Deepak Mallubhotla	093a3fb5c4	chore(release): 1.2.0 All checks were successful gitea-physics/deepdog/pipeline/head This commit looks good Details gitea-physics/deepdog/pipeline/tag This commit looks good Details	2024-05-08 22:24:28 -05:00
Deepak Mallubhotla	dc1d2d45a3	feat: adds additional matching regexes All checks were successful gitea-physics/deepdog/pipeline/head This commit looks good Details	2024-05-08 22:23:57 -05:00
Deepak Mallubhotla	f0e2fa3da9	feat: adds magnitude enabled parsing option	2024-05-03 10:44:06 -05:00
Deepak Mallubhotla	2581e722e6	chore(release): 1.1.0 All checks were successful gitea-physics/deepdog/pipeline/head This commit looks good Details gitea-physics/deepdog/pipeline/tag This commit looks good Details	2024-05-02 23:13:21 -05:00
Deepak Mallubhotla	62bd63bf9b	refactor: removes redundant calculation and uses pdme	2024-05-02 23:12:21 -05:00
Deepak Mallubhotla	df4d0b5d15	deps: upgrades pdme dep	2024-05-02 22:40:06 -05:00
Deepak Mallubhotla	5361dada8b	feat: removes legacy bayes run, technically breaking but just don't use them	2024-05-02 22:04:49 -05:00
Deepak Mallubhotla	29029c137a	deps: upgrades pdme	2024-05-02 18:17:33 -05:00
Deepak Mallubhotla	fb018abeae	feat: allows disabling timestamps in directmc bayesrun files All checks were successful gitea-physics/deepdog/pipeline/head This commit looks good Details	2024-05-01 21:40:53 -05:00