Ensemble generation#

The rethon package contains several methods and classes to generate RE ensembles using the different implementations.

There are four classes with the following class hierarchy:

  • EnsembleGenerator

    • SimpleEnsembleGenerator

    • GlobalREEnsembleGenerator

    • LocalREEnsembleGenerator

Source: You can download this notebook from here.

Generating ensembles with the EnsembleGenerator#

The ensemble generator provides different iterators and has to be instantiated with

  • a list of dialectical structures in a simplified format (arguments are represented as integer lists),

  • the size of the sentence pool,

  • a list of initial commitments,

  • a list of model-parameters,

  • a list of implementations of Position, DialecticalStructure and ReflectiveEquilibrium and

  • a switch indicating whether all branches of each model run should be created as well.

The iterator will loop through the cartesian product of the given model implementations, dialectical structures, different model parameters, initial commitments and, if desired, through all branches.

Let’s take a look at a simple example. The algorithm will take standard model parameters if we do not specify model parameters and it will take a standard implementation if we do not specify implementations. The simple example will use one dialectical structure and two initial commitments:

[1]:
from rethon import EnsembleGenerator

# the standard example with a sentence pool n=7
n = 7
arguments = [[1, 3],[1, 4],[1, 5],[1, -6], [2, -4],[2, 5],[2, 6],[2, 7]]
dia_structure_list = [arguments]
# two initial commitments
init_coms_1 = {3, 4, 5}
init_coms_2 = {3, 4, 5, 6, 7}
coms_list = [init_coms_1, init_coms_2]

ensemble_gen = EnsembleGenerator(arguments_list = dia_structure_list,
                                 n_sentence_pool = n,
                                 initial_commitments_list = coms_list)
# looping through the ensemble
for model_run in ensemble_gen.ensemble_iter():
    print(f"The model run ended with {model_run.state().last_commitments()} as its final commitments.")

print("New ensemble with all branches:")
# running all branches and looping through them:
ensemble_gen = EnsembleGenerator(arguments_list = dia_structure_list,
                                 n_sentence_pool = n,
                                 initial_commitments_list = coms_list,
                                 create_branches = True)
for model_run in ensemble_gen.ensemble_iter():
    print(f"The model run ended with {model_run.state().last_commitments()} as its final commitments.")
The model run ended with {1, 3, 4, 5, -6, -2} as its final commitments.
The model run ended with {1, 3, 4, 5, -6, -2} as its final commitments.
New ensemble with all branches:
The model run ended with {1, 3, 4, 5, -6, -2} as its final commitments.
The model run ended with {2, 5, 6, 7, -4, -1} as its final commitments.
The model run ended with {1, 3, 4, 5, -6, -2} as its final commitments.

The module provides several helper methods to produce random arguments, initial commitments and a variation of model parameters that can be used to generate more complex ensembles:

[2]:
from rethon import EnsembleGenerator
from tau.util import create_random_argument_list, random_position_as_set, create_random_arguments
from rethon.util import standard_model_params_varied_alphas

# Sentence pool
n=6
# Two implementations:
implementations = [{'tau_module_name': 'tau',
                    'position_class_name':'StandardPosition',
                    'dialectical_structure_class_name': 'DAGDialecticalStructure',
                    'rethon_module_name': 'rethon',
                    'reflective_equilibrium_class_name': 'StandardGlobalReflectiveEquilibrium'},
                   {'tau_module_name': 'tau',
                    'position_class_name':'SetBasedPosition',
                    'dialectical_structure_class_name': 'DAGSetBasedDialecticalStructure',
                    'rethon_module_name': 'rethon',
                    'reflective_equilibrium_class_name': 'GlobalSetBasedReflectiveEquilibrium'
                    }]

# Two random dialectical structures with 7 arguments each
n_dialectical_structures = 2
arguments_list = [create_random_arguments(n_sentences=n,
                                          n_arguments=7,
                                          n_max_premises=2) for i in range(n_dialectical_structures)]

# Standard model parameters with varied alpha weights
model_parameters_list = list(standard_model_params_varied_alphas(alpha_resolution =3))

# Two random initial commitments
n_positions = 2
coms_list = [random_position_as_set(n_sentences = n) for i in range(n_positions)]


ensemble_gen = EnsembleGenerator(arguments_list = arguments_list,
                                 n_sentence_pool = n,
                                 initial_commitments_list = coms_list,
                                 model_parameters_list = model_parameters_list,
                                 implementations = implementations)

model_runs = list(ensemble_gen.ensemble_iter())
print(f"Generated {len(model_runs)} ensembles.")
# looping through the ensemble
for model_run in model_runs:
    print(f"The model run ended with {model_run.state().last_commitments()} as its final commitments.")
Generated 24 ensembles.
The model run ended with {1, 2, 3, 5, -6, -4} as its final commitments.
The model run ended with {-1, -6, -5, -4, -3, -2} as its final commitments.
The model run ended with {1, 3, 5, -6, -4} as its final commitments.
The model run ended with {-1, -6, -5, -4, -3, -2} as its final commitments.
The model run ended with {1, 3, 5, -6, -4} as its final commitments.
The model run ended with {4, -1, -6, -5, -3, -2} as its final commitments.
The model run ended with {2, 3, 4, 5, -6} as its final commitments.
The model run ended with {6, -5, -4, -3, -2} as its final commitments.
The model run ended with {3, -6, 2} as its final commitments.
The model run ended with {6, -5, -4, -3, -2} as its final commitments.
The model run ended with {3, -6, 2, -5} as its final commitments.
The model run ended with {3, -6, -5, -4, -2} as its final commitments.
The model run ended with {1, 2, 3, 5, -6, -4} as its final commitments.
The model run ended with {-1, -6, -5, -4, -3, -2} as its final commitments.
The model run ended with {1, 3, 5, -6, -4} as its final commitments.
The model run ended with {-1, -6, -5, -4, -3, -2} as its final commitments.
The model run ended with {1, 3, 5, -6, -4} as its final commitments.
The model run ended with {4, -2, -6, -5, -3, -1} as its final commitments.
The model run ended with {2, 3, 4, 5, -6} as its final commitments.
The model run ended with {6, -5, -4, -3, -2} as its final commitments.
The model run ended with {-6, 2, 5} as its final commitments.
The model run ended with {3, -6, -5, -4, -2} as its final commitments.
The model run ended with {-6, 2, 3, -5} as its final commitments.
The model run ended with {3, -6, -5, -4, -2} as its final commitments.

Using json serialization to serialize the model runs#

By using the module rethon.util you can serialize the resulting model runs in the json format—for instance, in the following way:

[3]:
from rethon.util import rethon_dump
from os import getcwd, path

# serializing the ensemble JSON into a file
with open(file=path.join(getcwd(), 'ensemble_as_json.json'), mode='w') as output_file:
    rethon_dump({'ensemble': model_runs}, output_file, indent=4)

Generating ensemble data#

The method ensemble_items_iter can be used to generate data of your ensembles by defining data items that are generated for each model run. The method ensemble_items_iter will return a dictionary with these data items for each model run.

Data items can be defined by using the add_item method of EnsembleGenerator. The function expects a key for the data item and a function with EnsembleGenerator as one argument. The idea: The ensemble generator will call this function to generate the data item for each model run. The ensemble generator provides several methods that you can use to define a particular data item.

Let’s have a look at a simple example: We want to define a data item that stores the last commitments:

[4]:
from rethon import EnsembleGenerator

# our standard example with a sentence pool n=7
n = 7
arguments = [[1, 3],[1, 4],[1, 5],[1, -6], [2, -4],[2, 5],[2, 6],[2, 7]]
dia_structure_list = [arguments]
init_coms_1 = {3, 4, 5}
init_coms_2 = {3, 4, 5, 6, 7}
coms_list = [init_coms_1, init_coms_2]

ensemble_gen = EnsembleGenerator(arguments_list = dia_structure_list,
                                 n_sentence_pool = n,
                                 initial_commitments_list = coms_list)

# defining the function
def data_last_coms(ensemble_generator):
    return ensemble_generator.state().last_commitments()
# adding the data item
ensemble_gen.add_item('last_coms', data_last_coms)

# looping through the ensemble using the item_iter
for model_run_data_items in ensemble_gen.ensemble_items_iter():
    print(f"The model run produced the following data items:  {model_run_data_items}.")
The model run produced the following data items:  {'last_coms': {1, 3, 4, 5, -6, -2}}.
The model run produced the following data items:  {'last_coms': {2, 5, 6, 7, -4, -1}}.

You can use ananymous functions to add data fields in a compact way:

[5]:
# adding the data item
ensemble_gen.add_item('last_theory', lambda x: x.state().last_theory())

# looping through the ensemble using the item_iter
for model_run_data_items in ensemble_gen.ensemble_items_iter():
    print(f"The model run produced the following data items:  {model_run_data_items}.")
The model run produced the following data items:  {'last_coms': {1, 3, 4, 5, -6, -2}, 'last_theory': {1}}.
The model run produced the following data items:  {'last_coms': {1, 3, 4, 5, -6, -2}, 'last_theory': {1}}.

In a similar way, you have access to the RE instance itself and the dialectical structure of each model run. Refer to the api docs for more information.

Generating ensemble data with standard data items#

The classes SimpleEnsembleGenerator, GlobalREEnsembleGenerator, LocalREEnsembleGenerator have already a bunch of predefined data fields that can be extended by additional data items if you need them. (For details refer to the api-docs. [LINK NEEDS UPDATE])

  • SimpleEnsembleGenerator: Defines data fields that should work well with every implementation. Depending on whether branches are created, the generator will add fields for all fixed points for each model run.

  • GlobalREEnsembleGenerator: Extends SimpleEnsembleGenerator with additional data items that are available for globally searching RE processes.

  • LocalREEnsembleGenerator: Extends SimpleEnsembleGenerator with additional data items that are available for locally searching RE processes.

[6]:
from rethon import SimpleEnsembleGenerator

ensemble_gen = SimpleEnsembleGenerator(arguments_list = dia_structure_list,
                                 n_sentence_pool = n,
                                 initial_commitments_list = coms_list)
# looping through the ensemble using the item_iter
for model_run_data_items in ensemble_gen.ensemble_items_iter():
    print(f"The model run produced the following data items:  {model_run_data_items}.")
The model run produced the following data items:  {'model_name': 'GlobalNumpyReflectiveEquilibrium', 'ds': [[1, 3], [1, 4], [1, 5], [1, -6], [2, -4], [2, 5], [2, 6], [2, 7]], 'n_sentence_pool': 7, 'ds_arg_size': 8, 'ds_infer_dens': 0.26143928550824114, 'ds_n_consistent_complete_positions': 36, 'ds_mean_prem': 2, 'ds_variance_prem': 0, 'tau_truths': set(), 'principles': [(1, 4), (2, 4)], 'account_penalties': [0.0, 0.3, 1.0, 1.0], 'faithfulness_penalties': [0.0, 0.0, 1.0, 1.0], 'weight_account': 0.35, 'weight_systematicity': 0.55, 'weight_faithfulness': 0.1, 'init_coms': {3, 4, 5}, 'init_coms_size': 3, 'init_coms_n_tau_truths': 0, 'init_coms_n_tau_falsehoods': 0, 'init_coms_n_consistent_complete_positions': 6, 'init_coms_dia_consistent': True, 'init_coms_closed': False, 'fixed_point_coms': {1, 3, 4, 5, -6, -2}, 'fixed_point_coms_size': 6, 'fixed_point_coms_n_tau_truths': 0, 'fixed_point_coms_n_tau_falsehoods': 0, 'fixed_point_coms_closed': True, 'fixed_point_coms_consistent': True, 'fixed_point_coms_n_consistent_complete_positions': 2, 'fixed_point_theory': {1}, 'fixed_point_theory_closure': {1, 3, 4, 5, -6, -2}, 'init_coms_min_ax_bases': [{3, 4, 5}], 'n_init_coms_min_ax_base': 3, 'achievements_evolution': [0, 0.9942142852544784, 1.0, 1.0, 1.0, 1.0], 'fixed_point_dia_consistent': True, 'init_final_coms_simple_hamming': 3, 'init_final_coms_hamming': 3, 'init_final_coms_contradictions': 0, 'init_final_coms_expansions': 3, 'init_final_coms_contractions': 0, 'init_final_coms_identities': 4, 'random_choices': [], 'n_random_choices': 0, 'coms_evolution': [{3, 4, 5}, {1, 3, 4, 5, -6, -2}, {1, 3, 4, 5, -6, -2}], 'theory_evolution': [{1}, {1}, {1}], 'process_length': 6}.
The model run produced the following data items:  {'model_name': 'GlobalNumpyReflectiveEquilibrium', 'ds': [[1, 3], [1, 4], [1, 5], [1, -6], [2, -4], [2, 5], [2, 6], [2, 7]], 'n_sentence_pool': 7, 'ds_arg_size': 8, 'ds_infer_dens': 0.26143928550824114, 'ds_n_consistent_complete_positions': 36, 'ds_mean_prem': 2, 'ds_variance_prem': 0, 'tau_truths': set(), 'principles': [(1, 4), (2, 4)], 'account_penalties': [0.0, 0.3, 1.0, 1.0], 'faithfulness_penalties': [0.0, 0.0, 1.0, 1.0], 'weight_account': 0.35, 'weight_systematicity': 0.55, 'weight_faithfulness': 0.1, 'init_coms': {3, 4, 5, 6, 7}, 'init_coms_size': 5, 'init_coms_n_tau_truths': 0, 'init_coms_n_tau_falsehoods': 0, 'init_coms_n_consistent_complete_positions': 1, 'init_coms_dia_consistent': True, 'init_coms_closed': False, 'fixed_point_coms': {2, 5, 6, 7, -4, -1}, 'fixed_point_coms_size': 6, 'fixed_point_coms_n_tau_truths': 0, 'fixed_point_coms_n_tau_falsehoods': 0, 'fixed_point_coms_closed': True, 'fixed_point_coms_consistent': True, 'fixed_point_coms_n_consistent_complete_positions': 2, 'fixed_point_theory': {2}, 'fixed_point_theory_closure': {2, 5, 6, 7, -4, -1}, 'init_coms_min_ax_bases': [{3, 4, 5, 6, 7}], 'n_init_coms_min_ax_base': 5, 'achievements_evolution': [0, 0.9704897953734105, 0.986734693877551, 0.9908163265306122, 0.9918367346938776, 0.9918367346938776, 0.9918367346938776, 0.9918367346938776], 'fixed_point_dia_consistent': True, 'init_final_coms_simple_hamming': 5, 'init_final_coms_hamming': 4, 'init_final_coms_contradictions': 1, 'init_final_coms_expansions': 2, 'init_final_coms_contractions': 1, 'init_final_coms_identities': 3, 'random_choices': [1], 'n_random_choices': 1, 'coms_evolution': [{3, 4, 5, 6, 7}, {2, 3, 5, 6, 7, -4, -1}, {2, 5, 6, 7, -4, -1}, {2, 5, 6, 7, -4, -1}], 'theory_evolution': [{2, 3}, {2}, {2}, {2}], 'process_length': 8}.

Storing ensemble data as CSV files#

You can save the created data to csv files to analyse them later (e.g. by using pandas):

[7]:
from os import getcwd

ensemble_gen.ensemble_items_to_csv(
                              output_file_name = 'data_output.csv',
                              output_dir_name = getcwd(),
                              archive = True, # save the csv as archived tar.gz
                              save_preliminary_results = True, # will create preliminary csv-data sets
                              preliminary_results_interval = 50)

Complex scenarios of creating new data items#

The ensemble generator provides additional functionalities for defining data items. In particular, you can store objects in the generator that can be reused in the definition of your data items. If you have resource-intensive operations you might prefer to reuse calculated data.

Consider the following example: The algorithm will loop through all provided dialectical structures, initial conditions and model-parameter sets. If you define a data item that will calculate data on the dialectical structure alone, this data will be re-calculated for a particular dialectical structure for every model run that is based on that dialectical structure. In the case that you have 50 initial commitments and 50 different model-parameter sets, the same data is being calculated 2500 times.

Instead, you can calculate this data once for every dialectical structure, store it in the ensemble generator object and use this data in the definition of the data item. You can do this by simply overwriting a function of EnsembleGenerator:

[8]:
from rethon import EnsembleGenerator

class StoreNArgsEnsembleGenerator(EnsembleGenerator):

    # overwriting the method that initializes storing objects based on dialectical structures
    def init_tau_fields(self, dialectical_structure):
        super().init_tau_fields(dialectical_structure)
        # add the number of arguments as an object to the ensemble generator
        self.add_obj('ds_number_of_args', len(dialectical_structure.get_arguments()))

# the standard example with a sentence pool n=7
n = 7
arguments = [[1, 3],[1, 4],[1, 5],[1, -6], [2, -4],[2, 5],[2, 6],[2, 7]]
dia_structure_list = [arguments]
init_coms_1 = {3, 4, 5}
init_coms_2 = {3, 4, 5, 6, 7}
coms_list = [init_coms_1, init_coms_2]

# using the new class
ensemble_gen = StoreNArgsEnsembleGenerator(arguments_list = dia_structure_list,
                                           n_sentence_pool = n,
                                           initial_commitments_list = coms_list)

# adding the number of arguments as data item by accessing the stored object
ensemble_gen.add_item('ds_number_of_args', lambda x: x.get_obj('ds_number_of_args'))

# looping through the ensemble using the item_iter
for model_run_data_items in ensemble_gen.ensemble_items_iter():
    print(f"The model run produced the following data items:  {model_run_data_items}.")
The model run produced the following data items:  {'ds_number_of_args': 8}.
The model run produced the following data items:  {'ds_number_of_args': 8}.

In a similar way you can overwrite the following methods:

  • init_re_start_fields: Data items that are calculated based on the dialectical structure and the initial commitments alone.

  • init_re_final_fields: Data items that are calculated based on the dialectical structure and the resulting re process (including the evolution of the process).

  • init_ensemble_fields: Data items that are calculated based on the dialectical structure and the resulting re process with all its branches (works only if branching is switched on).