Evaluation#

The Evaluation class provides methods for analysing large chunks of data and storing that information in a combined table. It performs most of its operations concurrently and gives a performance advantage on machines with many CPUs. All measures described in Analysis & measurement can be applied to Evaluation objects.

Setting up an Evaluation object#

class taupy.simulation.evaluation.Evaluation(*, debate_stages, list_of_positions=None, clustering_method=None, multiprocessing_settings={})[source]#

A class to collect measurement values for a simulation while storing shared information between evaluation functions (such as clusterings).

Parameters:

debate_stages – An iterator containing the lists of debate stages for each simulation run.
list_of_positions – An iterator containing the lists of belief sytems for each simulation run.
clustering_method – When evaluation functions that rely on position clustering are called, the clustering algorithm specified here will be used. Functions from taupy.analysis.clustering can be selected here, in particular leiden(), affinity_propagation(), and agglomerative_clustering().
multiprocessing_settings (dict) – Settings forwarded to multiprocessing. Should be options that are recognised by concurrent.futures.ProcessPoolExecutor.

Variables:

data – A pandas.DataFrame containing the analysed data.

Viewing results#

All measurement functions from the evaluation module are configured to add columns to a shared pandas.DataFrame stored in Evaluation.data.

e = Evaluation()
# View the DataFrame
e.data
# Since e.data is a pandas DataFrame, all DataFrame operations can be used:
e.data.to_csv("myexport.csv")

An Evaluation.data table is structured like this:

	density	dispersion
0	0.02324	0.29561402
1	0.07451	0.30156791
2	0.08462	0.30196067
3	0.09880	0.30971113

The first two columns indicate the pandas.MultiIndex for the table. The first column corresponds to the simulation number within the experiment, and the second column to the debate stage within the simulation. The remaining columns are inserted by the Evaluation class methods described below.

A minimal example#

Suppose you have run an experiment with iterative argument introductions and want to analyse the density and pairwise dispersion of each debate stage.

# First, create 10 positions with strategy random
my_population = [Position(debate=None, introduction_strategy=strategies.random) for _ in range(10)]

# Run 4 simulations in an experiment:
my_experiments = experiment(
    n=4,
    simulations={"positions": my_population, "sentencepool": "p:10", "argumentlength": [2,3]},
    runs={"max_density": 0.8, "max_steps": 200}
    )

# Create an Evaluation object
e = Evaluation(
    debate_stages=my_experiments,
    list_of_positions=[e.positions for e in my_experiments]
    )
# Add a density column to the data
e.densities()
# Add a column with pairwise dispersion measurements to the data
e.dispersions()

The resulting e.data table is intended for further data analysis, such as statistics or plotting. These operations will be performed outside of taupy, in modules such as numpy or seaborn.

Adding data to an Evaluation object#

Shortcut functions#

These functions are shortcuts to the functions explained in more detail below.

Evaluation.densities()[source]#: A shortcut function to directly add the densities to the evaluation DataFrame.

Evaluation.dispersions(*, configuration={})[source]#: A shortcut function to directly add pairwise dispersion measurements to the evaluation DataFrame.

Evaluation.agreement_means(*, configuration={})[source]#: A shortcut to directly add the mean population-wide agreement to the evaluation DataFrame.

Measures that only analyse debate stages#

Evaluation.debate_stage_analysis(function)[source]#

A generic evaluation method to analyse, in multiprocessing, only debate stages without taking further data into account. From this module, functions that can be passed to function are:

densities_of_debate_stages()
sccp_extension()
progress()

taupy.simulation.evaluation.densities_of_debate_stages(debate_stages)[source]#

taupy.simulation.evaluation.sccp_extension(debate_stages)[source]#

taupy.simulation.evaluation.progress(debate_stages)[source]#

Measures that only analyse positions#

Evaluation.position_analysis(*, function, configuration={})[source]#

A generic method to evaluate functions that work on positions, with multiprocessing. Examples are (see the shortcut functions as well):

dispersions_between_positions()
mean_agreement_between_positions()

taupy.simulation.evaluation.dispersions_between_positions(positions, *, measure=<function normalised_hamming_distance>)[source]#

taupy.simulation.evaluation.mean_agreement_between_positions(positions, *, measure=<function bna>)[source]#

Measures that rely on clustering#

Evaluation.generate_clusters(*, clustering_settings={})[source]#: Apply the clustering algorithm selected in Evaluation.clustering_method to the stored debate stages and positions. The clusters are saved in the Evaluation.clusters list and can be accessed by functions that work on clusterings.

Evaluation.group_divergence(*, measure=<function normalised_hamming_distance>)[source]#

Calculate the group divergence between all positions stored in the Evaluation object and add a column to the data object. Raises an error if no clustering has been generated.

See taupy.analysis.polarisation.group_divergence() for details.

Evaluation.group_consensus(*, measure=<function normalised_hamming_distance>)[source]#

Calculate the group consensus between all positions stored in the Evaluation object and add a column to the data object. Raises an error if no clustering has been generated.

See taupy.analysis.polarisation.group_consensus() for details.

Evaluation.clusters_analysis(*, function, column_name='NAME', configuration={})[source]#

Generic multi-process function to apply a measure that works on the cluster structure of a simulation.

Parameters:

function –
A function to be applied in multiprocessing. Here is a list of examples from different taupy submodules that work with this function:
- number_of_groups
- group_size_parity
- coverage_of_clustering
- Shannon_index
- normalised_Shannon_index
- Simpson_index
- inverse_Simpson_index
- Gini_Simpson_index
Note that group_divergence() and group_consensus() are calculated with dedicated methods. This is because both functions rely on additional information not present in the clustering alone.
column_name (str) – Title of the column that is added to the Evaluations data table. Should be indicative of the measure that was applied.