Evaluation#
The Evaluation
class
provides methods for analysing large chunks of data and storing that
information in a combined table. It performs most of its operations
concurrently and gives a performance advantage on machines with many CPUs.
All measures described in Analysis & measurement can be applied to
Evaluation
objects.
Setting up an Evaluation object#
- class taupy.simulation.evaluation.Evaluation(*, debate_stages, list_of_positions=None, clustering_method=None, multiprocessing_settings={})[source]#
A class to collect measurement values for a simulation while storing shared information between evaluation functions (such as clusterings).
- Parameters:
debate_stages – An iterator containing the lists of debate stages for each simulation run.
list_of_positions – An iterator containing the lists of belief sytems for each simulation run.
clustering_method – When evaluation functions that rely on position clustering are called, the clustering algorithm specified here will be used. Functions from
taupy.analysis.clustering
can be selected here, in particularleiden()
,affinity_propagation()
, andagglomerative_clustering()
.multiprocessing_settings (dict) – Settings forwarded to multiprocessing. Should be options that are recognised by
concurrent.futures.ProcessPoolExecutor
.
- Variables:
data – A
pandas.DataFrame
containing the analysed data.
Viewing results#
All measurement functions from the evaluation module are configured to add
columns to a shared pandas.DataFrame
stored in
Evaluation.data
.
e = Evaluation()
# View the DataFrame
e.data
# Since e.data is a pandas DataFrame, all DataFrame operations can be used:
e.data.to_csv("myexport.csv")
An Evaluation.data
table is structured like this:
density |
dispersion |
||
---|---|---|---|
0 |
0 |
0.02324 |
0.29561402 |
0 |
1 |
0.07451 |
0.30156791 |
0 |
2 |
0.08462 |
0.30196067 |
0 |
3 |
0.09880 |
0.30971113 |
The first two columns indicate the pandas.MultiIndex
for the table.
The first column corresponds to the simulation number within the experiment, and
the second column to the debate stage within the simulation. The remaining
columns are inserted by the Evaluation
class methods described
below.
A minimal example#
Suppose you have run an experiment with iterative argument introductions and want to analyse the density and pairwise dispersion of each debate stage.
# First, create 10 positions with strategy random
my_population = [Position(debate=None, introduction_strategy=strategies.random) for _ in range(10)]
# Run 4 simulations in an experiment:
my_experiments = experiment(
n=4,
simulations={"positions": my_population, "sentencepool": "p:10", "argumentlength": [2,3]},
runs={"max_density": 0.8, "max_steps": 200}
)
# Create an Evaluation object
e = Evaluation(
debate_stages=my_experiments,
list_of_positions=[e.positions for e in my_experiments]
)
# Add a density column to the data
e.densities()
# Add a column with pairwise dispersion measurements to the data
e.dispersions()
The resulting e.data
table is intended for further data analysis, such
as statistics or plotting. These operations will be performed outside of
taupy
, in modules such as numpy
or seaborn
.
Adding data to an Evaluation object#
Shortcut functions#
These functions are shortcuts to the functions explained in more detail below.
- Evaluation.densities()[source]#
A shortcut function to directly add the densities to the evaluation DataFrame.
Measures that only analyse debate stages#
Measures that only analyse positions#
- Evaluation.position_analysis(*, function, configuration={})[source]#
A generic method to evaluate functions that work on positions, with multiprocessing. Examples are (see the shortcut functions as well):
Measures that rely on clustering#
- Evaluation.generate_clusters(*, clustering_settings={})[source]#
Apply the clustering algorithm selected in
Evaluation.clustering_method
to the stored debate stages and positions. The clusters are saved in theEvaluation.clusters
list and can be accessed by functions that work on clusterings.
- Evaluation.group_divergence(*, measure=<function normalised_hamming_distance>)[source]#
Calculate the group divergence between all positions stored in the
Evaluation
object and add a column to the data object. Raises an error if no clustering has been generated.See
taupy.analysis.polarisation.group_divergence()
for details.
- Evaluation.group_consensus(*, measure=<function normalised_hamming_distance>)[source]#
Calculate the group consensus between all positions stored in the
Evaluation
object and add a column to the data object. Raises an error if no clustering has been generated.See
taupy.analysis.polarisation.group_consensus()
for details.
- Evaluation.clusters_analysis(*, function, column_name='NAME', configuration={})[source]#
Generic multi-process function to apply a measure that works on the cluster structure of a simulation.
- Parameters:
function –
A function to be applied in multiprocessing. Here is a list of examples from different taupy submodules that work with this function:
Note that
group_divergence()
andgroup_consensus()
are calculated with dedicated methods. This is because both functions rely on additional information not present in the clustering alone.column_name (str) – Title of the column that is added to the Evaluations
data
table. Should be indicative of the measure that was applied.