Commit 56d3bd76 authored by Mateusz Pawlik's avatar Mateusz Pawlik

Finalised upper bound part of experiments.

parents 47c77cf4 ab5444b2
# Tree Edit Distance similarity join - experiments
# Tree Edit Distance Experiments
## Building
Currently the experiments framework contains stand-alone tree edit distance
and tree similarity join algorithms.
Follow the instructions below to reproduce the environment and the experiments.
## ICDE 2019 Reproducibility
For reproducing the experiments of the ICDE 2019 submission, checkout the tag
`icde2019` of this and Tree Similarity library repositories.
Obtain datasets from our
[Datasets repository](https://frosch.cosy.sbg.ac.at/mpawlik/ted-datasets)
Execute the experiments with all config files in `configs/icde2019` directory.
Plot the results using `src/plots/call_plot.sh` file.
## Build the project
After cloning the repository, clone the external libraries to `external`
subdirectory.
First clone the external libraries. Execute the following from the project's root directory.
```bash
mkdir external
cd external
# Timing library.
```
Clone the Timing library for runtime measurements.
```bash
git clone git@frosch.cosy.sbg.ac.at:wmann/common-code.git
# Tree Similarity library.
git clone -b develop https://github.com/DatabaseGroup/tree-similarity.git
```
Clone the Tree Similarity library with the algorithms (the `develop` branch
is currently the most recent).
```bash
git clone --branch develop https://github.com/DatabaseGroup/tree-similarity.git
```
Then execute the following from the project's root directory.
......@@ -20,16 +47,60 @@ cmake ..
make
```
## PostgreSQL
## Prepare a PostgreSQL database for storing the results
Install [PostgreSQL](https://www.postgresql.org/).
Create a database using the SQL file ``./db/create_db.sql``.
Create a database using the SQL file ``db/create_db.sql``.
Create a service file ``~/.pg_service.conf`` on the machine where you execute
the experiments. The service file holds the connection details to the database
where the results will be stored. An example service file looks as follows.
```
[ted-experiments]
host=mydb.sbg.ac.at
port=5432
user=ted
password=letmethrough
dbname=ted_experiments
```
Executing experiments requires dataset details to be present in the `dataset`
table. Visit our
[Datasets repository](https://frosch.cosy.sbg.ac.at/mpawlik/ted-datasets)
to learn how we obtain datasets. Use the `--service service` option of the
`statistics/statistics.py` script to register a dataset in the `dataset` table.
Further, create ``~/.pg_service.conf`` on the machine where you execute the experiments.
Insert a service for the database used to store experimental results.
## Executing
Use Python3 to run the experiments script by executing the following command from the root directory of the repository.
We use [Python3](https://www.python.org/) to execute the experiments.
### TED Join
The script `src/join_algs/join_algs_experiments.py` executes tree similarity
join experiments.
It uses a config JSON file to specify the experiment parameters. Example config
files can be found in `configs/icde2019` directory.
Example experiment execution can be performed as follows.
```bash
python3 src/join_algs/join_algs_experiments.py --config configs/icde2019/bolzano.json --dataset_path /path_to/ted-datasets/ --service service
```
### TED Algorithms
The script `src/ted_algs/ted_algs_experiments.py` executes tree similarity
join experiments.
It uses a config JSON file to specify the experiment parameters. Example config
files can be found in `configs/icde2019/upperbound` directory.
Example experiment execution can be performed as follows.
```bash
python3 src/join_algs/ted_algs_experiments.py --config configs/icde2019/upperbound/sentiment.json --dataset_path /path_to/ted-datasets/ --service service
```
\ No newline at end of file
{
"datasets": [
"dblp/dblp_no_www_sorted.bracket"
],
"thresholds": [
10.0
],
"algorithms": [
"--apted", "--tzd", "--lg"
]
}
\ No newline at end of file
{
"datasets": [
"sentiment/sentiment_sorted.bracket"
],
"thresholds": [
5.0,10.0,15.0,20.0
],
"algorithms": [
"--apted", "--tzd", "--lg"
]
}
\ No newline at end of file
### Upperbound
python3 plot_experiments.py --config configs/upperbound/dblp_error.json --storeplot "./plots/dblp_error.pdf"
python3 plot_experiments.py --config configs/upperbound/dblp_runtime_k.json --storeplot "./plots/dblp_runtime_k_9305acf1.pdf"
python3 plot_experiments.py --config configs/upperbound/dblp_runtime.json --storeplot "./plots/dblp_runtime_9305acf1.pdf"
python3 plot_experiments.py --config configs/upperbound/dblp_ted.json --storeplot "./plots/dblp_ted.pdf"
python3 plot_experiments.py --config configs/upperbound/python_runtime_k.json --storeplot "./plots/python_runtime_k_9305acf1.pdf"
python3 plot_experiments.py --config configs/upperbound/python_runtime.json --storeplot "./plots/python_runtime_9305acf1.pdf"
python3 plot_experiments.py --config configs/upperbound/sentiment_error.json --storeplot "./plots/sentiment_error.pdf"
python3 plot_experiments.py --config configs/upperbound/sentiment_runtime_k.json --storeplot "./plots/sentiment_runtime_k_9305acf1.pdf"
python3 plot_experiments.py --config configs/upperbound/sentiment_runtime.json --storeplot "./plots/sentiment_runtime_9305acf1.pdf"
python3 plot_experiments.py --config configs/upperbound/sentiment_ted.json --storeplot "./plots/sentiment_ted.pdf"
python3 plot_experiments.py --config configs/upperbound/swissprot_runtime_k.json --storeplot "./plots/swissprot_runtime_k_9305acf1.pdf"
python3 plot_experiments.py --config configs/upperbound/swissprot_runtime.json --storeplot "./plots/swissprot_runtime_9305acf1.pdf"
python3 plot_experiments.py --config configs/upperbound/sentiment_runtime.json --storeplot "./plots/sentiment_runtime.pdf" --service ted-join
python3 plot_experiments.py --config configs/upperbound/sentiment_runtime_k.json --storeplot "./plots/sentiment_runtime_k.pdf" --service ted-join
python3 plot_experiments.py --config configs/upperbound/sentiment_error.json --storeplot "./plots/sentiment_error.pdf" --service ted-join
python3 plot_experiments.py --config configs/upperbound/dblp_runtime.json --storeplot "./plots/dblp_runtime.pdf" --service ted-join
python3 plot_experiments.py --config configs/upperbound/dblp_error.json --storeplot "./plots/dblp_error.pdf" --service ted-join
### FPR
python3 plot_experiments.py --config configs/fpr/fpr_bolzano.json --storeplot "./plots/bolzano_fpr.pdf"
......
......@@ -10,7 +10,7 @@
"lines": [" ", " ", " "],
"tables": [
{
"table_name": "dblp_apted_x_avg_pair_tree_size_y_avg_runtime",
"table_name": "dblp_apted_x_avg_pair_tree_size_y_avg_runtime_k10",
"attributes": [
{
"attr_name": "avg(avg_runtime)"
......@@ -28,7 +28,7 @@
"name": "BSM"
},
{
"table_name": "dblp_labelguided_x_avg_pair_tree_size_y_avg_runtime_k10_9305acf",
"table_name": "dblp_labelguided_x_avg_pair_tree_size_y_avg_runtime_k10",
"attributes": [
{
"attr_name": "avg(avg_runtime)"
......
{
"title": "TED Value",
"print_title": "no",
"legend": "upper left",
"legend_font_size": 18,
"grid": "on",
"dataset_name": "DBLP",
"markers": ["+", "x", "."],
"colors": ["limegreen", "chocolate", "hotpink"],
"tables": [
{
"table_name": "dblp_touzetd_x_threshold_y_runtime_sum_9305acf1",
"attributes": [
{
"attr_name": "avg(sum_runtime)"
}
],
"name": "BSM"
},
{
"table_name": "dblp_labelguided_x_threshold_y_runtime_sum_9305acf1",
"attributes": [
{
"attr_name": "avg(sum_runtime)"
}
],
"name": "LGM"
}
],
"x_axis": {
"db_column": "ted_threshold",
"name": "Threshold",
"font_size": 18,
"ticks_font_size": 16
},
"y_axis": {
"name": "Runtime [ms]",
"font_size": 18,
"ticks_font_size": 16
}
}
\ No newline at end of file
{
"title": "TED Value",
"print_title": "no",
"legend": "upper left",
"legend_font_size": 18,
"grid": "on",
"dataset_name": "DBLP",
"markers": ["+", "x", "."],
"colors": ["limegreen", "chocolate", "hotpink"],
"lines": [" ", " ", " "],
"tables": [
{
"table_name": "dblp_apted_x_avg_pair_tree_size_y_ted",
"attributes": [
{
"attr_name": "avg(avg_ted_value)"
}
],
"name": "APTED"
},
{
"table_name": "dblp_labelguided_x_avg_pair_tree_size_y_ted_k10",
"attributes": [
{
"attr_name": "avg(avg_ted_value)"
}
],
"name": "LGM"
}
],
"x_axis": {
"db_column": "avg_pair_tree_size",
"name": "Tree Size",
"font_size": 18,
"ticks_font_size": 16
},
"y_axis": {
"name": "TED Value",
"font_size": 18,
"ticks_font_size": 16
}
}
\ No newline at end of file
{
"title": "Runtime",
"print_title": "no",
"legend": "upper left",
"legend_font_size": 18,
"grid": "on",
"dataset_name": "Python",
"markers": ["x", "."],
"colors": ["chocolate", "hotpink"],
"lines": [" ", " ", " "],
"tables": [
{
"table_name": "python_touzetd_x_avg_pair_tree_size_y_avg_runtime_k10",
"attributes": [
{
"attr_name": "avg(avg_runtime)"
}
],
"name": "BSM"
},
{
"table_name": "python_labelguided_x_avg_pair_tree_size_y_avg_runtime_k10_9305a",
"attributes": [
{
"attr_name": "avg(avg_runtime)"
}
],
"name": "LGM"
}
],
"x_axis": {
"db_column": "avg_pair_tree_size",
"name": "Tree Size",
"font_size": 18,
"ticks_font_size": 16
},
"y_axis": {
"name": "Runtime [ms]",
"scale": "log",
"font_size": 18,
"ticks_font_size": 16
}
}
{
"title": "TED Value",
"print_title": "no",
"legend": "upper left",
"legend_font_size": 18,
"grid": "on",
"dataset_name": "Python",
"markers": ["+", "x", "."],
"colors": ["limegreen", "chocolate", "hotpink"],
"tables": [
{
"table_name": "python_touzetd_x_threshold_y_runtime_sum_9305acf1",
"attributes": [
{
"attr_name": "avg(sum_runtime)"
}
],
"name": "BSM"
},
{
"table_name": "python_labelguided_x_threshold_y_runtime_sum_9305acf1",
"attributes": [
{
"attr_name": "avg(sum_runtime)"
}
],
"name": "LGM"
}
],
"x_axis": {
"db_column": "ted_threshold",
"name": "Threshold",
"font_size": 18,
"ticks_font_size": 16
},
"y_axis": {
"name": "Runtime [ms]",
"font_size": 18,
"ticks_font_size": 16
}
}
\ No newline at end of file
......@@ -10,7 +10,7 @@
"lines": [" ", " ", " "],
"tables": [
{
"table_name": "sentiment_apted_x_avg_pair_tree_size_y_avg_runtime",
"table_name": "sentiment_apted_x_avg_pair_tree_size_y_avg_runtime_k10",
"attributes": [
{
"attr_name": "avg(avg_runtime)"
......@@ -28,7 +28,7 @@
"name": "BSM"
},
{
"table_name": "sentiment_labelguided_x_avg_pair_tree_size_y_avg_runtime_k10_93",
"table_name": "sentiment_labelguided_x_avg_pair_tree_size_y_avg_runtime_k10",
"attributes": [
{
"attr_name": "avg(avg_runtime)"
......
......@@ -9,22 +9,13 @@
"colors": ["limegreen", "chocolate", "hotpink"],
"tables": [
{
"table_name": "sentiment_touzetd_x_threshold_y_runtime_sum_9305acf1",
"table_name": "sentiment_x_threshold_y_runtime_factor",
"attributes": [
{
"attr_name": "avg(sum_runtime/1000)"
"attr_name": "avg(runtime_factor)"
}
],
"name": "BSM"
},
{
"table_name": "sentiment_labelguided_x_threshold_y_runtime_sum_9305acf1",
"attributes": [
{
"attr_name": "avg(sum_runtime/1000)"
}
],
"name": "LGM"
"name": "BSM / LGM"
}
],
"x_axis": {
......@@ -36,7 +27,7 @@
"xmin": 0.0
},
"y_axis": {
"name": "Runtime [s]",
"name": "Runtime factor",
"font_size": 20,
"ticks_font_size": 20
}
......
{
"title": "TED Value",
"print_title": "no",
"legend": "upper left",
"legend_font_size": 18,
"grid": "on",
"dataset_name": "Sentiment",
"markers": ["+", "x", "."],
"colors": ["limegreen", "chocolate", "hotpink"],
"lines": [" ", " ", " "],
"tables": [
{
"table_name": "sentiment_apted_x_avg_pair_tree_size_y_ted",
"attributes": [
{
"attr_name": "avg(avg_ted_value)"
}
],
"name": "APTED"
},
{
"table_name": "sentiment_labelguided_x_avg_pair_tree_size_y_ted_k10",
"attributes": [
{
"attr_name": "avg(avg_ted_value)"
}
],
"name": "LGM"
}
],
"x_axis": {
"db_column": "avg_pair_tree_size",
"name": "Tree Size",
"font_size": 18,
"ticks_font_size": 16
},
"y_axis": {
"name": "TED Value",
"font_size": 18,
"ticks_font_size": 16
}
}
\ No newline at end of file
{
"title": "Runtime",
"print_title": "no",
"legend": "upper left",
"legend_font_size": 18,
"grid": "on",
"dataset_name": "Swissprot",
"markers": ["x", "."],
"colors": ["chocolate", "hotpink"],
"lines": [" ", " ", " "],
"tables": [
{
"table_name": "swissprot_touzetd_x_avg_pair_tree_size_y_avg_runtime_k10",
"attributes": [
{
"attr_name": "avg(avg_runtime)"
}
],
"name": "BSM"
},
{
"table_name": "swissprot_labelguided_x_avg_pair_tree_size_y_avg_runtime_k10_93",
"attributes": [
{
"attr_name": "avg(avg_runtime)"
}
],
"name": "LGM"
}
],
"x_axis": {
"db_column": "avg_pair_tree_size",
"name": "Tree Size",
"font_size": 18,
"ticks_font_size": 16
},
"y_axis": {
"name": "Runtime [ms]",
"scale": "log",
"font_size": 18,
"ticks_font_size": 16
}
}
{
"title": "TED Value",
"print_title": "no",
"legend": "upper left",
"legend_font_size": 18,
"grid": "on",
"dataset_name": "Swissprot",
"markers": ["+", "x", "."],
"colors": ["limegreen", "chocolate", "hotpink"],
"tables": [
{
"table_name": "swissprot_touzetd_x_threshold_y_runtime_sum_9305acf1",
"attributes": [
{
"attr_name": "avg(sum_runtime)"
}
],
"name": "BSM"
},
{
"table_name": "swissprot_labelguided_x_threshold_y_runtime_sum_9305acf1",
"attributes": [
{
"attr_name": "avg(sum_runtime)"
}
],
"name": "LGM"
}
],
"x_axis": {
"db_column": "ted_threshold",
"name": "Threshold",
"font_size": 18,
"ticks_font_size": 16
},
"y_axis": {
"name": "Runtime [ms]",
"font_size": 18,
"ticks_font_size": 16
}
}
\ No newline at end of file
REFRESH MATERIALIZED VIEW sentiment_apted_x_avg_pair_tree_size_y_avg_runtime;
REFRESH MATERIALIZED VIEW sentiment_apted_x_avg_pair_tree_size_y_avg_runtime_k10;
REFRESH MATERIALIZED VIEW sentiment_labelguided_x_avg_pair_tree_size_y_avg_runtime_k10;
REFRESH MATERIALIZED VIEW sentiment_touzetd_x_avg_pair_tree_size_y_avg_runtime_k10;
REFRESH MATERIALIZED VIEW sentiment_labelguided_x_threshold_y_sum_runtime;
REFRESH MATERIALIZED VIEW sentiment_touzetd_x_threshold_y_sum_runtime;
REFRESH MATERIALIZED VIEW sentiment_x_threshold_y_runtime_factor;
REFRESH MATERIALIZED VIEW sentiment_apted_x_avg_pair_tree_size_y_avg_ted_k10;
REFRESH MATERIALIZED VIEW sentiment_labelguided_x_avg_pair_tree_size_y_avg_ted_k10;
REFRESH MATERIALIZED VIEW sentiment_labelguided_x_avg_pair_tree_size_y_ted_error_k10;
REFRESH MATERIALIZED VIEW dblp_apted_x_avg_pair_tree_size_y_avg_runtime;
REFRESH MATERIALIZED VIEW dblp_apted_x_avg_pair_tree_size_y_avg_runtime_k10;
REFRESH MATERIALIZED VIEW dblp_labelguided_x_avg_pair_tree_size_y_avg_runtime_k10;
REFRESH MATERIALIZED VIEW dblp_touzetd_x_avg_pair_tree_size_y_avg_runtime_k10;
REFRESH MATERIALIZED VIEW sentiment_apted_x_pair_id_y_ted;
REFRESH MATERIALIZED VIEW sentiment_labelguided_x_pair_id_y_ted_k10;
REFRESH MATERIALIZED VIEW sentiment_touzetd_x_pair_id_y_ted_k10;
REFRESH MATERIALIZED VIEW dblp_apted_x_pair_id_y_ted;
REFRESH MATERIALIZED VIEW dblp_labelguided_x_pair_id_y_ted_k10;
REFRESH MATERIALIZED VIEW dblp_touzetd_x_pair_id_y_ted_k10;
\ No newline at end of file
REFRESH MATERIALIZED VIEW dblp_apted_x_avg_pair_tree_size_y_avg_ted_k10;
REFRESH MATERIALIZED VIEW dblp_labelguided_x_avg_pair_tree_size_y_avg_ted_k10;
REFRESH MATERIALIZED VIEW dblp_labelguided_x_avg_pair_tree_size_y_ted_error_k10;
\ No newline at end of file
......@@ -352,21 +352,12 @@ int main(int argc, char** argv) {
// --apted APTED apted_ted
// --tz Touzet - basic version touzet_ted
// --tzd Touzet - depth-based pruning touzet_ted_depth_pruning
// --tzs Touzet - keyroot nodes with set touzet_ted_kr_loop
// --tzl Touzet - keyroot nodes with loop touzet_ted_kr_set
// --tzse Touzet - keyroot nodes with set + e_max touzet_ted_kr_loop
// --tzle Touzet - keyroot nodes with loop + e_max touzet_ted_kr_set
// --lg LabelGuided greedy_ub_ted
bool alg_zs_is_set = false;
bool alg_apted_is_set = false;
bool alg_tz_is_set = false;
bool alg_tzd_is_set = false;
bool alg_tzs_is_set = false;
bool alg_tzl_is_set = false;
bool alg_tzse_is_set = false;
bool alg_tzle_is_set = false;
bool alg_lg_is_set = false;
bool alg_lg_depr_is_set = false;
// Output format
bool output_in_json = false;
......@@ -405,24 +396,9 @@ int main(int argc, char** argv) {
} else if (a == "--tzd") {
alg_tzd_is_set = true;
args_start_it += 1;
} else if (a == "--tzs") {
alg_tzs_is_set = true;
args_start_it += 1;
} else if (a == "--tzl") {
alg_tzl_is_set = true;
args_start_it += 1;
} else if (a == "--tzse") {
alg_tzse_is_set = true;
args_start_it += 1;
} else if (a == "--tzle") {
alg_tzle_is_set = true;
args_start_it += 1;
} else if (a == "--lg") {
}else if (a == "--lg") {
alg_lg_is_set = true;
args_start_it += 1;
} else if (a == "--lgdepr") {
alg_lg_depr_is_set = true;
args_start_it += 1;
} else if (a == "--one-by-one") {
// mechanism_to_execute = kOneByOne;
mp = MechanismParams(kOneByOne);
......@@ -513,34 +489,9 @@ int main(int argc, char** argv) {
execute_mechanism<Label, Touzet, &Touzet::touzet_ted_depth_pruning>(
trees_collection, mp, similarity_threshold, lp));
}
if (alg_tzs_is_set) {
experiment.algorithm_executions.emplace_back("TouzetKrLoop",
execute_mechanism<Label, Touzet, &Touzet::touzet_ted_kr_loop_no_e_max>(
trees_collection, mp, similarity_threshold, lp));
}
if (alg_tzl_is_set) {
experiment.algorithm_executions.emplace_back("TouzetKrSet",
execute_mechanism<Label, Touzet, &Touzet::touzet_ted_kr_set_no_e_max>(
trees_collection, mp, similarity_threshold, lp));
}
if (alg_tzse_is_set) {
experiment.algorithm_executions.emplace_back("TouzetKrLoopEmax",
execute_mechanism<Label, Touzet, &Touzet::touzet_ted_kr_loop_e_max>(
trees_collection, mp, similarity_threshold, lp));
}
if (alg_tzle_is_set) {
experiment.algorithm_executions.emplace_back("TouzetKrSetEmax",
execute_mechanism<Label, Touzet, &Touzet::touzet_ted_kr_set_e_max>(
trees_collection, mp, similarity_threshold, lp));
}
if (alg_lg_is_set) {
experiment.algorithm_executions.emplace_back("LabelGuided",
execute_mechanism<Label, LabelGuided, &LabelGuided::verify_bool>(
trees_collection, mp, similarity_threshold, lp));
}
if (alg_lg_depr_is_set) {
experiment.algorithm_executions.emplace_back("LabelGuidedDeprecated",
execute_mechanism<Label, LabelGuided, &LabelGuided::greedy_ub_ted_deprecated>(
execute_mechanism<Label, LabelGuided, &LabelGuided::greedy_ub_ted>(
trees_collection, mp, similarity_threshold, lp));
}
......
......@@ -22,7 +22,7 @@ from datetime import datetime
# TODO: Fix the paths depending on where is the script executed from.
# Everything is executed from the project's root.
binary_name = "build/ted-algs-experiments"
algorithms_repository_path = "external/tree-similarity-private/"
algorithms_repository_path = "external/tree-similarity/"
# execute a command and return stdout
def get_stdout_cmd(callargs):
......@@ -119,6 +119,11 @@ parser.add_argument(
dest='config_filename',
help="Path to experiments config file."
)
parser.add_argument(
type=str,
dest='dataset_path',
help="Path to the root directory of the datasets (with trailing slash)."
)
parser.add_argument(
type=str,
dest='service_name',
......@@ -130,6 +135,12 @@ parser.add_argument(