README.md 3.61 KB
Newer Older
1
# Tree Edit Distance Experiments
2

3
4
5
6
7
8
9
Currently the experiments framework contains stand-alone tree edit distance
and tree similarity join algorithms.

Follow the instructions below to reproduce the environment and the experiments.

## ICDE 2019 Reproducibility

Mateusz Pawlik's avatar
Mateusz Pawlik committed
10
11
12
13
This repository contains experiments of our ICDE 2019 paper
[Effective Filters and Linear Time Verification for Tree Similarity Joins](http://eplus.uni-salzburg.at/obvusboa/download/pdf/4486886).

To reproduce the experiments of the ICDE 2019 paper, checkout the tag
Mateusz Pawlik's avatar
Mateusz Pawlik committed
14
15
16
`icde2019` of this and
[Tree Similarity library](https://github.com/DatabaseGroup/tree-similarity/tree/develop)
repositories.
17
18

Obtain datasets from our
Mateusz Pawlik's avatar
Mateusz Pawlik committed
19
[Datasets repository](https://frosch.cosy.sbg.ac.at/mpawlik/ted-datasets).
20
21

Execute the experiments with all config files in `configs/icde2019` directory.
Mateusz Pawlik's avatar
Mateusz Pawlik committed
22
23
24
25
26
27
See execution details below. You may need to modify `--dataset_path` parameter
value when executing the experiments.

For LGM Upper Bound and BSM verification experiment, certain views must present
in the databse. After executing all experiments, execute `src/ted_algs/view_queries.sql`
on the database holding the experiment results.
28

Mateusz Pawlik's avatar
Mateusz Pawlik committed
29
30
Plot the results using `src/plots/create_all_plots.sh` file from `src/plots/`
directory.
31
32
33
34
35

## Build the project

After cloning the repository, clone the external libraries to `external`
subdirectory.
36
37
38
39

```bash
mkdir external
cd external
40
41
42
43
```
Clone the Timing library for runtime measurements.

```bash
44
git clone git@frosch.cosy.sbg.ac.at:wmann/common-code.git
45
46
47
48
49
50
51
```

Clone the Tree Similarity library with the algorithms (the `develop` branch
is currently the most recent).

```bash
git clone --branch develop https://github.com/DatabaseGroup/tree-similarity.git
52
53
54
55
56
57
58
59
60
```

Then execute the following from the project's root directory.
```bash
mkdir build
cd build
cmake ..
make
```
Thomas Huetter's avatar
Thomas Huetter committed
61

62
63
64
## Prepare a PostgreSQL database for storing the results

Install [PostgreSQL](https://www.postgresql.org/).
65

66
67
68
69
70
71
72
Create a database using the SQL file ``db/create_db.sql``.

Create a service file ``~/.pg_service.conf`` on the machine where you execute
the experiments. The service file holds the connection details to the database
where the results will be stored. An example service file looks as follows.

```
73
[ted-exp]
74
75
76
77
78
79
80
81
82
83
84
85
host=mydb.sbg.ac.at
port=5432
user=ted
password=letmethrough
dbname=ted_experiments
```

Executing experiments requires dataset details to be present in the `dataset`
table. Visit our
[Datasets repository](https://frosch.cosy.sbg.ac.at/mpawlik/ted-datasets)
to learn how we obtain datasets. Use the `--service service` option of the
`statistics/statistics.py` script to register a dataset in the `dataset` table.
Thomas Huetter's avatar
Thomas Huetter committed
86
87
88
89


## Executing

90
91
92
93
94
95
96
97
98
99
100
101
We use [Python3](https://www.python.org/) to execute the experiments.

### TED Join

The script `src/join_algs/join_algs_experiments.py` executes tree similarity
join experiments.

It uses a config JSON file to specify the experiment parameters. Example config
files can be found in `configs/icde2019` directory.

Example experiment execution can be performed as follows.

Thomas Huetter's avatar
Thomas Huetter committed
102
103
104
```bash
python3 src/join_algs/join_algs_experiments.py --config configs/icde2019/bolzano.json --dataset_path /path_to/ted-datasets/ --service service
```
105
106
107
108
109
110
111
112
113
114
115
116
117
118

### TED Algorithms

The script `src/ted_algs/ted_algs_experiments.py` executes tree similarity
join experiments.

It uses a config JSON file to specify the experiment parameters. Example config
files can be found in `configs/icde2019/upperbound` directory.

Example experiment execution can be performed as follows.

```bash
python3 src/join_algs/ted_algs_experiments.py --config configs/icde2019/upperbound/sentiment.json --dataset_path /path_to/ted-datasets/ --service service
```