README.md 819 Bytes
Newer Older
Mateusz Pawlik's avatar
Mateusz Pawlik committed
1 2
# Tree Edit Distance similarity join - datasets scripts

3 4 5 6 7
This repository contains all resources to download and process the datasets for
tree similarity join experiments.

**We do not store the datasets**, only the scripts to obtain and prepare them.

Mateusz Pawlik's avatar
Mateusz Pawlik committed
8 9 10 11 12 13
## Datasets description

Currently we support the following datasest:
- **bozen** - Bozen streets
- **dblp** - DBLP

14 15
The details about each dataset can be found in the corresponding README files.

Mateusz Pawlik's avatar
Mateusz Pawlik committed
16 17 18
## Repository organisation

Each dataset and its corresponding scripts belong to a separate directory with a name identifying the dataset.
19 20 21 22 23 24 25

## Expected output

Each output dataset must satisfy the following requirements:
- The output dataset must be a single text file with one tree per line.
- The trees must be in bracket notation.
- The trees must be sorted by size.