Newer
Older
# Tree Edit Distance similarity join - datasets scripts
This repository contains all resources to download and process the datasets for
tree similarity join experiments.
**We do not store the datasets**, only the scripts to obtain and prepare them.
## Datasets description
Currently we support the following datasest:
- **bozen** - Bozen streets
- **dblp** - DBLP
The details about each dataset can be found in the corresponding README files.
## Repository organisation
Each dataset and its corresponding scripts belong to a separate directory with a name identifying the dataset.
## Expected output
Each output dataset must satisfy the following requirements:
- The output dataset must be a single text file with one tree per line.
- The trees must be in bracket notation.
- The trees must be sorted by size.