Skip to content
README.md 819 B
Newer Older
Mateusz Pawlik's avatar
Mateusz Pawlik committed
# Tree Edit Distance similarity join - datasets scripts

This repository contains all resources to download and process the datasets for
tree similarity join experiments.

**We do not store the datasets**, only the scripts to obtain and prepare them.

Mateusz Pawlik's avatar
Mateusz Pawlik committed
## Datasets description

Currently we support the following datasest:
- **bozen** - Bozen streets
- **dblp** - DBLP

The details about each dataset can be found in the corresponding README files.

Mateusz Pawlik's avatar
Mateusz Pawlik committed
## Repository organisation

Each dataset and its corresponding scripts belong to a separate directory with a name identifying the dataset.

## Expected output

Each output dataset must satisfy the following requirements:
- The output dataset must be a single text file with one tree per line.
- The trees must be in bracket notation.
- The trees must be sorted by size.