Commit c43603ce authored by Mateusz Pawlik's avatar Mateusz Pawlik
Browse files

README.md: Added requirements of the expected output.

parent 125a2873
Loading
Loading
Loading
Loading
+14 −0
Original line number Diff line number Diff line
# Tree Edit Distance similarity join - datasets scripts

This repository contains all resources to download and process the datasets for
tree similarity join experiments.

**We do not store the datasets**, only the scripts to obtain and prepare them.

## Datasets description

Currently we support the following datasest:
- **bozen** - Bozen streets
- **dblp** - DBLP

The details about each dataset can be found in the corresponding README files.

## Repository organisation

Each dataset and its corresponding scripts belong to a separate directory with a name identifying the dataset.

## Expected output

Each output dataset must satisfy the following requirements:
- The output dataset must be a single text file with one tree per line.
- The trees must be in bracket notation.
- The trees must be sorted by size.