Commit 8e732b9b authored by Mateusz Pawlik's avatar Mateusz Pawlik
Browse files

Tidied up the main README.

parent 739a838e
Loading
Loading
Loading
Loading
+11 −7
Original line number Diff line number Diff line
# Tree Edit Distance similarity join - datasets scripts
# Datasets for tree edit distance experiments

This repository contains all resources to download and process the datasets for
tree similarity join experiments.
This repository contains all resources to acquire datasets for experimenting
on tree edit distance algorithms.

**We do not store the datasets**, only the scripts to obtain and prepare them.

## Datasets description

Currently we support the following datasest:
- **bozen** - Bozen streets
- **dblp** - DBLP

The details about each dataset can be found in the corresponding README files.
- **Bolzano** - Residential addresses in the city of Bolzano.
- **DBLP** - Bibliographic XML data.
- **Python** - Abstract syntax trees of Python source code in JSON.
- **Sentiment** - Semantic trees of movie reviews in the PennTreeBank format.
- **Swissprot** - Protein sequence data in XML.

The details about each dataset can be found in the README files in the
datasets subdirectories.

## Repository organisation