Skip to content
GitLab
Menu
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Mateusz Pawlik
ted-datasets
Commits
8e732b9b
Commit
8e732b9b
authored
Oct 22, 2018
by
Mateusz Pawlik
Browse files
Tidied up the main README.
parent
739a838e
Changes
1
Hide whitespace changes
Inline
Side-by-side
README.md
View file @
8e732b9b
#
T
ree
E
dit
D
istance
similarity join - datasets scrip
ts
#
Datasets for t
ree
e
dit
d
istance
experimen
ts
This repository contains all resources to
download and process the datasets for
This repository contains all resources to
acquire datasets for experimenting
tree
similarity join experiment
s.
on
tree
edit distance algorithm
s.
**We do not store the datasets**
, only the scripts to obtain and prepare them.
**We do not store the datasets**
, only the scripts to obtain and prepare them.
## Datasets description
## Datasets description
Currently we support the following datasest:
Currently we support the following datasest:
-
**bozen**
- Bozen streets
-
**Bolzano**
- Residential addresses in the city of Bolzano.
-
**dblp**
- DBLP
-
**DBLP**
- Bibliographic XML data.
-
**Python**
- Abstract syntax trees of Python source code in JSON.
The details about each dataset can be found in the corresponding README files.
-
**Sentiment**
- Semantic trees of movie reviews in the PennTreeBank format.
-
**Swissprot**
- Protein sequence data in XML.
The details about each dataset can be found in the README files in the
datasets subdirectories.
## Repository organisation
## Repository organisation
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment