Skip to content
GitLab
Menu
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Mateusz Pawlik
ted-datasets
Commits
0412c8e9
Commit
0412c8e9
authored
Oct 22, 2018
by
Mateusz Pawlik
Browse files
Added statistics.
parent
8e732b9b
Changes
1
Hide whitespace changes
Inline
Side-by-side
README.md
View file @
0412c8e9
...
...
@@ -17,6 +17,16 @@ Currently we support the following datasest:
The details about each dataset can be found in the README files in the
datasets subdirectories.
## Statistics
Dataset | Number of trees | Avg. tree size | Min tree size | Max tree size | Number of distinct labels
----------|-----------------|----------------|---------------|---------------|--------------------------
Bolzano | 299 | 166 | 2 | 2105 | 592
DBLP | 3934134 | 25 | 8 | 2986 | 14664605
Python | 150000 | 946 | 1 | 46481 | 3523697
Sentiment | 9645 | 37 | 3 | 103 | 19470
Swissprot | 556196 | 862 | 101 | 48286 | 11439467
## Repository organisation
Each dataset and its corresponding scripts belong to a separate directory with a name identifying the dataset.
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment