Commit be9379a5 authored by Mateusz Pawlik's avatar Mateusz Pawlik
Browse files

Iteration over README files. Tried to unify.

parent 0412c8e9
Loading
Loading
Loading
Loading
+3 −0
Original line number Diff line number Diff line
@@ -19,6 +19,9 @@ datasets subdirectories.

## Statistics

The `statistics` subdirectory contains scripts to summirize the data. The table
below shows the statistics of our datasets.

Dataset   | Number of trees | Avg. tree size | Min tree size | Max tree size | Number of distinct labels
----------|-----------------|----------------|---------------|---------------|-------------------------- 
Bolzano   | 299             | 166            | 2             | 2105          | 592
+28 −0
Original line number Diff line number Diff line
# Bolzano

Residential addresses in the city of Bolzano. A tree represents all addresses
of a street (root: street name, root-leaf path: address).

## Source

https://dbresearch.uni-salzburg.at/projects/pq-gram-ordered-labeled-trees/description.php

## Conversion details

The original trees are in the bracket notation. We only remove the tree
identifiers.

## Dependencies

- **wget**
- **unzip**
- **iconf**
- **sed**
- **awk**

## Steps

Execute the following to download and prepare the dataset.
```bash
./download_prepare.sh
```
+9 −3
Original line number Diff line number Diff line
# DBLP

## Short description

Computer science bibliography. Can be downloaded in a large XML file.

## Source

https://dblp2.uni-trier.de/faq/How+can+I+download+the+whole+dblp+dataset.html

## Conversion details

We include all entries except “www” elements, which are small
subtrees that all match for threshold ≥ 3, blowing up the join result by
billions of pair.

- Each tag is a node with tag's name as a label.
- Tag nesting is converted to parent-child relationships.
- A tag's *content* is tag's child node with *content* as label.
@@ -64,5 +70,5 @@ awk '!/{www{key{homepages/' dblp_sorted.bracket > dblp_no_www_sorted.bracket

## Troubleshooting

- Encoding
### Encoding
In case of encoding error follow the steps on this webpage: [https://www.thomas-krenn.com/de/wiki/Perl_warning_Setting_locale_failed_unter_Debian](https://www.thomas-krenn.com/de/wiki/Perl_warning_Setting_locale_failed_unter_Debian).
+6 −3
Original line number Diff line number Diff line
# Python Abstract Syntax Trees

## Short description
Abstract Syntax trees of 150k Python collected and curated by SRI Group of ETH
Zuerich.

Abstract Syntax trees of 150k Python programs found on Github. Data provided by SRI Group of ETH Zuerich.
## Source

https://www.sri.inf.ethz.ch/py150

## Conversion details

@@ -13,7 +16,7 @@ Abstract Syntax trees of 150k Python programs found on Github. Data provided by

- **Python3**
  https://www.python.org/downloads/
- **argparse**
- **argparse** module of Python3
- **wget**
- **wget**
- **tar**
+4 −2
Original line number Diff line number Diff line
# Sentiment

## Short description
Syntax trees of movie ratings from the Stanford NLP Group.

Syntax trees of movie ratings.
## Source

https://nlp.stanford.edu/sentiment/

## Conversion details

Loading