Commit f5ccdceb authored by Mateusz Pawlik's avatar Mateusz Pawlik

Documenting DBLP: RAM and runtime estimates.

parent 791e1282
......@@ -28,51 +28,27 @@ billions of pair.
https://www.python.org/downloads/
- **wget**
- **gzip**
- **awk**
## Steps
**For repeatability, it downloads a specific version of the data (hardcoded in ``download.sh`` and ``dblp-to-bracket.py``).**
Execute to download all necessary files.
```bash
./download.sh
```
## RAM requirements
Execute to convert the raw data file into bracket notation. **Takes around 10 minutes on i5 laptop machine.**
```bash
python dblp_to_bracket.py
```
The current way of processing DBLP dataset requires **16GB** of RAM memory.
**Execute to get a subset only.**
```bash
python random_lines.py 100000 dblp.bracket > dblp_random_100k.bracket
```
## Steps
Execute to sort the dataset by tree size.
```bash
./sort_dataset.sh dblp.bracket
```
**For repeatability, it downloads a specific version of the data (hardcoded in
``download.sh`` and ``dblp-to-bracket.py``).**
Execute to remove the homepage entries.
Execute the following to download and prepare the dataset.
```bash
sed '/{www{key{homepages/d' dblp_sorted.bracket > dblp_no_www_sorted.bracket
```
or
```bash
awk '!/{www{key{homepages/' dblp_sorted.bracket > dblp_no_www_sorted.bracket
./download_prepare.sh
```
## Estimated time
**(Optional)** Execute to delete all downloaded files. It leaves only the output dataset files.
```bash
./tidy-up.sh
```
On an Intel Xeon 2.40GHz CPU, it takes around **15min**.
## Troubleshooting
### Encoding
In case of encoding error follow the steps on this webpage: [https://www.thomas-krenn.com/de/wiki/Perl_warning_Setting_locale_failed_unter_Debian](https://www.thomas-krenn.com/de/wiki/Perl_warning_Setting_locale_failed_unter_Debian).
## Estimated time
Partially listed in Steps. Total time to be measured.
\ No newline at end of file
In case of encoding error follow the steps on this webpage: [https://www.thomas-krenn.com/de/wiki/Perl_warning_Setting_locale_failed_unter_Debian](https://www.thomas-krenn.com/de/wiki/Perl_warning_Setting_locale_failed_unter_Debian).
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment