Commit 0bd99fec authored by Mateusz Pawlik's avatar Mateusz Pawlik
Browse files

dblp: Added command for removing homepage entries.

parent 857a4f43
......@@ -37,14 +37,19 @@ Execute to convert the raw data file into bracket notation. **Takes around 10 mi
Sample the dataset. **We perform a join on a subset only.**
**Execute to get a subset only.**
python 100000 dblp.bracket > dblp_random_100k.bracket
Execute to sort the dataset by tree size.
./ dblp_random_100k.bracket
./ dblp.bracket
Execute to remove the homepage entries.
sed '/{www{key{/d' dblp_sorted.bracket > dblp_no_www_sorted.bracket
**(Optional)** Execute to delete all downloaded files. It leaves only the output dataset files.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment