Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
T
ted-datasets
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Service Desk
Milestones
Merge Requests
0
Merge Requests
0
Operations
Operations
Incidents
Analytics
Analytics
Repository
Value Stream
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Mateusz Pawlik
ted-datasets
Commits
0bd99fec
Commit
0bd99fec
authored
May 02, 2018
by
Mateusz Pawlik
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
dblp: Added command for removing homepage entries.
parent
857a4f43
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
7 additions
and
2 deletions
+7
-2
dblp/README.md
dblp/README.md
+7
-2
No files found.
dblp/README.md
View file @
0bd99fec
...
@@ -37,14 +37,19 @@ Execute to convert the raw data file into bracket notation. **Takes around 10 mi
...
@@ -37,14 +37,19 @@ Execute to convert the raw data file into bracket notation. **Takes around 10 mi
python dblp_to_bracket.py
python dblp_to_bracket.py
```
```
Sample the dataset.
**We perform a join on
a subset only.**
**Execute to get
a subset only.**
```
bash
```
bash
python random_lines.py 100000 dblp.bracket
>
dblp_random_100k.bracket
python random_lines.py 100000 dblp.bracket
>
dblp_random_100k.bracket
```
```
Execute to sort the dataset by tree size.
Execute to sort the dataset by tree size.
```
bash
```
bash
./sort_dataset.sh dblp_random_100k.bracket
./sort_dataset.sh dblp.bracket
```
Execute to remove the homepage entries.
```
bash
sed
'/{www{key{/d'
dblp_sorted.bracket
>
dblp_no_www_sorted.bracket
```
```
**(Optional)**
Execute to delete all downloaded files. It leaves only the output dataset files.
**(Optional)**
Execute to delete all downloaded files. It leaves only the output dataset files.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment