Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
What's new
10
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Open sidebar
Mateusz Pawlik
ted-datasets
Commits
c4240f87
Commit
c4240f87
authored
May 02, 2018
by
Thomas Huetter
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
dblp/README.md: alternative dblp_no_www command
parent
0bd99fec
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
6 additions
and
1 deletion
+6
-1
dblp/README.md
dblp/README.md
+5
-0
xmark/xmark_to_bracket.py
xmark/xmark_to_bracket.py
+1
-1
No files found.
dblp/README.md
View file @
c4240f87
...
...
@@ -51,6 +51,11 @@ Execute to remove the homepage entries.
```
bash
sed
'/{www{key{/d'
dblp_sorted.bracket
>
dblp_no_www_sorted.bracket
```
or
```
bash
awk
'!/{www{key{/'
dblp_sorted.bracket
>
dblp_no_www_sorted.bracket
```
**(Optional)**
Execute to delete all downloaded files. It leaves only the output dataset files.
```
bash
...
...
xmark/xmark_to_bracket.py
View file @
c4240f87
...
...
@@ -46,7 +46,7 @@ tree_id = 0
for
child
in
root
:
tree_id
+=
1
# Printing simple progress.
if
tree_id
%
1000
0
==
0
:
if
tree_id
%
1000
==
0
:
print
(
"- Tree %s"
%
(
tree_id
))
handler
=
XMarkContentHandler
()
lxml
.
sax
.
saxify
(
child
,
handler
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment