Commit c4240f87 authored by Thomas Huetter's avatar Thomas Huetter

dblp/README.md: alternative dblp_no_www command

parent 0bd99fec
...@@ -51,6 +51,11 @@ Execute to remove the homepage entries. ...@@ -51,6 +51,11 @@ Execute to remove the homepage entries.
```bash ```bash
sed '/{www{key{/d' dblp_sorted.bracket > dblp_no_www_sorted.bracket sed '/{www{key{/d' dblp_sorted.bracket > dblp_no_www_sorted.bracket
``` ```
or
```bash
awk '!/{www{key{/' dblp_sorted.bracket > dblp_no_www_sorted.bracket
```
**(Optional)** Execute to delete all downloaded files. It leaves only the output dataset files. **(Optional)** Execute to delete all downloaded files. It leaves only the output dataset files.
```bash ```bash
......
...@@ -46,7 +46,7 @@ tree_id = 0 ...@@ -46,7 +46,7 @@ tree_id = 0
for child in root: for child in root:
tree_id += 1 tree_id += 1
# Printing simple progress. # Printing simple progress.
if tree_id % 10000 == 0: if tree_id % 1000 == 0:
print("- Tree %s" % (tree_id)) print("- Tree %s" % (tree_id))
handler = XMarkContentHandler() handler = XMarkContentHandler()
lxml.sax.saxify(child, handler) lxml.sax.saxify(child, handler)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment