Commit c4240f87 authored by Thomas Huetter's avatar Thomas Huetter
Browse files

dblp/README.md: alternative dblp_no_www command

parent 0bd99fec
Loading
Loading
Loading
Loading
+5 −0
Original line number Original line Diff line number Diff line
@@ -51,6 +51,11 @@ Execute to remove the homepage entries.
```bash
```bash
sed '/{www{key{/d' dblp_sorted.bracket > dblp_no_www_sorted.bracket
sed '/{www{key{/d' dblp_sorted.bracket > dblp_no_www_sorted.bracket
```
```
or
```bash
awk '!/{www{key{/' dblp_sorted.bracket > dblp_no_www_sorted.bracket
```



**(Optional)** Execute to delete all downloaded files. It leaves only the output dataset files.
**(Optional)** Execute to delete all downloaded files. It leaves only the output dataset files.
```bash
```bash
+1 −1
Original line number Original line Diff line number Diff line
@@ -46,7 +46,7 @@ tree_id = 0
for child in root:
for child in root:
    tree_id += 1
    tree_id += 1
    # Printing simple progress.
    # Printing simple progress.
    if tree_id % 10000 == 0:
    if tree_id % 1000 == 0:
        print("- Tree %s" % (tree_id))
        print("- Tree %s" % (tree_id))
    handler = XMarkContentHandler()
    handler = XMarkContentHandler()
    lxml.sax.saxify(child, handler)
    lxml.sax.saxify(child, handler)