Skip to content
GitLab
Menu
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Mateusz Pawlik
ted-datasets
Commits
e4873b59
Commit
e4873b59
authored
Oct 23, 2018
by
Mateusz Pawlik
Browse files
Finalized Swissprot. Tested.
parent
4ec3f4f7
Changes
2
Hide whitespace changes
Inline
Side-by-side
swissprot/README.md
View file @
e4873b59
...
...
@@ -27,7 +27,8 @@ https://www.uniprot.org/downloads
## RAM requirements
To be measured.
The current way of processing Swissprot dataset requires
**60GB**
of RAM memory
(60GB for conversion, 10GB for sorting).
## Steps
...
...
@@ -40,4 +41,5 @@ Execute the following to download and prepare the dataset.
## Estimated time
To be measured.
On an Intel Xeon 2.40GHz CPU, it takes around
**50min**
(including downloading).
swissprot/swissprot_to_bracket.py
View file @
e4873b59
...
...
@@ -25,7 +25,7 @@ from lxml import etree
import
lxml.sax
from
xml.sax.handler
import
ContentHandler
# This script converts
DBLP
from XML to bracket notation.
# This script converts
Swissprot
from XML to bracket notation.
# NOTE: Filenames are hardcoded in this script.
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment