Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
What's new
10
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Open sidebar
Mateusz Pawlik
ted-datasets
Commits
6e165826
Commit
6e165826
authored
May 02, 2018
by
Thomas Huetter
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
fixed names in sentiment and bolzano streets
parent
db8be93b
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
5 additions
and
5 deletions
+5
-5
bolzano-address-trees/download_prepare.sh
bolzano-address-trees/download_prepare.sh
+3
-3
sentiment/download_prepare.sh
sentiment/download_prepare.sh
+2
-2
No files found.
bolzano-address-trees/download_prepare.sh
View file @
6e165826
...
...
@@ -23,15 +23,15 @@ cd original_data
# prepare data for file L.trees
# convert file into UTF-8 format | remove header | remove IDs | sort by number of nodes (equivalent to number of "{")
iconv
-f
ISO-8859-1
-t
"UTF-8"
L.trees |
tail
-n
+14 |
sed
's/.*://'
|
awk
'{print gsub("{","{"), $0}'
|
sort
-n
|
cut
-d
' '
-f2-
>
../L_preprocessed.
bracke
t
iconv
-f
ISO-8859-1
-t
"UTF-8"
L.trees |
tail
-n
+14 |
sed
's/.*://'
|
awk
'{print gsub("{","{"), $0}'
|
sort
-n
|
cut
-d
' '
-f2-
>
../L_preprocessed.
tx
t
# prepare data for file R.trees
# convert file into UTF-8 format | remove header | remove IDs | sort by number of nodes (equivalent to number of "{")
iconv
-f
ISO-8859-1
-t
"UTF-8"
R.trees |
tail
-n
+14 |
sed
's/.*://'
|
awk
'{print gsub("{","{"), $0}'
|
sort
-n
|
cut
-d
' '
-f2-
>
../R_preprocessed.
bracke
t
iconv
-f
ISO-8859-1
-t
"UTF-8"
R.trees |
tail
-n
+14 |
sed
's/.*://'
|
awk
'{print gsub("{","{"), $0}'
|
sort
-n
|
cut
-d
' '
-f2-
>
../R_preprocessed.
tx
t
# prepare dataset with a single label
# | remove non-bracket chars. | add single dummy label 'o' > save to file
cat
../L_preprocessed.txt |
sed
's/[^\{\}]//g'
|
sed
's/[\{]/\{o/g'
>
../L_preprocessed_single_label.bracket
# go back to the folder
cd
..
\ No newline at end of file
cd
..
sentiment/download_prepare.sh
View file @
6e165826
...
...
@@ -23,7 +23,7 @@ cd trees
# prepare data for file L.trees
# convert dev.txt and train.txt into UTF-8 format | replace ( by { | replace ) by } | remove whitespace before '{' | sort by number of nodes (equivalent to number of "{")
iconv
-f
ISO-8859-1
-t
"UTF-8"
dev.txt train.txt |
sed
-e
's/(/{/g'
|
sed
-e
's/)/}/g'
|
sed
-E
's/[[:space:]]([{])/\1/g'
|
awk
'{print gsub("{","{"), $0}'
|
sort
-n
|
cut
-d
' '
-f2-
>
../sentiment.bracket
iconv
-f
ISO-8859-1
-t
"UTF-8"
dev.txt train.txt |
sed
-e
's/(/{/g'
|
sed
-e
's/)/}/g'
|
sed
-E
's/[[:space:]]([{])/\1/g'
|
awk
'{print gsub("{","{"), $0}'
|
sort
-n
|
cut
-d
' '
-f2-
>
../sentiment
_sorted
.bracket
# go back to the folder
cd
..
\ No newline at end of file
cd
..
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment