Skip to content
GitLab
Menu
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Mateusz Pawlik
ted-datasets
Commits
6e165826
Commit
6e165826
authored
May 02, 2018
by
Thomas Huetter
Browse files
fixed names in sentiment and bolzano streets
parent
db8be93b
Changes
2
Hide whitespace changes
Inline
Side-by-side
bolzano-address-trees/download_prepare.sh
View file @
6e165826
...
...
@@ -23,15 +23,15 @@ cd original_data
# prepare data for file L.trees
# convert file into UTF-8 format | remove header | remove IDs | sort by number of nodes (equivalent to number of "{")
iconv
-f
ISO-8859-1
-t
"UTF-8"
L.trees |
tail
-n
+14 |
sed
's/.*://'
|
awk
'{print gsub("{","{"), $0}'
|
sort
-n
|
cut
-d
' '
-f2-
>
../L_preprocessed.
bracke
t
iconv
-f
ISO-8859-1
-t
"UTF-8"
L.trees |
tail
-n
+14 |
sed
's/.*://'
|
awk
'{print gsub("{","{"), $0}'
|
sort
-n
|
cut
-d
' '
-f2-
>
../L_preprocessed.
tx
t
# prepare data for file R.trees
# convert file into UTF-8 format | remove header | remove IDs | sort by number of nodes (equivalent to number of "{")
iconv
-f
ISO-8859-1
-t
"UTF-8"
R.trees |
tail
-n
+14 |
sed
's/.*://'
|
awk
'{print gsub("{","{"), $0}'
|
sort
-n
|
cut
-d
' '
-f2-
>
../R_preprocessed.
bracke
t
iconv
-f
ISO-8859-1
-t
"UTF-8"
R.trees |
tail
-n
+14 |
sed
's/.*://'
|
awk
'{print gsub("{","{"), $0}'
|
sort
-n
|
cut
-d
' '
-f2-
>
../R_preprocessed.
tx
t
# prepare dataset with a single label
# | remove non-bracket chars. | add single dummy label 'o' > save to file
cat
../L_preprocessed.txt |
sed
's/[^\{\}]//g'
|
sed
's/[\{]/\{o/g'
>
../L_preprocessed_single_label.bracket
# go back to the folder
cd
..
\ No newline at end of file
cd
..
sentiment/download_prepare.sh
View file @
6e165826
...
...
@@ -23,7 +23,7 @@ cd trees
# prepare data for file L.trees
# convert dev.txt and train.txt into UTF-8 format | replace ( by { | replace ) by } | remove whitespace before '{' | sort by number of nodes (equivalent to number of "{")
iconv
-f
ISO-8859-1
-t
"UTF-8"
dev.txt train.txt |
sed
-e
's/(/{/g'
|
sed
-e
's/)/}/g'
|
sed
-E
's/[[:space:]]([{])/\1/g'
|
awk
'{print gsub("{","{"), $0}'
|
sort
-n
|
cut
-d
' '
-f2-
>
../sentiment.bracket
iconv
-f
ISO-8859-1
-t
"UTF-8"
dev.txt train.txt |
sed
-e
's/(/{/g'
|
sed
-e
's/)/}/g'
|
sed
-E
's/[[:space:]]([{])/\1/g'
|
awk
'{print gsub("{","{"), $0}'
|
sort
-n
|
cut
-d
' '
-f2-
>
../sentiment
_sorted
.bracket
# go back to the folder
cd
..
\ No newline at end of file
cd
..
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment