Instructions
Instructions
1. Go to https://www.newocr.com/
2. Upload one of the pdf files
3. Click the “Preview ” button
4. Choose both English ( first) and Amharic ( second ) as “Recognition language(s) ”
5. Create an empty Google Docs document with the s ame name as dffile
the uploaded p
6. For each page of the uploaded pdf file chosen (using the drop down menu)
a. choose first half of the left column of the page using the regionselector
b. click the “OCR ” button and wait for the output (down the bottom of the page)
c. copy the output of the OCR and paste it in the Google Docs document
d. do steps a
,band cfor the second half o f the left column (when pasting,
append the output after the last line of the previous text)
e. do steps a
,b,cand dalso for the right column of the page
f. finally edit the text as follows:
i. put each English word first (on one line) and the corresponding Amharic
word meaning next (on a separate line). Additionally, delete any empty
line(s).
ii. do step ifor the example sentences as well, but additionally a terminal
character ( ።or
? or !
) after each Amharic sentence depending on the
terminal character used in the corresponding English sentence ( .
or ?
or
!
respectively). Don’t add any terminal character after phrases or words.
iii. check for any errors (for both English and Amharic characters) that can
be made by the OCR and correct them accordingly using the pdf file as a
reference. Additionally, remove the ፤or:or
፣character between words in
Amharic phrases and sentences except when ፤
or is used to separate
፣
multiple Amharic meanings (words or phrases) for a given English word or
phrase.
7. Download the document as plain text (.txt) file.
8. Repeat steps 2 7 for the remaining pdf files
NB:
The pdf files are found in Wikipedia under “Creative Commons Attribution/ShareAlike License”
at:
https://am.wikipedia.org/wiki/እንግሊዝኛ_አማርኛ_መዝገበ_ቃላት_1972