0% found this document useful (0 votes)
18 views1 page

Instructions

Uploaded by

danielfeleke1996
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views1 page

Instructions

Uploaded by

danielfeleke1996
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Procedures

1. Go to ​https://www.newocr.com/
2. Upload one of the pdf files
3. Click the “​Preview​ ” button
4. Choose both English (​ first​) and Amharic (​ second​ ) as “​Recognition language(s)​ ”
5. Create an ​ empty Google Docs​ document with the s ​ame name as ​ ​df​file
the uploaded p
6. For each page of the uploaded pdf file chosen (using the drop down menu)
a. choose ​ first half o​f the​ left column​ of the page using the region­selector
b. click the “​OCR​ ” button and wait for the output (down the bottom of the page)
c. copy the output of the OCR and paste it in the Google Docs document
d. do steps ​ a​
,​b​and ​ c​for the ​second half o ​f the​ left column ​ (when pasting,
append the output after the last line of the previous text)
e. do steps ​ a​
,​b​,​c​and ​ d​also for the r​ight column​ of the page
f. finally edit the text as follows:
i. put each English word first (on one line) and the corresponding Amharic
word meaning next (on a separate line). Additionally, delete any empty
line(s).
ii. do step ​ i​for the example sentences as well, but additionally a terminal
character (​ ።​or ​
?​ or !​

) after each Amharic sentence depending on the
terminal character used in the corresponding English sentence (​ .​
or ​?​
or ​
!
respectively). Don’t add any terminal character after phrases or words.
iii. check for any errors (for both English and Amharic characters) that can
be made by the OCR and correct them accordingly using the pdf file as a
reference. Additionally, remove the ​ ፤​or​:​or ​
፣​character between words in
Amharic phrases and sentences except when ፤​ ​
or​ is used to separate
፣​
multiple Amharic meanings (words or phrases) for a given English word or
phrase.
7. Download the document as plain text (.txt) file.
8. Repeat steps 2 ­ 7 for the remaining pdf files

NB:
The pdf files are found in Wikipedia under “Creative Commons Attribution/Share­Alike License”
at: ​
https://am.wikipedia.org/wiki/እንግሊዝኛ_አማርኛ_መዝገበ_ቃላት_1972

The terms of the license can be found at: ​


https://creativecommons.org/licenses/by­sa/3.0/

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy