Yadav 2014
Yadav 2014
Abstract- A semantic network is a graphical notation, for The result of this analysis can be used for various text mining
representing knowledge in form of interconnected nodes and applications. As a second objective, we discuss its importance
arcs. In this paper we propose a novel approach to construct a for different text mining applications and present its application
semantic graph from a text document. Our approach considers for two specific applications: keyword extraction and finding
all the nouns of a document and builds a semantic graph, such "nature of document".
that it represents entire document. We think that our graph
captures many properties of the text documents and can be used Our paper is divided in five sections. Section II represents
for different application in the field of text mining and NLP, such the related work done in area of language network. III section
as keyword extraction and to know the nature of the document. is a theoretical description of our work including all the steps to
Our approach to construct a semantic graph is independent of preprocess the document and then it presents a new approach
any language. We performed an experimental analysis to validate for building a semantic graph from a text document. Section IV
our results to extract keywords of document and to derive nature deals with experiments and results on two different data sets.
of graph. We present the experimental result on construction of The experiment validates the two different applications of
graph on FIRE data set and present its application for keyword language network on different data sets. Section five concludes
extraction and commenting on the nature of document. our works with future work and at last Dataset-2 attached in
Appendix.
Keywords- Semantic graph, Language Network, Keyword
extraction, Nature ofDocument, Text mining, WordNet.
II. RELATED WORK
A. Steps ofAlgorithms
B. Proposed Algorithm
Our approach takes a text file as input, Firstly the file is
Proposed algorithm for construction of semantic graph is
preprocessed, tagged and all nouns are extracted. The graph
described in figure-I. Here Relation-id is assigned just to
shows Ontology based relation on these nouns. We have used
WordNet Ontology to find relations (synonym, hypernymm I distinguish of edges for better analysis when needed, else no
hyponym and meronymy/holonymy) among the words. The mean of weight here. We have chosen 1, 2, 3 Relation-id for
algorithm for constructing the graph is as follows: Synonyms, member holonymy and hypernymy respectively.
In next section IV, we present exhaustive experimental
Step1: Normalizations and Stemming work along with their results for different applications of text
First we normalize the text by transforming all capital mining. In these experiments we use a network analysis
letters to lowercase so that two or more words can be treated approach for identifying the nature of the graph and keyword
same if they are different by just capital or small letter. For extraction.
example Nature, NATURE all will transform into "nature".
2014 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT) 597
documents as described in query relevance file (provided by
1. Preprocess the file: nonnalization and stemming.
2. Tag document terms using a POS tagger. FIRE). We tried to extract keywords from these relevant
3. Extract nouns to obtain sets of concepts {CI, C2, C3.... Cn}. documents. Keywords were extracted by applying graph
4. Considering these concepts, build a list of words called theoretic measures such as: Degree, Eccentricity, Closeness
wlist= {WI, W2, W3...Wn}. Wi E Ci : I <i <= n Centrality etc. The nodes with highest degree of these measures
5. Construct a Semantic graph, where wlist provides vertices for a were considered as keywords. We assume that if these
graph. extracted keywords are present in a query or are related to the
5.1 Declare a matrix ofnxninitialize with all 0 entries query, then they are correctly identified as a result of
and i=j=O. experiment. We perfonned experiments on 50 queries and
5.2 While (i<n) do found motivating results. Result for documents related to query
5.3 for (j=i; j<n; j++) do 27 is shown in Table I. The semantic graph for selected
If relation (Wi, Wj) defined in documents is presented in Figure 2.
WordNet where Wi, Wj E Ck {I <k<=n}
then Query 27: Relation between India and China.
table [i][j]= Relation-id
Query Description: Infonnation about the relationship
End for loop
between India and China with regard to economy, diplomacy,
Increment the current candidate pointer i
science, technology and trade is relevant.
End while
Documents selected: 1040913jrontpage_story_3751658.
Figure 1: Proposed Algorithm for construction of Semantic-Net
utili, 1041016_nation_story_3889236. utili, 1050207_nation_
story_4346014. utili 1040909_opinion_story_3732586. utili
IV. EXPERIMENTS AND RESULTS
and 1041006_opinion_story_3819642. utili
The organization of this section is in two parts. First, about
the dataset & tools used and second part is experiment part.
Experiments conducted in our work are divided in three parts:
construction of Semantic Network for a text document,
extracting the keywords from semantic graph and commenting
on the nature of the document based on graphs visualization.
598 2014 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT)
Table 1: Keyword Extraction for Documents related to Query 27 Table 2: Keyword Extraction for Dataset-2 with different measures
� E.-'Eukity CIm;a:w Bm'!!!l:l:!ll C .C. NO. DtGREE EC(EN RiellY OostntisC.emrality B.IWffilflelsC.nuality TF
c;a:trilil"
�ODES
IVA!. I :;ODES I V.4J. ](ODES '}'1. \UV"'
. VAl bodies
2014 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT) 599
To compare the efficiency of the results we compare theme or is multi-thematic. Further, we observed that 40 %
extracted keyword with those evaluated by human judgment words have zero degree and are not important. These words
(only 30 are mentioned here due to space limitation). After need not be considered for indexing. Thus we have a type of
observing the results we have found that clustering coefficient semantic filtration that identifies stop words based on semantic
is providing the best results among all measures for dataset- 2. importance. Almost all important words are in the largest
Number of words is similar to human judgment are more as component. This trend was also observed for all other FIRE
compare to other measure. So we can infer that clustering documents on which we performed our experiment. .
coefficient playing a good role in keyword extraction.
v. CONCLUSION AND FUTURE WORK
After analyzing the result, we can say that clustering
coefficient not only considers the nodes with high degree but In this paper we presented a novel approach for
also takes into consideration that how densely its neighbors are constructing a semantic graph of text documents. Our approach
connected. If the neighbors of a node are densely connected it considers all nouns of a document, converts them to concepts
has a high clustering coefficient. This ensures transitivity in the and automatically builds a graph from the extracted concepts.
relation i.e. if x is connected to y and y is connected to z, there The nodes are the concepts, while edges represent the semantic
is a high probability that x is connected to z. Moreover, relation between concepts. The motivation behind this we are
clustering coefficient also allows the nodes with lower degrees, not depending on the measures like TF-IDF which gives more
which may be part of a smaller connected component of a importance to the word which is more frequent in the
graph to be considered as important nodes. This is a unique document. Our approach gives more importance to semantic
property of clustering coefficient; other measures provide more importance of the word present in the document, which even
importance to the nodes which are directly or indirectly can be applied for small document where TF-IDF cannot be
connected to a large number of nodes. Thus, these measures used. We think that our graph captivate many properties of the
provide high ranking to the nodes belonging to large text documents and can be used for different application in the
components only. field of text mining, NLP and computational linguistic. Here
we present its application for keyword extraction and for
Nature of Document:
commenting on nature of document. We have yet to do in
The nature of the document means, to comment on text depth analysis of our semantic graph. To comment on the
basically what it contains. We are commenting based on nature of the document we find that clustering coefficient
Semantic graph visualization. We have created a Semantic playing an imperative role that not only cover whole big
graph for document (that is prepared manually by taking news component but also it considers small clusters. Further we find
from different newspapers, appendix enclosed), which is that closeness centrality and eccentricity of a node provide
shown in Figure 2 and corresponding results for keyword good criteria for keyword extraction.
extraction are presented in table 2. After reading and
As this is a new work, it requires further exploration. We
understanding this document we can say that this document
intend to preprocess /post process the graph for improving its
contains information about the tragedy that happened recently
efficiency for different applications. Further, we think that the
in India at Uttrakhand due to Cloudburst, resulted into floods
semantic graph has a potential to be applied in many NLP
and landslides. It also informs that the rescue operation was
applications such as: query expansion, topic detection, text
done by Indian forces, army, IAF with the help of choppers etc.
summarization etc.
On observing the graph we can say that it contains many
components but there is one dominating component ( by using REFERENCES
DFSIBFS or search algorithm). This indicates that the text is
[I] R. Mihalcea , D. Radev, "Graph based natural language processing and
focused on a topic. information retrieval," Cambridge University press,2011.
After seeing following semantic graph, we can say that this [2] P. Dmitry , "Identitying the Pathways for Meaning Circulation using
Text Network Analysis," Nodus Labs, 2011.
semantic graph mainly informs about two areas, one is about
[3] B. Kang.,V. Kim, S. Lee, " Exploiting Concept Clusters for Content
this tragedy and its results is shown through arc "A2" that is
based Infonnation Retrieval," Information Sciences 179 (2-4), 2005 pp.
highlighted with red arc. Secondly, theme is represented 443-462 .
through a red arc "AI" has words like {search, operation, [4] A. Sharan, M. Lata Joshi,A. Pandey " Exploiting Ontology for Concept
inspector, unit, forces, command, rank, police, team, crew etc.} Based Information Retrieval," Infonnation Systems for Indian
is clearly pointing towards the rescue operation done by forces. Languages Communications in Computer and Infonnation Science
Arc A3 represented in Blue color shows concepts with small Volume 139,2011,pp 157-164
lengths. These concepts also provide important information [5] J. LIU, J. WANG, "Keyword Extraction Using Language Network," In
Natural Language Processing and Knowledge Engineering,2007.
like { {rain, cloudbursts}, {emergencies, crisis}, {fleet,
aircraft} etc.} (Arc is drawn manually for better analysis). [6] M. Bastian, S. Heymann and M. Jacomy. "Gephi: an open source
software for exploring and manipulating networks," Proceedings of the
We also considered some multi-thematic documents and Third International ICWSM Conference (2009),pp. 361-362.
observed that their graph contains 2-3 large connected [7] G.A. Miller, "WordNet: A Lexical Database for English,"
components of comparable size instead of one dominating Communications of the ACM Volume 38,1995,pp. 39-41.
component. We are not presenting the result due to space [8] Princeton University, "WordNet ", Internet: www.http://wordnet.princ
eton. edu,Dec. 27,2012 [ July, 26,2013].
limitation. This indicates that the semantic graph can be used
[9] S. Bird, E. Klein and E. Loper, "Natural Language Processing with
for identifying whether a document is focused around one
Python ",O"Reilly Media,2009.
600 2014 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT)
[10] B. Ulrik, "A Faster Algorithm for Betweenness Centrality," in Journal of relief operations in the state but is maintaining 10 choppers there including world's
Mathematical Sociology,pp. 163-177,2001. largest Mi-26 transport helicopter along with Mi-17s and ALH Dhruvs. In the last 24
hours, the Ministry said the IAF has evacuated around 310 and overall, so far, rescued
[II] L.c. Freeman, "Centrality in social networks, conceptual clarification," 20,712 people using its helicopter fleet. The Army is continuing to maintain the troop
Social Networks I,pp. 215-239,1979. level at 8,000 in the region for the relief operations. It has been maintaining this strength
[12] Wikimedia Foundation Inc. "Clustering coefficient ", Internet: since June 18 when the armed forces were inducted into the state.
http://en.wikipedia.org/wiki/Clustering_coefficient. July. 6, 2013 [July. The Indian Air Force (lAF) has deployed 13 more aircraft for relief and rescue
operations. Fifty-jive helicopters have been pressed into service for rescue work. The IAF
26,2013].
has also deployed its heavy lift Ml-26 helicopters for transporting filel and heavy
[13] ISICAL "Forum for Infonnation Retrieval and Evaluation ", Internet equipment required by the Boarder Road Organization (BRO) to clear roads closed due
http://www.isical.ac.in/-c1ia/data.html. [Aug.,23 2013]. to landslide. New Delhi: Indian Air Force has airlifted over 18,000 persons and dropped
more than 3 lakh kg of relief material in flood-hit Uttarakhand since June 17 in its
[14] T. Kristina , K. Dan , C. Manning, and Y. Singer. 2003. "Feature-Rich
biggest ever helicopter operation for rescue and relief in the state. IAF has airlifted a
Part-of-Speech Tagging with a Cyclic Dependency Network," HLT
total of 18,424 persons, flying a total of 2,137 sorties and dropping/ landing a total of
NAACL 2003,pp. 252-259. 3,36,930 kgs of relief material and equipment, it said in a statement on Sunday.
[15] G. Vishal and L. Gurpreet Singh, "automatic keywords extraction for operations for 'Op Rahat' that were undertaken since morning, a total of 749 persons
punjabi language," [JCS[ International Journal of Computer Science were airlifted, flying a total of 93 sorties and dropped about 12,000 kgs of relief material
Issues, Vol. 8,Issue 5,No 3,September 2011 and equipment, it said.
The Central Reserve Police Force (CRPF) on Saturday announced it will contribute one
[16] J. F. Sowa. "Semantic Network," http://www.jfsowa.com/pubs /semnet. day's salary of its personnel to the Prime Minister's Relief Fund for the victims of
html,Feb. 02,2006 [Jul,26,2013]. Uttarakhand tragedy.CRPF Director General Pranay Sahay said the force will
contribute over Rs18 crore to support the victims of the massive calamity in the hill state,
a spokesperson said in Gurgaon. The CRPF deeply commiserates with the victims of the
Appendix tragedy that has struck Uttarakhand. There has been large scale destruction of property
Dataset-2 and loss of lives in this disaster. The victims are our own brethren. The CRPF rank and
file joins the countlymen in conveying its deepest concern for the victims of the tragedy.
"Ullarakhand-Flood Missing unll'aced till July 15 will be presumed dead: Bahuguna The officers and men have decided to donate one day's salmy to the Prime Minister's
acing gains! time, the Uttarakhand government on Thursday decided that those missing Relief Fund. The amount will be over Rs.18 erore, Sahay added.
in the flood-ravaged state will be presumed dead if they remain untraced till Ju/y J 5 and Personnel of ITBP have been extensively active in the rescue and salvage operations
asked of f icials to remain vigilant in the wake of warning of heavy rains over the next two conducted jointly with the Indian Air Force (lAF) and Army units from the Central
days. Command ever since the news was flashed about the flash floods at several places in the
Chief Minister Vijay Bahuguna said the exact number of people missing after the tragedy Uttarakhand. Rescue teams and police personnel have recovered 48 dead bodiesfi"om the
is 3,064 and the deadline for finding them is July 15. Considering the magnitude of the River Ganga in I-Iaridwar. We are keeping them in the morgue and documenting their
crisis, the slale Cabinet has decided that if the missing persons are not found by July J 5, details. We are also clicking their photographs, and flashing the details for identification,
we will presume that they are dead and the process of paying compensation to their ne.Tt said Haridwar Senior Superintendent of Police Rajiv Swaroop. Central Army
of kin will begin, he said With the MeT department issuing a warning of heavy rains at Commander Lt. Gen Anil Chait said on Friday that about 8,000 to 9,000 people are still
places in Kumaon region over the ne.;rt two days, Bahuguna said that for the ne.;rt 50 stranded in Badrinath. Over 73,000 people have so far been evacuated from the flood
hours the administration needs to be highly vigilant, adding 250 National Disaster and landslide-hit areas of Uttarakhand so far. Another 32,000 to 33,000 people are still
Response Force personnel have been deployed in these areas. Meanwhile, the Indian Air to be evacuated, even as rescuers intensified their efforts to help those in distress in
Force flew 70 civil administration personnel to the Kedarnath temple premises to clean different inaccessible parts of the hill state. In Joshimath sector, the army has
the surroundings there. A team of seven mountaineers is also engaged in a combing constn1Cted a temporary bridge across Alaknanda River near Govindghat to facilitate
operation in areas adjoining the shrine in search of bodies while over 50 members of a I-Jemkunt Sahib pilgrims to cross over. A road link has been opened from Sonprayag to
team of experts and volunteers is stationed in Kedarnath to clean the temple premises of Gaurikund.
tonnes of debris under which more bodies may be lying, an of f icial said. Meanwhile, BJP spokesperson Prakash Javadekar has said that party president Rajnath
In Delhi, the government announced it will rebuild 10,000 houses and undertake other Singh has formed a disaster relief force for Uttarakhand, consisting of party workers,
activities to develop infrastn/cture in all affected municipalities in the state. All affected volunteers and people from all classes of the society. We are making special
municipalities and notified area councils in Uttarakhand can be covered under Rajiv arrangements for those eager to work. It shall be a continuous process, Javadekar said.
Awas Yojana as a special case to support reconstn1Ction of houses of the poor and I-Ie also praised the efforts of the armed forces and the Indo-Tibetan Border Police,
reconstn1Ct and redevelop these devastated houses, Union minister Girija Vyas said in saying they were doing a laudable job. If we move down hill fi"om Badrinath, towards
Delhi. Joshimath, there is a place that falls on route called Gobindghat. There are three isolated
Mass cremation of bodies in Kedarghati held up for the past few days started with 23 segments (spots), which are completely cut-off, in between Gobindghat and Badrinath.
more consigned to flames at Gaurikund and Junglechatti last night, DIG Amit Sinha Some people are still stranded in these places, Lt. Gen Chait said."
under whose supervision the exercise is being undertaken said.Mass cremation of bodies
in Kedarghati held up for the past few days on Thursday started with 23 more consigned
to flames, taking the number of bodies disposed of so far to 59 even as a team of experts
worked on removal of debris and extricating bodies from under them at the Himalayan
shrine.23 more bodies were cremated at Gaurikund and Junglechatti on Wednesday
night, DIG Amit Sinha under whose supervision the exercise is being undertaken told
PTi. This takes the total number of bodies disposed of in Kedarghati to 59, he said. 36
bodies had been cremated earlier, the DIG said, adding the process is slow due to bad
weather and the precautions being taken not to risk the lives of personnel engaged in the
e.;rercise. 50 other members of the team are searching for bodies in Gaurikund and
Rambara areas. Despite continuing bad weather in affected areas amid a MeT
department prediction of heavy rains in the ne.;rt 48 hours at places, efforts were on to
airdrop relief material in affected villages totally cut off after the calamity in the worst
hit Rudraprayag, Chamoli and Uttarkashi districts.It is still raining intermittently in the
area, he said. A team of seven mountaineers is engaged in a combing operation in areas
aqjoining the shrine in search of bodies while over 50 members of a team of experts and
volunteers is stationed in Kedarnath to clean the temple premises of tonnes of debris
under which more bodies may be lying, the official said.
Chief Minister Vijay Bahuguna has alerted the District Magistrates in Kumaon and
Garhwal regions to be prepared to deal with any emergency in case of heavy rains and
suspend all pilgrimage.The Indian Air Force (lAF) today flew 70 civil administration
personnel to the Kedarnath temple premises to clean the surroundings after the area was
ravaged by the recent flash floods in Uttarakhand. A total of 70 personnel have been
inducted at Kedarnath for cleaning of temple surroundings by the fAF choppers, a
Defence Ministry release said.
The eighth century shrine at Kedarnath withstood the cloudbursts and floods that swept
away its neighbourhood and much of the town last month. After evacuating all the people
stranded in the upper reaches of the hill state, the lAF is in the process of evacuating the
locals who want to move out. The lAF has pulled out majority of its assets deployed in the
2014 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT) 601