Sequeira 17
Sequeira 17
Chapter 1
∗
Corresponding Author Email: 2120027@my.ipleiria.pt
2 F. Rosa-Sequeira, V. Basto-Fernandes, and R.Z. Frantz
Abstract
Nowadays companies have a software ecosystem composed of more than one ap-
plication to support their business processes. On the Enterprise Application Integra-
tion (EAI) field can be found a set of methods, techniques, and tools to integrate them
in a synchronous or asynchronous way. In this chapter, we review integration ap-
proaches and integration platforms available in the Cloud. We demonstrate the use
of an integration platform by means of a case of study for a research outcomes and
technological information management integration problem. The proposal addresses
Portuguese and international science and research outcomes information management,
and corresponding information systems. There are presented problems in interoper-
ability between information systems. A business and technological perspective is pro-
vided, including the conceptual analysis and modelling, an integration solution based
on a Domain-Specific Language (DSL) and the integration platform to execute the
proposed solution. For illustrative purposes, the role and information system needs of
a research unit is assumed as the representative case.
Application Integration, Integration Platform-as-a-Service, Cloud Computing 3
1. Introduction
Organizations rely on information systems and software applications to support their
business activities. Interesting applications rarely live in isolation. Whether a sales applica-
tion must interface with an inventory application, a procurement application must connect
to an auction site, or a Personal Digital Assistant (PDA) or Personal Information Manager
(PIM) must synchronize with the corporate calendar server, it seems like any application can
be made better by integrating it with other applications [18]. Frequently, these applications
are legacy systems, packages purchased from third parties, or developed internally to solve
a particular problem. This usually results in heterogeneous software ecosystems, which are
composed of applications that were not usually designed taking integration into account. In-
tegration is necessary, chiefly because it allows to reuse two or more applications to support
new business processes, or because the current business processes have to be optimized by
interacting with other applications within the software ecosystem. Enterprise Application
Integration (EAI) provides methodologies, techniques, and tools to design and implement
integration solutions. The goal of an EAI solution is to keep a number of applications data
in synchrony or to develop new functionality on top of them, so that applications do not
have to be changed and are not disturbed by the integration solution [15].
Enterprises have typically hundreds of applications custom-built, acquired, part of a
legacy system, or a combination, operating in multiple tiers of different operating systems
and platforms. Some enterprises have dozens of Websites, more than one instance of SAP
and countless departmental solutions. Creating a single, big application to run a complete
business is next to impossible. Enterprise Resource Planning (ERP) have had some suc-
cess at creating larger-than-ever business applications. The reality, though, is that even the
heavyweights like SAP, Oracle, etc. only perform a fraction of the business functions re-
quired in a typical enterprise. That can easily be seen by the fact that ERP systems are
one of the most popular integration points in todays enterprises. Unfortunately, enterprise
integration is no easy task. Software vendors offer EAI suites that provide cross-platform,
cross-language integration as well as the ability to interface with many popular packaged
business applications. However, this technical infrastructure presents only a small portion
of the integration complexities. The true challenges of integration span far across business
and technical issues.
In a general manner, integration technologies from nowadays do not let to work at a
high level of abstraction, e.g., the implementation of solutions demands for a knowledge of
programming APIs. That is a limiting factor for the development and maintenance which
turns the solution dependent on the integration platform. If solutions could be modeled in a
platform independent language and the code needed for its implementation generated in an
automatic manner, then it would be a cross-platform solution and the costs in implementa-
tion, maintenance and evolution would possibly be reduced [20].
The rest of this chapter is organized as follows: Section 2. discuss four common in-
tegration approaches; Section 3. introduces the cloud-based integration platforms ranked
as “Leaders” in the Magic Quadrant of Gartner, Inc.; Section 4. illustrates the use of the
Guaran Cloud integration platform to solve an integration problem in the context of re-
search outcomes information management; Section 5. concludes this chapter.
4 F. Rosa-Sequeira, V. Basto-Fernandes, and R.Z. Frantz
2. Integration Approaches
Nevertheless applications integration may occur in the same machine, many times it
does not; in fact, some machines may be thousands kilometers from each other, and so al-
most all integration solutions have to deal with a few fundamental challenges, being one
of them networks unreliability and slowness. An integration solution has to transport data
between computers across networks. Compared to a process running on a single computer,
distributed computing has to be prepared to deal with a larger set of possible problems.
Often, two systems to be integrated are in distinguished continents and data between them
have to travel through phone-lines, LAN segments, public networks, and satellite links. On
each of these steps delays or interruptions may occur. Sending data across a network is
multiple orders of magnitude slower than doing a local method call. Designing a widely
distributed solution in the same way it would approach a single application could result in
disastrous performance implications. Integration solutions need to transmit information be-
tween systems that use different distinct programming languages, operating platforms and
data formats. An integration solution needs to be able to interface with all these different
technologies [18].
As stated by Gregor Hohpe and Bobby Woolf [18] application integration may be done
in four different ways, namely: File Transfer, Shared Database, Remote Procedure Calling,
and Messaging. In File Transfer each application may play the role of producer or con-
sumer, producing files of shared data to others consume or consuming what others have
produced. Shared Database is the integration approach where a common database schema
is shared, in a single physical database; since there is not duplicate storage, there is not also
any data transference between applications. In Remote Procedure Calling integration appli-
cations expose some of their functionalities in a way that other applications can access them
as a remote procedure, and so the communication occurs in real-time and synchronously.
Finally, in Messaging integration each application is connected to a common messaging
system, where it publishes its own messages and it reads messages from other applications.
Since applications may read messages from that channel in a later time after they have been
published by another applications, the communication is asynchronous; applications just
have to agree on a channel and a message format to be used [20]. In the following sections
we provide more information on each integration approach.
Shared
Data
r
Application Application
A B
S
S
r
2.4. Messaging
Messaging Integration, shown in Figure 4, just like RPC might be used in organizations
with independently built applications, running in different languages and distinct platforms,
and where it is also needed to share functionalities in a responsive way to an event. How-
ever, unlike RPC integration, Messaging integration might be asynchronous. Just like a
reaction to the common problems present in distributed systems (unavailability of systems,
problems with network connections), Messaging systems enable transfer of data packets in
a frequently, reliable, immediate way but also asynchronously, using customizable formats,
y means of adapters. An adapter is a piece of code, independent of the application, which
abstracts away the communication mechanism between the application and the message
Having a system responsible for taking and delivering messages from one application to
another one (or even more), allows the interoperability in situations where not all the sys-
tems are up-and-running at the same time. Nevertheless, it may occur that a sequence of
messages may not be received in the same order that was sent (sometimes because a mes-
sage fails or it took longer than another to be created) and the Message Bus will have to
resend it again [18].
Mg
• some systems provide data “as is”, without any chance to request other data formats;
Application Integration, Integration Platform-as-a-Service, Cloud Computing 7
• outputs from one system have to be “worked on” before being used as inputs on other
systems;
Taking in mind the advantages and disadvantages of the integration approaches pre-
sented in previous sections, and the properties of this integration scenario, the messaging
integration approach was considered the most suited and promising for an integration solu-
tion with improved quality attributes such as reliability, scalability and availability [20].
provides a tool for test and watch the process rolling. It provides means to connect well
know applications, like Dropbox, Jira and also standard connectors like FTP or HTTP.
3.2. Informatica
Informatica is a private company founded in 1993 in the U.S.A.. Informatica provides a
visual designer by drag-and-drop web interface and self-service wizards, connectivity to ap-
plications on the cloud, as well as connectivity to on-premise applications and databases. It
also provides wizards, pre-configured templates and out-of-the-box mappings. Developers
use the design canvas to drag and drop data sources, targets, and advanced transformations.
It allows to manage the state of orchestrations and business processes system-to-system
interactions, be they synchronous, asynchronous, long or short-running. Informatica pro-
vides connectors for well known applications, like Dropbox, Google APIs, Jira, Microsoft
Sharepoint, Salesforce, and standard protocols, like FTP, ODBC, REST and also connectors
development.
3.3. JitterBit
Jitterbit was founded in 2003 on the U.S.A. JitterBit provides for the design of a graph-
ical interface, allowing the re-use of existing code and business logic, point-and-click con-
nectivity, drag-and-drop configuration, pre-built templates and also infusing any application
with artificial intelligence. Deploy of solutions may be done 100% on cloud, on-premise,
or hybrid. It is also possible the reuse of any application or code. JitterBit claims that its
management data may be moved across applications, real-time analytics with consolidated
data, real-time monitoring with alerts and the use of team permissions. It provides connec-
tions to many well Known applications, like Dropbox, Gmail, Jira, Microsoft SQL Server
and also standard protocols, like FTP, HTTP, ODBC.
3.4. MuleSoft
Mulesoft was founded in 2006 on the U.S.A. With its Open Source tool, MuleSoft
combines cloud-hosted and on-premises integration. It enables integration of software as a
service and on-premises applications, APIs management, since its creation and publishing,
all on a single platform. A repository for connectors, templates and APIs is available and
might be enriched by users. Mulesoft provides connectors for well known applications, like
Dropbox, Jira, Microsoft Sharepoint, Salesforce, and standard protocols, like FTP, HTTP,
JDBC and also connectors development.
3.5. Oracle
Oracle was founded in 1977 on the U.S.A. Oracle integrations are developed by point
and click in a browser based visual designer editor, allowing APIs publishing for external
consumption. Users have access to connectors to all Oracle SaaS applications subscribed by
them, native SaaS adaptors to integrate with other cloud applications, and integration with
on-premise applications. It is possible to monitor transactions, key performance indicators,
and also to detect and diagnose errors. Customers have access to pre-built integrations to
Application Integration, Integration Platform-as-a-Service, Cloud Computing 9
use as-is or to customize and also to a Cloud Marketplace where pre-built adapters and
integrations are traded. Developers have access to set of connectors to well-known appli-
cations, like GMail, Microsoft SQL Server, SAP R3 and also standard protocols, like FTP,
REST.
3.6. SnapLogic
Snaplogic was founded in 2006 on the U.S.A. With its Open Source tool, SnapLogic
provides a web-based user interface and a set of adapters, development of integration flows
and a set of patterns to be used by integrators, via drag-and-drop. It integrates applica-
tions or data in the cloud, on-premise and hybrid. It guarantees data requests delivery and
automatically monitors performance and data requests to ensure data delivery as well as
compliance with Service Level Agreements (SLAs), companies policies and legal and reg-
ulatory requirements. Centralized object level, granular security and permissions enable
integration to be extended through out customers organizations. It provides connectors
to ERP, CRM, identity management, on-line storage, relational, columnar and key-value
databases and standard technologies, like XML, REST, OAuth.
3.7. Guaraná
Guaraná technology arises from the efforts to provide specific tools and notations to
reduce design times and implementation of solutions in the field of integration computer
systems and business information. It got inspiration from the Model-Driven Engineering
discipline [19], shifting the focus from the source code to models. Models are abstractions
that allow software engineers to focus on the relevant aspects of a software system while
ignoring details that are irrelevant. Behind this discipline is the idea to raise the level of ab-
straction of the overall development process, to capture systems as a collection of reusable
models, to separate business logic descriptions from a particular platform implementation,
and to automate the implementation phase.
Guaraná Cloud aims to provide software engineers the optimum technology to integrate
traditional business resources (local applications, legacy systems, databases, files, web ser-
vices, etc.), Internet applications (Software as a Service or SaaS) or Cloud platforms (Plat-
form as a Service or PaaS). After logging in, users get in Guaraná Cloud Dashboard, and
IDE where developers may create solutions from drafts or using templates, create templates,
set up configuration values, access tutorials, watch for alerts. Solutions development is done
in a “drag-and-drop” basis; users pick-up tasks and/or connectors, connect them and set up
some values. There can be found connectors to well known applications from everyday
work, like Dropbox, Gmail, Jira, Sage One, Salesforce, or generic connectors, like HTTP,
FTP. Depending on the connectors and tasks, most of the times those settings are done just
by clicking; only in few cases developers may have to type XSLT code or other parame-
ter values. Even in the cases where XSLT code is required, developers may count with an
internal tool to help on building the XSLT code. Developers may also count with a visual
debugging tool to find out where errors occur or only to check if the solution is running as
expected, or they can also look into logs, helpful when they cannot get enough information
from the visual debugging tool. Furthermore, users also have access to statistics for better
monitoring and log messages from solutions already running, not only on debug mode.
10 F. Rosa-Sequeira, V. Basto-Fernandes, and R.Z. Frantz
4. Study of a case
In this section we report on a study of a case to demonstrate the use of Guaraná integra-
tion platform. We introduce the integration problem to solve, the integration solution mod-
eled using the domain-specific language of Guaraná and its implementation using Guaraná
Cloud.
2700 projects supported by FCT. Besides this, FCT also ensures international partnerships
with the U.S.A., the participation of Portuguese scientific community in bilateral and mul-
tilateral programs, contributions for international scientific organizations like CERN, ESA
and EMBO [5]. Another important responsibility of FCT is to collect, organize, compile,
summarize, report and provide national research outcomes and activities information, by
the means of electronic repositories and platforms.
4.2.2. ORCID
ORCID [10] is an open, non-profit, community-driven effort to create and maintain
a registry of unique researchers identifiers and a transparent method of linking research
activities and outputs to these identifiers. ORCID is unique in its ability to reach across dis-
ciplines, research sectors and national boundaries. It is a hub that connects researchers and
research through the embedding of ORCID identifiers in key workflows, such as research
profile maintenance, manuscript submissions, grant applications, and patent applications.
Researchers may benefit from ORCID two core functions:
• a registry to obtain a unique identifier and then manage a record of activities;
• APIs that support system-to-system communication and authentication.
ORCID makes its code available under an open-source license and posts an annual pub-
lic data file under a Creative Commons Zero (CC0) waiver for free download. The ORCID
Registry is available free of charge to individuals who may obtain an ORCID identifier, and
then manage their record of activities and search for others in the Registry. Organizations
may also use it, become members, link their records to ORCID identifiers, update ORCID
records, receive updates from ORCID, register their employees and students with ORCID
identifiers.
4.2.4. Scopus
Scopus [11] claims to be the largest abstract and citation database of peer-reviewed
literature: scientific journals, books and conference proceedings. Scopus features smart
tools to track, analyze and visualize research, delivering a comprehensive overview of the
world’s research output in the fields of science, technology, medicine, social sciences, and
arts and humanities. Scopus claims comprehensiveness, having twice as many titles and
over 50% more publishers listed than any other Abstracting and Indexing (A&I) database,
with interdisciplinary content that covers the research spectrum. Timely updates from thou-
sands of peer-reviewed journals, preliminary findings from millions of conference papers,
and the thorough analysis in an expanding collection of books ensure researchers have the
most up-to-date and highest quality interdisciplinary content available. Scopus claims to
be the only leading database that is daily updated, rather than weekly. There can be found
journals, books, open access journals, conference papers and patents. Scopus supports data
exportation to reference managers such as Mendeley, RefWorks and EndNote. Besides this,
there is a set of APIs available to registered or non-registered users, being that the last ones
have limited access to a basic metadata and basic search functionality.
participate in these programs and services, etc). Public availability of the Platform data
on the Internet gives greater transparency and reliability to the promotion activities of the
CNPq and agencies that use it, strengthen exchanges between researchers and institutions,
and is an inexhaustible source of information for studies and research. In the way that its in-
formation is recurrent and cumulative, also has an important role in preserving the memory
of the research activity in Brazil.
For the sake of simplicity and summarized description, without loss of generality, it
is assumed in this work the role and perspective of a research unit actor. A research unit
needs, in a regular base (at least annually), to follow and assess its researchers activities and
outcomes, whose CVs, activities, research outcomes are registered and updated in national
funding agencies software platforms and international research production repositories.
and other type of scientific outcomes are reported and valid, classify the publications ac-
cording to quality ranks, aggregate, summarize and generate annual reports using software
and communication tools such as e-mail, style sheets and text editors. Turning this into an
automated process supported by integration software, only requiring a way to identify the
researchers of a research unit, would reduce or eliminate the need for manual and/or ad-hoc
procedures.
The EAI based integration solution proposed in this chapter involves the interaction
with four main data sources/applications to collect or publish data about researchers and
corresponding research outcomes: “Local Research Unit Characterisation”, “Plataforma
DeGóis”, “Scopus” and “CMS Application”. The first data source consists in a XML file
stored in a file system accessible via TCP/IP protocols, containing basic data about re-
searchers, needed to feed the integration solution. Based on this data the integration solu-
tion creates and send requests of researchers CVs to “Plataforma DeGóis”. The integration
solution aggregates the researchers CVs into a research unit scale XML document, collects
additional information related to the research outcomes referred in researchers CVs (e.g.
a conference paper number of citations) available in “Scopus” platform, and finally, trans-
forms the summarized data into a HTML document that is sent to the CIIC-IPLeiria Joomla
Content Management System (“CMS Application”). All the integration tasks and interac-
tions with the external applications are specified with Guaraná DSL and processes by the
Guaraná integration engine introduced in previous chapters.
The data sources and data structures used in the integration solution are briefly
and graphically presented next. Figure 5 shows a graphical representation of
Researchers.xsd, a XML document schema defining the structure of data about re-
searchers that feeds the unique input port of the entire integration solution.
For the current integration solution only Researcher tags containing Status at-
tribute equal to Efetivo are considered. IdDegois attribute must be previously and
manually filled in the XML document with the corresponding researchers IdDegois, for
the integration solution to look for their CV on “Plataforma DeGóis”. Figure 6 shows the
researchers CV XML schema used by “Plataforma DeGóis”. RESTful web services re-
quest/responses are exchanged between the integration solution and DeGóis platform to
search/deliver a researcher CV identified by the IdDegois attribute.
Similar RESTful requests/responses are exchanged between the integration solution and
the Scopus platform, in order to collect detailed data about researchers production items
such as paper indexing ID, paper number citations, etc. The XML schema adopted by
Scopus platform for research production items is shown in Figure 7.
of data (started with Researchers.xml contents) will be enriched with data provided
from “Plataforma DeGóis” and from “SCOPUS”. The expected output will be a set of
HTML files that are sent to a “Joomla” CMS instance in the form of “Joomla” articles.
The workflow starts at entry port P1, which loads Researchers.xml contents and
then periodically checks for changes on it. Task T1 splits the data obtained from P1 and
each chunk corresponds to a researcher. From now on, each “Researcher” will be handled
as a message.
Task T2 filters out researchers with attribute Status different from Efetivo. Messages
in the solution are then replicated at T3; one copy is used to build a “DeGóis” query, to be
forwarded to “Plataforma DeGóis” by Solicitor Port P2. Solicitor Port P2 will then get a
reply from “Plataforma DeGóis” (“Plataforma DeGóis” was queried by a researcher CV,
query based on researcher’s IdDegois). Still in task T3 messages will be merged, and
the system will keep running with the same amount of messages it had right before starting
task T3.Task T4 changes message schema for the message that reaches task T5 to be able to
hold new information coming from “Scopus”. For example, XML attribute ScopusId is
added to publication items XML elements to hold publications ScopusId retrieved from
“Scopus”. Task T5 retrieves information from “Scopus” to be associated with researchers
CV information, as previously described.
From here onwards, information about researchers does not need to be treated individ-
ually. Task T6 re-unifies messages with information about each researcher into a single
message, for research unit granularity processing. Task T7 replicates this unique message
into five copies, which will be used to produce another five distinct HTML output docu-
ments, containing research unit scale indicators in a per research items type basis (projects,
papers, organized events, awards, advanced training, news).
Tasks T8, T10, T12, T14 and T16 (Slimmer tasks) perform messages cleans-
ing, preserving only information related to each of the specific research indicator to be
calculated/processed. Finally, tasks T9, T11, T13, T15 and T17 perform messages
transformation, more precisely, transformation of XML represented data into HTML doc-
uments, corresponding to the five different category of research item types. The output of
these tasks (five HTML documents) is forwarded through Exit Ports P4-P8 to “CMS Ap-
plication” in the form of an HTML CMS articles type, and immediately made accessible by
the (“Joomla”) CMS instance.
Application Integration, Integration Platform-as-a-Service, Cloud Computing 19
5. Conclusions
Enterprise Application Integration (EAI) is a well-established research field, which pro-
vides methodologies, techniques and tools to design and implement integration solutions.
Companies rely on EAI to reuse the applications that are available within their software
ecosystems to support their business processes. There are currently several open-source
20 F. Rosa-Sequeira, V. Basto-Fernandes, and R.Z. Frantz
Figure 10. Integration Solution generated report integrated with Joomla CMS [20].
integration platforms available for companies to assist the design and implementation of
their integration solutions. The open-source integration platforms community got inspira-
tion from the work of Hohpe and Woolf [18], which means they support the catalogue of
integration patterns documented by these authors and follow the messaging based integra-
tion style. In this chapter Guaraná integration tools were studied and used to design and
implement an innovative integration solution targeted for science and research outcomes
information management. Guaraná was chosen due to its advantages with respect to some
integration solutions quality attributes, with emphasis on platform independence.
Guaraná is divided into a domain-specific language and a set of tools from which stands
out a cloud based editor and runtime system. Whereas Guaraná DSL can be used indepen-
dently from engineering tools to design integration solutions and provide a full support to
the integration patterns documented by Hohpe and Woolf [18], the Guaraná Cloud is an
integrated development environment available on demand in the cloud and with a limited
support to the DSL.
In this chapter, the research outcomes information management at research units, insti-
tutional and national levels were presented, as well as the overall research outcomes man-
agement ecosystems. Information producers, consumers, sources and platforms were ad-
dressed with focus on interoperability problems and information systems integration com-
plexity. Firstly, Guaraná DSL was studied and used to model an integration solution to
the science and research outcomes information management integration problem. Then,
Guaraná Cloud was studied and used to implement the model into an executable integration
solution.
Although Guaraná DSL is a simple and at the same time a rich modeling language,
there is still a gap between the language and the tool support in the cloud to design the
Application Integration, Integration Platform-as-a-Service, Cloud Computing 21
integration solutions. Guaraná Cloud has concentrated efforts to provide an extensible list
of application adapters, which allow to communicate with the integrated applications, but
has devoted less attention to support more building blocks of the DSL, such the different
kinds of tasks. It makes more difficult the implementation of the model when the integration
solution model has to be adapted due to a missing building block in the Guaraná Cloud.
Guaraná Cloud is a recent integration platform and is still under development. Consid-
ering this, it is important to: a) improve the DSL support by supporting new kinds of tasks;
b) performing better testing and correct some bugs to improve reliability on the integrated
development environment; c) the lack of documentation is an important problem, which
must be solved by providing examples, tutorials, reports, and online help; d) access control
based in roles or something similar, avoiding or allowing a user to use another user cre-
dentials and access to solutions; e) copy or move solutions between servers, when a user is
member of more than one group, having access to more than one server; f) keep an histor-
ical, version control or a mechanism that allows download and upload of full or part of an
integration solution; g) allow users to create their own connectors according to their needs.
Regarding the integration solution developed in this chapter, it is important to highlight
that it can be improved by aggregating other data sources/applications from which more
scientific information could be extracted to enrich the web pages generated by the integra-
tion solution. Thus, in the future, the Brazilian WebQualis could provide the information
regarding the ranking of each publication according to the Brazilian system; the ISI Web of
Science could provide the information regarding the Journal Citation Report (JCR) impact
factor for journal publications; other reports could be generated by the integration solution
to enrich the analysis of CIIC-IPLeiria activity available on the web site.
Acknowledgements
Fernandos Rosa Sequeira and Vitor Basto-Fernandes’ work was supported by the Por-
tuguese republic national funds through the FCT - Portuguese Foundation for Science and
Technology, I.P., under the project UID/CEC/4524/2016. Rafael Z. Frantz’s work was sup-
ported by Capes postdoctoral programme, grant 88881.119518/2016-01.
References
[1] Brazilian Lattes Curriculum Platform. http://lattes.cnpq.br, 2017.
[18] G. Hohpe and B. Woolf. Enterprise integration patterns: Designing, building, and
deploying messaging solutions. Addison-Wesley, 2003