0% found this document useful (0 votes)
95 views

Object Oriented Technology

Object-oriented programming (OOP) is a programming model that organizes software around data and objects rather than functions and logic. The key principles of OOP include encapsulation, abstraction, inheritance, and polymorphism. Classes define the blueprint for objects with attributes and methods. Structs are similar to classes but are value types rather than reference types. OOP is well-suited for large, complex programs that require collaboration and reuse through its emphasis on modularity, reusability, and flexibility. Popular languages that support OOP include Java, C++, and Python.

Uploaded by

abhay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views

Object Oriented Technology

Object-oriented programming (OOP) is a programming model that organizes software around data and objects rather than functions and logic. The key principles of OOP include encapsulation, abstraction, inheritance, and polymorphism. Classes define the blueprint for objects with attributes and methods. Structs are similar to classes but are value types rather than reference types. OOP is well-suited for large, complex programs that require collaboration and reuse through its emphasis on modularity, reusability, and flexibility. Popular languages that support OOP include Java, C++, and Python.

Uploaded by

abhay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Name-VISHAL KUMAR ANAND

M.tech ( CSE )

Assigment – 104 Object Oriented Technology

Unit I

(1).What is OOP ?

ANS- Object-oriented programming (OOP) is a computer programming model that


organizes software design around data, or objects, rather than functions and logic. An
object can be defined as a data field that has unique attributes and behavior.

OOP focuses on the objects that developers want to manipulate rather than the logic
required to manipulate them. This approach to programming is well-suited for
programs that are large, complex and actively updated or maintained. This includes
programs for manufacturing and design, as well as mobile applications; for example,
OOP can be used for manufacturing system simulation software.

The organization of an object-oriented program also makes the method beneficial to


collaborative development, where projects are divided into groups. Additional
benefits of OOP include code reusability, scalability and efficiency.

The first step in OOP is to collect all of the objects a programmer wants to manipulate
and identify how they relate to each other -- an exercise known as data modeling.

Examples of an object can range from physical entities, such as a human being who is
described by properties like name and address, to small computer programs, such
as widgets.

Once an object is known, it is labeled with a class of objects that defines the kind of
data it contains and any logic sequences that can manipulate it. Each distinct logic
sequence is known as a method. Objects can communicate with well-defined
interfaces called messages.

The structure, or building blocks, of object-oriented programming include the


following:

 Classes are user-defined data types that act as the blueprint for individual objects,
attributes and methods.

 Objects are instances of a class created with specifically defined data. Objects can
correspond to real-world objects or an abstract entity. When class is defined
initially, the description is the only object that is defined.

 Methods are functions that are defined inside a class that describe the behaviors of
an object. Each method contained in class definitions starts with a reference to an
instance object. Additionally, the subroutines contained in an object are called
instance methods. Programmers use methods for reusability or keeping
functionality encapsulated inside one object at a time.

 Attributes are defined in the class template and represent the state of an object.
Objects will have data stored in the attributes field. Class attributes belong to the
class itself.

This image shows an example of the structure and


naming in OOP.
Object-oriented programming is based on the following principles:

 Encapsulation. This principle states that all important information is contained


inside an object and only select information is exposed. The implementation and
state of each object are privately held inside a defined class. Other objects do not
have access to this class or the authority to make changes. They are only able to
call a list of public functions or methods. This characteristic of data hiding
provides greater program security and avoids unintended data corruption.

 Abstraction. Objects only reveal internal mechanisms that are relevant for the use
of other objects, hiding any unnecessary implementation code. The derived class
can have its functionality extended. This concept can help developers more easily
make additional changes or additions over time.

 Inheritance. Classes can reuse code from other classes. Relationships and
subclasses between objects can be assigned, enabling developers to reuse common
logic while still maintaining a unique hierarchy. This property of OOP forces a
more thorough data analysis, reduces development time and ensures a higher level
of accuracy.

 Polymorphism. Objects are designed to share behaviors and they can take on
more than one form. The program will determine which meaning or usage is
necessary for each execution of that object from a parent class, reducing the need
to duplicate code. A child class is then created, which extends the functionality of
the parent class. Polymorphism allows different types of objects to pass through
the same interface.

While Simula is credited as being the first object-oriented programming language,


many other programming languages are used with OOP today. But some
programming languages pair with OOP better than others. For example, programming
languages considered pure OOP languages treat everything as objects. Other
programming languages are designed primarily for OOP, but with some procedural
processes included.

For example, popular pure OOP languages include:


 Ruby

 Scala

 JADE

 Emerald

Programming languages designed primarily for OOP include:

 Java

 Python

 C++

Other programming languages that pair with OOP include:

 Visual Basic .NET

 PHP

 JavaScript
the benefits of OOP
Benefits of OOP include:

 Modularity. Encapsulation enables objects to be self-contained, making


troubleshooting and collaborative development easier.

 Reusability. Code can be reused through inheritance, meaning a team does not
have to write the same code multiple times.

 Productivity. Programmers can construct new programs quicker through the use
of multiple libraries and reusable code.

 Easily upgradable and scalable. Programmers can implement system


functionalities independently.
 Interface descriptions. Descriptions of external systems are simple, due to
message passing techniques that are used for objects communication.

 Security. Using encapsulation and abstraction, complex code is hidden, software


maintenance is easier and internet protocols are protected.

 Flexibility. Polymorphism enables a single function to adapt to the class it is


placed in. Different objects can also pass through the same interface.
Criticism of OOP
The object-oriented programming model has been criticized by developers for
multiple reasons. The largest concern is that OOP overemphasizes the data
component of software development and does not focus enough on
computation or algorithms. Additionally, OOP code may be more complicated
to write and take longer to compile.

Alternative methods to OOP include:

 Functional programming. This includes languages such as Erlang and


Scala, which are used for telecommunications and fault tolerant systems.

 Structured or modular programming. This includes languages such as


PHP and C#.

 Imperative programming. This alternative to OOP focuses on function


rather than models and includes C++ and Java.

 Declarative programming. This programming method involves


statements on what the task or desired outcome is but not how to achieve
it. Languages include Prolog and Lisp.

 Logical programming. This method, which is based mostly in formal logic


and uses languages such as Prolog, contains a set of sentences that
express facts or rules about a problem domain. It focuses on tasks that can
benefit from rule-based logical queries.
Most advanced programming languages enable developers to combine
models, because they can be used for different programming methods. For
example, JavaScript can be used for OOP and functional programming.

Developers who are working with OOP and microservices can address
common microservices issues by applying the principles of OOP.

(2)Class vs struct
Ans- Structs are light versions of classes. Structs are value types and can be used to
create objects that behave like built-in types.

Structs share many features with classes but with the following limitations as compared
to classes.

 Struct cannot have a default constructor (a constructor without parameters) or a


destructor.
 Structs are value types and are copied on assignment.
 Structs are value types while classes are reference types.
 Structs can be instantiated without using a new operator.
 A struct cannot inherit from another struct or class, and it cannot be the base of a
class. All structs inherit directly from System.ValueType, which inherits from
System.Object.
 Struct cannot be a base class. So, Struct types cannot abstract and are always
implicitly sealed.
 Abstract and sealed modifiers are not allowed and struct member cannot be
protected or protected internals.
 Function members in a struct cannot be abstract or virtual, and the override
modifier is allowed only to the override methods inherited from
System.ValueType.
 Struct does not allow the instance field declarations to include variable
initializers. But, static fields of a struct are allowed to include variable initializers.
 A struct can implement interfaces.
 A struct can be used as a nullable type and can be assigned a null value.

When to use struct or classes?

To answer this question, we should have a good understanding of the differences.


S.N Struct Classes

Structs are value types, allocated either on the stack or Classes are reference types, allocated on
1
inline in containing types. the heap and garbage-collected.

Allocations and de-allocations of value types are in Assignments of large reference types are
2 general cheaper than allocations and de-allocations of cheaper than assignments of large value
reference types. types.

In structs, each variable contains its own copy of the In classes, two variables can contain the
data (except in the case of the ref and out parameter reference of the same object and any
3
variables), and an operation on one variable does not operation on one variable can affect
affect another variable. another variable.

In this way, struct should be used only when you are sure that,

 It logically represents a single value, like primitive types (int, double, etc.).
 It is immutable.
 It should not be boxed and un-boxed frequently.

In all other cases, you should define your types as classes.

e.g. Struct

1. struct Location
2. {
3. publicint x, y;
4. publicLocation(int x, int y)
5. {
6. this.x = x;
7. this.y = y;
8. }
9. }
10. Locationa = new Location(20, 20);
11. Locationb = a;
12. a.x = 100;
13. System.Console.WriteLine(b.x)
14. ;

The output will be 20. Value of "b" is a copy of "a", so "b" is unaffected by change of
"a.x". But in class, the output will be 100 because "a" and "b" will reference the same
object.
I believe this blog has clarified most of the doubts about struct. If you find further queries
about struct, please share with me so that everyone can have a clear understanding of
this topic.

(3)Define data abstraction


Ans-

Data abstraction is a principle of data modeling theory that emphasizes the clear
separation between the external interface of objects and internal data handling and
manipulation. In many programming languages, interfaces (or abstract classes) provide
abstraction, and their concrete implementations form the implementation.

This abstraction allows a much simpler API for the system models, mostly objects of
different classes while having a complex internal structure. This separation enables APIs
to remain essentially unchanged while the implementation of the API can improve over
time. The net result is that the ecosystem built around the APIs does not break with
every iterative refinement of the implementation.

Interfaces are a structure of object-oriented programming language that defines the


external behavior of data structures without defining how the data is stored. Each
interface outlines the basic operations allowed on the data type. It becomes the
responsibility of any data type that implements the interface to populate the
functionality.

For example, the map type is an interface that offers a lookup functionality. This interface
can be implemented by a binary search tree or by a list of two-element tuples. Either
implementation can offer the lookup method and act the same for any user of
the map data type.

(4)What is encapsulation

Ans- Encapsulation in OOP Meaning: In object-oriented computer programming languages,


the notion of encapsulation (or OOP Encapsulation) refers to the bundling of data, along with the
methods that operate on that data, into a single unit. Many programming languages
use encapsulation frequently in the form of classes. A class is a program-code-template that
allows developers to create an object that has both variables (data) and behaviors (functions or
methods). A class is an example of encapsulation in computer science in that it consists of data
and methods that have been bundled into a single unit.
Encapsulation may also refer to a mechanism of restricting the direct access to some components
of an object, such that users cannot access state values for all of the variables of a particular
object. Encapsulation can be used to hide both data members and data functions or methods
associated with an instantiated class or object.

(5)Define Polymorphism ?

ANS- Object-Oriented Programming has different concepts allowing


developers to build logical code. One of these concepts is polymorphism.
But what is polymorphism? Let’s discuss.
Polymorphism is one of the core concepts of object-oriented
programming (OOP) and describes situations in which something occurs
in several different forms. In computer science, it describes the concept
that you can access objects of different types through the same interface.
Each type can provide its own independent implementation of this
interface.
To know whether an object is polymorphic, you can perform a simple
test. If the object successfully passes multiple is-a or instanceof tests,
it’s polymorphic. As described in our post about inheritance, all Java
classes extend the class Object. Due to this, all objects in Java are
polymorphic because they pass at least two instanceof checks.

Different types of polymorphism


Java supports 2 types of polymorphism:

 static or compile-time
 dynamic

Static polymorphism
Java, like many other OOP languages, allows you to implement multiple
methods within the same class that use the same name. But, Java uses a
different set of parameters called method overloading and represents a
static form of polymorphism.
The parameter sets have to differ in at least one of the following three
criteria:

 They need to have a different number of parameters, one


method accepting 2 and another one accepting 3 parameters
 The types of the parameters need to be different, one method
accepting a String and another one accepting a Long
 They need to expect the parameters in a different order. For
example, one method accepts a String and a Long and another
one accepts a Long and a String. This kind of overloading is
not recommended because it makes the API difficult to
understand

In most cases, each of these overloaded methods provides a different but


very similar functionality.
Due to the different sets of parameters, each method has a
different signature. That signature allows the compiler to identify which
method to call and binds it to the method call. This approach is called
static binding or static polymorphism.
Let’s take a look at an example.
A simple example for static polymorphism
Let’s use the same CoffeeMachine project as we used in the previous
posts of this series. You can clone it
at https://github.com/thjanssen/Stackify-OopInheritance.
The BasicCoffeeMachine class implements two methods with the
name brewCoffee. The first one accepts one parameter of
type CoffeeSelection. The other method accepts two parameters,
a CoffeeSelection, and an int.
public class BasicCoffeeMachine {

// ...

public Coffee brewCoffee(CoffeeSelection selection) throws CoffeeException {


switch (selection) {

case FILTER_COFFEE:

return brewFilterCoffee();

default:

throw new CoffeeException(

"CoffeeSelection ["+selection+"] not supported!");

public List brewCoffee(CoffeeSelection selection, int number) throws CoffeeException {

List coffees = new ArrayList(number);

for (int i=0; i<number; i++) {

coffees.add(brewCoffee(selection));

return coffees;

// ...

When you call one of these methods, the provided set of parameters
identifies the method which has to be called.
In the following code snippet, we’ll call the method only with
a CoffeeSelection object. At compile time, the Java compiler binds this
method call to the brewCoffee(CoffeeSelection selection) method.
BasicCoffeeMachine coffeeMachine = createCoffeeMachine();

coffeeMachine.brewCoffee(CoffeeSelection.FILTER_COFFEE);
If we change this code and call the brewCoffee method with
a CoffeeSelection object and an int, the compiler binds the method call to
the other brewCoffee(CoffeeSelection selection, int number) method.
BasicCoffeeMachine coffeeMachine = createCoffeeMachine();

List coffees = coffeeMachine.brewCoffee(CoffeeSelection.ESPRESSO, 2);

Dynamic polymorphism
This form of polymorphism doesn’t allow the compiler to determine the
executed method. The JVM needs to do that at runtime.
Within an inheritance hierarchy, a subclass can override a method of its
superclass, enabling the developer of the subclass to customize or
completely replace the behavior of that method.
Doing so also creates a form of polymorphism. Both methods
implemented by the super- and subclasses share the same name and
parameters. However, they provide different functionality.
Let’s take a look at another example from the CoffeeMachine project.
Method overriding in an inheritance hierarchy
The BasicCoffeeMachine class is the superclass of
the PremiumCoffeeMachine class.
Both classes provide an implementation of
the brewCoffee(CoffeeSelection selection) method.
import java.util.ArrayList;

import java.util.List;

import java.util.Map;

public class BasicCoffeeMachine extends AbstractCoffeeMachine {

protected Map beans;

protected Grinder grinder;

protected BrewingUnit brewingUnit;

public BasicCoffeeMachine(Map beans) {

super();

this.beans = beans;

this.grinder = new Grinder();

this.brewingUnit = new BrewingUnit();

this.configMap.put(CoffeeSelection.FILTER_COFFEE, new Configuration(30, 480));

public List brewCoffee(CoffeeSelection selection, int number) throws CoffeeException {

List coffees = new ArrayList(number);

for (int i=0; i<number; i++) {

coffees.add(brewCoffee(selection));

}
return coffees;

public Coffee brewCoffee(CoffeeSelection selection) throws CoffeeException {

switch (selection) {

case FILTER_COFFEE:

return brewFilterCoffee();

default:

throw new CoffeeException("CoffeeSelection ["+selection+"] not supported!");

private Coffee brewFilterCoffee() {

Configuration config = configMap.get(CoffeeSelection.FILTER_COFFEE);

// grind the coffee beans

GroundCoffee groundCoffee =
this.grinder.grind(this.beans.get(CoffeeSelection.FILTER_COFFEE), config.getQuantityCoffee());

// brew a filter coffee

return this.brewingUnit.brew(CoffeeSelection.FILTER_COFFEE, groundCoffee,


config.getQuantityWater());

public void addBeans(CoffeeSelection selection, CoffeeBean newBeans) throws


CoffeeException {
CoffeeBean existingBeans = this.beans.get(selection);

if (existingBeans != null) {

if (existingBeans.getName().equals(newBeans.getName())) {

existingBeans.setQuantity(existingBeans.getQuantity() +
newBeans.getQuantity());

} else {

throw new CoffeeException("Only one kind of beans supported for each


CoffeeSelection.");

} else {

this.beans.put(selection, newBeans);

import java.util.Map;

public class PremiumCoffeeMachine extends BasicCoffeeMachine {

public PremiumCoffeeMachine(Map beans) {

// call constructor in superclass

super(beans);

// add configuration to brew espresso

this.configMap.put(CoffeeSelection.ESPRESSO, new Configuration(8, 28));

private Coffee brewEspresso() {


Configuration config = configMap.get(CoffeeSelection.ESPRESSO);

// grind the coffee beans

GroundCoffee groundCoffee =
this.grinder.grind(this.beans.get(CoffeeSelection.ESPRESSO), config.getQuantityCoffee());

// brew an espresso

return this.brewingUnit.brew(

CoffeeSelection.ESPRESSO, groundCoffee, config.getQuantityWater());

public Coffee brewCoffee(CoffeeSelection selection) throws CoffeeException {

if (selection == CoffeeSelection.ESPRESSO)

return brewEspresso();

else

return super.brewCoffee(selection);

If you read our post about the OOP concept inheritance, you already
know the two implementations of the brewCoffee method.
The BasicCoffeeMachine only supports
the CoffeeSelection.FILTER_COFFEE. The brewCoffee method of
the PremiumCoffeeMachine class adds support
for CoffeeSelection.ESPRESSO.
If the action gets called with any other CoffeeSelection, it uses the
keyword super to delegate the call to the superclass.
Late binding
Sometimes, you want to use an inheritance hierarchy in your project. To
do this, you must answer the question, which method will the JVM call?
The answer manifests during runtime because it depends on the object
on which the method gets called. The type of the reference, which you
can see in your code, is irrelevant. You need to distinguish three general
scenarios:

1. Your object is of the type of the superclass and gets


referenced as the superclass. So, in the example of this post,
a BasicCoffeeMachine object gets referenced as
a BasicCoffeeMachine
2. Your object is of the type of the subclass and gets referenced
as the subclass. In the example of this post,
a PremiumCoffeeMachine object gets referenced as
a PremiumCoffeeMachine
3. Your object is of the type of the subclass and gets referenced
as the superclass. In the CoffeeMachine example,
a PremiumCoffeeMachine object gets referenced as
a BasicCoffeeMachine

Let’s delve a bit further …


Superclass referenced as the superclass
The first scenario is pretty simple. When you instantiate
a BasicCoffeeMachine object and store it in a variable of
type BasicCoffeeMachine, the JVM will call the brewCoffee method on
the BasicCoffeeMachine class. So, you can only brew
a CoffeeSelection.FILTER_COFFEE.
// create a Map of available coffee beans

Map beans = new HashMap();

beans.put(CoffeeSelection.FILTER_COFFEE,
new CoffeeBean("My favorite filter coffee bean", 1000));

// instantiate a new CoffeeMachine object

BasicCoffeeMachine coffeeMachine = new BasicCoffeeMachine(beans);

Coffee coffee = coffeeMachine.brewCoffee(CoffeeSelection.FILTER_COFFEE);

Subclass referenced as the subclass


The second scenario is similar. But this time, we instantiate
a PremiumCoffeeMachine and reference it as a PremiumCoffeeMachine.
In this case, the JVM calls the brewCoffee method of
the PremiumCoffeeMachine class, which adds support
for CoffeeSelection.ESPRESSO.
// create a Map of available coffee beans

Map beans = new HashMap();

beans.put(CoffeeSelection.FILTER_COFFEE,

new CoffeeBean("My favorite filter coffee bean", 1000));

beans.put(CoffeeSelection.ESPRESSO,

new CoffeeBean("My favorite espresso bean", 1000));

// instantiate a new CoffeeMachine object

PremiumCoffeeMachine coffeeMachine = new PremiumCoffeeMachine(beans);


Coffee coffee = coffeeMachine.brewCoffee(CoffeeSelection.ESPRESSO);

Subclass referenced as the superclass


This is the most interesting scenario and the main reason why we
explain dynamic polymorphism in such detail.
When you instantiate a PremiumCoffeeMachine object and assign it to
the BasicCoffeeMachine coffeeMachine variable, it still is
a PremiumCoffeeMachine object. It just looks like a
BasicCoffeeMachine.
The compiler doesn’t see that in the code, and you can only use the
methods provided by the BasicCoffeeMachine class. But if you call
the brewCoffee method on the coffeeMachine variable, the JVM knows
that it is an object of type PremiumCoffeeMachine and executes the
overridden method. This is called late binding.
// create a Map of available coffee beans

Map beans = new HashMap();

beans.put(CoffeeSelection.FILTER_COFFEE,

new CoffeeBean("My favorite filter coffee bean", 1000));

// instantiate a new CoffeeMachine object

BasicCoffeeMachine coffeeMachine = new PremiumCoffeeMachine(beans);

Coffee coffee = coffeeMachine.brewCoffee(CoffeeSelection.ESPRESSO);

Summary
Polymorphism is one of the core concepts in OOP languages and
describes the concept wherein you can use different classes with the
same interface. Each of these classes can provide its own
implementation of the interface.
Java supports two kinds of polymorphism. You can overload a method
with different sets of parameters. This is called static polymorphism
because the compiler statically binds the method call to a specific
method.
Within an inheritance hierarchy, a subclass can override a method of its
superclass. If you instantiate the subclass, the JVM will always call the
overridden method, even if you cast the subclass to its superclass. That
is called dynamic polymorphism.

UNIT II
1.Methodological issues ?
Ans- Bone research is a dynamic area of scientific investigation that usually
encompasses multidisciplines. Virtually all basic cellular research, clinical research
and epidemiologic research rely on statistical concepts and methodology for
inference. This paper discusses common issues and suggested solutions concerning
the application of statistical thinking in bone research, particularly in clinical and
epidemiological investigations. The issues are sample size estimation, biases and
confounders, analysis of longitudinal data, categorization of continuous data, selection
of significant variables, over-fitting, P-values, false positive finding, confidence
interval, and Bayesian inference. It is hoped that by adopting the suggested measures
the scientific quality of bone research can improve.

 Previous article in issue


 Next article in issue

Keywords
Statistical methods

P-value
Confidence interval

Bayesian inference

Collider bias

1. Introduction
Bone research commonly involves multifaceted studies. These studies may range
from basic cellular experiments, clinical trials to epidemiological investigations. Most
of these studies come down to 3 broad aims: assessing difference (ie, effect),
association, and prediction. Do cells with one version of a gene express more of an
enzyme than cells with another version? Does a new drug reduce the risk of fracture
compared with placebo? Among hundreds of risk factors in a cohort study, which
factors are associated with fracture? Can a new prediction model based on Caucasian
populations be used for fracture risk assessment in Asian populations? The answer to
these questions invariably involves statistical thinking.

Indeed, every stage of a research project – from study design, data collection, data
analysis, to data reporting – involves statistical consideration. Statistical models and
null hypothesis significance testing are powerful methods to discover laws and trends
underlying observational data, and to help make accurate inference. Test of hypothesis
can also help researchers to make decision of accepting or rejecting a null hypothesis,
contributing to the scientific progress. Thus, reviewers and readers alike expect
researchers to apply appropriate statistical models to obtain useful information from
the data for creating new knowledge.

However, misuse of statistical methods has been common in biomedical research [1],
and the problem is still persistent [2,3]. In the 1960s, a review of 149 studies from
popular medical journals revealed that less than 30% of studies were
methodologically ‘acceptable’ [4]. About 2 decades later, a review of 196 clinical
trials on rheumatoid arthritis found that 76% of the conclusions or abstracts contained
‘doubtful or invalid statements’ [5]. In a recent systematic review of published studies
in orthopedic journals, 17% of studies where conclusions were not consistent with
results presented, and 39% of studies where a different analytical method should have
been applied [6]. While the majority of statistical errors were minor, about 17% errors
could compromise the study conclusion [6]. Apart from errors, there are deep
concerns about the abuse of statistical methods that lead to misinterpretation of data
and retraction of published studies. The bone research community has recently come
to terms with a high profile retraction of papers by a bone researcher [7]. The misuse
of statistical methods and misinterpretation of statistical analysis partly contribute to
the problem of irreproducibility of research findings [8,9].
The recognition of the lack of reproducibility in biomedical research [[10], [11], [12]]
has led to several discussions on how to improve the quality of bone research
publications [[13], [14], [15]]. As an editor and expert reviewer for several bone and
medical journals over the past 25 years, I have identified major areas that need
improvement, namely, reporting of study design, data analysis, and interpretation of
P-values. In this article, I focus on the most common issues that appear repeatedly in
the bone research literature, and then suggest possible solutions. My aim is to help
bone research colleagues in providing relevant ideas and methods that are required to
improve the reproducibility and accurate inference of their work.

1.1. Sample size

The founder of modern statistics, Karl Pearson, once said that "the utility of all
science consists alone in its method, not its material" [16]. Although the same method
can be used in different studies, it is the details of methodological activities that define
the quality of the work. The description of details and activities of study design can be
found in several guidelines such as CONSORT [17] for clinical trials, STROBE [18]
for observational studies, and ARRIVE [19] for animal studies.
One important point of these guidelines is the description of sample size estimation.
As a norm, studies with inadequate sample size have low sensitivity (eg, power) to
uncover a true association. It is not widely appreciated that underpowered studies
often produce statistically significant and exaggerated findings, but the findings have
low probability of reproducibility [20].
Therefore, a clear explanation of sample size estimation and rationale, including
primary outcome, expected effect size, type I and type II error, greatly help readers to
assess the reliability of study findings [21]. Unfortunately, many bone science authors
do not report how they arrived at the sample size. Moreover, most laboratory studies
are based on a small number of animals, but there is no quantitative justification of the
sample size [22]. As a result, it is very difficult to interpret a study’s observed effect
size in the absence of a hypothesized effect size that underlined the estimation of
sample size.

1.2. Biases and confounders

In uncontrolled and non-randomized studies, the association between exposure and


outcome can be misled by biases and confounders. The list of biases and confounders
are extensive [23], and these biases are almost always present in uncontrolled studies.
Among the list of biases, selection bias is a major threat. Selection bias can arise in
studies where participants were drawn from a sample that is very different from the
general population, and as a result, it may distort the true association between
exposure and outcome. The diagram below (Fig. 1) shows a hypothetical association
between an exposure and an outcome in a population with a correlation coefficient
being r = −0.29 (P < 0.0001; left panel); however, if a subset of the population was
selected for analysis (right panel) then the association is no longer statistically
significant (r = −0.05; P = 0.72). Thus, studies in subgroup of patients or non-
representative samples have a high risk of reaching a wrong conclusion.
1. Download : Download high-res image (361KB)
2. Download : Download full-size image
Fig. 1. Illustration of selection bias. There was a significant association between exposure
and outcome in the population (left panel), but if a subset of individuals in red box were
selected from the population, the association can be statistically non-significant (right
panel).
Confounding is a common threat to the validity of conclusions from observational
studies. A confounder is defined as a variable that causes or influences both the
exposure and outcome (Fig. 2, left panel). For instance, an association between low
levels of physical activity and bone mineral density could be confounded by
advancing age (i.e., a confounder). In osteoporosis research, confounding variables
such as age, gender, comorbidities, and frailty could account for the observed
association between bisphosphonates and mortality in observational studies [24].

1. Download : Download high-res image (190KB)


2. Download : Download full-size image
Fig. 2. Illustration of confounding variable and collider variable. A confounder is a
variable that causes both exposure and outcome variables. A collider is a variable that is
caused by both exposure and outcome variables. Regression model can be used to adjust
for the effect of confounder, but it should not be used to adjust for the effect of collider.
Collider bias [25] is another threat to the validity of observational studies. A variable
is considered a ‘collider’ if it is caused by both the exposure and the outcome. It
should be noted that collider is different from confounder, which is defined as a
variable that is the cause of both exposure and outcome (Fig. 2, right panel). For
example, both fracture (outcome) and respiratory failure (exposure) can cause patients
to be hospitalized, and in this case, hospitalization is the potential collider. The effect
of collider bias is nicely illustrated by the spurious association between single
nucleotide polymorphisms (SNP) and sex [26]. In this analysis, none of the 694 SNPs
for height, as expected, was associated with sex (ie, the outcome) in a bivariate
analysis; however, when height (ie, the collider) was added to the model, 222 SNPs
were significantly associated to sex [26]. This example highlights that in association
analysis, adjusting for factor that is causally related to the outcome can yield
biologically meaningless but statistically significant association.
Regression-based adjustment is a powerful method to adjust for the effect of
confounding variables, and help the inference to be more accurate. However,
regression adjustment for a collider can yield a spurious association between exposure
and outcome [26]. Some researchers have the tendency to adjust for all variables
available with the intention to obtain the most unbiased association. For instance,
some authors used weight, height, body mass index, and age in a regression model.
Such an agnostic approach of adjustment may be counterproductive, because it runs
the risk of over-adjustment and over-fitting, not to mention the problem of
multicollinearity (ie, correlation among predictor variables). Not all associations
require regression adjustment, and appropriate adjustment requires a careful
consideration based on substantive knowledge. For instance, adjustment is not
necessary for a covariate that does not induce the causal relationship between
exposure and outcome [27].

1.3. Longitudinal data

In prospective cohort studies, individuals are repeatedly measured over time, enable
the examination of individual evolution of outcome. The analysis of data from this
type of study design is challenging, because (i) measurements within an individual are
correlated, (ii) the duration between visits is different between individuals, and (iii)
there are missing data. Some authors applied the analysis of variance to analyze such
a longitudinal dataset, but this method cannot handle the difference in follow-up
duration and missing data. If the within-subject correlation is not properly accounted
for, it can lead to false positive findings and wrong confidence intervals [28].
Researchers are suggested to consider more modern methods such as generalized
estimating equations [29] and the linear mixed effects model [30]. A major strength of
these modern methods is that they can handle missing data while still accounting for
variability within and between individuals.
Another common problem associated with longitudinal data analysis is the
determination of rate of change for an individual. For studies that measure bone
mineral density (BMD) before (denoted by x0) and after (x1) intervention, most
researchers would calculate the percentage change as the difference between 2
measurements over the baseline measurement, ie, (x1−x0)/x0×100, and then use the
percentage change as a dependent variable for further analyses. Although this measure
seems straightforward, it is not symmetric [31] and can result in misleading results
[32]. A better and symmetric quantification of change should use the mean of 2
measurements as the denominator, ie, (x1−x0)/mean(x0,x1)×100. For testing hypothesis
concerning difference between treatments in before-after studies that involves a
continuous outcome variable, the analysis of covariance is considered a standard
method [33].

1.4. Categorization of continuous variable

It is not uncommon to read bone research papers where the authors categorize
continuous variables such as bone mineral density (BMD) into 2 distinct groups (eg,
"osteporosis" and "non-osteoporosis"), or 3 groups (eg, osteoporosis, osteopenia, and
normal), and then use the categorized variable as an exposure or an outcome for
further analyses. While the World Health Organization’s recommended BMD
classification [34] is appropriate for clinical/diagnostic purposes, it is a bad practice
for scientific purpose.
It has been repeatedly shown that such a categorization is unnecessary and can distort
an association [35]. Apart from the risk of misclassification, the obvious problem with
categorization of continuous variables is the loss of information. In the case of
dichotomization, for example, all individuals above or below the cut-point is treated
equaly, yet their prognosis could be vastly different. Therefore, the loss of information
is increased (ie, more severe) when the number of categories is reduced.
Categorization also reduces the efficiency of adjustment for confounders. In linear
models, a categorized risk factor removes only 67% of the confounder compared to
when the continuous type of the variable is used [36].
For scientific purposes, it is recommended that investigators do not categorize
continuous variables in an analysis of association. Some continuous variables may
exhibit a non-normal distribution, and in this case, it is instructive to consider more
appropriate analyses such as spline regression or non-parametric smoother, and not to
categorize continuous data.

1.5. Selection of ‘significant’ variables

In many studies, the aim is to identify a set of predictor variables that are
independently associated with a continuous outcome (in multiple linear regression) or
a binary outcome (in multiple logistic regression). In the presence of hundreds or
thousands of variables of interest, the number of possible sets of variables (or models)
can be very large. For instance, a study with 30 variables can generate at least
2∧30 = 1,073,741,824 possible models, and determining which models are associated
with an outcome is quite a challenge.

Many researchers have traditionally used stepwise regression to select the ‘best
model’. While stepwise regression is a popular method for selecting a relevant set of
variables, it has serious deficiencies [37]. It is not widely appreciated that stepwise
regression does not necessarily come up with the best model if there are redundant
predictors. Consequently, variables that are truly associated with the outcome may not
be identified by stepwise regression, because they do not reach statistical significance,
while non-associated variables may be identified to be significant [38]. As a result, the
model identified by stepwise regression is poorly reproducible.
For selection of relevant predictors, investigators are strongly suggested to consider
more robust methods such as Bayesian model averaging [39,40] or LASSO [41]
which has been shown to perform better than the stepwise regression. Still, the models
identified by these methods are only suggestive in nature. Statistical algorithms do not
have substantive knowledge about a clinical or biological problem that we researchers
have. Therefore, the best models must be guided by substantive knowledge, not just
by statistical method-driven model selection.

1.6. Over-fitting
Multivariable statistical model always runs the risk of being over-fitted, in the sense
that the model is unnecessarily complex. When over-fitting happens, the model is not
valid because it tries to explain the random part of the model rather than the
association between variables. As a result, an over-fitting model may fit the data very
well for a dataset at hand, but it fits poorly for a new and independent dataset.

Over-fitting often happens when the number of parameters in the model is greater
than the number of events. There is a rule of thumb that each predictor in a
multivariable model requires at least 10 events [42], but recent research has shown
that this rule of thumb is simplistic. Theoretical studies show that the number of
events in a multivariable prediction model is determined by (i) the incidence of
disease, (ii) the number of risk factors, (iii) the proportion of variance explained, and
(iv) shrinkage factor [43].
Modern methods such as LASSO [41] or ridge regression [44] can help reduce over-
fitting. In particular, LASSO is a method that shrinks the model coefficients toward 0
by imposing a constraint on the sum of the parameter estimates. This imposition can
help eliminate non-important predictors in the model, and hence reduce the over-
fitting.

1.7. P-values

Much of scientific inference boils down to the interpretation of P-value. Since its
inception in the 1920s, P-value has been ubiquitous in the scientific literature, such
that it is sometimes considered a "passport for publication". Readers of biomedical
research literature may have noticed that the interpretation of P-value in most papers
was largely dichotomized into "significant" vs "non-significant", with P = 0.05 being
the commonest threshold for declaring a discovery. In some not-so-extreme cases,
researchers reach a conclusion of effect based on a finding with P = 0.04, but readily
dismiss a result with P = 0.06 as a null effect. However, it is not widely appreciated
that that P-values vary greatly between samples [45], such that a deletion or addition
of a single observation can change the statistical significance of a finding. Therefore,
the simple classification of finding into "significant" and "non-significant" based on
the threshold of 0.05 is not encouraged. The conclusion of an effect should be based
on full evidence, not limited to the levels of statistical significance alone.
P-value is a result of null hypothesis significant testing (NHST). However, few
practicing scientists realize that NHST is the hybridization of 2 approaches: test of
significance and test of hypothesis. This hybridization has generated a lot of confusion
and misinterpretation of P-values. It is thus instructive to have a brief review of the
thinking underlying the NHST approach.

In the paradigm of significance testing, a null hypothesis is proposed, then a test


statistic (eg, t-test, chi-squared test) is computed from the observed data. An index,
called P-value, representing the deviation between the test statistic and the null
hypothesis is derived, with lower values being a signal of the degree of implausibility
of the null hypothesis. The proponent of this significance testing approach, Sir Ronald
Fisher, suggested that a finding with P-value of 0.05 or lower is considered
statistically significant. In his own words: "The value for which P = 0.05, or 1 in 20, is
1.96 or nearly 2; it is convenient to take this point as a limit in judging whether a
deviation ought to be considered significant or not" [46] Fisher suggests that
researchers should report exact P-values (eg, P = 0.031, not P < 0.04).
In the paradigm of hypothesis testing, a null hypothesis and an alternative hypothesis
are proposed to assess 2 mutually exclusive theories about the population of interest.
Two long-term rates of erroneous decisions are then defined prior to conducting data
collection: (i) the probability of a false positive finding that will be made when the
null hypothesis is true (also referred to as type I error or α); and (ii) the probability of
a false negative finding that will be made when the null hypothesis is false (ie, type II
error or β). Traditionally, researchers set α=5% and β = 20% in most studies. After the
data have been collected and distilled into a test statistic, the test result is then
compared with a theoretical cut-off value associated with type I error. If the test result
is smaller than the cut-off value, then the null hypothesis is accepted; otherwise the
null hypothesis is rejected. The hypothesis testing approach, developed by Jerzy
Neyman and Egon Pearson in the 1930s, was designed so that “in the long run of
experience, we shall not be too often wrong” [47]
NHST is the marriage between Fisher’s significance testing and Neyman-Pearson’s
hypothesis testing approaches [48]. In NHST, P-value is compared with type I error
rate α to reject (when P ≤ α) or accept (when P > α) the null hypothesis. As can be
seen, this is actually a mis-marriage of 2 different approaches, because the P-value
from significance testing is a local measure of evidence for a specific study, but the
type I error and type II error from hypothesis testing are global measures from
independent studies taken as a totality.
This mis-marriage has generated to a lot of misconceptions of P-values [49,50]. Most
researchers interpret P-value as the probability of null hypothesis (eg, no effect, no
association), and consequently 1 minus P-value is implicitly viewed as the probability
that the alternative hypothesis (eg, presence of effect, association) is true; however,
such an unconditional interpretation is wrong. Actually, P-value is the probability of
obtaining results as extreme as the observed results when the null hypothesis is true –
it is a conditional probability. Thus, if an effect size with P = 0.06, it means that when
the null hypothesis is true, a value of the effect size as or more extreme than what was
observed occurs in 6% of all samples; it does not mean that the null hypothesis is true
in 6% of all samples. In other words, the effect size observed, or smaller, occurs in 1 –
P = 94% of all samples under the assumption that the null hypothesis of no effect is
true.
Because the P-value threshold of 0.05 is traditionally considered ‘statistically
significant’, and statistical significance is associated with a greater chance of
publication, some researchers have involved in questionable research practices such as
"P-hacking" [51]. P-hacking is a practice of data manipulation in conscious or
subconscious way that produces a desired P-value. These include multiple subgroup
analyses of an outcome, categorization of continuous data, data transformation, and
selection of statistical tests. By manipulating data in such ways, an absolutely negative
data can produce a statistically significant result in 61% of the time [51].

1.8. Multiple testing, large sample size, and false discovery rate

In recent years, national registries have provided researchers with opportunities to test
hundreds or thousands of hypotheses, with many more tests being unreported. As a
norm, the more one searches, the more one discovers unexpected and false findings. It
can be shown that the probability of false positive findings is an exponential function
of the number of hypothesis tests. For instance, at the alpha level of 5%, a study
testing for association between 50 risk factors and an outcome, there is a 92%
probability that the study will find at least one ‘significant’ association, even if there
is no association between any of the risk factors and the outcome. In genomic
research, the P-value threshold of 5 × 10−8 has become a standard for common-variant
genome wide association studies, but there is no such threshold for registry-based
research. Researchers using registry based data are suggested to take measures (such
as Bonferroni’s procedure or Tukey’s test) to adjust P values from multiple testing so
that the nominal P-value is less than 0.05, and to report the false discovery rate [52].
Studies with very large sample size pose serious challenges in the inference of
association. For a given effect size, P-value is a reflection of sample size, in the sense
that studies with very large sample size almost always reject the null hypothesis. In
the 1950s, Lindley showed that a statistically significant finding from a study with
very large sample size may represent strong evidence for the null effect, and this is
later known as "Lindley’s Paradox" [53]. For example, an observed proportion of
49.9% is consistent with the null hypothesis of 50.0% (P = 0.95) when the sample size
is 1000 individuals; however, when the sample size is 1,000,000, P = 0.045 which is
against the null hypothesis at the a level of 0.05. In other words, studies with very
large sample size are very likely to find small P-values, but their evidence against the
null hypothesis is very weak.
The implication is that the level of 5% may not be applicable to large sample size
studies. Researchers need to adjust the observed P-value in large sample size studies.
Good proposed a simple adjustment called Q or standardized P-value [54]: Q=Pn/100 ,
where P is the actual P-value, n is the sample size. Thus, when n = 100, the
standardized P-value Q is the same as the observed P-value. Good suggested
that Q > 1 can be interpreted as support for the null hypothesis. Thus
for n = 1,000,000 and P = 0.045, Q = 4.5, which is an evidence for the null hypothesis.
Another solution is to set an ‘optimal’ α level based on a hypothesized effect size and
cost of errors [55].
Many researchers mistaken the P-value as a false discovery rate. According to this
view, a finding with P = 0.05 is equivalent to a false discovery rate of 5%. However,
such an interpretation is also wrong. It can actually be shown that in the agnostic
scenario a finding of P = 0.05 is equivalent to a false discovery rate of at least 30%
[56]. It can also be shown that a P-value of 0.001 corresponds to a false discovery rate
of 1.8% [57]. Thus, there is a call that the routine P-value should be lowered to 0.005
[58] or 0.001 [9] to minimize false discovery rate. The implication of these
consideration is that researchers should not regard any result with P > 0.005 as an
evidence of discovery.

1.9. Confidence interval

Researchers are almost always interested in knowing the size of an effect or


magnitude of association which is not conveyed by P-value. Confidence interval
provides likely values of effect size within an interval (usually taken as 95%) that are
compatible with a study’s observed data. Thus, confidence interval is a very useful
complementary information pertaining to the practical significance of findings. For
instance, a study testing the effect of supplementation of vitamins C and E during
pregnancy concluded that the supplementation "does not reduce the risk of death or
other serious outcomes in their infants" [59]. However, actual data showed that the
relative risk of death or serious outcome (relative risk 0.79; 95% confidence interval,
0.61 to 1.02) clearly favored the supplementations group, even though P = 0.20.
Some researchers tend to mistakenly interpret confidence interval as a test of
significance. In this view, a 95% confidence interval does not include the null
hypothesis value is interpreted as statistically significant. On the other hand, a 95%
confidence interval includes the null hypothesis value is considered statistically non-
significant. However, confidence interval is a result of estimation, and it should not be
interpreted within the framework of significance testing. Accordingly, a confidence
interval from 0.61 to 1.02 should be interpreted that the data are compatible with a
49% reduction of risk or a 2% increase in risk. Thus, confidence interval should be
named as "Compatibility Intervals" [60].
While reporting confidence intervals has been almost a norm in clinical research
papers, it is still not widely adopted in animal research. Investigators in basic as well
as translational research are suggested to report confidence interval for key measures
in their papers.

1.10. Bayesian inference

A 95% confidence interval (CI) from a to b is sometimes interpreted as there is a


probability of 95% that the true value lies between a and b; however, this
interpretation is strictly incorrect. The actually interpretation of confidence interval
requires a mental exercise: if the study were repeated infinite number of times with
different samples, and a 95% CI is obtained for each time, then 95% of the intervals
would contain the true value. That interpretation is based on the frequentist school of
inference. Admittedly, it is not easy to comprehend the true meaning of CI.
The statement that ‘there is a probability of 95% that the true value lies
between a and b’ can only be derived from a Bayesian analysis. A Bayesian analysis
uses the Bayes’ theorem to synthesize the prior information of an effect and the
existing data to produce the posterior probability of an effect [61]. The posterior
probability can directly provide the kind of answer that researchers want to
have: given the observed data, what is the probability that there is an
effect/association. Just as patients would like to know what is the probability of
having a disease after seeing a test result, researchers want to know what is the
probability of an effect after seeing result of a test statistic. P-value cannot answer that
question; Bayesian analysis can.
Bayesian analysis allows the reporting of direct probability statements about any
magnitude of difference that is of clinical interest [62,63]. For instance, a meta-
analysis of 8 randomized controlled trials showed that supplements of calcium and
vitamin D (CaD) reduced the risk of fracture in both community dwelling and
institutionalized individuals [64]. Using a Bayesian analysis [65], we showed that the
there was a 95% chance that the risk ratio of fracture associated CaD supplements
ranges between 0.68 and 1.02. Moreover, there is a 44% probability that CaD
supplements reduce fracture risk by at least 15% [65]. Sometimes, P-value based
results are not necessarily consistent with a Bayesian analysis. For instance, based on
the frequentist inference, the effect of alendronate on hip fractures may be interpreted
as statistically non-significant at the alpha level of 5%; however, result of a Bayesian
analysis indicated that there is a 90% probability that alendronate reduced fracture risk
by at least 20% [66]. Although the Bayesian school of inference has been suggested as
a paradigm of inference in the 21st century [67], its application in the medical
research is still modest. The low level of uptake of Bayesian methods in medical
research is partly due to the difficulty in choosing prior distributions that capture a
reasonable amount of background knowledge. Many researchers used expert opinions
for determining prior distribution, but this can create many biased problems.
Nevertheless, in most cases, prior distributions can be generated from previously
published data or from probability distributions that reflect a range of background
knowledge about an association: non-informative, sceptical to optimistic [68]. Bone
researchers are encouraged to consider Bayesian analysis and interpretation more
often in their studies.

2. Conclusions
Statistical errors can arise in every phase of a study, from experimental design, data
analysis to interpretation (Table 1). Data are products of experiment, and data quality
is a consequence of experimental design. Good experimental design, whether it is
animal study or clinical trial, is essential for generating high quality data. For a well-
designed study with high quality data, simple statistical methods suffice in most cases,
and the chance of statistical errors is low. Data can be adjusted, but study design
cannot be reversed. Therefore, it is very important that issues concerning study design
(eg, sample size, control, matching, blocking, randomization, measurements) should
be considered at the beginning of a research project to minimize subsequent errors.
Table 1. List of common issues and suggested solutions.
Issue Suggested solution

Lack of sample size justification Provide a statement of sample size estimation,


including hypothesized effect size, type I and type II
error.

Confounders and biases Regression adjustment, but be aware of over


adjustment and unnecessary adjustment.

Data-dependent categorization of continuous Avoid categorization of continuous data. Use spline


data regression or non-parametric smoother.

Dichotomization of P-values into "significance" Avoid dichotomization of P-value. Report actual P-


and "non-significance" based on the threshold values. Consider P < 0.001 or P < 0.005 as a threshold
of P = 0.05 for discovery declaration.

Selection of ‘significant’ variables Avoid stepwise regression. Consider LASSO and


Bayesian Model Averaging methods.

Over-fitting: number of predictors is greater Consider LASSO and ridge regression analysis
than the number of events
Issue Suggested solution

Analysis of variance for longitudinal data Consider linear mixed effects model as an alternative to
repeated measures analysis of variance.

Multiple tests of hypothesis Consider adjustment for multiple tests of hypothesis.


Consider false discovery reporting [77].

P-value in very large sample size study Consider Good’s adjustment [54].

Interpretation of different P-values as different Avoid. Report confidence intervals.


effect sizes

Quantification of uncertainty of effect size Consider Bayesian analysis.

Although the focus of this article is on bone research, the errors identified here are
also discussed in other areas of research [[69], [70], [71]]. Most of these errors come
down to the practice of null hypothesis significance testing and P-value, which is the
subject of intense debate among methodologists and practicing scientists [72]. It is
recognized that the P-value overstates the evidence for an association, and that its
arbitrary threshold of 0.05 is a major source of falsely interpreted true positive results.
About 25% of all findings with P < 0.05, if viewed in a scientifically agnostic light,
can be regarded as either meaningless [73] or as nothing more than chance findings
[74]. There have been calls to ban P-value in scientific inference [75,76]. However, it
is likely that the P-value is here to stay. Although P-value does not convey the truth, it
is a useful measure that helps distinguish between noise and signal in the world of
uncertainty. What is needed is the interpretation of P-value should be contextualized
within a study and biological plausibility. It is hoped that this review helps improve
statistical literacy along all phases of research.

(3)Explain Phases of the waterfall model ?


Ans - Classical waterfall model is the basic software development life
cycle model. It is very simple but idealistic. Earlier this model was very popular
but nowadays it is not used. But it is very important because all the other
software development life cycle models are based on the classical waterfall
model.
Classical waterfall model divides the life cycle into a set of phases. This model
considers that one phase can be started after completion of the previous phase.
That is the output of one phase will be the input to the next phase. Thus the
development process can be considered as a sequential flow in the waterfall.
Here the phases do not overlap with each other. The different sequential
phases of the classical waterfall model are shown in the below figure:

Let us now learn about each of these phases in bri


brief details:
1. Feasibility Study:: The main goal of this phase is to determine whether it
would be financially and technically feasible to develop the software.
The feasibility study involves understanding the problem and then determine
the various possible strategies to solve the problem. These different
identified solutions are analyzed based on their benefits and drawbacks, The
best solution is chosen and all the other phases are carried out as per this
solution strategy.
2. Requirements analysis and specification: The aim of the requirement
analysis and specification phase is to understand the exact requirements of
the customer and document them properly. This phase consists of two
different activities.
 Requirement gathering and analysis: Firstly all the requirements
regarding the software are gathered from the customer and then the
gathered requirements are analyzed. The goal of the analysis part is to
remove incompleteness (an incomplete requirement is one in which some
parts of the actual requirements have been omitted) and inconsistencies
(inconsistent requirement is one in which some part of the requirement
contradicts with some other part).
 Requirement specification: These analyzed requirements are
documented in a software requirement specification (SRS) document.
SRS document serves as a contract between development team and
customers. Any future dispute between the customers and the developers
can be settled by examining the SRS document.
3. Design: The aim of the design phase is to transform the requirements
specified in the SRS document into a structure that is suitable for
implementation in some programming language.
4. Coding and Unit testing: In coding phase software design is translated into
source code using any suitable programming language. Thus each designed
module is coded. The aim of the unit testing phase is to check whether each
module is working properly or not.
5. Integration and System testing: Integration of different modules are
undertaken soon after they have been coded and unit tested. Integration of
various modules is carried out incrementally over a number of steps. During
each integration step, previously planned modules are added to the partially
integrated system and the resultant system is tested. Finally, after all the
modules have been successfully integrated and tested, the full working
system is obtained and system testing is carried out on this.
System testing consists three different kinds of testing activities as described
below :
 Alpha testing: Alpha testing is the system testing performed by the
development team.
 Beta testing: Beta testing is the system testing performed by a friendly
set of customers.
 Acceptance testing: After the software has been delivered, the customer
performed the acceptance testing to determine whether to accept the
delivered software or to reject it.
6. Maintenance: Maintenance is the most important phase of a software life
cycle. The effort spent on maintenance is the 60% of the total effort spent to
develop a full software. There are basically three types of maintenance :
 Corrective Maintenance: This type of maintenance is carried out to
correct errors that were not discovered during the product development
phase.
 Perfective Maintenance: This type of maintenance is carried out to
enhance the functionalities of the system based on the customer’s
request.
 Adaptive Maintenance: Adaptive maintenance is usually required for
porting the software to work in a new environment such as work on a new
computer platform or with a new operating system.
Advantages of Classical Waterfall Model
Classical waterfall model is an idealistic model for software development. It is
very simple, so it can be considered as the basis for other software
development life cycle models. Below are some of the major advantages of this
SDLC model:
 This model is very simple and is easy to understand.
 Phases in this model are processed one at a time.
 Each stage in the model is clearly defined.
 This model has very clear and well understood milestones.
 Process, actions and results are very well documented.
 Reinforces good habits: define-before- design,
design-before-code.
 This model works well for smaller projects and projects where requirements
are well
understood.
Drawbacks of Classical Waterfall Model
Classical waterfall model suffers from various shortcomings, basically we can’t
use it in real projects, but we use other software development lifecycle models
which are based on the classical waterfall model. Below are some major
drawbacks of this model:
 No feedback path: In classical waterfall model evolution of software from
one phase to another phase is like a waterfall. It assumes that no error is
ever committed by developers during any phases. Therefore, it does not
incorporate any mechanism for error correction.
 Difficult to accommodate change requests: This model assumes that all
the customer requirements can be completely and correctly defined at the
beginning of the project, but actually customers’ requirements keep on
changing with time. It is difficult to accommodate any change requests after
the requirements specification phase is complete.
 No overlapping of phases: This model recommends that new phase can
start only after the completion of the previous phase. But in real projects, this
can’t be maintained. To increase the efficiency and reduce the cost, phases
may overlap.

(4).What are the maturity levels in CMM ?

ANS- Capability Maturity Model (CMM) is a methodology used to develop,


refine maturity of an organizations software development process. It is
developed by SIE in mid 1980. It is a process improvement approach.
To assess an organization against a scale of 5 process maturity levels. It Deals
with the what processes should be implemented & not so much with the how
processes should be implemented. Each maturity level comprises a predefined
set of process areas called KDA (Key Process Area), these KDA – Goals,
Commitment, Ability, measurement, verification.
Levels of Capability Maturity Model (CMM) are as following below.
1. Level One : Initial – Work is performed informally.
A software development organization at this level is characterized by AD HOC
activities (organization is not planned in advance.).
2. Level Two : Repeatable – Work is planned and tracked.
This level of software development organization has a basic and consistent
project management processes to TRACK COST, SCHEDULE, AND
FUNCTIONALITY. The process is in place to repeat the earlier successes on
projects with similar applications.
3. Level Three : Defined – Work is well defined.
At this level the software process for both management and engineering
activities are DEFINED AND DOCUMENTED.
4. Level Four : Managed – Work is quantitatively controlled.
 Software Quality management – Management can effectively control the
software development effort using precise measurements. At this level,
organization set a quantitative quality goal for both software process and
software maintenance.
 Quantitative Process Management – At this maturity level, The
performance of processes is controlled using statistical and other
quantitative techniques, and is quantitatively predictable.
5. Level Five : Optimizing – Work is Based Upon Continuous Improvement.
The key characteristic of this level is focusing on CONTINUOUSLY
IMPROVING PROCESS performance.
Key features are:
 Process change management
 Technology change management
 Defect prevention

UNIT IV

(1)What are object oriented databases and there examples ?


Ans - An object-oriented database (OODBMS) or object database management system
(ODBMS) is a database that is based on object-oriented programming (OOP). The data is
represented and stored in the form of objects. OODBMS are also called object databases or
object-oriented database management systems.

A database is a data storage. A software system that is used to manage databases is called a
database management system (DBMS). There are many types of database management
systems such as hierarchical, network, relational, object-oriented, graph, and document. Learn
more here, Types of Database Management Systems.

In this article, we will discuss what object-oriented databases are and why they are useful.

Object-Oriented Database

Object database management systems (ODBMSs) are based on objects in object-oriented


programing (OOP). In OOP, an entity is represented as an object and objects are stored in
memory. Objects have members such as fields, properties, and methods. Objects also have a
life cycle that includes the creation of an object, use of an object, and deletion of an object. OOP
has key characteristics, encapsulation, inheritance, and polymorphism. Today, there are many
popular OOP languages such as C++, Java, C#, Ruby, Python, JavaScript, and Perl.

The idea of object databases was originated in 1985 and today has become common for
various common OOP languages, such as C++, Java, C#, Smalltalk, and LISP. Common
examples are Smalltalk is used in GemStone, LISP is used in Gbase, and COP is used in
Vbase.

Object databases are commonly used in applications that require high performance,
calculations, and faster results. Some of the common applications that use object databases are
real-time systems, architectural & engineering for 3D modeling, telecommunications, and
scientific products, molecular science, and astronomy.

Advantages of Object Databases

ODBMS provide persistent storage to objects. Imagine creating objects in your program and
saving them as it is in a database and reading back from the database.

In a typical relational database, the program data is stored in rows and columns. To store and
read that data and convert it into program objects in memory requires reading data, loading data
into objects, and storing it in memory. Imagine creating a class in your program and saving it as
it is in a database, reading back and start using it again.

Object databases bring permanent persistent to objects. Objects can be stored in persistent
storage forever.

In typical RDBMS, there is a layer of object-relational mapping that maps database schemas
with objects in code. Reading and mapping an object database data to the objects is direct
without any API or OR tool. Hence faster data access and better performance.

Some object database can be used in multiple languages. For example, Gemstone database
supports C++, Smalltalk and Java programming languages.

Drawbacks of Object Databases

 Object databases are not as popular as RDBMS. It is difficult to find object DB


developers.
 Not many programming language support object databases.
 RDBMS have SQL as a standard query language. Object databases do not have a
standard.
 Object databases are difficult to learn for non-programmers.
Popular Object Databases

Here is a list of some of the popular object databases and their features.

Cache

InterSystems’s Caché is a high-performance object database. Caché database engine is a set


of services including data storage, concurrency management, transactions, and process
management. You can think of the Caché engine as a powerful database toolkit.

Caché is also a full-featured relational database. All the data within a Caché database is
available as true relational tables and can be queried and modified using standard SQL via
ODBC, JDBC, or object methods. Caché is one of the fastest, most reliable, and most scalable
relational databases.

Cache offers the following features,

 The ability to model data as objects (each with an automatically created and
synchronized native relational representation) while eliminating both the impedance
mismatch between databases and object-oriented application environments as well as
reducing the complexity of relational modeling,
 A simpler, object-based concurrency model
 User-defined data types
 The ability to take advantage of methods and inheritance, including polymorphism, within
the database engine
 Object-extensions for SQL to handle object identity and relationships
 The ability to intermix SQL and object-based access within a single application, using
each for what they are best suited
 Control over the physical layout and clustering used to store data in order to ensure the
maximum performance for applications

Cache offers a broad set of tools, which include,

 ObjectScript, the language in which most of Caché is written.


 Native implementations of SQL, MultiValue, and Basic.
 A well-developed, built-in security model
 A suite of technologies and tools that provide rapid development for database and web
applications
 Native, object-based XML and web services support
 Device support (such as files, TCP/IP, printers)
 Automatic interoperability via Java, JDBC, ActiveX, .NET, C++, ODBC, XML, SOAP,
Perl, Python, and more
 Support for common Internet protocols: POP3, SMTP, MIME, FTP, and so on
 A reusable user portal for your end users
 Support for analyzing unstructured data
 Support for Business Intelligence (BI)
 Built-in testing facilities

ConceptBase

ConceptBase.cc is a multi-user deductive database system with an object-oriented (data, class,


metaclass, meta-metaclass, etc.) makes it a powerful tool for metamodeling and engineering of
customized modeling languages. The system is accompanied by a highly configurable graphical
user interface that builds upon the logic-based features of the ConceptBase.cc server.

ConceptBase.cc is developed by the ConceptBase Team at University of Skövde (HIS) and the
University of Aachen (RWTH). ConceptBase.cc is available for Linux, Windows, and Mac OS-X.
There is also a pre-configured virtual appliance that contains the executable system plus its
sources plus the tools to compile them. The system is distributed under a FreeBSD-style
license.

Db4o

b4o is the world's leading open-source object database for Java and .NET. Leverage fast native
object persistence, ACID transactions, query-by-example, S.O.D.A object query API, automatic
class schema evolution, small size.

ObjectDB Object Database

ObjectDB is a powerful Object-Oriented Database Management System (ODBMS). It is


compact, reliable, easy to use and extremely fast. ObjectDB provides all the standard database
management services (storage and retrieval, transactions, lock management, query processing,
etc.) but in a way that makes development easier and applications faster.

 ObjectDB Database Key Features


 100% pure Java Object-Oriented Database Management System (ODBMS).
 No proprietary API - managed only by standard Java APIs (JPA 2 / JDO 2).
 Extremely fast - faster than any other JPA / JDO product.
 Suitable for database files ranging from kilobytes to terabytes.
 Supports both Client-Server mode and Embedded mode.
 Single JAR with no external dependencies.
 The database is stored as a single file.
 Advanced querying and indexing capabilities.
 Effective in heavy loaded multi-user environments.
 Can easily be embedded in applications of any type and size.
 Tested with Tomcat, Jetty, GlassFish, JBoss, and Spring.

ObjectDatabase++

ObjectDatabase++ (ODBPP) is an embeddable object-oriented database designed for server


applications that require minimal external maintenance. It is written in C++ as a real-time ISAM
level database with the ability to auto recover from system crashes while maintaining database
integrity.

Objectivity/DB

Objectivity/DB is a scalable, high performance, distributed Object Database (ODBMS). It is


extremely good at handling complex data, where there are many types of connections between
objects and many variants.

Objectivity/DB runs on 32 or 64-bit processors running Linux, Mac OS X, UNIX (Oracle Solaris)
or Windows.

There are C++, C#, Java and Python APIs.

All platform and language combinations are interoperable. For example, objects stored by a
program using C++ on Linux can be read by a C# program on Windows and a Java program on
Mac OS X.

Objectivity/DB generally runs on POSIX filesystems, but there are plugins that can be modified
for other storage infrastructure.

Objectivity/DB client programs can be configured to run on a standalone laptop, networked


workgroups, large clusters or in grids or clouds with no changes to the application code.
ObjectStore

ObjectStore is an enterprise object-oriented database management system for C++ and Java.

ObjectStore delivers multi-fold performance improvement by eliminating the middleware


requirement to map and convert application objects into flat relational rows by directly persisting
objects within an application into an object store

ObjectStore eliminates need to flatten complex data for consumption in your application logic
reducing the overhead of using a translation layer that converts complex objects into flat
objects, dramatically improving performance and often entirely eliminating the need to manage
a relational database system

ObjectStore is OO storage that directly integrates with Java or C++ applications and treats
memory and persistent storage as one – improving the performance of application logic while
fully maintaining ACID compliance against the transactional and distributed load.

Versant Object Database

Versant Object-Oriented Database is an object database that supports native object persistence
and used to build complex and high-performance data management systems.

Key Benefits

 Real-time analytical performance


 Big Data management
 Cut development time by up to 40%
 Significantly lower total ownership cost
 High availability

WakandaDB
WakandaDB is an object database and provides a native REST API to access interconnected
DataClasses defined in Server-Side JavaScript. WakandaDB is the server within Wakanda
which includes a dedicated, but not mandatory, Ajax Framework, and a dedicated IDE.

Object-relational Databases

Object-relational database (ORD), or object-relational database management systems


(ORDBMS) are databases that support both objects and relational database features. OR
databases are relational database management systems with the support of an object-oriented
database model. That means, the entities are represented as objects and classes and OOP
features such as inheritance are supported in database schemas and in the query language.

PostgreSQL is the most popular pure ORDBMS. Some popular databases including Microsoft
SQL Server, Oracle, and IBM DB2 also support objects and can be considered as ORDBMS.

(2)Advantages & disadvantages of object oriented databases ?


Ans- The object-oriented database model ties related packages together. In other words, a data
set and all its attributes are combined with an object. In this way, all of the information is directly
available. Instead of distributing everything across different tables, then, the data can be retrieved in
one package. Alongside the attributes, methods are also stored in the objects. This is where the
databases’ proximity to object-oriented programming languages becomes clear. As in
programming, each object has certain activities that it can carry out.

In turn, objects are brought together in classes. Or, to put it more accurately: an object is a concrete
unit in an abstract class. This generates a hierarchy of classes and subclasses. Within such a
construct, subclasses assume the properties of higher-level classes and expand on these with their
own attributes. At the same time, objects in one class can also be connected with other classes. This
breaks up the strict hierarchy and makes sure that the objects are interlinked. Simple objects can be
combined with complex objects.

To address the various objects, the corresponding database management system automatically
assigns a one-off identification to each unit. In this way, objects can be easily retrieved again after
they have been saved.

Let’s look at an example. Assume we are saving the concrete object of a bicycle as an object-
oriented unit with all of its properties and methods. It is red, you can ride it, it has a saddle, and so
on. This object is then simultaneously part of the class ‘Bicycles’. Inside the same class, for
example, we might also find a blue and a green bicycle. The class ‘Bicycles’ is in turn a subcategory
of ‘Vehicles’, which also contains ‘Cars’. At the same time, however, the object also has a
connection to the class ‘Leisure activities’. If we retrieve our object via its unique ID, all of its related
attributes and methods are directly available.
Relational vs. object-oriented databases

For a long time, relational databases have been the standard in web and software development. In
this model, information is stored in interconnected tables. Here, too, the links between complex
pieces of information with different components can be stored and retrieved. With an object
database, though, all of the unit’s components are available immediately. That also means that the
data sets can be much more complex. With a relational database we tend to try to accommodate
simple information. The more complex the data set becomes, the more extensive the connections,
obstructing the database.

Advantages and disadvantages of the object-oriented database model

The choice of database type heavily depends on the individual application. When working
with object-oriented programming languages, like Java for example, an object database is
advantageous. The objects of the source code can simply be incorporated into the database. If we
turn to a relational database, which is fairly common, complex objects are more difficult to integrate
into the table construct.

One disadvantage of OODBs is poor adoption among the community. Although the model has
been established since the 1980s, up to now, very few database management systems have taken
to object databases. The community that works with the model is correspondingly small. Most
developers, therefore, prefer to turn to the more widely spread, well-documented and highly
developed relational databases.

Although complexity of OODBs is one of its advantages, it can present a few disadvantages in
certain situations. The complexity of the objects means that complex queries and operations can be
undertaken much more quickly than in relational models. However, if the operations are simple, the
complex structure still has to be used resulting in a loss of speed.

Advantages Disadvantages

Complex data sets can be saved and retrieved Object databases are not widely adopted.
quickly and easily.

Object IDs are assigned automatically. In some situations, the high complexity can cause
performance problems.

Works well with object-oriented programming


languages.

(4)The architecture of oop databases ?


Ans- 1. The Need for Object-Oriented Databases
The increased emphasis on process integration is a driving force for the adoption of objectoriented
database systems. For example: • The Computer Integrated Manufacturing (CIM) area is focusing
heavily on using object-8oriented database technology as the process integration framework.
Architecture of Object-Oriented Databases - 2 - • Advanced office automation systems use object-
oriented database systems to handle hypermedia data. • Hospital patient care tracking systems use
object-oriented database technologies for ease of use. • Computer Aided Design (CAD), Computer Aided
Manufacturing (CAM) and Computer Aided Software Engineering (CASE) applications use object-
oriented database to manage the complex graphical data. All of these applications are characterized by
having to manage complex, highly interrelated information, which was difficult to manage in RDBMS. To
combat the limitations of RDBMS and meet the challenge of the increasing rise of the internet and the
Web, programmers developed object-oriented databases in 1980. 2. Object Oriented Usage Areas:
Generally, an object database is a good choice when you have the following three factors: business
need, high performance, and complex data. 2.1 High Performance With complex data, ODBMS will run
anywhere from 10 to 1000 times faster than an RDBMS. The range of this performance advantage
depends on the complexity of the data and the access patterns for the data. The reason for high
performance is that the Object Oriented Databases are optimized for the traversals related to complex
data. They also do not have any "impedance mismatch" when it comes to using object programming
languages such as Java and C++. 2.2 Complex data: The complex data is often characterized by: • A lack
of unique, natural identification. • A large number of many-to-many relationships. • Access using
traversals. • Frequent use of type codes such as those found in the relational schema. Architecture of
Object-Oriented Databases - 3 - For Example: the following clothing database uses the complex database
and objects. 3. Overview of Object Oriented Database Management Systems An OODBMS is the result of
combining object oriented programming principles with database management principles. Object
oriented programming concepts such as encapsulation, polymorphism and inheritance are enforced as
well as database management concepts such as the ACID Properties (Atomicity, Consistency, Isolation
and Durability) which lead to system integrity, support for an ad hoc query language and secondary
storage management systems which allow for managing very large amounts of data. Features that are
common in the RDBMS world such as transactions, the ability to handle large amounts of data, indexes,
deadlock detection, backup and restoration features and data recovery mechanisms also exist in the
OODBMS world. Architecture of Object-Oriented Databases - 4 - The Object Oriented Database
specifically lists the following features as mandatory for a system to support before it can be called an
OODBMS: Complex Objects, Object Identity, Encapsulation, Types and Classes, Class or Type Hierarchies
, Overriding, Overloading and Late Binding Computational Completeness, Extensibility,
Persistence,Secondary Storage Management , Concurrency, Recovery and an Ad Hoc Query Facility. A
primary feature of an OODBMS is that accessing objects in the database is done in a transparent manner
such that interaction with persistent objects is no different from interacting with in-memory objects.
This is very different from using RDBMSs in that there is no need to interact via a query sub-language
like SQL nor is there a reason to use a Call Level Interface such as ODBC, ADO or JDBC. Database
operations typically involve obtaining a database root from the the OODBMS which is usually a data
structure like a graph, vector, hash table, or set and traversing it to obtain objects to create, update or
delete from the database. When a client requests an object from the database, the object is transferred
from the database into the application's cache where it can be used either as a transient value that is
disconnected from its representation in the database (updates to the cached object do not affect the
object in the database) or it can be used as a mirror of the version in the database in that updates to the
object are reflected in the database and changes to object in the database require that the object is
refetched from the OODBMS. 4. Comparisons of OODBMSs to RDBMSs According to Mary Loomis, the
architect of the Versant OODBMS: • "Relational database design is really a process of trying to figure out
how to represent real-world objects within the confines of tables in such a way that good performance
results and preserving data integrity is possible.” • “Object database design is quite different. For the
most part, object database design is a fundamental part of the overall application design process. The
object classes used by the programming language are the classes used by the ODBMS. Because their
Architecture of Object-Oriented Databases - 5 - models are consistent, there is no need to transform the
program’s object model to something unique for the database manager." There are concepts in the
relational database model that are similar to those in the object database model. The equality of the
various concepts in RDBMS and OODBMS is as: • Relation or Table Class. • Tuple An instance of a class. •
Column in a Tuple Class Attribute. 5. Characteristics of Object-Oriented Databases in Depth Object-
oriented database technology is a marriage of object-oriented programming and database technologies.
Figure below illustrates how these programming and database concepts have come together to provide
object-oriented databases. The most significant characteristic of object-oriented database technology is
that it combines object-oriented programming with database technology to provide an integrated
application development system. There are many advantages to including the definition of operations
with the definition of data: Architecture of Object-Oriented Databases - 6 - • The defined operations
apply ubiquitously and are not dependent on the particular database application running at the
moment. • The data types can be extended to support complex data such as multi-media by defining
new object classes that have operations to support the new kinds of information. Other strengths of
object-oriented modeling are well known. For example, inheritance allows one to develop solutions to
complex problems incrementally by defining new objects in terms of previously defined objects.
Polymorphism and dynamic binding allow one to define operations for one object and then to share the
specification of the operation with other objects. These objects can further extend this operation to
provide behaviors that are unique to those objects. Dynamic binding determines at runtime which of
these operations is actually executed, depending on the class of the object requested to perform the
operation All of these capabilities come together synergistically to provide significant productivity
advantages to database application developers. A unique characteristic of objects is that they have an
identity that is independent of the state of the object. For example, if one has a car object and we
remodel the car and change its appearance, the engine, the transmission, and the tires so that it looks
entirely different, it would still be recognized as the same object we had originally. Within an object-
oriented database, one can always ask the question, “is this the same object I had previously?”,
assuming one remembers the object’s identity. Object-identity allows objects to be related as well as
shared within a distributed computing network. 6. Architectures using object Database Products: 6.1
Stand-alone architecture If you are using C++ or Java in a stand-alone application and have the need for
a database that provides high performance on complex data, it is difficult to beat an ODBMS. The reason
is two-fold: Architecture of Object-Oriented Databases - 7 - 1. With an ODBMS, you have only one model
to manage, the model that your object programming language uses. See the diagram below, which
shows the same model 2being used in the database and the application. There is also no need to
program any mapping between the data in the database and the data in the application. 2. An ODBMS
gives you excellent performance on object models. This means either you can get extreme performance
on complex data or you can use less expensive hardware than you might need with a relational DBMS.
Architecture of Object-Oriented Databases - 8 - Some examples of stand-alone applications: • Web sites
that do not use any existing data. • Programming Tool. • Design Tools. • Multimedia Tools. • Catalogs on
CDs. • Embedded applications in general. 6.2 Architecture with existing data sources Object databases
can be a way of staging data for your C++ or Java applications. This example shows two existing data
sources that have data in non-object formats (flat file and relational, for example). The non-object data
is mapped into object models and stored in the object database. This object database now holds some
part of the existing data and perhaps some of its own data that did not exist previously. At some later
time, the object application can obtain this data Architecture of Object-Oriented Databases - 9 - and tap
the high performance that an object database provides. This performance is a result of having the same
model in the object database as is used by the object application. 6.2.1 Middle-tier architecture This
middle-tier architecture allows the existing database to be the "database of record." At the same time, it
also protects the existing database from direct Internet traffic and provides a high performance engine
to interact with the Internet traffic. 6.2.2 Object-relational mapping Architecture of Object-Oriented
Databases - 10 - If we want to take advantage of using an ODBMS in the middle tier and have one or
more existing relational databases, we will need to map data from a relational format to an object
format. This mapping of data can become complex. If we would code this mapping yourself, the amount
of code devoted to mapping often becomes 30 to 40 percent of your total code. Unfortunately, that 30
to 40 percent is only mapping, not helping to solving our business problem. It also adds to the possible
defects we may need to fix. So, if we need to map, consider using an object-relational mapping product.
They will save our time and reduce the number of defects in our mapping. An object-relational mapping
product is shown at the left of figure of middle tier architecture. It handles the mapping and has a cache
much like an ODBMS. Data is mapped to the ODBMS and from the ODBMS based on the application
needs. The ODBMS provides high-speed performance for the Internet. 7. Current Users of ODBMSes:
The following information was gleaned from the ODBMS Facts website. • The Chicago Stock Exchange
manages stock trades via a Versant ODBMS. • Radio Computing Services is the world's largest radio
software company. Its product, Selector, automates the needs of the entire radio station -- from the
music library, to the newsroom, to the sales department. RCS uses the POET ODBMS because it enabled
RCS to integrate and organize various elements, regardless of data types, in a single program
environment. • The Objectivity/DB ODBMS is used as a data repository for system component naming,
satellite mission planning data, and orbital management data deployed by Motorola in The Iridium
System. • The ObjectStore ODBMS is used in SouthWest Airline's Home Gate to provide self service to
travelers through the Internet. Architecture of Object-Oriented Databases - 11 - • Ajou University
Medical Center in South Korea uses InterSystems' Cachè ODBMS to support all hospital functions
including mission-critical departments such as pathology, laboratory, blood bank, pharmacy, and X-ray. •
The Large Hadron Collider at CERN in Switzerland uses an Objectivity DB. The database is currently being
tested in the hundreds of terabytes at data rates up to 35 MB/second. • As of November, 2000, the
Stanford Linear Accelerator Center (SLAC) stored 169 terabytes of production data using Objectivity/DB.
The production data is distributed across several hundred processing nodes and over 30 on-line servers.
8. Commercial ODBMSes • Polyhedra • Poet • Gemstone • Eye-DB • Objectivity/DB • Jasmine- CA OO
DB • Itasca • Metakit- The Structured database which fits into the palm of your hand. • Conte Xt • XDb •
ObjectDB 9. Freely Available ODBMSes • SHORE Web Page • FTP Access to Shore • Lincks • Texas
Persistent Store Library • MinneStore(tm) version 2 - "Libre" OO DB for various Smalltalk
implementations. Architecture of Object-Oriented Databases - 12 - • IronDOC (See also the Old IronDOC
Page) provides database-like operators to robustly manage components and roll back changes. The
author, , was apparently involved with the creation of the OpenDoc component scheme, particularly the
Bento structured storage system. Note that McCusker is now apparently working on an OpenDoc-like
system called Bolide. 10. Conclusion: The main objective of this paper is evaluation of the current state
of OODBMS products and architecture. This paper was written with two specific purposes. The first was
to enumerate and describe the many features that are provided by an object-oriented database. This
information should be useful to those readers who are new to object-oriented databases, and even to
those new to object oriented technologies and/or database technologies. A second objective was to
provide readers a starting point for performing more in-depth OODBMS Architecture evaluations. There
is a misunderstanding between the users that shifting from existing database to object-oriented
database is a tedious task.. But the fact is that it can be easily shifted to object oriented database using
Relational Mapping and Three – Tier Architecture as discussed above. 1. The Need for Object-Oriented
Databases
UNIT V

(1)What is distributed object ?


Ans - In distributed computing, distributed objects are objects (in the sense of object-
[citation needed]

oriented programming) that are distributed across different address spaces, either in
different processes on the same computer, or even in multiple computers connected via a network,
but which work together by sharing data and invoking methods. This often involves location
transparency, where remote objects appear the same as local objects. The main method
of distributed object communication is with remote method invocation, generally by message-
passing: one object sends a message to another object in a remote machine or process to perform
some task. The results are sent back to the calling object.
Distributed objects were popular in the late 1990s and early 2000s, but have since fallen out of
favor.[1]
The term may also generally refer to one of the extensions of the basic object concept used in the
context of distributed computing, such as replicated objects or live distributed objects.

 Replicated objects are groups of software components (replicas) that run a distributed multi-
party protocol to achieve a high degree of consistency between their internal states, and that
respond to requests in a coordinated manner. Referring to the group of replicas jointly as
an object reflects the fact that interacting with any of them exposes the same externally visible
state and behavior.
 Live distributed objects (or simply live objects)[2] generalize the replicated object concept to
groups of replicas that might internally use any distributed protocol, perhaps resulting in only a
weak consistency between their local states. Live distributed objects can also be defined as
running instances of distributed multi-party protocols, viewed from the object-oriented
perspective as entities that have distinct identity, and that can encapsulate distributed state and
behavior.

Local vs. distributed objects[edit]


Local and distributed objects differ in many respects.[3][4] Here are some of them:

1. Life cycle : Creation, migration and deletion of distributed objects is different from local
objects
2. Reference : Remote references to distributed objects are more complex than simple pointers
to memory addresses
3. Request Latency : A distributed object request is orders of magnitude slower than local
method invocation
4. Object Activation : Distributed objects may not always be available to serve an object request
at any point in time
5. Parallelism : Distributed objects may be executed in parallel.
6. Communication : There are different communication primitives available for distributed
objects requests
7. Failure : Distributed objects have far more points of failure than typical local objects.
8. Security : Distribution makes them vulnerable to attack.
Examples[edit]
The RPC facilities of the cross platform se
serialization protocol, Cap'n Proto amount to a distributed
object protocol. Distributed object method calls can be executed (chained, in a single network
request, if needs be) through
ugh interface references/
references/capabilities.[5]
Distributed objects are implemented
mented in Objective-C using the Cocoa API with the NSConnection
class and supporting objects.
Distributed objects are used in Java RMI
RMI.
CORBA lets one build distributed mixed object systems.
DCOM is a framework for distributed objects on the Microsoft platform.
DDObjects is a framework for distributed
tributed objects using Borland Delphi.
Jt is a framework for distributed components using a messaging paradigm.
JavaSpaces is a Sun specification for a distributed, shared memory (spa
(space based)
Pyro is a framework for distributed objects using the Python programming language.
language
Distributed Ruby (DRb) is a framework for distributed objects using the Ruby programming
language.

(2)Compile – time versus runtime objects ?


Ans-

is the period when the programming code (such as C#, Java, C, Python)
Python is converted
to the machine code (i.e. binary code)
code). Runtime is the period of time when a
program is running and generally occurs after compile timetime.
We use high-level
level programming languages such as Java to write a program. The
instructions or source code written using high
high-level
level language is required to get
ge
converted to machine code for a computer to understand. During compile time,
the source code is translated to a byte code like from .java to .class. During compile
time the compiler check for the syntax, semantic, and type of the code.

3.1. Inputs and Outputs


Inputs and outputs during compile time are the following:

 Inputs – Source code, dependent files, interfaces, libraries required for


successful compilation of the code
 Outputs – On successful compilation, a complied code (assembly
code or relocatable object code), otherwise compile time error messages

3.2. Errors
During compile time errors occur because of syntax and semantic. The syntax
error occurs because of the wrong syntax of the written code. Semantic errors occur in
reference to variable, function, type declarations and type checking.

4. Runtime
A program’s life cycle is a runtime when the program is in execution. Following are the
different types of runtime errors:

 Division by zero – when a number is divided by zero (0)


 Dereferencing a null pointer – when a program attempts to access memory with a
NULL
 Running out of memory – when a computer has no memory to allocate to
programs

5. Differences
The following table shows a comparison between compile time and runtime.
6. Conclusion
In this article, we discussed an overview of the compile-time and runtime. First, we
discussed the stages of translation of source code to machine code for java and C#.
We then talked about the differences between compiler time and runtime. To
conclude, knowing about the translation stages, is beneficial in understanding the
source of errors for a computer program.

(4)What are CORBA Object refrences ?


Ans- A CORBA object is a virtual entity in the sense that it does not exist on its own, but
rather is brought to life when, using the reference to that CORBA object, the client
application requests an operation on that object. The reference to the CORBA object is
called an object reference. The object reference is the only means by which a CORBA
object can be addressed and manipulated in an Oracle Tuxedo system. For more information
about object references, see Creating CORBA Server Applications in the Oracle Tuxedo
online documentation.

When the client or server application issues a request on an object via an object reference,
the Oracle Tuxedo server application instantiates the object specified by the object
reference, if the object is not already active in memory. (Note that a request always maps
to a specific operation invocation on an object.)
Instantiating an object typically involves the server application initializing the object’s state,
which may include having the object’s state read from durable storage, such as a database.
The object contains all the data necessary to do the following:
•Execute the object’s operations.
•Store the object’s state in durable storage when the object is no longer needed.

How a CORBA Object Comes into Being


The data that makes up a CORBA object may have its origin as a record in a database. The
record in the database is the persistent, or durable, state of the object. This record becomes
accessible via a CORBA object in an Oracle Tuxedo domain when the following sequence has
occurred:
The server application’s factory creates a reference for the object. The object reference
1.includes information about how to locate the record in the database.
Using the object reference created by the factory, the client application issues a request
2.on the object.
The object is instantiated. The object is instantiated by the TP Framework by invoking
3.the Server::create_servant method, which exists in the Server object.
The Oracle Tuxedo domain invokes the activate_object operation on the object, which
4.causes the record containing state to be read into memory.
Whereas a language object exists only within the boundaries of the execution of the
application, a CORBA object may exist across processes and machine systems. The Oracle
Tuxedo system provides the mechanism for constructing an object and for making that
object accessible to the application.
The Oracle Tuxedo CORBA server application programmer is responsible for writing the code
that initializes an object’s state and the code that handles that object’s state after the object
is no longer active in the application. If the object has data in durable storage, this code
includes the operations that read from and write to durable storage. For more information
about developing server applications, see Creating CORBA Server Applications in the Oracle
Tuxedo online documentation.

Components of a CORBA Object


CORBA objects typically have the following components, shown in the figure that follows:
•An ID, also known as an object ID, or OID
•An interface, which specifies the CORBA object’s data and operations
The sections that follow describe each of these object components in detail.
The Object ID
The object ID (OID) associates an object with its state, such as a database record, and
identifies the instance of the object. When the factory creates an object reference, the
factory assigns an OID that may be based on parameters that are passed to the factory in
the request for the object reference.
The server application programmer must create the factories used in the Oracle
Tuxedo client/server application. The programmer is responsible for writing the code
that assigns OIDs. Factories, and examples of creating them, are discussed
Note:in Creating CORBA Server Applications.
The Oracle Tuxedo system can determine how to instantiate the object by using the
following information:
•The OID
•Addressing data in the object reference
•The group ID in the object reference
The Object Interface
The object’s interface, described in the application’s OMG IDL file, identifies the set of data
and operations that can be performed on an object. For example, the interface for a
university teller object would identify:
The data types associated with the object, such as a teller ID, cash in the teller’s drawer;
•and the data managed by the object, such as an account.
The operations that can be performed on that object, such as obtaining an account’s
•current balance, debiting an account, or crediting an account.
One distinguishing characteristic of a CORBA object is the run-time separation of the
interface definition from its data and operations. In a CORBA system, a CORBA object’s
interface definition may exist in a component called the Interface Repository. The data and
operations are specified by the interface definition, but the data and operations exist in the
server application process when the object is activated.
The Object’s Data
The object’s data includes all of the information that is specific to an object class or an
object instance. For example, within the context of a university application, a typical object
might be a teller. The data of the teller could be:
•An ID
•The amount of cash in the teller’s drawer
The number of transactions the teller has processed during a given interval, such as a day
•or month
You can encapsulate the object’s data in an efficient way, such as by combining the object’s
data in a structure to which you can get access by means of an attribute. Attributes are a
conventional way to differentiate the object’s data from its operations.
The Object’s Operations
The object’s operations are the set of routines that can perform work using the object’s
data. For example, some of the operations that perform functions using teller object might
include:
•get_balance()
•credit()
•debit()
In a CORBA system, the body of code you write for an object’s operations is sometimes
called the object implementation, which is explained in the next section.

Where an Object Gets Its Operations


As explained in the preceding section, the data that makes up a CORBA object may exist in
a record in a database. Alternatively, the data could be established for a CORBA object only
when the object is active in memory. This section explains how to write operations for a
CORBA object and how to make the operations a part of the object.
The operations you write for a given CORBA object are also known as the
object’s implementation. You can think of the implementation as the code that provides the
behavior of the object. When you create an Oracle Tuxedo CORBA client/server application,
one of the steps you take is to compile the application’s OMG IDL file. The OMG IDL file
contains statements that describe the application’s interfaces and the operations that can be
performed on those interfaces.
If you are implementing your server application in C++, one of the several files optionally
produced by the IDL compiler is a template for the implementation file. The template for
the implementation file contains default constructors and method signatures for your
application’s objects. The implementation file is where you write the code that implements
an object; that is, this file contains the business logic of the operations for a given interface.
The Oracle Tuxedo system implements an interface as a CORBA object. The IDL compiler
also produces other files, which get built into the Oracle Tuxedo CORBA client and server
application, that make sure that the implementation you write for a given object gets
properly connected to the correct object data during run time.
This is where the notion of a servant comes in. A servant is an instance of the object class;
that is, a servant is an instance of the method code you wrote for each operation in the
implementation file. When the Oracle Tuxedo CORBA client and server applications are
running, and a client request arrives in the server application for an object that is not active
-- that is, the object is not in memory -- the following events occur:
If no servant is currently available for the needed object, the Oracle Tuxedo system
1.invokes the Server::create_servant method on the Server object.
The Server::create_servant method is entirely user-written. The code that you write for
the Server::create_servant method instantiates the servant needed for the request. Your
code can use the interface name, which is passed as a parameter to
the Server::create_servant method, to determine the type of servant that the Oracle Tuxedo
domain creates.
The servant that the Oracle Tuxedo domain creates is a specific servant object instance (it is
not a CORBA object), and this servant contains an executable version of the operations you
wrote earlier that implement the CORBA object needed for the request.
The Oracle Tuxedo domain passes control to the servant, and optionally invokes the
servant’s activate_object method, if you have implemented it. Invoking
2.the activate_object method gives life to the CORBA object, as follows:
You write the code for the activate_object method. The parameter to
the activate_object method is the string value of the object ID for the object to be
a. activated. You may use the object ID as a key to how to initialize the object.
You initialize the CORBA object’s data, which may involve reading state data from durable
b.storage, such as from a record in a database.
The servant’s operations become bound to the data, and the combination of those
c. operations and the data establish the activated CORBA object.
After steps a, b, and c are completed, the CORBA object is said to be activated.
Implementing the activate_object method on an object is optional. For more information
about when you want to implement this operation on an object, see Creating CORBA Server
Applications in the Oracle Tuxedo online documentation.
A servant is not a CORBA object. In fact, the servant is represented as a language
Note:object. The server performs operations on an object via its servant.
For more information about creating object implementations, see Creating CORBA Server
Applications in the Oracle Tuxedo online documentation.

How Object Invocation Works


Since CORBA objects are meant to function in a distributed environment, OMG has defined
an architecture for how object invocation works. A CORBA object can be invoked in one of
two ways:
By means of generated client stubs and skeletons -- sometimes referred to as stub-style
•invocation.
•By means of the dynamic invocation interface -- referred to as dynamic invocation.
Creating CORBA Client Applications describes how dynamic invocation works. This section
describes stub-style invocation, which is simpler to use than dynamic invocation.
When you compile your application’s OMG IDL file, one file that the compiler generates is a
source file called the client stub. The client stub maps OMG IDL operation definitions for an
object type to the operations in the CORBA server application that the Oracle Tuxedo
system invokes to satisfy a request. The client stub contains code generated during the
client application build process that is used in sending the request to the server application.
Programmers should never modify the client stub code.
Another file produced by the IDL compiler is the skeleton, which is also a source file. The
skeleton contains code used for operation invocations on each interface specified in the OMG
IDL file. The skeleton is a map that points to the appropriate code in the CORBA object
implementation that can satisfy the client request. The skeleton is connected to both the
object implementation and the Oracle Tuxedo Object Request Broker.
The following figure shows the client application, the client stub, the skeleton, and the
CORBA object implementation:

When a client application sends a request, the request is implemented as an operation on


the client stub. When the client stub receives the request, the client stub sends the request
to the Object Request Broker (ORB), which then sends the request through the Oracle
Tuxedo system to the skeleton. The ORB interoperates with the TP Framework and the
Portable Object Adapter (POA) to locate the correct skeleton and object implementation.
For more information about generating client stubs and skeletons, see Creating CORBA
Client Applications and Oracle Tuxedo ATMI C Function Reference in the Oracle Tuxedo
online documentation.

(5)Sysnchronization blocking in remote objects ?


Ans- Logically, blocking in a remote object is simple. Suppose that client A calls a
synchronized method of a remote object. To make access to remote objects look always
exactly the same as to local objects, it would be necessary to block A in the client-side stub
that implements the object's interface and to which A has direct access. Likewise, another
client on a different machine would need to be blocked locally as well before its request can
be sent to the server. The consequence is that we need to synchronize different clients at
different machines. As we discussed in Chap. 6, distributed synchronization can be fairly
complex.
An alternative approach would be to allow blocking only at the server. In principle, this
works fine, but problems arise when a client crashes while its invocation is being handled by
the server. As we discussed in Chap. 8, we may require relatively sophisticated protocols to
handle this situation, and which that may significantly affect the overall performance of
remote method invocations.

Therefore, the designers of Java RMI have chosen to restrict blocking on remote
objects only to the proxies (Wollrath et al., 1996). This means that threads in the same
process will be prevented from concurrently accessing the same remote object, but
threads in different processes will not. Obviously, these synchronization semantics are
tricky: at the syntactic level (ie, when reading source code) we may see a nice, clean design.
Only when the distributed application is actually executed, unanticipated behavior may be
observed that should have been dealt with at design time. [..
Due to the differing failure modes of local and remote objects, distributed wait and
notification requires a more sophisticated protocol between the entities involved (so that,
for example, a client crash does not cause a remote object to be locked forever), and as
such, cannot be easily fitted into the local threading model in Java. Hence, a client can use
notify and wait methods on a remote reference, but that client must be aware that such
actions will not involve the actual remote object, only the local proxy (stub) for the remote
object.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy