OBGD ONA User Manual R-Studio v1.1.6
OBGD ONA User Manual R-Studio v1.1.6
using R-studio
-1-
CONTENTS
-2-
1 HOW TO USE THIS BOOK
Each section provides example programming codes for conducting organizational network
analyses. Below, you find the explanation of how to use these codes.
To execute the code noted in some sections, you first need to execute the code given in
previous sections. For example, in order to add vertex attributes to a loaded network data file
(4.3.3), you first need to install/load relevant R packages (4.3.1), and load the socio-matrix
(4.3.2).
In-text highlights
Code boxes
-3-
2 INTRODUCING OUR CASE ORGANIZATION
To get hands-on experience in organizational network analysis (ONA), we will mimic a
situation in which you work as a member of a consultancy team that is asked to advice a case
organization. This case organization is experiencing problems related to its organizational
network. Throughout this book, it is our mission to make sense of the network data of this
company and, ultimately, to help increase the effectiveness of the company.
The case company is a medium-sized call center, employing exactly 99 employees and leaders,
and it is located in Singapore. There are three hierarchical layers in the organization, namely
(1) Vice President (VP), (2) first-line managers, and (3) call-center employees. Each first-line
manager supervises between 10 and 15 call-center employees, and first-line managers are
subsequently supervised by the VP. The company comprises 8 departments. The first
department is the “Management team” and consists of the VP and the first-line supervisors.
The remaining 7 departments are called DEP9-DEP15 and consist of call-center employees
(CE1-CE91). The company relies on lateral communication within the management team to
ensure integration between different departments. The organizational chart is shown in
Figure 1. Please note that the LS1 has switched to DEP15.
VP operations (VP)
Supervisor DEP9 Supervisor DEP10 Supervisor DEP11 Supervisor DEP12 Supervisor DEP13 Supervisor DEP14 Supervisor DEP15
(LS2) (LS3) (LS4) (LS5) (LS6) (LS7) (LS1)
Employee Call Employee Call Employee Call Employee Call Employee Call Employee Call Employee Call
Center 1 (EC1) Center 13 (EC13) Center 28 (EC28) Center 41 (EC41) Center 56 (EC56) Center 68 (EC68) Center 82 (EC82)
Employee Call Employee Call Employee Call Employee Call Employee Call Employee Call Employee Call
Center 12 (EC12) Center 27 (EC27) Center 40 (EC40) Center 55 (EC55) Center 67 (EC67) Center 81 (EC81) Center 91 (EC91)
In the upcoming chapters, we will analyze the organizational network of this company. For
this purpose, we have received the following data from the VP:
1) Network data on who calls whom in the call center to discuss work-related topics. This
file is named “company_network_data.csv”.
2) Information on organizational members’ roles and department affiliations. This file
supplements the network data and helps to identify who is who in the network data
file. This file is called “company_attribute_data.csv”.
3) Information on how closely individuals work together in the company. This file is called
“company_edge_closeness.csv”.
-4-
3 ORGANIZATIONAL NETWORK ANALYSIS (ONA)
3.1 What is an organizational network (and why should we care)?
Employees in organizations typically do not work in isolation. Instead, they form relationships
or ‘ties’ with their colleagues for a variety of reasons. Sometimes employees connect to other
employees because they want to get their advice, coordinate work, or exchange best-
practices. At other times, employees connect because they share a field of interest and want
to discuss this subject with each other. An organizational network captures these
interpersonal ties between employees and, thus, indicates who is connected to whom in the
organization. Such organizational networks can be visualized as a network ‘plot’ or ‘network
graph’, which displays all the employees that work in the organization as dots and connects
the employees that share a tie with a line. The dots are called ‘nodes’ or ‘vertices’ and the
lines are the ‘ties’ or ‘edges’. Different colors or shapes are typically used to help distinguish
between nodes from different departments, hierarchical layers, or work fields. By looking at
the network graph, we thus get a good overview of how nodes are connected within and
across important groups in the organization. Figure 2 provides an example network plot.
Organizational networks can make or break an organization. Many organizations have been
successful because their employees have built broad networks that allow cross-fertilization of
-5-
ideas and company-wide coordination among different groups. In these organizations,
networks have enabled innovation and efficiency. In other organizations, organizational
networks are a source of inefficiency and ineffectiveness. What sometimes happens in these
organizations is that employees that should be working with each other, in fact, do not work
with each other. These employees will then fail to coordinate their work, align work schedules
or exchange information on their work progress. As a result, work will be delayed, deadlines
will be missed, and costly rework is inevitable due to coordination problems.
In addition to collecting data on who works with whom, we also need to collect background
information on the individuals in the organization, such as their department affiliation,
functional roles, etc. Later on, we can then use such information to diagnose, for example,
connectivity within and across departments. Also, we sometimes collect information on the
nature of specific ties between individuals. If a questionnaire approach is used, we can use
follow-up questions and ask individuals to rate the closeness or effectiveness of each of their
ties in the organization on a scale ranging from 1 (not at all) to 5 (to very high extent). It is
1
For an example study using e-mail data to analyze organizational networks, see Kleinbaum (2013).
2
For an example study using a questionnaire approach, see de Vries et al. (2014).
-6-
important to follow privacy and ethical rules when network data is collected or used. Following
these rules will, in almost all cases, require researchers to ask employees for consent, consult
an institutional research board (IRB), and aggregate/anonymize data.
3.2.2.1 Socio-matrix
In the second step, we need to structure data in a way that it can be analyzed with ONA.
Specifically, the collected information on who interacts with whom needs to be structured as
a big matrix, with the rows and the columns representing the individuals in the organization.
This matrix is called an adjacency matrix or a “socio-matrix”. The rows in a socio-matrix
indicate the “egos” in the network (i.e., the employees in the organization) and the columns
indicate the “alters” (i.e., the potential partners of the egos). If an ego has a tie with an alter,
this is indicated by placing a “1” in the cell of the matrix where the row of the ego intersects
the column of the alter. If an ego and an alter do not have a tie, this is indicated by a “0”. A
socio-matrix should be “square”, meaning that the number of egos (rows) should match the
number of alters (columns). Also, the list of alters and egos should be equal in length, should
include the exact same individuals, and should be ordered in the same way (e.g.,
alphabetically, based on team affiliation, etc.) to avoid running into errors when conducting
ONAs. An example socio-matrix is given in Table 1. In this matrix, the names of the employees
(John, Sally, Sue, and Rob) are displayed as both the row and the column headers. The 1s in
the table indicate the presence of a tie between an ego and an alter, so you can tell that Sally
has ties with John and Sue. The 0s indicate the absence of a tie, as is the case between Sally,
on the one hand, and Rob, on the other hand. The diagonal cells in a socio-matrix always
display 0s, as it is impossible for individuals to have ties with themselves.
Table 1 - Socio-matrix
The first column should be called “vertex.names” and provide the labels or names of the egos
in the organization. Subsequent columns provide information on additional background
characteristics, such as egos’ functions or hierarchical levels (see Table 2). Importantly, the
attribute list and socio-matrix should be ordered in the same way. For example, an individual
in row 56 (as an ego) of the socio-matrix, should also be in row 56 of the attribute list.
-7-
Table 2 – Vertex attribute list
Edge-attribute information needs to be structured as a matrix (see Table 3), similar to the
socio-matrix shown in Table 1. Whereas the socio-matrix provides information on the
presence of ties between employees, the edge-attributes matrix provides details on the
attributes of these ties. Table 1 indicates that John and Sally have a tie with each other, and
Table 3 indicates that this ties is of average closeness (i.e., a score of 3 on a 1-5 scale).
-8-
4 GETTING STARTED WITH ONA IN R-STUDIO
4.1 Why use R-studio for ONA?
There are several software packages that can be used to conduct ONA, including Pajek, Gephi,
UCINET, NodeXL, and R-studio. Here, we rely on R-studio for the following reasons. First of all,
R-studio is a general statistical programming language. Mastering R-studio will therefore not
only allow you to conduct ONA, it will also enable you to learn other types of analyses that
you might want to conduct in the future (e.g., structural equation modeling, multilevel
regressions, etc.). Second, R-studio is open-source software, which has enabled many
software developers to contribute to R-studio by developing dedicated modules or ‘packages’
for conducting, among other things, sophisticated ONA. Among these packages are “statnet”
and “sna”, which we will primarily rely on in this book. We will use these packages in
combination with more general statistical or data management packages included in R-studio,
thereby benefiting from R-studio’s versatility. Thirdly, because the popularity of R-studio,
there are a lot of on-line resources to help you get started.
After installing these files, run R-studio and you should get an opening screen resembling
Figure 3. In this screen, you see the “console,” which is used to enter code into R-studio, as
well as tabs on the right for “Environment” and “Packages”. The environment tab displays the
data objects that are loaded into R-studio. Later on you will load a socio-matrix into R-studio.
That data will then be displayed in the environment as a data object under the label you have
assigned to it. The packages tab displays the software modules that have been installed in R-
studio. Packages that have been ticked are loaded and can be used.
1) First, we have to load the packages that enable R-studio to recognize the network
data. To be exact, we need to load the packages “statnet” and “sna”.
2) Second, we need to load the socio-matrix in R-studio.
-9-
Figure 3 - R-studio opening screen
3) Third, we need to tell R-studio which row and column in the socio-matrix (step 2)
belongs to which ego and alter. In other words, we have to load the vertex attribute
list (i.e., characteristics of the employees in the network, such as their names, function,
team affiliation, etc.) and attach it to the socio-matrix.
4) Fourth, for some analyses, we need to specify the nature of the tie between two nodes.
We could, for example, add information on how closely two individuals are connected
to each other on a 1 to 5 scale (1 = not close all ~ 5 = very close). To add this
information, we need to load the edge attribute information.
We will discuss these steps, and associated R-studio code, in depth below.
- 10 -
Figure 4 - Installing packages via the menu structure
For the console option, it is good practice to start with opening a new “script” file. A script file
allows you to save all the code that you entered in a R-studio session. This will enable you to
replicate your actions at a later time and rerun analyses, data manipulations, etc. To create a
new script, click on File > New file > R Script (or simply press CTRL + Shift + N). The new script
file is then automatically opened and the cursor is placed in line 1 of the script. Here you can
start entering code. Please save your script file at regular intervals. To install the package
statnet and sna, please enter the following code to your R script:
install.packages("statnet")
install.packages("sna")
Then locate your cursor in this code line and press CTRL + ENTER to execute the code. Once
this code is executed, the package is installed, added to the library, and automatically loaded.
If you exit R-studio and open it again, the package will still be in your R library, but it is not
loaded automatically. Hence, you have to load all relevant packages each time you start up R-
studio. To load a pre-installed package from the library, add the following code to your script
file and again press CTRL + ENTER to run it:
library("statnet")
library("sna")
getwd ()
in the script file, followed by CTRL + Enter, to discover the current working directory. If you
are satisfied with the current working directory’s location, simply copy your data files to this
directory. If not, you can change the working directory by executing the following code:
setwd("X:/ONA")
This will change the working directory to the directory specified in the quotation marks. After
changing the directory, copy the data to the new working directory (this is not done
- 11 -
automatically). Note that MAC OS or Linux computers require backward slashes (i.e., “\”)
instead of forward slashes (i.e., “/”) to specify the location of the working directory.
Once the data has been copied into the working directory, we can load the data. Usually,
network data is saved in the comma-delimited format (.csv). The socio-matrix network data
of our case organization is, for example, stored in the file “company_network_data.csv”.
Please do not work on this file in Microsoft Excel, as this may change the format of the data
file and make it incompatible with R-studio. We can load that data as a data object in the R-
studio environment by adding the following code to the script:
Executing this code will load the “company_network_data.csv” file and store it in the R-studio
environment as a data object that is labeled “network_data”. We can use whatever label we
like. As is common for network data, the.csv file does not contain row or column names and
R-studio needs to be informed about that in the code. This is done by adding header=F. This
tells R-studio that the data file only contains tie information and excludes any attribute
information about the nodes in the network (such as names).
Next, we need to specify that the “network_data” object represents a network, as this will
allow us to use this data in subsequent organizational network analyses. To do so, we inform
R-studio that the data stored under the label network_data is, in fact, a socio-matrix (rather
than a ‘normal’ matrix). We can do this by using network():
To check whether R-studio now correctly recognizes our network data, we can simply enter
the name of the network data object and press CTRL + ENTER:
network_data_sociomatrix
The R-studio output will then give you the summary statistics of the network data that was
loaded, as displayed below. The output will list the number of employees in the network (i.e.,
number of vertices; 99 in our case), the number of ties between these employees (i.e., number
- 12 -
of edges; 210 in our case), and whether the network is directed or not. It is good practice to
carefully review this information before you proceed further.
network_data_sociomatrix
Network attributes:
vertices = 99
directed = FALSE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges= 210
missing edges= 0
non-missing edges= 210
No edge attributes
After completing these steps, we now have successfully loaded the socio-matrix in R-studio.
In the next steps, we will add the vertex and edge attributes to this socio-matrix.
We will start with discussing how we can load vertex attributes from an external file. This file
is located on Nestor and is called “company_attribute_data.csv”. This file has 99 rows (plus
one row for the column header), with one row for each individual within the network we
loaded in the previous step. Moreover, there are two columns in the file. The first column
specifies an anonymized name (called “vertex.names”) of each employee, and the second
column specifies the department affiliation of employees (with a number, called “dep”). We
first need to load the .csv file as a data object and assign a label to it (we will use “attributes”),
as we did for the socio-matrix:
Then we need to attach the information in the data object “attributes” to the socio-matrix
that we labeled “network_data_sociomatrix” using set.vertex.attribute as following:
- 13 -
set.vertex.attribute(network_data_sociomatrix,names(attributes),
attributes)
We can check if the attribute was added correctly by executing the code:
list.vertex.attributes(network_data_sociomatrix)
[1] "dep" "na" "vertex.names"
This will list the labels of the attributes that we added to the socio-matrix, as taken from the
column headers of the attributes file. In our case, two attributes were added, namely
“vertex.names” and “dep”. Sometimes an empty attribute called “na” vertex attribute is also
mentioned in the list. This “na” vertex attribute can be ignored. The “vertex.names” attribute
is standardly created when you load a socio-matrix into R-studio. However, until we load the
vertex attributes, this attribute is not populated (i.e., it is empty).
We can then display vertex attribute values by using get.vertex.attribute followed by the
name of the attribute, as shown below. R-studio displays the vertex attributes “vertex.names”
and “dep” in the wide format (from left to right). The number placed in brackets indicates the
position of the first number in the subsequent row. For example, [52] next to the name label
CE44 indicates that CE44 is in the 52th position in the socio-matrix. CE45 is in the 53th position,
CE46 in the 54th position, etc.
get.vertex.attribute(network_data_sociomatrix,"vertex.names")
[1] VP LS1 LS2 LS3 LS4 LS5 LS6 LS7 CE1 CE2 CE3 CE4 CE5 CE6 CE7 CE8 CE9
[18] CE10 CE11 CE12 CE13 CE14 CE15 CE16 CE17 CE18 CE19 CE20 CE21 CE22 CE23 CE24 CE25 CE26
[35] CE27 CE28 CE29 CE30 CE31 CE32 CE33 CE34 CE35 CE36 CE37 CE38 CE39 CE40 CE41 CE42 CE43
[52] CE44 CE45 CE46 CE47 CE48 CE49 CE50 CE51 CE52 CE53 CE54 CE55 CE56 CE57 CE58 CE59 CE60
[69] CE61 CE62 CE63 CE64 CE65 CE66 CE67 CE68 CE69 CE70 CE71 CE72 CE73 CE74 CE75 CE76 CE77
[86] CE78 CE79 CE80 CE81 CE82 CE83 CE84 CE85 CE86 CE87 CE88 CE89 CE90 CE91
99 Levels: CE1 CE10 CE11 CE12 CE13 CE14 CE15 CE16 CE17 CE18 CE19 CE2 CE20 CE21 CE22 ... VP
get.vertex.attribute(network_data_sociomatrix,"dep")
[1] 1 2 3 4 5 6 7 8 9 9 9 9 9 9 9 9 9 9 9 9 10 10 10 10 10 10 10 10 10
[30] 10 10 10 10 10 10 11 11 11 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12
[59] 12 12 12 12 12 13 13 13 13 13 13 13 13 13 13 13 13 14 14 14 14 14 14 14 14 14 14 14 14
[88] 14 14 15 15 15 15 15 15 15 15 15 15
There are two ways to load edge attribute data, method 1 and method 2. In method 1, we use
the edge-attribute file to directly load both the socio-matrix and edge attributes. In method 2,
we first load the socio-matrix (from a separate file) and then add the edge-attributes to that
socio-matrix. Which method we need depends on the number of edge attributes we want to
add. If there is only one edge attribute (e.g., only closeness), we can use method 1. Method 1,
however, does not work when multiple edge attributes need to be added (e.g., in case we
- 14 -
want to add tie closeness and tie effectiveness as edge attributes). If we would use method 1
in this case, edge-attributes will be overwritten. Hence, if we need to add multiple edge
attributes, we should use method 2. Using method 2, we can add as many edge attributes as
we like. Please note that it is also possible to start with method 1 to load the socio-matrix and
the first edge attribute (e.g., tie closeness) and then use method 2 to add additional edge
attributes (e.g., tie effectiveness). Below, we discuss how we can use method 1 and 2 to load
a single edge attribute, called closeness.
Method 1 involves loading the edge attribute file directly as a socio-matrix. In this case, we
have to tell R-studio that we want to load a network socio-matrix from a matrix file that
contains valued (i.e., in our case, values range from 0 to 5) rather than binary data (i.e., 0). In
addition, we have to tell R-studio what to do with the valued data. In our case, the values
denote closeness scores and, thus, should be stored as an edge attribute called “closeness”.
By specifying ignore.eval=FALSE and names.eval=”closeness”, R-studio will do exactly
that and retain the values (rather than converting them into binary data) and save the values
as an edge attribute labeled “closeness”. As before, we need to specify that the network data
is undirected by adding directed=FALSE to the code.
network_data_sociomatrix<-
as.network.matrix(edge,directed=FALSE,ignore.eval=FALSE,names.eval="closene
ss")
In method 2, we attach the edge attribute information to an already loaded socio-matrix (for
example, a socio-matrix loaded by using method 1). This requires us to complete the following
steps. We first need to load the edge attribute .csv file and assign a label to it (we will use the
label “edge”) using read.csv with header=F. Then, we need to attach the information from
the edge file to the socio-matrix we already loaded and have labeled
“network_data_sociomatrix”. We can use the set.edge.value for this. We label the edge
attribute we want to add “closeness”:
network_data_sociomatrix<-set.edge.value(network_data_sociomatrix,
'closeness',edge)
Afterwards, we should check whether the edge closeness information was attached correctly
to the socio-matrix by requesting some statistics (average, minimum, maximum score) using
summary, as shown below. In the summary code, we indicate the label of the socio-matrix,
followed by %e% and the name of the edge attribute we want to inspect. The %e% tells R-
studio that we are referring to an edge attribute embedded within
“network_data_sociomatrix” (for vertex attributes, you can use %v%).
In addition, we can display the raw closeness scores in the socio-matrix by using
get.edge.attribute. R-studio will then display the edge attribute scores for all existing ties.
- 15 -
In our case, there are 210 edges in the network, so R-studio correspondingly displays a list
comprising 210 edge attribute scores.
get.edge.attribute(network_data_sociomatrix,"closeness")
[1] 2 4 2 4 1 2 2 5 4 5 4 3 2 1 4 5 2 5 1 5 5 2 3 5 4 5 4 2 5 3 5 4 5 1 5 4 4 3 2 1 2 4 4 5 4 3 2
[48] 3 3 2 4 3 1 5 1 2 3 2 1 3 1 3 2 4 2 2 3 4 1 5 5 4 4 4 1 1 4 3 4 3 1 4 1 3 2 3 5 3 5 1 5 2 2 3
[95] 4 5 5 5 2 1 2 4 2 3 5 2 2 3 4 4 2 5 3 5 5 2 5 2 5 5 5 4 3 3 5 5 1 4 5 1 1 5 3 3 1 2 3 5 5 3 1
[142] 5 2 2 1 1 4 2 3 2 3 2 3 5 4 5 4 5 4 5 1 2 1 2 1 3 3 1 4 5 5 5 1 3 1 3 4 4 5 2 5 1 3 2 3 5 5 4
[189] 4 3 2 2 5 4 2 5 2 1 5 1 1 5 5 5 5 3 4 2 3 5
Finally, we can request the summary statistic for the socio-matrix (as we did in 4.3.2) to check
if closeness is now added as an edge attribute to the network. As shown below, the summary
output now lists “closeness” as an edge attribute, as well as the vertex attributes that we
added in 4.3.3.
network_data_sociomatrix
Network attributes:
vertices = 99
directed = FALSE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges= 210
missing edges= 0
non-missing edges= 210
- 16 -
5 NETWORK DATA MANIPULATION IN R-STUDIO
5.1 Accessing network data in R-studio
Before we start analyzing the network data, let us first shortly explore how we can access
network data that includes vertex and edge attribute information.
list.vertex.attributes(network_data_sociomatrix)
[1] "dep" "vertex.names"
list.edge.attributes(network_data_sociomatrix)
[1] "closeness"
Once we know the names of the vertex and edge attributes, we can display the raw values of
these attributes by using get.vertex.attributes and get.edge.attributes, followed by
the label of the network data object and attribute, as shown below for the attribute “dep”.
This will provide a list of the attribute scores of the egos in the network. Again, the order of
the attribute values corresponds to the order of egos in the socio-matrix, such that the person
who is in position 1 in the attribute list is also in position 1 of the socio-matrix.
get.vertex.attribute(network_data_sociomatrix,"dep")
[1] 1 2 3 4 5 6 7 8 9 9 9 9 9 9 9 9 9 9 9 9 10 10 10 10 10 10 10 10 10
[30] 10 10 10 10 10 10 11 11 11 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12
[59] 12 12 12 12 12 13 13 13 13 13 13 13 13 13 13 13 13 14 14 14 14 14 14 14 14 14 14 14 14
[88] 14 14 15 15 15 15 15 15 15 15 15 15
as.sociomatrix(network_data_sociomatrix)["CE64","LS4"]
[1] 0
- 17 -
If you want to refer to row and column numbers (instead of using the row and column headers,
as we did above), we need to complete a couple of additional steps. First, we need discover
which individual is in which row (as an ego). For this purpose, we can retrieve information on
the vertex.names values and positions in the network file, as shown below.
get.vertex.attribute(network_data_sociomatrix,"vertex.names")
[1] VP LS1 LS2 LS3 LS4 LS5 LS6 LS7 CE1 CE2 CE3 CE4 CE5 CE6 CE7 CE8 CE9
[18] CE10 CE11 CE12 CE13 CE14 CE15 CE16 CE17 CE18 CE19 CE20 CE21 CE22 CE23 CE24 CE25 CE26
[35] CE27 CE28 CE29 CE30 CE31 CE32 CE33 CE34 CE35 CE36 CE37 CE38 CE39 CE40 CE41 CE42 CE43
[52] CE44 CE45 CE46 CE47 CE48 CE49 CE50 CE51 CE52 CE53 CE54 CE55 CE56 CE57 CE58 CE59 CE60
[69] CE61 CE62 CE63 CE64 CE65 CE66 CE67 CE68 CE69 CE70 CE71 CE72 CE73 CE74 CE75 CE76 CE77
[86] CE78 CE79 CE80 CE81 CE82 CE83 CE84 CE85 CE86 CE87 CE88 CE89 CE90 CE91
Based on this output, we know that “CE64” is in position 72 of the socio-matrix. This means
that this person, correspondingly, is in row 72 as an ego and column 72 as an alter (remember
that rows and columns of a socio-matrix have equal lengths and are ordered in the same way).
Now, suppose that we want to know if “CE64” has a tie with “LS4”. From inspecting the
vertex.names values and positions, we know that “LS4” is in position 5 in the socio-matrix. We
already knew that “CE64” is in position 72. Combined, this tells us that the information on
whether or not “CE64” and “LS4” have a tie is located in the cell of the socio-matrix where row
72 meets column 5. To access this information, we can use the code shown below. We find
that the cell on the intersection of row 72 and column 5 indicates a “0”, which means that
there is no tie between “CE64” and “LS4”.
as.sociomatrix(network_data_sociomatrix)[72,5]
[1] 0
Sometimes, we want to know whether an individual has ties with multiple alters from a larger
group, such as his or her team or department. For example, we might want to know if the VP
of the case organization has ties with other members of the management team (i.e., LS1 –
LS7). To get this information, we should retrieve the values of the cells where the row of the
VP (row 1) intersects with the columns of the other leaders (columns 2:8). To do this, we can
refer to the row number and the range of columns in the socio-matrix:
as.sociomatrix(network_data_sociomatrix)[1,2:8]
LS1 LS2 LS3 LS4 LS5 LS6 LS7
1 1 1 1 1 1 1
Alternatively, we can access this information by referring to the row and column names, using
the code below. In this code, we need to use c, followed by a list of the column names in
quotation marks and separated by commas, to refer to multiple columns simultaneously. It is
not possible to directly refer to a range of column names (i.e., LS2:LS7); this is only possible
when using row or column numbers.
as.sociomatrix(network_data_sociomatrix)["VP",c("LS1","LS2","LS3","LS4","LS5","LS6"
,"LS7")]
LS1 LS2 LS3 LS4 LS5 LS6 LS7
1 1 1 1 1 1 1
- 18 -
Table 4 shows additional examples of how to obtain information on ties by referring to specific
(ranges) of rows and columns using as.sociomatrix. Here “x” denotes the label of the
network data object.
If we want to calculate the number of ties for certain individuals, we need to refer to the rows
of these individuals in the socio-matrix. For example, we might be interested in how many ties
certain managers have with all other members of the company. We know that managers are
in positions 1:8 in the network. Hence, we can use the following code to calculate the total
number of ties of each manager:
rowSums((as.sociomatrix(network_data_sociomatrix)[1:8,]))
VP LS1 LS2 LS3 LS4 LS5 LS6 LS7
21 12 20 12 20 23 15 19
We can also calculate how many ties certain individuals have with (a group of) certain other
individuals. For example, we can calculate how many ties each manager has with alters located
in DEP9 (i.e., CE1 – CE12). In that case, we need to refer to the rows of the egos and the
columns of alters in the socio-matrix. Managers are located in rows 1-8 (as egos), while CE1 –
C12 are located in columns 9-20 (as alters). Hence, to calculate managers’ ties with members
- 19 -
from DEP9, we can use the code specified below. Based on the output, we can conclude that
only manager “LS2” has direct ties with members of DEP9. This makes sense, considering that
“LS2” is the supervisor of DEP9 (see Figure 1).
rowSums((as.sociomatrix(network_data_sociomatrix)[1:8,9:20]))
VP LS1 LS2 LS3 LS4 LS5 LS6 LS7
0 0 12 0 0 0 0 0
Similar to the socio-matrix, a row in the attributes data object corresponds to a certain
individual. For example, the first row of the attributes file provides information on the VP.
Furthermore, the order of individuals in the attribute file is consistent with that of the socio-
matrix, such that the VP should also be in the first row of socio-matrix (as an ego). Unlike the
socio-matrix, however, the columns represent variables (rather than alters, as in the socio-
matrix). These variables represent individuals’ attributes. To learn more about the variables
included in the attributes data object, we can use head followed by the label of the attributes
data object. This will display the variable names (i.e., vertex attributes in our case) and the
values for the first five egos. When applied to the attributes data object “attributes”, R-studio
returns the names and first 5 values of the attributes “vertex.names” and “dep”:
head(attributes)
vertex.names dep
1 VP 1
2 LS1 2
3 LS2 3
4 LS3 4
5 LS4 5
6 LS5 6
To display the attributes of only certain individuals, we can refer to the rows of these
individuals within the matrix of the attributes data object. For example, to display the
attributes of the DEP9 members, we can refer to rows 9:20 (see code below). We can also
display a selection of attributes by refer to the column numbers or labels of these attributes.
The second example (shown below) details the code for displaying only the attribute “dep”
for members of DEP9. Note that we do not need to convert the attributes file to a matrix (i.e.,
by using as.matrix), because this file is already formatted as a matrix.
attributes[9:20,]
vertex.names dep
9 CE1 9
10 CE2 9
11 CE3 9
12 CE4 9
13 CE5 9
14 CE6 9
15 CE7 9
- 20 -
16 CE8 9
17 CE9 9
18 CE10 9
19 CE11 9
20 CE12 9
attributes[9:20,"dep"]
[1] 9 9 9 9 9 9 9 9 9 9 9 9
Finally, we can display all values of an attribute (or any variable within any data object) by
referring to the label of the data object followed by $ and the label of the variable. To display
the attribute “dep”, for example, we can use the following code.
attributes$dep
[1] 1 2 3 4 5 6 7 8 9 9 9 9 9 9 9 9 9 9 9 9 10 10 10 10 10 10 10 10 10
[30] 10 10 10 10 10 10 11 11 11 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12
[59] 12 12 12 12 12 13 13 13 13 13 13 13 13 13 13 13 13 14 14 14 14 14 14 14 14 14 14 14 14
[88] 14 14 15 15 15 15 15 15 15 15 15 15
In our dataset, the supervisors have the vertex.names: “VP” and “LS1-LS7”. To create a new
socio-matrix that excludes individuals that have these vertex.names values, we
correspondingly use the following code:
As noted above, which allows us to select certain cases. In order to work properly, which
needs to be followed by the name of the variable on which we want to filter
(network_data_sociomatrix %v% “vertex.names” in our case) followed by the value that
should be included or excluded (here, cases should be excluded if their vertex.names value is
“VP” or “LS1”-“LS7”). In specifying the value for filtering, != means “does not equal”, while ==
means “equals”. Moreover, which is flexible to multiple criteria. We can add additional criteria
by using the & or | signs. The | sign means “OR”, while & means “AND”. After executing this
code, we can request the summary for the new, filtered socio-matrix that we labeled
- 21 -
“network_data_workers”. As shown below, 91 vertices remain in the new socio-matrix
(compared to 99 in the original socio-matrix). Hence, we have successfully omitted the 8
supervisors from the socio-matrix.
Network_data_workers
Network attributes:
vertices = 91
directed = FALSE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges= 88
missing edges= 0
non-missing edges= 88
Alternatively, we could plot the filtered socio-matrix and inspect if supervisors have been
omitted. For this purpose, we can use gplot. This results in Figure 5.
gplot(Network_data_workers,displaylabels=TRUE)
- 22 -
As expected, the supervisors are omitted in Figure 5. Also, there are now a lot of isolates in
the organizational network (employees that have no ties). For improving the visualization of
the network, we can choose to omit these isolates and then plot the network again. We can
use isolates to identify isolates and, subsequently, delete.vertices to remove the
identified isolates from the filtered socio-matrix (see Figure 6):
isolates(Network_data_workers)
delete.vertices(Network_data_workers,isolates(Network_data_workers))
gplot(Network_data_workers,displaylabels=TRUE)
- 23 -
2) We then need to create a new socio-matrix from the filtered edge-attribute file. The
new socio-matrix file should assign a “1” to ties that score 3 or higher on the edge
attribute “closeness”. Closeness information should be available for all included ties.
3) We need to reinsert the vertex attributes to the new socio-matrix created in step 2.
For step 1, we can use the code shown below. Do not forget to include header=F, because R-
studio will otherwise mistakenly add column headers to the edge attribute matrix, which will
cause the subsequent analyses to fail. After we loaded the .csv file, we need to replace all
values <3 with a 0 in “edge_closeness_filtered” using the code shown below:
edge_closeness_filtered<-read.csv("company_edge_closeness.csv", header=F)
edge_closeness_filtered[edge_closeness_filtered<3]<-0
In the next step, we load the filtered data as a network using the code we discussed in 4.3.4.
Because we now load valued rather than binary network data (i.e., values range from 0 to 5),
we have to tell R-studio what to do with the valued data. In our case, the values denote
closeness scores and, thus, should be stored as an edge attribute. By specifying
ignore.eval=FALSE and names.eval=”closeness”, R-studio will do exactly that and retain
the values (rather than converting them into binary data) and save the values as an edge
attribute labeled “closeness”. As before, we need to specify that the network data is
undirected by adding directed=FALSE to the code.
network_data_close<-
as.network.matrix(edge_closeness_filtered,directed=FALSE,ignore.eval=FALSE,
names.eval="closeness")
We can check the new network data by requesting summary statistics, simply by typing in the
label of the new socio-matrix and pressing CTRL+ENTER. This results in the information shown
below. You will notice that the number of vertices has remained the same (99), which is
correct, since we did not filter out any vertices. However, the number of edges between these
vertices has reduced from 210 to 180 (see below). Apparently, 30 ties were filtered out
because their closeness scores were below the threshold of 3.
network_data_close
Network attributes:
vertices = 99
directed = FALSE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges= 180
missing edges= 0
non-missing edges= 180
- 24 -
5.3.1 Focusing on ego-networks of individual employees
Until now, we focused on the whole organization. In many cases, however, we are not
interested in the organizational network as a whole, but want to focus on the network of
specific employees (or specific teams, as we will discuss in 5.3.2). To focus on a specific
individual, we will need to extract the edges from the larger network that are relevant for
understanding the individual under study (i.e., the focal individual or ‘node’). We thus need to
create a so-called “ego network” for the focal node. An ego network “consists of a focal node
("ego") and the nodes to whom ego has ties with (these are called "alters") plus the ties, if
any, among the alters.”3
To extract an ego network, we need to omit all vertices from the organizational network that
have no direct ties with the focal node. Extracting an ego network therefore requires us to do
more than simply filtering out certain vertices or edges (as we did in 5.2 and 5.3). To create an
ego network, we need to find out which individuals have ties with the focal person and then
create a network that only includes these vertices, as well as the focal node.
Let’s suppose we want to zoom in on the VP’s ego network in our case organization. As a first
step, we need to add a vertex attribute to the socio-matrix that indicates which individuals
have a tie with the VP. Later on, we can then exclude vertices that have no ties with the VP,
and thus have no place in the VP’s ego network. Here’s the procedure for doing this in R-
studio. First, we need to create a new row in the attributes file that indicates if an individual
has a tie with the VP or not. Specifically, we need to create a new (empty) row called
“ties_with_VP” in the data object “attributes” by entering the code:
attributes$ties_with_VP. We subsequently need to tell R-studio which values (from which
data source) we want to use to populate this new, empty row. In our case, the new row should
indicate whether the individual to which the row belongs (please note that each row refers to
a different individual in the attributes file) has a tie with the VP (1=yes, 0=no). Information on
whether individuals have ties with the VP is captured in the socio-matrix that we labeled
“network_data_sociomatrix”. To access this data, we need to the refer to the column in the
socio-matrix that specifies whether individuals have a tie with the alter “VP”. Importantly, that
information is located in the column with the name “VP”. To refer to this location in the socio-
matrix, we use the [x,y] format, where the x refer to the rows in the matrix and y refers to
columns in the matrix (please see 5.1.2 for details). In our case, we specify
network_data_sociomatrix[,"VP"] and use the data that is located in this column to
populate our new column “ties_with_VP” using the code shown below:
attributes$ties_with_VP<-network_data_sociomatrix[,"VP"]
After inspecting this new column (by executing the code: attributes$ties_with_VP), we
learn that the VP has a value of 0 in the new column. This means that the VP will be excluded
from his or her own ego network when we would filter based on the “ties_with_VP” attribute.
To fix that, we can assign a value of 1 to the VP’s cell in the “ties_with_VP” attribute within
the attributes data object:
attributes$ties_with_VP[attributes$vertex.names=="VP"]<-1
3
http://www.analytictech.com/networks/egonet.htm
- 25 -
Once we added the “ties_to_VP” attribute to the attributes data object, we use
set.vertex.attributes to attach the updated attributes data object to the socio-matrix
(see 4.3.3). Afterwards, we can use get.inducedSubgraph to create a new socio-matrix that
excludes vertices that have no ties with the VP (see below). We label this socio-matrix
“ego_network_VP”. In the final step, we plot this ego network using gplot. We should then
end up with Figure 7.
set.vertex.attribute(network_data_sociomatrix,names(attributes),
attributes)
ego_network_VP=get.inducedSubgraph(network_data_sociomatrix,which
(network_data_sociomatrix %v% "ties_with_VP" == "1"))
gplot(ego_network_VP,displaylabels=TRUE)
Let’s practice with calculating the team ego network for “department 9” (i.e., DEP9) in the
case organization that we have been analyzing up to this point. First, we need to figure out
who works in this department. We can do this be displaying the attribute data object we
- 26 -
loaded before. Execute the code attributes and you get the list shown below4. Looking at
this output, it becomes clear that DEP9 has 12 members, CE1 to C12. In addition, from other
company records, we know that DEP9 is supervised by LS2 (see Figure 1). Please note that
these members are located in rows 3 & 9-20 of the attributes file. Importantly, the rows and
columns of the socio-matrix mimic this order of individuals in the attributes list. Remember
from 3.2 that the rows in the socio-matrix indicate the “ego” (i.e., the employees) and the
columns of the socio-matrix specify their “alters” (i.e., their partners). An ego has a tie with an
alter when the socio-matrix shows a 1 in the cell where the row of the alter meets the column
of the alter. Translating this to our case organization, this means that an individual has a tie
with DEP9 when he/she has at least one non-zero value in columns 3 & 9-20. These are the
individuals we need to include in DEP9’s ego network.
attributes
vertex.names dep
1 VP 1
2 LS1 2
3 LS2 3
4 LS3 4
5 LS4 5
6 LS5 6
7 LS6 7
8 LS7 8
9 CE1 9
10 CE2 9
11 CE3 9
12 CE4 9
13 CE5 9
14 CE6 9
15 CE7 9
16 CE8 9
17 CE9 9
18 CE10 9
19 CE11 9
20 CE12 9
...
Now that we know how to locate partners (i.e., alters) of DEP9, we can calculate a new
attribute that indicates whether an individual has a tie with DEP9. To create this new attribute,
we follow the same procedure we discussed in 5.3.1 and first add an empty column to the
attributes data object. We label this column “ties_with_team” and use the code
attributes$ties_with_team. Next, we calculate this attribute as the number of ties an
employee has with DEP9 members. To do so, we simply sum up the 1s that are listed in
columns 3 & 9-20 for each employee using rowSums. The c(3,9:20) code allows us to refer
to columns 3 and column 9:20 in a single code line (see 5.1.2). We then use that value from
the rowSums function to define the new attribute we created earlier:
attributes$ties_with_team<-
rowSums((network_data_sociomatrix[,c(3,9:20)]),na.rm = F)
4
See 4.3.3 for information on how to load the attributes file.
- 27 -
Next, we assign a 1 to the new “ties_with_team” attribute for all the team members of DEP9,
because we want to make sure that they will be included in the ego network (even when they
did not interact with their fellow team members):
attributes$ties_with_team[attributes$vertex.names=="CE1"]<-1
attributes$ties_with_team[attributes$vertex.names=="CE2"]<-1
attributes$ties_with_team[attributes$vertex.names=="CE3"]<-1
attributes$ties_with_team[attributes$vertex.names=="CE4"]<-1
attributes$ties_with_team[attributes$vertex.names=="CE5"]<-1
attributes$ties_with_team[attributes$vertex.names=="CE6"]<-1
attributes$ties_with_team[attributes$vertex.names=="CE7"]<-1
attributes$ties_with_team[attributes$vertex.names=="CE8"]<-1
attributes$ties_with_team[attributes$vertex.names=="CE9"]<-1
attributes$ties_with_team[attributes$vertex.names=="CE10"]<-1
attributes$ties_with_team[attributes$vertex.names=="CE11"]<-1
attributes$ties_with_team[attributes$vertex.names=="CE12"]<-1
Subsequently, we attach the updated attributes data object to the socio-matrix (see code
below). We can then create the DEP9 ego network by defining a new network that includes
only vertices from the organizational network that have a score of at least 1 (>= 1) on the
“ties_with_team” attribute. As before, we use get.inducedSubgraph for this purpose.
set.vertex.attribute(network_data_sociomatrix,names(attributes),
attributes)
ego_network_team<-get.inducedSubgraph(network_data_sociomatrix,which
(network_data_sociomatrix %v% "ties_with_team" >= "1"))
gplot(ego_network_team,displaylabels=TRUE)
Finally, we can plot the ego network of DEP9 using gplot. The result should look like Figure 8.
Based on this figure, we can conclude that DEP9 has a highly centralized network, as all ties
are mediated by the supervisor (LS2). Also, all external team ties are mediated by LS2.
- 28 -
6 CALCULATING NETWORK STATISTICS
6.1 Organization-level discriptives
It is good practice to get a ‘feel’ for the general properties of an organization’s overall network,
once we have successfully loaded the data in R-studio. There are five network properties that
give a good overview of a network: size, density, centralization, fragmentation, and degree of
separation. The following sections will explain what these properties represent on a
conceptual level and how to calculate these network statistics using R-studio.
network.size(network_data_sociomatrix)
[1] 99
Alternatively, we can type in the name of the socio-matrix and execute the code. This will give
us overall statistics on the organization’s network, including the number of vertices, number
of edges, and whether the network is directed or not. Using this approach, we again find that
the network labeled “network_data_sociomatrix” comprises of 99 vertices:
network_data_sociomatrix
Network attributes:
vertices = 99
directed = FALSE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges= 210
missing edges= 0
non-missing edges= 210
No edge attributes
summary(network_data_sociomatrix,print.adj = FALSE)
Network attributes:
vertices = 99
directed = FALSE
hyper = FALSE
- 29 -
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges = 210
missing edges = 0
non-missing edges = 210
density = 0.04329004
Vertex attributes:
dep:
integer valued attribute
99 values
ties_with_team:
numeric valued attribute
attribute summary:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.0000 0.0000 0.3333 0.0000 12.0000
ties_with_VP:
numeric valued attribute
attribute summary:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.0000 0.0000 0.2222 0.0000 1.0000
vertex.names:
character valued attribute
99 valid vertex names
Edge attributes:
closeness:
integer valued attribute
210values
gden(network_data_sociomatrix)
[1] 0.04329004
We obtain a network density score of .0433, which means that 4.33% of all possible network
ties are actually activated in the organization. Based on this information, it is possible to
conclude that employees, in general, are not heavily connected with each other. There is a
good indication that at least some parts of the organization maintain little to no contact with
each other. This provides a good reason for conducting follow-up analyses (e.g., by looking at
the ego networks of teams or individuals) to determine which parts that might be.
- 30 -
have many direct ties with each other. Instead, employees connect with each other through a
central intermediate person. In highly decentralized networks, on the other hand, employees
connect directly with each other without going through an intermediary person.
Centralization has advantages and disadvantages. High centralization can enable efficiency
and oversight, as employees do not have to spend much of their time on interacting with other
individuals (they can just talk to one or a few central persons). Moreover, the central person
is in a unique position to develop a ‘big picture’ of what is going on in the organization and,
subsequently, to distribute that knowledge throughout the organization. After all, this person
is connected to a large number of employees. On the negative side, the central persons may
become overloaded and employees may miss the opportunity to engage in open problem
solving and debate with fellow colleagues. Hence, innovation and creativity may decrease in
centralized organizations. Whether the benefits outweigh the detriments of centralization
depends on the organization’s strategy and goals.
We can use centralization to calculate the degree to which ties between employees are
mediated by (a) central person(s). This will return a value ranging between 0 (the network is
highly decentralized; everyone has direct ties with everyone else in the organization) to a
theoretical maximum of 1 (all ties are mediated by one single central person).
centralization(network_data_sociomatrix,degree)
[1] 0.1953503
components(network_data_sociomatrix)
[1] 1
- 31 -
of 1 indicates that all employees are directly connected with each other, as they are separated
by one tie at the most (their direct tie) to any other individual. A maximum degree of
separation of 10, on the other hand, indicates that at least some individuals need to go
through 9 other individuals (connecting to friends of friends of friends) to reach some of their
colleagues. Besides the maximum, we can also calculate the average degree of separation.
This indicates how many intermediate colleagues an employee, on average, needs to go
through in order to connect to a specific other employee in the organization.
There could be problems in the organization if we find a very low or a very high average and
maximum degree of separation. When the degree of separation is very low, employees may
have so many direct contacts that they may suffer from “collaboration overload”. When that
happens, individuals spend so much time collaborating with each other that they are left with
little time to finish their actual work in the organization. By contrast, when the degree of
separation is very high, it could take very long before important information and know-how
reaches all individuals in the organization, because such information and know-how has to go
through many intermediary contacts. Again, we may need to execute follow-up organizational
network analyses to get to the bottom of this.
When the organization network comprises one component (i.e., there is no fragmentation,
see 6.1.4), we can directly calculate the degree of separation. To do so, we first create a new
matrix that indicates the degree of separation between all possible dyads in the overall
organization. This can be done by using geodist, followed by the label of the organization’s
socio-matrix. This will create a column called “gdist” in the data object we defined (here:
“degree_of_separation”). Next, we can calculate the maximum and average degree of
separation using mean and max:
degree_of_separation<-geodist(network_data_sociomatrix)
mean(degree_of_separation$gdist)
[1] 2.913988
max(degree_of_separation$gdist)
[1] 5
When the overall network comprises several components, it is not possible to directly
calculate the degree of separation. This is because subgroups are not connected and, thus,
impossible for employees to reach all other employees in the organization. Calculating the
overall degree of separation then does not make sense. We can, however, calculate the
degree of separation within the largest component in the organization:
network_largest_component<-
component.largest(network_data_sociomatrix,result="graph")
degree_of_separation_component<-geodist(network_largest_component)
max(degree_of_separation_component$gdist)
[1] 5
mean(degree_of_separation_component$gdist)
[1] 2.913988
- 32 -
Here, the degree of separation of the largest component equals the degree of separation in
the overall network, because there is only one component in the overall network.
The code for calculating the degree centrality scores of individuals in the network labeled
“network_data_sociomatrix’ is shown below. Based on the outcome from this analysis, we can
conclude that the first individual listed in the network has the highest degree centrality (i.e.,
21 ties). This makes sense, considering that this is the VP of the organization. The following 7
individuals also have relatively high degree centrality scores. This is also to be expected,
because these individuals are the department supervisors. Unexpectedly, however, there are
also a number of operational employees with a high degree centrality (e.g., the persons in
positions 36-49) with centrality scores of 12 to 13 ties.
degree(network_data_sociomatrix,gmode="graph")
[1] 21 12 20 12 20 23 15 19 1 1 1 1 1 1 1 1 1 1 1 2 1 1 14 1 5 1 1 2 1 1 1 1 1
[34] 1 1 12 13 13 13 13 13 13 13 13 13 13 12 13 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2
[67] 2 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 8 1 1 1 1 1 1 1 1 1
- 33 -
closeness(network_data_sociomatrix,gmode="graph")
[1] 0.5505618 0.4000000 0.5212766 0.4666667 0.4851485 0.5297297 0.5077720 0.4537037 0.3438596 0.3438596
[11] 0.3438596 0.3438596 0.3438596 0.3438596 0.3438596 0.3438596 0.3438596 0.3438596 0.3438596 0.3698113
[21] 0.3344710 0.3344710 0.5000000 0.3344710 0.3426573 0.3344710 0.3192182 0.3391003 0.2558747 0.2558747
[31] 0.2558747 0.3192182 0.3192182 0.3192182 0.3192182 0.3402778 0.3414634 0.3414634 0.3414634 0.3414634
[41] 0.3414634 0.3414634 0.3414634 0.3414634 0.3414634 0.3414634 0.3402778 0.3414634 0.3475177 0.3475177
[51] 0.3475177 0.3475177 0.3475177 0.3475177 0.3475177 0.3475177 0.3475177 0.3475177 0.3475177 0.3475177
[61] 0.3475177 0.3475177 0.3475177 0.3576642 0.3576642 0.3576642 0.3576642 0.3563636 0.3563636 0.3563636
[71] 0.3563636 0.3563636 0.3576642 0.3576642 0.3576642 0.3130990 0.3130990 0.3130990 0.3130990 0.3130990
[81] 0.3130990 0.3130990 0.3130990 0.3130990 0.3130990 0.3130990 0.3130990 0.3130990 0.3130990 0.4900000
[91] 0.2865497 0.2865497 0.2865497 0.2865497 0.2865497 0.2865497 0.2865497 0.2865497 0.2865497
Looking at the results, we see that the first person listed in the network (the VP) has the
highest closeness centrality score (.55), followed by the individual in position 3 (.52).
betweenness(network_data_sociomatrix,gmode="graph")
[1] 1.155262e+03 8.384286e+02 1.151929e+03 4.764286e+02 1.106512e+03 1.448929e+03 3.719286e+02 1.267000e+03 0.000000e+00
[10] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[19] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.025333e+03 0.000000e+00 2.880000e+02 0.000000e+00 0.000000e+00
[28] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[37] 8.333333e-02 8.333333e-02 8.333333e-02 8.333333e-02 8.333333e-02 8.333333e-02 8.333333e-02 8.333333e-02 8.333333e-02
[46] 8.333333e-02 0.000000e+00 8.333333e-02 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[55] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[64] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[73] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[82] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 2.983333e+02
[91] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
It appears that the individual in position 2 (LS5) has the highest betweenness centrality score
(i.e., a score of 1448), even surpassing the VP who had the highest degree and closeness
centrality scores.
cutpoints(network_data_sociomatrix, mode="graph")
[1] 1 2 3 4 5 6 8 23 25
- 34 -
To check whether the removal of one of these cutpoints indeed results in more fragmentation
in the organization, we can use the following procedure. First, we calculate an new
subnetwork (using induced.Subgraph) that excludes the identified cutpoint. In this case, we
will remove the person in position 1 (the VP). Next, we assess the number of components in
the organizational including the VP and compare it to the number of cutpoints in the
subnetwork. The results indicate that the number of components indeed increase when the
VP is excluded from the network. As shown below, the number of components increases from
1 to 6 when the VP is dropped from the organizational network.
Network_excluding_VP = get.inducedSubgraph(network_data_sociomatrix,
which(network_data_sociomatrix %v% "vertex.names" != "VP"))
components(network_data_sociomatrix)
[1] 1
components(Network_excluding_VP)
[1] 6
To calculate this, we add 8 new rows to the attributes data object, which we call
“ties_with_TMT” to “ties_with_DEP15”. We then populate these rows with the number of ties
an individual has within the respective department. For example, “ties_with_TMT” should
note how many ties a person has with TMT members. For this purpose, we will use rowSums
and calculate for each row the sum of the columns that correspond to the alters within the
different teams. Please remember from 3.2 that rows represent the ‘egos’ in a socio-matrix
and the columns represent the ‘alters’, so by calculating the row sum for specific columns, we
actually calculate egos’ sum of ties with certain alters (i.e., that correspond to the specified
column numbers). By inspecting the attributes data object, we know that the VP and LS1-7 (as
alters) are positioned in columns 1:9. These columns, thus specify if an individual has a tie with
TMT members or not (0=no, 1=yes). Correspondingly, by taking the sum of columns 1:9 for
each row, we get a value that indicates how many ties a person has with TMT members. We
then repeat this for the other departments as following:
attributes$ties_with_TMT<-rowSums((network_data_sociomatrix[,1:9]),na.rm = F)
attributes$ties_with_DEP9<-rowSums((network_data_sociomatrix[,9:20]),na.rm = F)
attributes$ties_with_DEP10<-rowSums((network_data_sociomatrix[,21:35]),na.rm = F)
attributes$ties_with_DEP11<-rowSums((network_data_sociomatrix[,36:48]),na.rm = F)
attributes$ties_with_DEP12<-rowSums((network_data_sociomatrix[,49:63]),na.rm = F)
- 35 -
attributes$ties_with_DEP13<-rowSums((network_data_sociomatrix[,64:75]),na.rm = F)
attributes$ties_with_DEP14<-rowSums((network_data_sociomatrix[,76:89]),na.rm = F)
attributes$ties_with_DEP15<-rowSums((network_data_sociomatrix[,90:99]),na.rm = F)
Based on the newly calculated attributes, we can then calculate each individual’s total ties
within the organization, again using rowSums:
attributes$total_ties<-rowSums(attributes[,c("ties_with_TMT",
"ties_with_DEP9",
"ties_with_DEP10",
"ties_with_DEP11",
"ties_with_DEP12",
"ties_with_DEP13",
"ties_with_DEP14",
"ties_with_DEP15")])
Next, we calculate individuals’ ties with direct colleagues inside their own department. We
use ifelse for this purpose and specify that individuals’ score on “attributes$internal_ties”
should be derived from the row “attributes$DEP9” if the individual works in DEP9, from the
row “attributes$DEP10” if the individual works in DEP10, etc.:
attributes$internal_ties<-ifelse(attributes$dep<9,attributes$ties_with_TMT,
ifelse(attributes$dep == 9,attributes$ties_with_DEP9,
ifelse(attributes$dep == 10,attributes$ties_with_DEP10,
ifelse(attributes$dep == 11,attributes$ties_with_DEP11,
ifelse(attributes$dep == 12,attributes$ties_with_DEP12,
ifelse(attributes$dep == 13,attributes$ties_with_DEP13,
ifelse(attributes$dep == 14,attributes$ties_with_DEP14,
ifelse(attributes$dep == 15,attributes$ties_with_DEP15,
NA))))))))
Finally, we calculate the number of external_ties (i.e., ties between the individual and alters
from other departments) as a function of the individual’s total ties and internal ties:
attributes$external_ties<-(attributes$total_ties-attributes$internal_ties)
Once we complete the calculation of the internal, external, and total ties of all individuals in
the network, we can print a list from the attributes data object with each individual’s score:
attributes[,c("vertex.names","internal_ties","external_ties","total_ties")]
vertex.names internal_ties external_ties total_ties
1 VP 7 14 21
2 LS1 1 11 12
3 LS2 7 14 21
4 LS3 5 7 12
5 LS4 5 15 20
6 LS5 6 17 23
7 LS6 6 9 15
8 LS7 4 15 19
9 CE1 0 1 1
10 CE2 0 1 1
...
- 36 -
In addition, we can calculate the minimum, maximum, and average scores across individuals
on the new attribute variables. This allows us to compare a specific individual’s score to the
average and determine his or her network position against that of the average employee:
summary(attributes$total_ties)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 1.000 1.000 4.253 2.000 23.000
summary(attributes$internal_ties)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 0.000 0.000 2.172 1.000 12.000
summary(attributes$external_ties)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 1.000 1.000 2.081 1.000 17.000
where pi is the percentage of total ties with the ith group in the organization and k represents
the total number of groups.
In our case organization, we can determine individuals’ tie diversity by calculating whether
their ties are equally distributed across all 8 departments in the organization. We can use the
“total_ties” and “ties_with_TMT” to “ties_with_DEP15” rows from the attributes data object
for this. When implementing Blau’s diversity index in R-studio, we should end up with the code
shown below. This code adds a new row to the attributes data object, called “tie_diversity”.
attributes$tie_diversity=1-(
(attributes$ties_with_TMT/attributes$total_ties)^2+
(attributes$ties_with_DEP9/attributes$total_ties)^2+
(attributes$ties_with_DEP10/attributes$total_ties)^2+
(attributes$ties_with_DEP11/attributes$total_ties)^2+
(attributes$ties_with_DEP12/attributes$total_ties)^2+
(attributes$ties_with_DEP13/attributes$total_ties)^2+
(attributes$ties_with_DEP14/attributes$total_ties)^2+
(attributes$ties_with_DEP15/attributes$total_ties)^2)
When we inspect our newly calculated tie diversity attribute, we can see that a lot of
individuals have a score of 0. This means that they only have ties within one department
(probably their own department). A few individuals, on the other hand, do have high tie
diversity. The individual labeled “LS6” has the most diverse collection of ties, with a tie
diversity score of .61 (see below).
- 37 -
attributes[,c("vertex.names", "tie_diversity")]
vertex.names tie_diversity
1 VP 0.5578231
2 LS1 0.2916667
3 LS2 0.5578231
4 LS3 0.5694444
5 LS4 0.5100000
6 LS5 0.5028355
7 LS6 0.6133333
8 LS7 0.4099723
9 CE1 0.0000000
10 CE2 0.0000000
11 CE3 0.0000000
12 CE4 0.0000000
13 CE5 0.0000000
14 CE6 0.0000000
15 CE7 0.0000000
16 CE8 0.0000000
17 CE9 0.0000000
18 CE10 0.0000000
19 CE11 0.0000000
20 CE12 0.5000000
...
In our case organization, we asked individual employees to indicate how closely they work
with each of their alters on a scale ranging from 1 (not close at all) to 5 (very close). With this
information we can calculate how close an individual is, on average, with his or her alters. A
high average closeness value indicates that the individual has built deep and meaningful ties
with his or her alters. The information on tie closeness is stored in the file
“company_edge_closeness.csv”. To add this information as an edge attribute to the socio-
matrix, we can use the code below. This will first load the tie closeness information as a data
object called “edge”. Please note that we load this data using as.matrix with header=F to
make sure that the matrix with edge information is completely square and that the number
of rows and columns correspond to that of the socio-matrix. In the next line of code, we add
the edge information to the socio-matrix “network_data_sociomatrix” as an edge attribute
called “closeness”. For this purpose, we use set.edge.value (see 4.3.4).
network_data_sociomatrix<-
set.edge.value(network_data_sociomatrix,'closeness',edge)
Next, we calculate the sum of all closeness scores for each row, using rowSums. Remember,
each row represents one ‘ego’ in the organizational network, so the row sum for one particular
- 38 -
row represents the overall score of one employee. For example, the row sum of the first row
corresponds to the overall closeness score of the VP of the organization. To get to an average
closeness score, we divide the overall closeness score by the individual’s total number of ties
(which we calculated in 6.2.2.1 as “attributes$total_ties”). We add the result to our attributes
data object as a new row labeled “avg_closeness”:
attributes$avg_closeness<-
rowSums(as.sociomatrix(network_data_sociomatrix,attrname="closeness")[,])/a
ttributes$total_ties
We can then use the newly calculated avg_closeness row in the attributes data object to
discover individuals’ network quality. In addition, we can compare individuals’ scores to the
mean average closeness score within the organization using summary:
summary(attributes$avg_closeness)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 2.692 3.077 3.313 4.000 5.000
attributes[,c("vertex.names","avg_closeness")]
vertex.names avg_closeness
1 VP 3.238095
2 LS1 3.666667
3 LS2 3.095238
4 LS3 2.416667
5 LS4 3.000000
6 LS5 3.086957
7 LS6 3.000000
8 LS7 3.842105
9 CE1 3.000000
10 CE2 2.000000
11 CE3 1.000000
12 CE4 2.000000
13 CE5 4.000000
14 CE6 4.000000
15 CE7 5.000000
16 CE8 4.000000
17 CE9 3.000000
18 CE10 2.000000
19 CE11 3.000000
20 CE12 3.000000
...
summary(degree(network_data_sociomatrix,gmode="graph"))
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 1.000 1.000 4.242 2.000 23.000
- 39 -
summary(degree(network_data_sociomatrix,gmode="graph"))
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 1.000 1.000 4.242 2.000 23.000
summary(closeness(network_data_sociomatrix,gmode="graph"))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.2559 0.3131 0.3427 0.3468 0.3475 0.5506
summary(betweenness(network_data_sociomatrix,gmode="graph"))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 0.00 0.00 95.24 0.00 1448.93
attributes$degree<-degree(network_data_sociomatrix,gmode="graph")
attributes$closeness<-closeness(network_data_sociomatrix,gmode="graph")
attributes$betweenness<-betweenness(network_data_sociomatrix,gmode="graph")
attributes
vertex.names dep degree closeness betweenness
1 VP 1 21 0.5505618 1.155262e+03
2 LS1 2 12 0.4000000 8.384286e+02
3 LS2 3 20 0.5212766 1.151929e+03
4 LS3 4 12 0.4666667 4.764286e+02
5 LS4 5 20 0.4851485 1.106512e+03
6 LS5 6 23 0.5297297 1.448929e+03
7 LS6 7 15 0.5077720 3.719286e+02
...
Once these attributes have been added to the attributes data object, we can add this
information as vertex attributes, as we have done in 4.3.3. We can also export attributes as a
.csv file using write.csv. This will save the .csv file in the working directory.
Please note that the .csv format uses comma to delimitate different columns in the table. To
access this data, we can import it in Microsoft Excel by going to the “Data” menu tab and then
selecting “From text”. Subsequently, select the.csv file on your computer and import it as a
comma delimitated file.
- 40 -
6.3 Department-level/team-level descriptives
Finally, we might be interested in calculating networks statistics for teams or departments.
Based on this information, we can diagnose the networks of departments or teams and, if
necessary, develop interventions to enhance team or departmental effectiveness.
When evaluating departments or teams using ONA, we have to distinguish between the
internal and external network of the department or team. The internal network comprises all
connections between members of the respective department or team. Hence, the internal
network indicates how employees of the same team or department connect with each other
to complete work. The external network, on the other hand, comprises all connections
between members of the respective department or team with members from other
departments or teams. Hence, the external network reflects how the team or department
works with other groups in the organization to complete organization-wide tasks and to
exchange information across boundaries.
In the following sections, we will calculate department-level network statistics for the
departments in our case organization. Before we do this, we will recode the department
identifier in the attributes file so that all managers belong to team 1 (i.e., the top management
team). We label the new identifier “dep_recoded” and add it to the attributes:
attributes$dep_recoded<-attributes$dep
attributes$dep_recoded[attributes$dep_recoded<9]<-1
where the numerator represents the sum of internal ties of all individual department
members g (denoted as 𝐶𝐷𝑀 (𝑛𝑖 )), and the denominator represents the theoretical maximum
number of ties that can emerge within the department (i.e., when all department members
have ties with each other) in a department with a total of 𝑔 𝑠 members. Each department
member can, in theory, maintain ties with anyone expect him or herself. Hence, we calculate
the maximum number of ties of an individual department member as 𝑔 𝑠 − 1 (i.e., the
department size, excluding the focal department member). If we then multiply this by the
number of members that work in the department (i.e., 𝑔 𝑠 (𝑔 𝑠 − 1)), we end up with the
theoretical maximum number of ties that can emerge in the department as a whole.
- 41 -
To implement this formula in R-Studio, we need the dplyr package. This package provides a
number of data manipulation options and can be used summarize individual-level data by
group membership. As such, we can use this package to calculate department-level statistics,
such as group density. The first step is to install and load this package:
install.packages("dplyr")
library("dplyr")
In dplyr, we can use group_by, in combination with summarize, to specify that we want to
calculate group-level statistics (or in our case, department-level statistics). We can calculate
the average (mean), maximum value (max), minimum value (min), median value (median),
standard deviation (sd), and variance (var) of the scores of employees that work in the same
department, as identified by the grouping variable. In addition, we can count the number of
observations (i.e., employees) that share the same identifier value using n().
In the present case, we use dplyr to implement the internal group density formula and,
subsequently, to calculate a new data table that is labelled “internal_group_density”:
internal_group_density<-attributes %>%
group_by(dep_recoded) %>%
summarize(internal_group_density=(sum(internal_ties)/((n()*(n()-1)))))
This code produces a table that comprises 8 rows (i.e., one row per department) and 2
columns. The first column specifies the department number and lists the values of our
grouping variable (“dep_recoded”). The second column is called “internal_group_density” and
is calculated as the sum of all internal ties within a department, divided by theoretical
maximum number of ties within the department. Once we calculated these values, we convert
the table to a data frame using as.data.frame and then display it in R-Studio.
internal_group_density<-as.data.frame(internal_group_density)
internal_group_density
dep_recoded internal_group_density
1 1 0.7321429
2 9 0.0000000
3 10 0.0952381
4 11 0.9871795
5 12 0.0000000
6 13 0.0000000
7 14 0.0000000
8 15 0.0000000
Looking at the table, it is apparent that DEP11 has by far the highest internal density score,
followed by the TMT and DEP10. The remaining departments have an internal group density
score of 0, which makes sense because there are no internal ties within these departments.
- 42 -
∑𝑔𝑖=1[𝐶𝐷𝑀 (𝑛∗ ) − 𝐶𝐷𝑀 (𝑛𝑖 )]
𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝑔𝑟𝑜𝑢𝑝 𝑐𝑒𝑛𝑡𝑟𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛 = ,
𝑔 𝑠 (𝑔 𝑠 − 1)
where the numerator represents the sum of the difference between the number of internal
ties of each team member g (denoted as 𝐶𝐷𝑀 (𝑛𝑖 )) and the number of internal ties of the
department member who maintains the most ties within the department (denoted as
𝐶𝐷𝑀 (𝑛∗ )). The denominator reflects the theoretical maximum number of ties within the team,
similar to what we used to calculate internal group density.
An internal group centralization score can range from 0 to 1. A score near the scale’s maximum
value of 1 suggests that a single department member has a disproportionately larger number
of internal ties relative to other department members. This means that this central
department member takes the lead in coordinating work, exchanging information, etc. within
the department (i.e., high internal group centralization). Conversely, a score near the
theoretical minimum of 0 indicates that all department members maintain an equal number
of internal ties and, thus, hold equal roles in the coordination of tasks, exchange of
information, etc. within the department (i.e., low internal group centralization).
To determine internal group centralization, we first calculate the difference between each
employee and the most-connected colleague within his or her department. To do that, we use
dplyr to add a variable that is called “internal_ties_diff” to our vertex attributes file. We
calculate this variable by subtracting the value listed in the column “internal_ties” from the
maximum value observed within the respective individual’s department. Afterwards, we
convert the attributes to a data frame using as.data.frame. Dplyr automatically converts
the file to a table format, which could cause problems later on when left unchanged.
attributes<-as.data.frame(attributes)
We can subsequently calculate internal group centralization by taking the group-level sum of
the “internal_ties_diff” column and dividing this value by the theoretical maximum number
of possible internal ties within the department:
internal_group_centralization<-attributes %>%
group_by(dep_recoded) %>%
summarize(internal_group_centralization=sum(internal_ties_diff)/(n()*(n()-
1)))
In the final step, we convert the resulting internal group centralization scores to a data frame
and then combine it with the data on group internal density using join. We can then display
both the departments’ density and centralization scores by referring to the label of the
combined data frame (i.e., “dep_statistics_internal” in our case):
internal_group_centralization<-as.data.frame(internal_group_centralization)
dep_statistics_internal<-left_join(internal_group_density,
internal_group_centralization,by="dep_recoded")
- 43 -
dep_statistics_internal
dep_recoded internal_group_density internal_group_centralization
1 1 0.7321429 0.26785714
2 9 0.0000000 0.00000000
3 10 0.0952381 0.33333333
4 11 0.9871795 0.01282051
5 12 0.0000000 0.00000000
6 13 0.0000000 0.00000000
7 14 0.0000000 0.00000000
8 15 0.0000000 0.00000000
Based on our results, we can conclude that both the TMT (dep_recoded = 1) and DEP10 have
fairly centralized internal networks. That means that there are probably one or a few members
with a lot of ties to other members in those departments, while other members in these
departments do not maintain a lot of ties with each other. In other words, the flow of
information is heavily controlled by a few central department members.
Importantly, the theoretical maximum is calculated differently for external group density, as
compared with internal group density. For determining the maximum number of external ties,
we take the size of the organization and subtract the size of the respective department. We
end up with the number of potential external ties that the respective department’s members,
in theory, can maintain. Our case organization has 99 employees. Hence, each individual
member of the TMT can, at the most, maintain (99-9 [i.e., the size of the TMT]) 90 external
ties. We then multiply this number by the size of the respective department to arrive at the
theoretical maximum number of external ties that can be maintained by the focal department
as a whole (90*9 in our case). Note that we do not subtract 1 from this value (as we did for
internal density). The code below illustrates how to do this in R-Studio:
external_group_density<-attributes %>%
group_by(dep_recoded) %>%
summarize(external_group_density=(sum(external_ties)/((n()*(99-n())))))
By executing the above stated code, we create a data table that lists the external group density
score of each department in our case organization. We need to convert this table to a data
frame, to avoid problems later on when we merge that data with other data. Subsequently,
we can display departments’ values by referring to the label of the data frame and pressing
CTRL + ENTER. The results below indicate that the TMT has the highest external group density,
compared to the other departments in the case organization.
- 44 -
external_group_density<-as.data.frame(external_group_density)
external_group_density
dep_recoded external_group_density
1 1 0.14010989
2 9 0.01245211
3 10 0.01031746
4 11 0.01162791
5 12 0.01190476
6 13 0.01819923
7 14 0.01176471
8 15 0.01910112
We can calculate external group centralization by using a slightly modified version of the
procedure we used for calculating internal group centralization (see 6.3.1.2). For calculating
external group centralization, we need to modify the formula so that it calculates the
difference between each department member’s external ties and the member in the
department that holds most external ties:
attributes<-as.data.frame(attributes)
In the next step, we calculate for each department the sum of its members’ difference scores
and divide that by the theoretical maximum number of external ties that the respective
department can maintain. The theoretical maximum is calculate as before (see 6.3.2.1):
external_group_centralization<-attributes %>%
group_by(dep_recoded) %>%
summarize(external_group_centralization=sum(external_ties_diff)/((n()*
(99-n()))))
In the final step, we convert our result to a data frame and display the results. Based on the
results, we can see that DEP10 scores highest in terms of external group centralization.
external_group_centralization<-as.data.frame(external_group_centralization)
external_group_centralization
dep_recoded external_group_centralization
1 1 0.046703297
2 9 0.010536398
3 10 0.084920635
4 11 0.000000000
5 12 0.000000000
6 13 0.004789272
7 14 0.000000000
8 15 0.070786517
- 45 -
As a final step, we can merge the data frames with information on departments’ internal and
external networks using join. This file can come in handy when examining what kind of
combinations of internal and external network structures departments use to execute work.
dep_statistics_internal<-left_join(internal_group_density,
internal_group_centralization,by="dep_recoded")
dep_statistics_external<-left_join(external_group_density,
external_group_centralization, by="dep_recoded")
dep_statistics<-left_join(dep_statistics_internal,dep_statistics_external,
by="dep_recoded")
dep_statistics
dep_recoded internal_group_density internal_group_centralization
1 1 0.7321429 0.26785714
2 9 0.0000000 0.00000000
3 10 0.0952381 0.33333333
4 11 0.9871795 0.01282051
5 12 0.0000000 0.00000000
6 13 0.0000000 0.00000000
7 14 0.0000000 0.00000000
8 15 0.0000000 0.00000000
external_group_density external_group_centralization
1 0.14010989 0.046703297
2 0.01245211 0.010536398
3 0.01031746 0.084920635
4 0.01162791 0.000000000
5 0.01190476 0.000000000
6 0.01819923 0.004789272
7 0.01176471 0.000000000
8 0.01910112 0.070786517
- 46 -
7 PLOTTING ORGANIZATIONAL NETWORKS
To visualize an organizational network, we can make a network plot. A network plot is
collection of dots and lines that provide a schematic overview of who has ties with whom in
the organization. The dots represent the nodes in the network, so organizational members in
our organization, and the lines represent the presence of a ties between nodes. In case there
is a line between two nodes, this indicates that the two nodes have a tie with each other.
Below, we will discuss how to create network plots with varying layouts.
gplot(network_data_sociomatrix,usearrows=FALSE)
This standard plot does not give much information, except for a display of the overall structure
of the organizational network. From the plot, we can already assess that the network is
relative centralized with a few nodes controlling much of the information flow in the
organization. However, we do not know who the central nodes are because the network plot
excludes node labels. Hence, the next step is to add node labels.
If labels are stored as a vertex attribute called “vertex.names”, we can simply add
displayLabels=T to the code and R-studio will use vertex.names as node labels, as shown in
5
Be sure to load both the “sna” and “statnet” package, as this might otherwise cause errors when using gplot.
- 47 -
Figure 10. We can further change the size of the nodes by specifying vertex.cex, as well as
the size of the labels by specifying label.cex. In the code shown below, we specify that the
labels should be rescaled by a factor of 0.60, while nodes are rescaled by a factor of 0.90.
gplot(network_data_sociomatrix,displaylabels=T,
label.cex=0.60,
vertex.cex=0.90,
usearrows=FALSE)
If we do not want to use vertex.names for node labels, we need to define a new data object
that provides alternative node labels. Next, we need to add a piece of code to gplot to use
this data object for the node labels. For example, if we want to display employees’ department
affiliation as node labels (instead of vertex.names), we first define a data object called “dep”
that is derived from the vertex attribute “dep”. Subsequently, we refer to this data object by
adding label=dep to gplot. This will display nodes’ department numbers as node labels.
dep <-get.vertex.attribute(network_data_sociomatrix,"dep")
gplot(network_data_sociomatrix,label=dep,displaylabels=T,label.cex=0.70,use
arrows=FALSE)
- 48 -
Besides adding node labels, we can also change the way in which nodes are positioned in the
network plot. There are several options available for this in gplot. We can choose for a
random layout in which nodes are placed randomly in the network plot. This is done by adding
mode= “random” to the gplot code:
gplot(network_data_sociomatrix,displaylabels=T,label.cex=0.70,usearrows=FAL
SE,mode="random")
For obvious reasons, the random positioning does not produce the most interpretable
network plot (see Figure 11). Indeed, to maximize interpretability, we need to prevent that
nodes are caught up in-between a lot of ties. When nodes are in the middle of ties, their labels
will not be readable and the overall network plot may look sloppy. To resolve this, we can
consider positioning nodes in a big circle. This will prevent that nodes are surrounded by the
lines of ties (Figure 12). Here’s how to implement such as circle-positioning design:
gplot(network_data_sociomatrix,displaylabels=T,label.cex=0.70,usearrows=FAL
SE,mode="circle")
- 49 -
Figure 12 - Network plot with circular positioning of nodes
The circular positioning improves the interpretation somewhat, but it is still not optimal. There
are still a lot of line crossings and lines differ in length. To resolve this, we need to place nodes
together that are densely connected to each other together, while positioning nodes that are
not connected (directly or indirectly) further apart in the network plot. One way to do this, is
by using the “Fruchterman-Reingold” algorithm. This algorithm works as a force field in which
nodes gravitate towards each other to the degree to which they are directly and indirectly
connected with each other. When applying this positioning algorithm to a network plot, we
end up with a tailor-made layout that is specific to the organization under study. To use the
Fruchterman-Reingold, we add mode= “fruchtermanreingold” to gplot. This is, however,
already standardly done in statnet.
gplot(network_data_sociomatrix,displaylabels=T,label.cex=0.70,usearrows=FAL
SE,mode="fruchtermanreingold")
To vary node sizes according to node centrality, we first need to create a data object with
information on employees’ centralization scores. In this case, we are interested in the degree
- 50 -
centrality and we will calculate a data object that is called “degree” (for details, see 6.2.1.1).
Next, we specify that the vertex size should depend on the individual’s centralization score, as
specified in “degree”. This is done by adding vertex.cex=degree to gplot. Upon inspection,
this code results in extremely large nodes for central members of the organization. The node
sizes of employees with lower centralization scores, on the other hand, are fine. Hence, we
need to apply a non-linear transformation to the node sizes; one that reduces the node sizes
of central employees while leaving the node sizes of other members relatively unchanged. To
do this, we can take the square root, natural logarithm, or common logarithm of node
centrality. Here, we use the common logarithm, as this gives the best result (Figure 13). We
add a 0.50 intercept to the node size value, to make sure that the minimum node size is 0.50:
degree<-degree(network_data_sociomatrix,gmode="graph")
gplot(network_data_sociomatrix,displaylabels=T,
label.cex=0.60,
vertex.cex=(log10(degree))+0.5,
usearrows=FALSE)
- 51 -
7.2.2 Varying nodes shapes in network plots
Node shapes can be used to display nodes’ group affiliations. In our case organization, we can,
for example, assign a 3-sided triangle to members of the management team, a 4-sided square
to department DEP9, a 5-sided pentagon to DEP10, a 6-sided hexagon to DEP11, etc. This will
help to visualize within and between-departmental ties in the organization.
To vary node shapes, we first need to recode the vertex.attribute “dep”, such that all
managers have a department number of “1” (in the original attributes file, managers had
different department numbers). To do so, we duplicate the vertex.attribute “dep” as a new
data object called “dep_nodes”. Then, we recode the department values of the managers (dep
1-8) to “1” in the “dep_nodes” data object. We keep the department values of other
employees unchanged. We subsequently convert this data object by using as.factor:
dep_nodes<-attributes$dep
dep_nodes[dep_nodes<9]<-1
dep_nodes<-as.factor(dep_nodes)
Next, we need to specify the shapes we want to apply in our network plot. We have 8 groups
in our organization (one management team, and 7 departments). Hence, we need 8 different
shapes to display department affiliation in the network plot. An enclosed shape has a
minimum of 3 sides (i.e., a triangle). Every next shape adds one side. Considering that we need
8 shapes, our network plot thus needs to include triangles (3 sides), squares (4 sides),
pentagons (5 sides), hexagons (6 sides), heptagons (7 sides), octagons (8 sides), nonagons (9
sides), and decagons (10 sides). In other words, our shapes need to vary between 3 and 10
sides. We inform R-studio of this by creating a new data object (called “sides”) that includes
information on the range of shapes we are looking to use in our network plot.
Once we created the departments identifier (“dep_nodes”) and specified range of sides of the
accompanying shapes (“sides”), we can add this to the gplot code. We use vertex.sides to
specify that the number of sides is specified in the data object “sides” and should be
contingent on the department affiliation, as stored in “dep_nodes”. The result is shown in
Figure 14. As you will notice, it is hard to distinguish between, for example, nonagons and
decagons. Hence, varying shapes is only useful when only a few groups needs to be
represented.
gplot(network_data_sociomatrix,displaylabels=T,
label.cex=0.60,
vertex.sides=sides[dep_nodes],
usearrows=FALSE)
- 52 -
Figure 14 - Network plot with varying node shapes
To implement this in R-studio, we first need to create a data object that specifies whether a
node is a leader or an employee. Similar to what we did in the previous section, we create this
data object by taking information from the attributes data object. We call this new data object
“hierarchical_role”. In our attributes list, leaders have department codes 1:8 and employees
has department codes 9:15. To correctly code “hierarchical_role”, we thus recode values 1:8
into 1 (=leader) and 9:15 into 2 (=employee):
hierarchical_role<-attributes$dep
hierarchical_role[hierarchical_role<9]<-1
hierarchical_role[hierarchical_role>8]<-2
hierarchical_role<-as.factor(hierarchical_role)
In the following step, we specify the color pallet that we want to use. In our case, we want to
display leader nodes in red and employee nodes in blue. To specify this pallet, we create a
data object that we call “color_pallet” that lists the color red and blue:
We then use vertex.color in gplot to specify that the node color that R picks from our pallet
should depend on the node’s “hierarchical_role”. This will assign the same color to node’s who
- 53 -
have the same “hierarchical_role” value and, of course, different colors to individuals who
have different hierarchical roles (see Figure 15).
gplot(network_data_sociomatrix,
usearrows=FALSE,
displaylabels=T,
label.cex=0.60,
vertex.cex=(log10(degree))+0.5,
vertex.sides=sides[dep_nodes],
vertex.col=color_pallet[hierarchical_role])
To vary line thickness depending on an edge attribute, we first need to make sure that the
edge attribute is attached to the socio-matrix. If not, we should load the .csv file that houses
the edge information and load it using set.edge.value (see 4.3.4). Subsequently, we create
a new data object called “edge” that lists the edge attribute “closeness”:
network_data_sociomatrix<-set.edge.value(network_data_sociomatrix,
'closeness',edge)
- 54 -
closeness<-network_data_sociomatrix %e% "closeness"
In the final step we specify with edge.lwd that the line thickness should depend on the
closeness value of a tie. Here, we multiply the closeness score by a factor of 0.20 to avoid that
lines in the network plot become overly thick. The result is shown in Figure 16.
gplot(network_data_sociomatrix,
usearrows=FALSE,
displaylabels=T,
label.cex=0.60,
vertex.cex=(log10(degree))+0.5,
vertex.sides=sides[dep_nodes],
vertex.col=color_pallet[hierarchical_role],
edge.lwd = 0.2*closeness)
- 55 -
8 INDEX R-COMMANDS
as.matrix, 15, 20, 38, 54 gplot, 22, 23, 26, 28, 47, 48, 49, 50, 51, 52,
as.network.matrix, 15, 24 53, 54, 55
as.sociomatrix, 17, 18, 19, 20, 39 ifelse, 36
betweenness, 34, 40 install.packages, 11
centralization, 31 isolates, 23
closeness, 8, 14, 15, 16, 22, 23, 24, 34, 38, library, 11
39, 40, 54, 55 list.edge.attributes, 17
component.largest, 32 list.vertex.attributes, 14, 17
components, 31, 35 network.size, 29
cutpoints, 34, 35 read.csv, 12, 13, 15, 24, 38, 54
degree, 33, 40, 51 rowSums, 19, 20, 27, 35, 36, 39
delete.vertices, 23 set.edge.value, 15, 38, 54
gden, 30 set.vertex.attribute, 13, 14, 26, 28
geodist, 32 setwd, 11
get.inducedSubgraph, 21, 26, 28, 35 summary, 12, 15, 16, 19, 21, 24, 29, 30, 37,
get.vertex.attribute, 14, 17, 18, 20, 48 39, 40
getwd, 11 write.csv, 40
- 56 -