Weber Introducing blockmodeling to input-output analysis
Weber Introducing blockmodeling to input-output analysis
Michael Weber
Donau-City-Strasse 1
A-1220 Vienna, Austria
michael.weber@ec3.at
Abstract. Complexity raises uncertainty and uncertainty calls for distinct actions to
reduce complexity and to establish an overview (i.e. order). This paper gives a review
on recent developments in a branch of research that aims at gaining insight into the
economic units. This might eventually benefit application fields of methods of input-
blockmodeling’ might enhance the instruments of input-output analysts that are geared
processes for value generation that both are extensively supported by information
Economic units are subject to these developments and due to the division of labour also
labour rises, so does the need of coordination for economic units and, consequently, the
set of (actual and potential) relations that are or could be established for the purpose of
collaboration (Weber & Fröschl, 2006). Evidently, this rising complexity as well as the
these high-dimensional data also influence the analysis of economic relations that
challenge for the analysts that have to deal with these intricate and vast amounts of
relational data. Methods and procedures are needed that help to condense data to
presented to input-output analysts that might also serve as a starting point for
methods for clustering relational data is discussed, starting with a general introduction
including definitions of types of equivalence relations and ideal blocks. Algorithms for
the generation of ‘optimal’ partitions of vertices and edges for binary and valued graphs
insightful procedure that facilitates the analysis and interpretation particularly for graphs
(i.e. relational data) with a relatively high number of vertices, e.g. input-output data.
sociology that was conducted primarily in the 1970s, recent research on methods for this
purpose in the field of social network analysis focuses on a specific approach that
simultaneously clusters the vertices of a graph and partitions the relations between the
White et al., 1976) – evolved over the years starting from an indirect clustering
procedure that builds clusters based on structural (Lorrain & White, 1971; Burt, 1976)
and later even regular (White & Reitz, 1983) equivalence of vertices to an approach that
incorporates indirect and in particular also direct methods for building blockmodels
(Batagelj et al., 1992a; 1992b) with generalized concepts of equivalence (Doreian et al.,
identified in cases where every (directed) relation of vertex v can be matched to any
that for every edge (v,x) and/or (x,v) a corresponding edge (w,x) and/or (x,w) exists with
relation of w to a vertex y V. That is, for an edge (v,x) and/or (x,v), there also exists
the edge (w,y) and/or (y,w), with v w. Both types of equivalence can be seen as a
starting point for the definition of groups of vertices. For instance, the vertices x and y
as well as the vertices v and w could be grouped and then the groups could be related to
each other.
In recent years, a direct method for computing blockmodels was (further) elaborated
that is called ‘generalized blockmodeling’ (Doreian et al., 2004; Doreian et al., 2005;
building upon Batagelj et al., 1992a; 1992b; Doreian et al., 1994; Batagelj, 1997).
enables the separate treatment and classification of the relations of a vertex of a graph to
its ancestor or successor vertices and therefore provide a way to cluster ancestor and
successor vertices as well as to partition the edges between these vertices differently.
This is an idea that was also discussed by Borgatti & Everett (1992). It enables
blockmodeling of graphs with disjoint sets of ancestor and successor vertices – so-called
‘two-mode data’ – as well as of graphs with one set of vertices (‘one-mode data’) with
each vertex having an ancestor and a successor role. A central challenge for direct,
successor vertices with only a variable subset of the corresponding set of vertices
disjoint sets and their relations with one set encompassing the observation units
(vertices) while the other set contains the characteristics of the observation units. Using
matrix with the dimension of the number of observation units times the number of
observations units. This distance matrix can be seen as one-mode and forms the basis
With this in mind, the distinction between indirect and direct blockmodeling can be
explained. Indirect blockmodeling approaches (e.g. Breiger et al., 1975) identify certain
attributes for each vertex – usually structural characteristics such as degree, reachability
of nodes or distance indices to successors or ancestors – that form the basis for the
retransform the data back to a one-mode structure (a distance matrix). Based on the
distance matrix, conventional clustering methods are then employed. The direct
methods for blockmodeling (commencing with Batagelj et al., 1992a), however, avoid
certain types of blocks, that vary according to the type of equivalence chosen, an
investigated empirical blockmodel, i.e. the blocks obtained by grouping the vertices,
and thus re-arranging the columns and rows of the empirical adjacency matrix, to the
confirmatory approach that additionally implies the specific alignment of some or all
block types. This enables the testing of hypotheses on the structure of the relational
data. With both variations of direct blockmodeling the data is matched to an ideal
(groups of vertices) and blocks entirely ‘data-driven’, i.e. in a way that suits the data
best, and generates hypotheses about the block structure without any model pre-
specification.
of an ideal blockmodel as it confines the block types admissible. These block types can
be understood as templates for the ideal blocks of the model. Block types for structural
equivalence are null blocks or complete blocks, while for regular equivalence the
standard block types are null blocks and regular blocks. However, in a generalized
blockmodel, additional block types can be defined (Doreian et al., 1994; Batagelj,
The relationship of the block types presented are visualised in Figure 1. It can be seen
that the definition of block types constitutes a certain hierarchy. Among the block types
for regular equivalence, the row-dominant and/or column-functional and the column-
dominant and/or the row-functional blocks permit the finest level of distinction.
Therefore, these four types build the most specific level of the block type hierarchy.
According to their definitions, the block types row-dominant and row-functional as well
the hierarchy in Figure 1 is formed by the block types column-regular and row-regular.
These types either appear independently from the most specific level as their definition
is more general, or they are derived from the corresponding types from the finest level
block meets the requirements for row-regularity. However, a block that is classified as
the most general block type in the hierarchy is the regular block. It can directly be
found in the third (finest) level of the hierarchy. For instance, the joint classification of a
regular and row-regular, but also regular. In this context, the complete block, a special
case of the regular block that is used for the identification of structural equivalence, has
regular. A further special case of the basic block typology in Figure 1 is the null block,
procedure as described in formula (1) (Batagelj et al., 1992a) for direct one-mode
blockmodeling.
(1)
The objective function Z(C) measures the discrepancy between the investigated
empirical blockmodel R(C) and the ideal model B(C), with defining the
determined by the pre-specified ideal blockmodel. Initially, the ideal model is only
defined in terms of the number of clusters and the block types allowed. Then, through
the assignment of vertices to clusters, ideal blocks of the same dimension as the
empirical blocks are yielded that are formed with regard to the admissible block types.
the sum of discrepancies z(Ci,Cj) between each empirical block r(Ci,Cj) R(C) and its
thus, the objective function, is based on the set of feasible ideal blocks B(Ci,Cj). In the
corresponding ideal block b is simply derived from the direct comparison of these two
exploratory scenario the discrepancy z(Ci,Cj) is calculated with reference to the set of
feasible ideal blocks B(Ci,Cj), i.e. z(Ci,Cj) is defined as the minimal discrepancy
between the empirical block r(Ci,Cj) and all ideal blocks b B(Ci,Cj) in line with the
(2)
Two examples for calculating the discrepancy are given below. The suggestions
calculates the sum of the absolute deviations between the cell values rvw of the empirical
block and the corresponding cell values bvw in the ideal block. This implies equal
(4)
According to Formula (4), the discrepancy is determined either as the sum of the cell
values rvw in the empirical block (if the ideal block b is null) or as the sum of (i) the
product of the number of zero columns and the total number of rows and (ii) the product
of the number of zero rows and the total number of columns (if the block type for b is
regular).
Building upon the definition of discrepancy, the optimization problem can be described
vertices in a way that the objective function Z(C) is minimized (Formula (5)).
(5)
smaller than the corresponding value of the current clustering C. Doreian et al. (1994)
recommend two transformations in order to find new clusterings, either the shifting of a
vertex from one group of vertices to another or the exchange of two vertices between
two groups. This procedure is repeated until the objective function value reaches a
minimum. To avoid local optima, this routine is usually repeated severalfold with
discover structure and to validate hypotheses on a certain production system that can
condense in the relational data that reach beyond the analytical potential of methods
such as triangulation1 or simple graph theoretic ratios. This is especially true if the
blockmodel classifies vertices and partitions the relations by differentiating the role of
the vertices as ancestors and successors. For this reason, a direct blockmodeling
approach is required that can deal with two-mode data, i.e. data that relates different
In a two-mode blockmodel, the definition of the objective function for the optimization
(6)
Opposed to the one-mode approach, the ancestor and the successor vertices are grouped
in separate cluster sets. CZ contains the set of (empirical) groups of ancestor vertices,
whereas CS comprises the set of (empirical) groups of successor vertices. C is the tuple
of CZ and CS. The cardinalities of the groups of vertices k = |CZ| and h = |CS| are defined
by the pre-specified ideal blockmodel. For determining the discrepancy between the
1 Triangulation is used for identifying the production hierarchy (production stages) by allowing for the transaction
volume of the (weighted) relational data in input-output analysis. It enables (quantitative) structural analysis, yet with
the drawback of an unstable order of production stages if the production relations are non-linear. This instability rises
with decreasing linearity of the relational data. It might also happen that (directly) unrelated vertices are strung
2 An example of such vertices is the distinction between the relations of a vertex to its ancestors and its successors.
discrepancy , the contributions to the objective function are calculated
(7)
The optimization problem for the two-mode approach can be described as follows. The
vertices of a graph G(V,E) that is made up of V = (V1,V2) and e = (v,w) E with v V1,
(8)
Eventually, the two-mode approach applies the same iterative optimization procedure as
one-mode blockmodeling.
about the structure of the relational data, i.e. the block types that are admissible as well
as the number of clusters. However, if one lacks this a priori information, approaches
have to be pursued that help to identify an ideal blockmodel and hence facilitate
exploratory blockmodeling. Brusco & Steinley (2006) present such an approach that
does not require any prior structural knowledge and that provides a basis for generating
a blockmodel for both, one- and two-mode relational data. Their approach builds upon a
matrix that aims at partitioning the relational data into homogeneous blocks. Thereby,
clusters are not directly identified, but can – for instance – be derived from the
visualisation of the results. The same is true for the determination of equivalence
relations or block types of the blocks that were formed. Unfortunately, the
bound method is dissatisfactory with graphs that have more than 40 vertices, as Brusco
& Steinley note by referring to alternative, heuristic methods that might improve the
above were therefore implemented in R (R, 2007) to enable the explorative designation
adjacency matrix that consists of the row vertices V1 = {a, b, c, d, e, f} and the column
As the structural equivalence of this adjacency matrix shall be examined, the algorithm
will optimize with reference to the (ideal) block types null and complete. The number of
row- and the number of column-clusters that is assumed for this demonstration is 3. (It
has to be noted that there could also be different numbers for row- and column-clusters
clustering as shown on the left-hand side of Figure 2, the algorithm produces a result
that perfectly corresponds to an ideal blockmodel for structural equivalence (right hand
form complete blocks and hence are structurally equivalent, while the
demonstrating the approach. Larger and more complex data would make it more
difficult to find the ideal number of row- and column-clusters and would also involve a
Evidently, domains that deal with valued relational data (weighted graphs) such as
an appropriate parameter (e.g. greater than 0 or the mean weight of the edges) for the
transformation of valued to binary relational data without influencing the solution
with strictly binary data (Breiger et al., 1975) can facilitate the analysis of valued data
(significant) loss of information such as Žiberna (2007) or Weber & Denk (2007). By
develops two alternative approaches to blockmodel valued relational data. These are (i)
mode approach. It extends the equivalence relations and consequently the block type
definitions by replacing the stipulations for 1 with analogous stipulations for the
parameter m and by allowing for a function f of the cell values in a block that meets the
parameter m. Examples for these altered block type definitions can be found in Table 4;
the functions f that could be used for the valued blocks are, for instance, min(),
blockmodeling concerns the calculation of discrepancy between the empirical and the
ideal block types. Žiberna (2007) provides a distinct discrepancy criterion for each of
the altered block types. For a f-regular block, for instance, the discrepancy is derived
from the deviations of the function f evaluated for each column and each row to the
parameter m. More specifically, for each pair of a row vertex vk and a column vertex vl
discrepancy of the block is then computed as sum of the pairwise maxima of dk. and d.l
over all pairs (vk,vl) for which the function f of the investigated row k of the investigated
column l is greater than the parameter m. In essence, these are the main adaptations of
compared to binary blockmodeling – a new issue has cropped up, viz. the determination
within a block such as the sum of squared deviations from the mean or the sum of
absolute deviations from the median. Moreover, the definition of the ideal block types
has to be adapted in a way that stresses the homogeneity (equality) of the main
characteristic of the block type in question (Table 4). This redefinition entails the
amendment of the discrepancy criterion for the block types as well. In comparison to
both, f-value blockmodeling and binary blockmodeling for valued data, no additional
parameters are needed for homogeneity blockmodeling and there is virtually no loss of
blockmodeling and binary blockmodeling of valued data. Still, this method is work in
progress as Žiberna (2007) notes. Thus, he suggests that the result of homogeneity
make use of block type definitions that allow for the mutual impact of vertices and
therefore capture the magnitude of the ‘inbound’ and ‘outbound’ effects of vertices
Žiberna’s proposals, empirical studies that evaluate the expected benefits (and
vertices can be interpreted as a set of commodities (labelled ‘a’, …, ‘j’), while the
(successor) vertices can be seen as a set of activities (labelled ‘A’, …,’H’). Evidently,
the edges display the valued relations between these two sets of vertices. Starting from
the valued adjacency matrix on the left-hand side homogeneous blocks are identified by
using the sum of absolute deviations from the block median as measure of within block
regular block the maximum cell value should be the same for each row and column.
Experiments with different numbers of row- and column-clusters revealed the suitability
complete block was identified that is made up of three commodities with equal
importance for the block activity. The remaining blocks are max-regular with different
levels of cell values (relation weights) and discrepancies from the ideal max-regular
blocks.
(‘symmetric’) input-output tables (for instance tables with the dimensions commodities
x commodities), i.e. the vertex set V1 can be equal to the vertex set V2 (V1 = V2) or – to
put it differently – there is only one vertex set with two differing roles per vertex.
5. Conclusion
designed to gain structural information from relational data, in order to facilitate the
and reduce the complexity of economic relations that rises with the increasing
computer systems and could therefore be a starting point for advanced algorithms, but
also the designation of blockmodels could be supported by specific routines and the
blockmodeling methods for the clustering and partitioning of relational data constitutes
an interesting branch of research that provides an attempt to deal with the complexity
Acknowledgement
I would like to express my sincere gratitude to Michaela Denk for her comprehensive
References
Batagelj, V., Ferligoj, A., & Doreian, P. (1992a) Direct and indirect methods for
Batagelj, V., Doreian, P., & Ferligoj, A. (1992b) An optimizational approach to regular
Borgatti, S.P., & Everett, M.G. (1992) Regular blockmodels of multiway, multimode
Brusco, M., & Steinley, D. (2006) Inducing a blockmodel structure of two-mode binary
468–477.
Burt, R.S. (1976) Positions in networks, Social Forces, 55, pp. 93-122.
Doreian, P., Batagelj, V., & Ferligoj, A. (1994) Partitioning networks based on
Doreian, P., Batagelj, V., & Ferligoj, A. (2004) Generalized blockmodeling of two-
Oldenbourg). In German.
Lorrain, F., & White, H.C. (1971) Structural equivalence of individuals in social
Weber, M., & Fröschl, K. (2006) Das Semantic Web als Innovation in der
White, D.R., & Reitz, K.P. (1983) Graph and semigroup homomorphisms on networks
White, H.C., Boorman, S.A., & Breiger, R.L. (1976) Social structure from multiple
regular block
null block
row-dominant block
row-regular block
row-functional block
column-dominant block
column-regular block
column-functional block
C1 C2 C3 C4 C5
column- row-
C1 complete regular null
regular regular
row- column-
C2 null complete null
functional regular
row-
C3 regular null null null
functional
row- row- row-
C4 null null
regular regular functional
column- row- row-
C5 null null
regular regular functional
Table 3: Sample data for two-mode blockmodeling
A B C D E
a 0 0 0 0 0
b 0 0 1 0 1
c 0 0 0 0 0
d 1 0 0 1 0
e 0 0 1 0 1
f 1 0 0 1 0
Table 4: Examples for altered block types according to Žiberna (2007)
row-
dominant row-
column- column- functional
dominant functional
D
E
B
a a
e c
c d
d f
b e
f b
G
C
H
A
A
F
F
a 0 26 0 18 0 4 0 35 a 0 0 0 4 26 18 0 35
b 0 0 45 3 49 0 47 0 g 0 0 0 4 23 27 33 30
c 0 1 0 2 44 0 48 0 j 0 0 0 4 24 22 31 34
d 63 5 18 0 21 0 21 58 d 18 21 21 0 5 0 63 58
e 55 4 17 9 0 0 18 62 e 17 0 18 0 4 9 55 62
f 60 7 20 5 17 0 19 56 f 20 17 19 0 7 5 60 56
g 33 23 0 27 0 4 0 30 h 18 19 16 0 8 6 56 0
h 56 8 18 6 19 0 16 0 b 45 49 47 0 0 3 0 0
i 0 2 53 1 46 0 44 0 c 0 44 48 0 1 2 0 0
j 31 24 0 22 0 4 0 34 i 53 46 44 0 2 1 0 0
Figure 3: Valued adjacency matrix before and after valued two-mode blockmodeling