0% found this document useful (0 votes)

19 views8 pages

Igann Sparse: Bridging Sparsity and Interpretability With Non-Linear Insight

Uploaded by

sufian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views8 pages

Igann Sparse: Bridging Sparsity and Interpretability With Non-Linear Insight

Uploaded by

sufian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

IGANN SPARSE: BRIDGING SPARSITY AND

INTERPRETABILITY WITH NON-LINEAR INSIGHT

Research in Progress

Stoecker, Theodor Felix, FAU Erlangen-Nürnberg, Nürnberg, Germany, theo.stoecker@fau.de

Hambauer, Nico, FAU Erlangen-Nürnberg, Nürnberg, Germany, nico.hambauer@fau.de
Zschech, Patrick, Leipzig University, Leipzig, Germany, patrick.zschech@uni-leipzig.de
Kraus, Mathias, University of Regensburg, Regensburg, Germany, mathias.kraus@ur.de
arXiv:2403.11363v1 [cs.LG] 17 Mar 2024

Abstract
Feature selection is a critical component in predictive analytics that significantly affects the prediction
accuracy and interpretability of models. Intrinsic methods for feature selection are built directly into
model learning, providing a fast and attractive option for large amounts of data. Machine learning
algorithms, such as penalized regression models (e.g., lasso) are the most common choice when it comes
to in-built feature selection. However, they fail to capture non-linear relationships, which ultimately affects
their ability to predict outcomes in intricate datasets. In this paper, we propose IGANN Sparse, a novel
machine learning model from the family of generalized additive models, which promotes sparsity through a
non-linear feature selection process during training. This ensures interpretability through improved model
sparsity without sacrificing predictive performance. Moreover, IGANN Sparse serves as an exploratory
tool for information systems researchers to unveil important non-linear relationships in domains that
are characterized by complex patterns. Our ongoing research is directed at a thorough evaluation of
the IGANN Sparse model, including user studies that allow to assess how well users of the model can
benefit from the reduced number of features. This will allow for a deeper understanding of the interactions
between linear vs. non-linear modeling, number of selected features, and predictive performance.

Keywords: Machine Learning, Explainable AI, Model Interpretability, Model Sparsity

1 Introduction
Predictive analytics is a crucial methodological stream of research in the field of information systems (IS)
that deals with the creation of empirical prediction models (Shmueli and Koppius, 2011). By leveraging
advanced machine learning (ML) techniques, researchers can uncover patterns and relationships within
large datasets, enabling them to anticipate future events, user behaviors, and system performance (Kühl
et al., 2021). In addition to its practical utility, predictive analytics also plays an important role in theory
development and testing, as well as relevance assessment (Shmueli and Koppius, 2011).
In recent years, the IS community has increasingly recognized the importance of ensuring that prediction
models not only provide high predictive performance, but are also comprehensible for explanation
purposes (Bauer, Zahn, and Hinz, 2023; Kim et al., 2023). There are basically two primary approaches
for ensuring model explainability: inherently interpretable models and post-hoc explainability methods.1
Inherently interpretable models, such as shallow decision trees, linear models, and generalized additive
1 It should be noted that the terms interpretability and explainability can have different meanings from a psychological perspective

(cf. Broniatowski, 2021). In this paper, however, we will concentrate on a technical differentiation (cf. Kraus et al., 2023).

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 1

Stoecker et al. / IGANN Sparse

models (GAMs) (Hastie and Tibshirani, 1986), are designed to be human-readable without requiring
additional explanation. These models often employ techniques like linearity, and sparsity to enhance
their interpretability (Rudin, 2019). On the other hand, post-hoc explainability tools like Shapley values
(Lundberg and Lee, 2017) or LIME (Ribeiro, Singh, and Guestrin, 2016) aim to approximate the complex
behavior of non-interpretable models (Esteva et al., 2019). However, these post-hoc methods should be
applied cautiously, as they may not fully capture the intricate workings of the original models. For tabular
data, the choice between the two approaches is often guided by the marginal performance gains achievable
with complex models over interpretable ones (Rudin, 2019; Zschech et al., 2022).
GAMs and their various extensions base their final prediction on a number of independent functions that
map the input features to the output space of the model, where they are summed up to generate the final
prediction (Hastie and Tibshirani, 1986). Each of these functions usually only processes an individual
feature or a single interaction between two features, which allows to visualize the function after training
and gaining insights into the effect that a feature has on the model output.
Although GAMs have proven to be very powerful, they lack an important property that limits their
applicability to high-dimensional feature spaces (i.e., datasets characterized by a large number of input
features): They do not easily allow the creation of sparse models, i.e., models that base their predictions
on only a few input features. This is the case because GAMs typically are trained in an iterative fashion
(through backfitting or gradient boosting), where a feature is selected in each iteration to minimize the
remaining loss of the model (Hastie and Tibshirani, 1986; Nori et al., 2019). Although very powerful, this
iterative approach makes the implementation of additional sparsity constraints difficult. Consequently,
commonly used are linear feature selection methods, however data can have non-linear relations, which
simple models such as a linear regressions are unable to detect.
Only recently, novel neural network based GAMs have been proposed which do not rely on the iteration
over features (e.g., Kraus et al., 2023; Yang, Zhang, and Sudjianto, 2021). This paper extends the
Interpretable Generalized Additive Neural Network (IGANN) framework with a focus on sparsity and
interpretability. It also positions the IGANN sparse model as a powerful exploratory tool for exploring
non-linear dependencies in data that traditional GAMs may not readily uncover.
Our contributions are fourfold: First, we propose a novel approach to fast training of sparse neural networks
using extreme learning machines. Second, we incorporate this method into the IGANN model, resulting
in the introduction of IGANN Sparse.2 Third, we validate that IGANN Sparse maintains comparable
predictive accuracy to its non-sparse counterparts while significantly reducing the number of features,
thereby improving interpretability. Finally, we demonstrate the utility of IGANN Sparse in non-linear
feature selection, establishing its role in exploratory data analysis and interpretative modeling.
Our work has implications for both predictive analytics research and practice. We address the issue of
feature selection, which is a common step during pre-processing of machine learning pipelines. To achieve
this, we expand IGANN, a model capable of learning non-linear relations in data, by presenting a sparse
IGANN version. Furthermore, our approach has implications for IS researchers, as training a sparse model
for feature selection is a promising way to keep the logic of prediction models comprehensible. Therefore,
our model offers a promising tool for empirical IS studies that are concerned with popular research
questions related to predictive model building, such as predicting purchase behavior, price dynamics, user
satisfaction or technology acceptance (Kühl et al., 2021; Shmueli and Koppius, 2011).
The remaining paper is structured as follows: In Section 2, we introduce some related work. We propose
the new sparse model in Section 3. We test this model according to experiments described in Section 4,
followed by a presentation and discussion of the results in Section 5, and Section 6, respectively.

2 mplementation on GitHub: https://github.com/MathiasKraus/igann

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 2

Stoecker et al. / IGANN Sparse

2 Conceptual Background and Related Work

2.1 Sparse Prediction Models

When dealing with high dimensional data, it is often beneficial to use methods to decrease the number
of features impacting the final prediction. Sparsity improves model interpretability because the human
mind is not capable of processing the number of information units (i.e., features) that an ML model can
process at a time. This limitation lies around 7 ± 2 information units (Miller, 1956; Rudin, 2019). There
are a number of techniques such as compression, where the model size is decreased, principal component
analysis (PCA), which finds vectors orthogonal to each other, in order to mathematically represent the
most information in the least features, and feature selection (Gui et al., 2017). While compression and
PCA lead to model inputs which humans are unable to comprehend intuitively, feature selection methods
simply reduce the number of features that a ML model bases its prediction on. This allows researchers
and decision-makers to fully comprehend the model behavior (Gui et al., 2017).
One way of determining the relevance of a feature is to use some linear model to determine its predictive
power with regards to the target variable. Lasso used as feature selector does this by fitting a linear
regression with L1 regularization (Hastie, Tibshirani, et al., 2009). Thereby, coefficients are pushed to
zero in order to result in a sparse model where most of the input features can be discarded. These methods,
however, fall short in cases where features naturally have a non-linear effect. For instance in the context
of health analytics, neither a body temperature that is too low nor too high is healthy, thus, linear feature
selectors can easily ignore the importance of such an oftentimes powerful feature.

2.2 Interpretable Generalized Additive Neural Network (IGANN)

Increases in complexity of neural networks or models such as gradient boosted decision trees have
improved the predictive power in a trade off for interpretability, leading to so-called black-box models. To
tackle this challenge, recently proposed methods have kept some of the core innovations from black-box
models, but altered core parts to obtain fully interpretable models, such as GAMs (e.g., Kraus et al., 2023;
Nori et al., 2019).
IGANN, a novel ML model belonging to the GAM family, fits shape functions using a boosted ensemble
of neural networks, where each network represents an extreme learning machine (ELM), as illustrated in
Figure 1. ELMs are simple feed-forward neural networks that use a faster learning method than gradient-
based algorithms (Huang, Q.-Y. Zhu, and Siew, 2006). In detail, the training only includes updating the
weights of the output layer and, thus, is equal to fitting a linear model. Overall, IGANN has been shown to
produce smooth shape functions that can be easily comprehended by users. Furthermore, the way in which
IGANN trains the networks makes it an interesting choice to introduce model sparsity in a non-linear
fashion, which we describe in the following.

3 IGANN Sparse
As described above, IGANN uses a sequence of ELMs to compute the GAM. In the following, we
make use of this model choice to incorporate a sparsity-layer in the first ELM which allows to select the
(potentially non-linear) most important features. Figure 1 illustrates this basic idea.
For a fixed number of inputs x(1) , x(2) , . . . , x(m) , the ELM maps each input onto k non-linear hidden
(1) (1) (m) (m)
activations, which we call h1 , . . . , hk , . . ., h1 , . . . , hk . We denote the vectors that store the respective
(i) (i)
hidden activations by h(i) , i.e., h(i) = [h1 , . . . , hk ] and let h represent all hidden activations, i.e., h =
[h(1) , . . . , h(m) ].
In the traditional IGANN model, the ELM is now trained by solving the linear problem

min L (y, β T h) (1)

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 3

Stoecker et al. / IGANN Sparse

where β denotes the coefficients in the last layer, y the true target variable, and L describes the loss
function, such as mean squared error or cross-entropy. As can be seen in Equation 1, the ELM thus merely
solves a linear problem, yet the non-linear activations allow to capture highly non-linear effects (Kraus
et al., 2023).

Sparsity-Layer

Figure 1. First ELM from IGANN Sparse with three features as input which includes the sparsity-layer.
Each feature is processed by a sub-network of the whole ELM. For input x(1) , the green part of the model
highlights the corresponding sub-network.

This work makes use of this characteristic by introducing a sparsity-layer. Given the non-linear activations
h(1) , . . . , h(m) , where each h(i) represents a block of k values, it’s critical to ensure that the model remains
interpretable and avoids overfitting by using only the most important blocks of activations. To achieve
this, we introduce a sparsity-inducing step using the best-subset selection approach (J. Zhu et al., 2022).
Considering each block h(i) as an individual subset, the best-subset selection aims to find the subset
of blocks which, when used in the ELM model, results in the optimal balance between model fit and
complexity. Mathematically, the problem can be extended from Equation 1 as

min L (y, β T hS ), (2)

β ,S

where S denotes the selected subset of blocks from h(1) , . . . , h(m) , and hS represents the hidden activations
corresponding to this subset. The objective is to minimize the Bayesian Information Criterion (BIC) which
is defined as

BIC = |S| ln(n) − 2 ln(L̂), (3)

where S is the set of all selected blocks, |S| is the number of selected blocks (corresponding to features) by
the model, n is the number of observations, and L̂ is the observed value of the negative loss function for
the model. By using BIC in the selection process, only the blocks that add significant explanatory power
to the model’s predictions are retained, resulting in a sparser and more interpretable model.
With this approach, the majority of blocks in hS can be set to zero, leaving only those blocks that contribute
most significantly to the model’s predictive power. This results in a sparse representation of the hidden
activations.

4 Experiment Design
4.1 Datasets and Pre-Processing

Our experiments are based on common, publicly available benchmark datasets presented in Table 1. The
number of categorical features “cat” in these tables are measured after one-hot encoding. For preprocessing,
we removed columns like IDs or for categorical features with more than 25 distinct values as is done
in similar experiments (e.g., Zschech et al., 2022). Furthermore, the standard scaler from scikit-learn
is used for all numerical features. The data is split in a 5-fold cross validation to evaluate the model’s
performance.

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 4

Stoecker et al. / IGANN Sparse

Classification Regression
Dataset Samples num cat Dataset Samples num cat
college (Mukti, 2022) 1,000 4 10 bike (Fanaee-T and Gama, 2014) 17,379 7 5
churn (IBM, 2019) 7,043 3 37 wine (Cortez et al., 2009) 4,898 11 0
credit (Fair Isaac Corporation, 2018) 10,459 21 16 productivity (Imran et al., 2019) 1,197 9 26
income (Kohavi, 1996) 32,561 6 59 insurance (Lantz, 2015) 1,338 3 6
bank (Moro, Cortez, and Rita, 2014) 45,211 6 41 crimes (Redmond and Baveja, 2002) 1,994 100 0
airline (Klein, 2020) 103,904 18 6 farming (Sidhu, 2021) 3,893 7 3
recidivism (Angwin et al., 2016) 7,214 7 4 house (Pace and Barry, 1997) 20,640 8 0

Table 1. Overview of selected datasets covering classification (y ∈ {0, 1}) and regression (y ∈ R) tasks.
Samples describes the number of observations recorded in the dataset (number of rows). Numerical
(num) features and categorical (cat) features are the number of input columns representing numerical or
categorical values, respectively
. Cat features are measured after one-hot encoding.

4.2 Experiments

Our first experiment compares the prediction quality of our sparse model versus an unconstrained IGANN
model. This experiment assesses the trade-off between the quality of prediction and the level of sparsity.
Our second experiment tests the performance of the IGANN Sparse model for feature selection by
comparing it to a lasso model for feature selection. The main metrics to evaluate on are the number of
selected features, and the area under the receiver operating characteristic (AUROC) and root mean squared
error (RMSE) for classification and regression, respectively.
Both experiments are repeated for 20 times with different random states in both data split and during
model training, in order to gain a statistical distribution of the results, while maintaining reproducibility of
the results. The statistical analysis is done using a Wilcoxon test (Neuhäuser, 2011). The test assumes the
h0 hypothesis of similar model performance. For the comparison in experiment one we consider similar
model performance within a tolerance of one standard deviation, as a sparser model with comparable
performance is better in fields where comprehensibility is required (Rudin, 2019). The statistical analysis
for the second experiment comparing IGANN Sparse and lasso as feature selectors will be conducted
without tolerance.

5 Results
Table 2 shows the performance of IGANN Full and IGANN Sparse across 20 runs with a 5-fold cross
validation. For both experiments, the results of the Wilcoxon tests are highlighted in the tables. In only
three cases, the sparse model selected equal to or more than 75 % of the datasets’ total features, in one
case as few as 4 %.
The sparse model is considered superior if within one standard deviation of its non-sparse counterpart.
The comparison between the predictive performance of IGANN Full and Sparse shows that for 10 out of
14 datasets IGANN sparse is significantly better at p≤0.01. Each of the statistical tests are based on 100
observations. Moreover, IGANN Sparse is significantly better in 11 out of 14 datasets at p≤0.05.
Figure 2 exemplary shows the performance for two datasets (college and credit) for varying numbers of
selected features. Already with as few as four features we find very promising predictive performance
which is not further improved using more features.
As a feature selector, IGANN Sparse performed better than lasso in 9 out of 14 cases. In three of our seven
classification datasets and in two of our seven regression datasets, lasso performed better than IGANN
Sparse. Making IGANN Sparse the better feature selector in the majority of datasets for both classification
and regression tasks.

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 5

Stoecker et al. / IGANN Sparse

Classification Regression
Dataset IGANN Full IGANN Sparse Dataset IGANN Full IGANN Sparse
AUROC ± SD AUROC ± SD # Features RMSE ± SD RMSE ± SD # Features
college 0.863 ± 0.022 0.852 ± 0.025∗∗ 72.0 % bike 0.766 ± 0.006 0.768 ± 0.007∗∗ 80.4 %
churn 0.722 ± 0.012 0.711 ± 0.013∗∗ 51.5 % wine 0.901 ± 0.015 0.914 ± 0.016∗ 34.8 %
credit 0.731 ± 0.009 0.725 ± 0.016∗∗ 44.8 % productivity 0.896 ± 0.032∗∗ 0.960 ± 0.038 62.3 %
income 0.775 ± 0.006∗∗ 0.706 ± 0.024 51.1 % insurance 0.706 ± 0.020 0.707 ± 0.020∗∗ 68.0 %
bank 0.587 ± 0.005 0.584 ± 0.006∗∗ 90.1 % crimes 0.771 ± 0.026 0.778 ± 0.026∗∗ 4.0 %
airline 0.933 ± 0.002∗∗ 0.929 ± 0.002 64.6 % farming 0.815 ± 0.020 0.822 ± 0.020∗∗ 56.0 %
recidivism 0.685 ± 0.010 0.680 ± 0.014∗∗ 51.4 % house 0.733 ± 0.006 0.735 ± 0.005∗∗ 78.6 %

Table 2. Performance comparison of classification and regression datasets, showing AUROC and RMSE
results as well as standard deviations (SD) of the standard IGANN model compared to the sparse model
with the average percentage of selected features out of the input features described in Table 1, including
both categorical as well as numerical features. For classification, values closer to 1 are better, and for
regression, lower values are better. Significant differences using the Wilcoxon signed rank test are marked
with ∗ ∗ p ≤ 0.01, ∗p ≤ 0.05 at the respectively better model. We consider the sparser model to be better
if it performs within at least one standard deviation of the full model, due to its easier comprehensibility.

&ROOHJH &UHGLW

$852&

1XPEHURIIHDWXUHVVHOHFWHG 1XPEHURIIHDWXUHVVHOHFWHG

Figure 2. Impact of the number of features selected on model performance for the college and credit
dataset as measured using the AUROC score (closer to 1 is better).

6 Discussion & Future Work

This research introduced IGANN Sparse, a novel model designed for sparsity and interpretability in
predictive analytics projects. We validated its usage as both, a predictive model and a feature selector on a
variety of datasets. Despite resulting in much sparser models with as little as 4 % of the input features,
we showed that IGANN Sparse obtains competitive results in more than 75 % of our tested datasets.
Additionally, IGANN Sparse outperformed traditional feature selectors in the majority of cases. In future
work, we plan to include further state-of-the-art (black-box) models for a broader comparison, as well as
more diverse datasets with different properties. In this way, we aim to better understand our model’s merits
and limitations and explore its generalizability to different types of data, tasks, and contexts. Further, we
want to analyse the effect of including pairwise interactions to potentially improve predictive performance.
A key aspect of IGANN Sparse is its role as an exploratory tool for IS researchers, particularly capable
of revealing non-linear relationships within data, an area not fully explored by existing methods. This
capability positions IGANN Sparse not only as a predictive or feature-reducing model, but also as a means
for deeper data understanding. Future research can leverage the exploratory strengths of IGANN Sparse
in conjunction with domain-specific knowledge to validate theoretical models or to hypothesize new
relationships. This could be especially transformative in interdisciplinary IS research, where the fusion of

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 6

Stoecker et al. / IGANN Sparse

methodological robustness with domain expertise is essential for innovation.

Our assessment of interpretability primarily considered feature reduction from a mathematical perspective.
While this is a critical element (Wang et al., 2023), it alone does not capture the full spectrum of
model transparency. Interpretability extends beyond numerical simplicity to the model’s ability to convey
understanding and actionable insights to researchers and decision makers. To this end, we plan to conduct
user studies focused on evaluating the practical interpretability of IGANN Sparse, aiming to capture
comprehensive, qualitative feedback from domain experts on its effectiveness and applicability in real-
world scenarios. These studies are designed to explore how well different stakeholders can grasp the
underlying model mechanics, apply its predictions to complex settings, and trust its recommendations in
their professional environment.
In conclusion, our research shows that simplification in prediction models does not require compromising
performance. IGANN Sparse shows the potential of machine learning models that can achieve the critical
balance between simplicity and accuracy that is decisive to practical understanding and application. We
believe that the principles and results of this study will make a valuable contribution to the IS discourse
and inspire further research in interpretable modeling.

References
Angwin, J., J. Larson, S. Mattu, and L. Kirchner (2016). “Machine bias.” In: Ethics of Data and Analytics.
Auerbach Publications, pp. 254–264.
Bauer, K., M. von Zahn, and O. Hinz (2023). “Expl(AI)ned: The Impact of Explainable Artificial
Intelligence on Users’ Information Processing.” Information Systems Research. ISSN: 1047-7047.
Broniatowski, D. A. (2021). Psychological foundations of explainability and interpretability in artifi-
cial intelligence. Tech. rep. NIST IR 8367. Gaithersburg, MD: National Institute of Standards and
Technology (U.S.), NIST IR 8367.
Cortez, P., A. Cerdeira, F. Almeida, T. Matos, and J. Reis (2009). “Modeling wine preferences by data
mining from physicochemical properties.” Decision Support Systems 47 (4). Smart Business Networks:
Concepts and Empirical Evidence, 547–553. ISSN: 0167-9236.
Esteva, A., A. Robicquet, B. Ramsundar, V. Kuleshov, M. DePristo, K. Chou, C. Cui, G. Corrado, S.
Thrun, and J. Dean (2019). “A guide to deep learning in healthcare.” Nature Medicine 25 (1), 24–29.
ISSN : 1546-170X.
Fair Isaac Corporation (2018). Explainable Machine Learning Challenge.
Fanaee-T, H. and J. Gama (2014). “Event labeling combining ensemble detectors and background
knowledge.” Progress in Artificial Intelligence 2 (2), 113–127.
Gui, J., Z. Sun, S. Ji, D. Tao, and T. Tan (2017). “Feature Selection Based on Structured Sparsity: A
Comprehensive Study.” IEEE Transactions on Neural Networks and Learning Systems 28 (7), 1490–
1507.
Hastie, T. and R. Tibshirani (1986). “Generalized Additive Models.” Statistical Science 1 (3), 297–310.
Hastie, T., R. Tibshirani, J. H. Friedman, and J. H. Friedman (2009). The elements of statistical learning:
data mining, inference, and prediction. Vol. 2. Springer.
Huang, G.-B., Q.-Y. Zhu, and C.-K. Siew (2006). “Extreme learning machine: Theory and applications.”
Neurocomputing 70 (1-3), 489–501.
IBM (2019). Telco customer churn. https://community.ibm.com/community/user/businessanalytics/
blogs/steven-macko/2019/07/11/telco-customer-churn-1113.
Imran, A. A., M. N. Amin, M. R. I. Rifat, and S. Mehreen (2019). “Deep Neural Network Approach for
Predicting the Productivity of Garment Employees.” In: 2019 6th International Conference on Control,
Decision and Information Technologies (CoDIT). IEEE.
Kim, B. R., K. Srinivasan, S. H. Kong, J. H. Kim, C. S. Shin, and S. Ram (2023). “ROLEX: A Novel
Method for Interpretable Machine Learning Using Robust Local Explanations.” Management Informa-
tion Systems Quarterly 47 (3), 1303–1332. ISSN: ISSN 0276-7783/ISSN 2162-9730.

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 7

Stoecker et al. / IGANN Sparse

Klein, T. (2020). Airline Passenger Satisfaction. URL: https : / / www . kaggle . com / datasets /
teejmahal20/airline-passenger-satisfaction.
Kohavi, R. (1996). “Scaling up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid.” In:
Proceedings of the Second International Conference on Knowledge Discovery and Data Mining.
KDD’96. Portland, Oregon: AAAI Press, pp. 202–207.
Kraus, M., D. Tschernutter, S. Weinzierl, and P. Zschech (2023). “Interpretable generalized additive neural
networks.” European Journal of Operational Research. ISSN: 0377-2217.
Kühl, N., R. Hirt, L. Baier, B. Schmitz, and G. Satzger (2021). “How to Conduct Rigorous Supervised
Machine Learning in Information Systems Research: The Supervised Machine Learning Report Card.”
Communications of the Association for Information Systems 48 (1), 589–615. ISSN: 15293181.
Lantz, B. (2015). Machine learning with R: Learn how to use R to apply powerful machine learning
methods and gain an insight into real-world applications. Packt Publ.
Lundberg, S. M. and S.-I. Lee (2017). “A unified approach to interpreting model predictions.” Advances
in neural information processing systems 30.
Miller, G. A. (1956). “The magical number seven, plus or minus two: Some limits on our capacity for
processing information.” Psychological review 63 (2), 81.
Moro, S., P. Cortez, and P. Rita (2014). “A data-driven approach to predict the success of bank telemarket-
ing.” Decision Support Systems 62, 22–31.
Mukti, S. S. J. (2022). Go To College Dataset. URL: https : / / www . kaggle . com / datasets /
saddamazyazy/go-to-college-dataset.
Neuhäuser, M. (2011). “Wilcoxon–Mann–Whitney Test.” In: International Encyclopedia of Statistical
Science. Ed. by M. Lovric. Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 1656–1658.
Nori, H., S. Jenkins, P. Koch, and R. Caruana (2019). “InterpretML: A Unified Framework for Machine
Learning Interpretability.” arXiv: 1909.09223.
Pace, K. R. and R. Barry (1997). “Sparse spatial autoregressions.” Statistics & Probability Letters 33 (3),
291–297.
Redmond, M. and A. Baveja (2002). “A data-driven software tool for enabling cooperative information
sharing among police departments.” European Journal of Operational Research 141 (3), 660–678.
Ribeiro, M. T., S. Singh, and C. Guestrin (2016). “"Why should i trust you?" Explaining the predictions
of any classifier.” In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge
discovery and data mining, pp. 1135–1144.
Rudin, C. (2019). “Stop explaining black box machine learning models for high stakes decisions and use
interpretable models instead.” Nature Machine Intelligence 1 (5), 206–215. ISSN: 2522-5839.
Shmueli and Koppius (2011). “Predictive Analytics in Information Systems Research.” MIS Quarterly
35 (3), 553. ISSN: 02767783. DOI: 10.2307/23042796.
Sidhu, G. S. (2021). Crab Age Prediction. DOI: 10 . 34740 / KAGGLE / DSV / 2834512. URL: https :
//www.kaggle.com/dsv/2834512.
Wang, C., B. Han, B. Patel, and C. Rudin (2023). “In Pursuit of Interpretable, Fair and Accurate Machine
Learning for Criminal Recidivism Prediction.” Journal of Quantitative Criminology 39 (2), 519–581.
ISSN : 1573-7799.
Yang, Z., A. Zhang, and A. Sudjianto (2021). “GAMI-Net: An explainable neural network based on
generalized additive models with structured interactions.” Pattern Recognition 120, 108192. ISSN:
0031-3203.
Zhu, J., X. Wang, L. Hu, J. Huang, K. Jiang, Y. Zhang, S. Lin, and J. Zhu (2022). abess: A Fast Best
Subset Selection Library in Python and R. arXiv: 2110.09697.
Zschech, P., S. Weinzierl, N. Hambauer, S. Zilker, and M. Kraus (2022). “GAM(e) Change or Not? An
Evaluation of Interpretable Machine Learning Models Based on Additive Model Constraints.” In:
Proceedings of the 30th European Conference on Information Systems (ECIS).

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 8

Advances in Complex Data Modeling and Computational Methods in Statistics
No ratings yet
Advances in Complex Data Modeling and Computational Methods in Statistics
210 pages
LASSO Book Tibshirani PDF
No ratings yet
LASSO Book Tibshirani PDF
362 pages
SLS Corrected 1.4.16 PDF
No ratings yet
SLS Corrected 1.4.16 PDF
362 pages
Fair Classification With Adversarial Perturbations
No ratings yet
Fair Classification With Adversarial Perturbations
72 pages
Reviews Less 1 - 4
No ratings yet
Reviews Less 1 - 4
115 pages
A Unifying Framework For Robust and Efficient Inference With Unstructured Data
No ratings yet
A Unifying Framework For Robust and Efficient Inference With Unstructured Data
72 pages
M5 Big Data Analytics
No ratings yet
M5 Big Data Analytics
57 pages
Economic Predictions With Big Data The Illusion of Sparsity
No ratings yet
Economic Predictions With Big Data The Illusion of Sparsity
50 pages
S F C: D E I C G L M: Parse Eature Ircuits Iscovering AND Diting Nterpretable Ausal Raphs IN Anguage Odels
No ratings yet
S F C: D E I C G L M: Parse Eature Ircuits Iscovering AND Diting Nterpretable Ausal Raphs IN Anguage Odels
36 pages
Sparse Multiple Index Models For High-Dimensional Nonparametric Forecasting
No ratings yet
Sparse Multiple Index Models For High-Dimensional Nonparametric Forecasting
29 pages
Neural Additive Models: Interpretable Machine Learning With Neural Nets
No ratings yet
Neural Additive Models: Interpretable Machine Learning With Neural Nets
23 pages
Neural Basis Models For Interpretability: Filip Radenovic Abhimanyu Dubey Dhruv Mahajan
No ratings yet
Neural Basis Models For Interpretability: Filip Radenovic Abhimanyu Dubey Dhruv Mahajan
21 pages
Module 3
No ratings yet
Module 3
20 pages
Sindy-Pi: A Robust Algorithm For Parallel Implicit Sparse Identification of Nonlinear Dynamics
No ratings yet
Sindy-Pi: A Robust Algorithm For Parallel Implicit Sparse Identification of Nonlinear Dynamics
26 pages
Algorithms For Interpretable Machine Learning
No ratings yet
Algorithms For Interpretable Machine Learning
125 pages
A Guide To 21 Feature Importance Methods and Packages in Machine Learning (With Code) - by Theophano Mitsa - Dec, 2023 - Towards Data Science
100% (1)
A Guide To 21 Feature Importance Methods and Packages in Machine Learning (With Code) - by Theophano Mitsa - Dec, 2023 - Towards Data Science
41 pages
Interpretable Machine Learning
100% (4)
Interpretable Machine Learning
251 pages
Variable Precision Rough Set Model
No ratings yet
Variable Precision Rough Set Model
21 pages
2022 Unsupervised Selectivity Estimation by Integrating Gaussian Mixture Models and An Autoregressive Model
No ratings yet
2022 Unsupervised Selectivity Estimation by Integrating Gaussian Mixture Models and An Autoregressive Model
13 pages
Employees Career Survey Analysis
No ratings yet
Employees Career Survey Analysis
13 pages
Rsational Shapley Values
No ratings yet
Rsational Shapley Values
12 pages
Extending Loom
No ratings yet
Extending Loom
12 pages
MLIBooklet
No ratings yet
MLIBooklet
40 pages
The Python Library For Automated Feature Engineering and Selection
No ratings yet
The Python Library For Automated Feature Engineering and Selection
11 pages
SUMSEM2023-24 CSI3901 ETH VL2023240701291 2024-06-06 Reference-Material-I
No ratings yet
SUMSEM2023-24 CSI3901 ETH VL2023240701291 2024-06-06 Reference-Material-I
10 pages
Model Agnostic Generation of Counterfactual Explanations For Molecules - Geemi P Wellawatte, Aditi Seshadri, Andrew D White
No ratings yet
Model Agnostic Generation of Counterfactual Explanations For Molecules - Geemi P Wellawatte, Aditi Seshadri, Andrew D White
14 pages
Unit-2: Logistic Regression
No ratings yet
Unit-2: Logistic Regression
30 pages
Learning Sparse Nonlinear Dynamics Via Mixed-Integer Optimization
No ratings yet
Learning Sparse Nonlinear Dynamics Via Mixed-Integer Optimization
20 pages
Live Final
No ratings yet
Live Final
16 pages
Pysindy: A Python Package For The Sparse Identification of Nonlinear Dynamics From Data
No ratings yet
Pysindy: A Python Package For The Sparse Identification of Nonlinear Dynamics From Data
14 pages
Signal Perceptron On The Identifiability of Boolea
No ratings yet
Signal Perceptron On The Identifiability of Boolea
20 pages
XAI Basics
No ratings yet
XAI Basics
34 pages
Axioms 12 00997 v2
No ratings yet
Axioms 12 00997 v2
11 pages
iBreakDown Uncertainty of Model Explanations For N
No ratings yet
iBreakDown Uncertainty of Model Explanations For N
12 pages
Lore FItness Sharing
No ratings yet
Lore FItness Sharing
12 pages
BDA (18CS72) Module-5
No ratings yet
BDA (18CS72) Module-5
52 pages
Shapley-Based Explainable AI For Clustering
No ratings yet
Shapley-Based Explainable AI For Clustering
23 pages
Unlocking Online Insights: LSTM Exploration and Transfer Learning Prospects
No ratings yet
Unlocking Online Insights: LSTM Exploration and Transfer Learning Prospects
14 pages
2.SupervisedLearning Error
No ratings yet
2.SupervisedLearning Error
32 pages
Comparison of The Performance of GaussianNB Algorithm, The K Neighbors Classifier Algorithm
No ratings yet
Comparison of The Performance of GaussianNB Algorithm, The K Neighbors Classifier Algorithm
11 pages
An Introduction To Variable and Feature Selection: Isabelle Guyon
No ratings yet
An Introduction To Variable and Feature Selection: Isabelle Guyon
26 pages
Applsci 14 05975
No ratings yet
Applsci 14 05975
13 pages
Interpretable Machine Learning PDF
100% (2)
Interpretable Machine Learning PDF
251 pages
A Comparison Between Machine and Deep Learning Models On High Stationarity Data
No ratings yet
A Comparison Between Machine and Deep Learning Models On High Stationarity Data
11 pages
PAACDA Comprehensive Data Corruption Detection Algorithm
No ratings yet
PAACDA Comprehensive Data Corruption Detection Algorithm
8 pages
Comparing Interpretability and Explainability For Feature Selection
No ratings yet
Comparing Interpretability and Explainability For Feature Selection
12 pages
An Introduction To Variable and Feature Selection: Isabelle Guyon
No ratings yet
An Introduction To Variable and Feature Selection: Isabelle Guyon
26 pages
Entropy 23 000
No ratings yet
Entropy 23 000
1 page
School of Computing and Creative Media XBIS 2023 Data Science Assignment Report
No ratings yet
School of Computing and Creative Media XBIS 2023 Data Science Assignment Report
21 pages
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
Economic Predictions With Big Data
No ratings yet
Economic Predictions With Big Data
5 pages
Entropy 23 00018 v2 36
No ratings yet
Entropy 23 00018 v2 36
1 page
A Unified Approach To Interpreting Model Predictions
No ratings yet
A Unified Approach To Interpreting Model Predictions
9 pages
Electric Power Scam Prediction Using Machine Learning Techniques
No ratings yet
Electric Power Scam Prediction Using Machine Learning Techniques
8 pages
Entropy: Explainable AI: A Review of Machine Learning Interpretability Methods
No ratings yet
Entropy: Explainable AI: A Review of Machine Learning Interpretability Methods
45 pages
Effective Stiffness of Reinforced Concrete Columns
No ratings yet
Effective Stiffness of Reinforced Concrete Columns
9 pages
Lundberg, Lee - 2017 - A Unified Approach To Interpreting Model Predictions (2) - Annotated
No ratings yet
Lundberg, Lee - 2017 - A Unified Approach To Interpreting Model Predictions (2) - Annotated
11 pages
Astm e 165 - 02
100% (1)
Astm e 165 - 02
20 pages
MSC Circ 0913
No ratings yet
MSC Circ 0913
11 pages
PSM1
No ratings yet
PSM1
4 pages
Conachey2008
No ratings yet
Conachey2008
23 pages
2025 - Fairview Bio Pi Mock F4
No ratings yet
2025 - Fairview Bio Pi Mock F4
13 pages
ch01 Edit v2
No ratings yet
ch01 Edit v2
33 pages
Install
No ratings yet
Install
3 pages
Societyof Bahrain Engineers Paperon Effective RCMapproach
No ratings yet
Societyof Bahrain Engineers Paperon Effective RCMapproach
18 pages
Design of RO RO Ferry For Yangon Port
No ratings yet
Design of RO RO Ferry For Yangon Port
12 pages
ADA417876
No ratings yet
ADA417876
130 pages
Conachey2008 - 1
No ratings yet
Conachey2008 - 1
16 pages
MicroMonsta 2 Manual EN 2.3
No ratings yet
MicroMonsta 2 Manual EN 2.3
36 pages
Udr 1000 Mkvi
No ratings yet
Udr 1000 Mkvi
2 pages
Q. No Sub Q.No Answer: (Autonomous)
No ratings yet
Q. No Sub Q.No Answer: (Autonomous)
23 pages
Contribution of Renewable Energy On Total Energy Capacity
No ratings yet
Contribution of Renewable Energy On Total Energy Capacity
6 pages
Kintor Pharmaceutical (9939 HK) : Specializing in AR-related Innovative Therapies
No ratings yet
Kintor Pharmaceutical (9939 HK) : Specializing in AR-related Innovative Therapies
78 pages
Class 10 - Maths - Arithmetic Progressions
No ratings yet
Class 10 - Maths - Arithmetic Progressions
51 pages
Emovon Et Al
No ratings yet
Emovon Et Al
18 pages
Predictive Maintenance Model For Marine Vessels Using Machine Learning
No ratings yet
Predictive Maintenance Model For Marine Vessels Using Machine Learning
12 pages
Spe 201216 Ms Minifrac
No ratings yet
Spe 201216 Ms Minifrac
12 pages
Marine POL
No ratings yet
Marine POL
24 pages
Sepam80 64REF Wiring 4wire Low-Voltage Transformer T81 v0
No ratings yet
Sepam80 64REF Wiring 4wire Low-Voltage Transformer T81 v0
2 pages
Office of The Chief of Naval Operations 2000 Navy Pentagon WASHINGTON, DC 20350-2000
No ratings yet
Office of The Chief of Naval Operations 2000 Navy Pentagon WASHINGTON, DC 20350-2000
11 pages
Simple Compound Complex Sentences
No ratings yet
Simple Compound Complex Sentences
15 pages
Zou - Etal - 2019
No ratings yet
Zou - Etal - 2019
26 pages
Emovon
No ratings yet
Emovon
16 pages
Gupta2008
No ratings yet
Gupta2008
10 pages
Asgharzadeh2013
No ratings yet
Asgharzadeh2013
17 pages
Levy1967
No ratings yet
Levy1967
4 pages
Vibration-Guide-Nov22
No ratings yet
Vibration-Guide-Nov22
18 pages
Basurko - Et - Al
No ratings yet
Basurko - Et - Al
9 pages
Jonge, B. Et Al
No ratings yet
Jonge, B. Et Al
16 pages
Wu2006
No ratings yet
Wu2006
7 pages
19 Duan2020
No ratings yet
19 Duan2020
9 pages
Foreign Direct Investment and The Shipbu
No ratings yet
Foreign Direct Investment and The Shipbu
3 pages
Schlitzer1966
No ratings yet
Schlitzer1966
10 pages
Hondong
No ratings yet
Hondong
8 pages
MBA Marketing Research Project Guidelines
No ratings yet
MBA Marketing Research Project Guidelines
7 pages
Giorgio2014
No ratings yet
Giorgio2014
10 pages
Jacobs1998
No ratings yet
Jacobs1998
9 pages
13 Unver
No ratings yet
13 Unver
7 pages
Jardine1987
No ratings yet
Jardine1987
6 pages
Gross1977
No ratings yet
Gross1977
3 pages
DAILY LESSON LOG Organic Compounds
No ratings yet
DAILY LESSON LOG Organic Compounds
4 pages
Econometrics Problem Set
No ratings yet
Econometrics Problem Set
5 pages
Fenomenologia Da Psicologia
No ratings yet
Fenomenologia Da Psicologia
24 pages
Inner Ring
No ratings yet
Inner Ring
16 pages
Weibull 2 Calculations
No ratings yet
Weibull 2 Calculations
1 page
Cambridge International Examinations
No ratings yet
Cambridge International Examinations
12 pages
Features Features Features Features
No ratings yet
Features Features Features Features
8 pages
Contribution of Russia During Liberation War 13nov
No ratings yet
Contribution of Russia During Liberation War 13nov
2 pages
History Plan Week 6and 7. Term 1
No ratings yet
History Plan Week 6and 7. Term 1
2 pages
Introduction EMT357-upload Ver
No ratings yet
Introduction EMT357-upload Ver
19 pages
Kingspan Range Tribune Xe Brochure en GB
No ratings yet
Kingspan Range Tribune Xe Brochure en GB
16 pages
PROJECT
No ratings yet
PROJECT
6 pages
Laporan Daftar Pengguna GoodEva SmartSafety - Batch 1
No ratings yet
Laporan Daftar Pengguna GoodEva SmartSafety - Batch 1
3 pages
Stanford GSB Ee Sample Schedule MRR
No ratings yet
Stanford GSB Ee Sample Schedule MRR
1 page
A Simple Proof of Bernoulli's Inequality: Sanjeev Saxena
No ratings yet
A Simple Proof of Bernoulli's Inequality: Sanjeev Saxena
2 pages
Rubric 4
No ratings yet
Rubric 4
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Igann Sparse: Bridging Sparsity and Interpretability With Non-Linear Insight

Uploaded by

Igann Sparse: Bridging Sparsity and Interpretability With Non-Linear Insight

Uploaded by

IGANN SPARSE: BRIDGING SPARSITY AND

INTERPRETABILITY WITH NON-LINEAR INSIGHT

Stoecker, Theodor Felix, FAU Erlangen-Nürnberg, Nürnberg, Germany, theo.stoecker@fau.de

Keywords: Machine Learning, Explainable AI, Model Interpretability, Model Sparsity

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 1

2 mplementation on GitHub: https://github.com/MathiasKraus/igann

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 2

2 Conceptual Background and Related Work

2.2 Interpretable Generalized Additive Neural Network (IGANN)

min L (y, β T h) (1)

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 3

min L (y, β T hS ), (2)

BIC = |S| ln(n) − 2 ln(L̂), (3)

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 4

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 5

6 Discussion & Future Work

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 6

methodological robustness with domain expertise is essential for innovation.

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 7

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 8

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.