0% found this document useful (0 votes)
19 views8 pages

Igann Sparse: Bridging Sparsity and Interpretability With Non-Linear Insight

Uploaded by

sufian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views8 pages

Igann Sparse: Bridging Sparsity and Interpretability With Non-Linear Insight

Uploaded by

sufian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

IGANN SPARSE: BRIDGING SPARSITY AND

INTERPRETABILITY WITH NON-LINEAR INSIGHT

Research in Progress

Stoecker, Theodor Felix, FAU Erlangen-Nürnberg, Nürnberg, Germany, theo.stoecker@fau.de


Hambauer, Nico, FAU Erlangen-Nürnberg, Nürnberg, Germany, nico.hambauer@fau.de
Zschech, Patrick, Leipzig University, Leipzig, Germany, patrick.zschech@uni-leipzig.de
Kraus, Mathias, University of Regensburg, Regensburg, Germany, mathias.kraus@ur.de
arXiv:2403.11363v1 [cs.LG] 17 Mar 2024

Abstract
Feature selection is a critical component in predictive analytics that significantly affects the prediction
accuracy and interpretability of models. Intrinsic methods for feature selection are built directly into
model learning, providing a fast and attractive option for large amounts of data. Machine learning
algorithms, such as penalized regression models (e.g., lasso) are the most common choice when it comes
to in-built feature selection. However, they fail to capture non-linear relationships, which ultimately affects
their ability to predict outcomes in intricate datasets. In this paper, we propose IGANN Sparse, a novel
machine learning model from the family of generalized additive models, which promotes sparsity through a
non-linear feature selection process during training. This ensures interpretability through improved model
sparsity without sacrificing predictive performance. Moreover, IGANN Sparse serves as an exploratory
tool for information systems researchers to unveil important non-linear relationships in domains that
are characterized by complex patterns. Our ongoing research is directed at a thorough evaluation of
the IGANN Sparse model, including user studies that allow to assess how well users of the model can
benefit from the reduced number of features. This will allow for a deeper understanding of the interactions
between linear vs. non-linear modeling, number of selected features, and predictive performance.

Keywords: Machine Learning, Explainable AI, Model Interpretability, Model Sparsity

1 Introduction
Predictive analytics is a crucial methodological stream of research in the field of information systems (IS)
that deals with the creation of empirical prediction models (Shmueli and Koppius, 2011). By leveraging
advanced machine learning (ML) techniques, researchers can uncover patterns and relationships within
large datasets, enabling them to anticipate future events, user behaviors, and system performance (Kühl
et al., 2021). In addition to its practical utility, predictive analytics also plays an important role in theory
development and testing, as well as relevance assessment (Shmueli and Koppius, 2011).
In recent years, the IS community has increasingly recognized the importance of ensuring that prediction
models not only provide high predictive performance, but are also comprehensible for explanation
purposes (Bauer, Zahn, and Hinz, 2023; Kim et al., 2023). There are basically two primary approaches
for ensuring model explainability: inherently interpretable models and post-hoc explainability methods.1
Inherently interpretable models, such as shallow decision trees, linear models, and generalized additive
1 It should be noted that the terms interpretability and explainability can have different meanings from a psychological perspective

(cf. Broniatowski, 2021). In this paper, however, we will concentrate on a technical differentiation (cf. Kraus et al., 2023).

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 1


Stoecker et al. / IGANN Sparse

models (GAMs) (Hastie and Tibshirani, 1986), are designed to be human-readable without requiring
additional explanation. These models often employ techniques like linearity, and sparsity to enhance
their interpretability (Rudin, 2019). On the other hand, post-hoc explainability tools like Shapley values
(Lundberg and Lee, 2017) or LIME (Ribeiro, Singh, and Guestrin, 2016) aim to approximate the complex
behavior of non-interpretable models (Esteva et al., 2019). However, these post-hoc methods should be
applied cautiously, as they may not fully capture the intricate workings of the original models. For tabular
data, the choice between the two approaches is often guided by the marginal performance gains achievable
with complex models over interpretable ones (Rudin, 2019; Zschech et al., 2022).
GAMs and their various extensions base their final prediction on a number of independent functions that
map the input features to the output space of the model, where they are summed up to generate the final
prediction (Hastie and Tibshirani, 1986). Each of these functions usually only processes an individual
feature or a single interaction between two features, which allows to visualize the function after training
and gaining insights into the effect that a feature has on the model output.
Although GAMs have proven to be very powerful, they lack an important property that limits their
applicability to high-dimensional feature spaces (i.e., datasets characterized by a large number of input
features): They do not easily allow the creation of sparse models, i.e., models that base their predictions
on only a few input features. This is the case because GAMs typically are trained in an iterative fashion
(through backfitting or gradient boosting), where a feature is selected in each iteration to minimize the
remaining loss of the model (Hastie and Tibshirani, 1986; Nori et al., 2019). Although very powerful, this
iterative approach makes the implementation of additional sparsity constraints difficult. Consequently,
commonly used are linear feature selection methods, however data can have non-linear relations, which
simple models such as a linear regressions are unable to detect.
Only recently, novel neural network based GAMs have been proposed which do not rely on the iteration
over features (e.g., Kraus et al., 2023; Yang, Zhang, and Sudjianto, 2021). This paper extends the
Interpretable Generalized Additive Neural Network (IGANN) framework with a focus on sparsity and
interpretability. It also positions the IGANN sparse model as a powerful exploratory tool for exploring
non-linear dependencies in data that traditional GAMs may not readily uncover.
Our contributions are fourfold: First, we propose a novel approach to fast training of sparse neural networks
using extreme learning machines. Second, we incorporate this method into the IGANN model, resulting
in the introduction of IGANN Sparse.2 Third, we validate that IGANN Sparse maintains comparable
predictive accuracy to its non-sparse counterparts while significantly reducing the number of features,
thereby improving interpretability. Finally, we demonstrate the utility of IGANN Sparse in non-linear
feature selection, establishing its role in exploratory data analysis and interpretative modeling.
Our work has implications for both predictive analytics research and practice. We address the issue of
feature selection, which is a common step during pre-processing of machine learning pipelines. To achieve
this, we expand IGANN, a model capable of learning non-linear relations in data, by presenting a sparse
IGANN version. Furthermore, our approach has implications for IS researchers, as training a sparse model
for feature selection is a promising way to keep the logic of prediction models comprehensible. Therefore,
our model offers a promising tool for empirical IS studies that are concerned with popular research
questions related to predictive model building, such as predicting purchase behavior, price dynamics, user
satisfaction or technology acceptance (Kühl et al., 2021; Shmueli and Koppius, 2011).
The remaining paper is structured as follows: In Section 2, we introduce some related work. We propose
the new sparse model in Section 3. We test this model according to experiments described in Section 4,
followed by a presentation and discussion of the results in Section 5, and Section 6, respectively.

2 mplementation on GitHub: https://github.com/MathiasKraus/igann

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 2


Stoecker et al. / IGANN Sparse

2 Conceptual Background and Related Work


2.1 Sparse Prediction Models

When dealing with high dimensional data, it is often beneficial to use methods to decrease the number
of features impacting the final prediction. Sparsity improves model interpretability because the human
mind is not capable of processing the number of information units (i.e., features) that an ML model can
process at a time. This limitation lies around 7 ± 2 information units (Miller, 1956; Rudin, 2019). There
are a number of techniques such as compression, where the model size is decreased, principal component
analysis (PCA), which finds vectors orthogonal to each other, in order to mathematically represent the
most information in the least features, and feature selection (Gui et al., 2017). While compression and
PCA lead to model inputs which humans are unable to comprehend intuitively, feature selection methods
simply reduce the number of features that a ML model bases its prediction on. This allows researchers
and decision-makers to fully comprehend the model behavior (Gui et al., 2017).
One way of determining the relevance of a feature is to use some linear model to determine its predictive
power with regards to the target variable. Lasso used as feature selector does this by fitting a linear
regression with L1 regularization (Hastie, Tibshirani, et al., 2009). Thereby, coefficients are pushed to
zero in order to result in a sparse model where most of the input features can be discarded. These methods,
however, fall short in cases where features naturally have a non-linear effect. For instance in the context
of health analytics, neither a body temperature that is too low nor too high is healthy, thus, linear feature
selectors can easily ignore the importance of such an oftentimes powerful feature.

2.2 Interpretable Generalized Additive Neural Network (IGANN)

Increases in complexity of neural networks or models such as gradient boosted decision trees have
improved the predictive power in a trade off for interpretability, leading to so-called black-box models. To
tackle this challenge, recently proposed methods have kept some of the core innovations from black-box
models, but altered core parts to obtain fully interpretable models, such as GAMs (e.g., Kraus et al., 2023;
Nori et al., 2019).
IGANN, a novel ML model belonging to the GAM family, fits shape functions using a boosted ensemble
of neural networks, where each network represents an extreme learning machine (ELM), as illustrated in
Figure 1. ELMs are simple feed-forward neural networks that use a faster learning method than gradient-
based algorithms (Huang, Q.-Y. Zhu, and Siew, 2006). In detail, the training only includes updating the
weights of the output layer and, thus, is equal to fitting a linear model. Overall, IGANN has been shown to
produce smooth shape functions that can be easily comprehended by users. Furthermore, the way in which
IGANN trains the networks makes it an interesting choice to introduce model sparsity in a non-linear
fashion, which we describe in the following.

3 IGANN Sparse
As described above, IGANN uses a sequence of ELMs to compute the GAM. In the following, we
make use of this model choice to incorporate a sparsity-layer in the first ELM which allows to select the
(potentially non-linear) most important features. Figure 1 illustrates this basic idea.
For a fixed number of inputs x(1) , x(2) , . . . , x(m) , the ELM maps each input onto k non-linear hidden
(1) (1) (m) (m)
activations, which we call h1 , . . . , hk , . . ., h1 , . . . , hk . We denote the vectors that store the respective
(i) (i)
hidden activations by h(i) , i.e., h(i) = [h1 , . . . , hk ] and let h represent all hidden activations, i.e., h =
[h(1) , . . . , h(m) ].
In the traditional IGANN model, the ELM is now trained by solving the linear problem

min L (y, β T h) (1)


β

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 3


Stoecker et al. / IGANN Sparse

where β denotes the coefficients in the last layer, y the true target variable, and L describes the loss
function, such as mean squared error or cross-entropy. As can be seen in Equation 1, the ELM thus merely
solves a linear problem, yet the non-linear activations allow to capture highly non-linear effects (Kraus
et al., 2023).

Sparsity-Layer

Figure 1. First ELM from IGANN Sparse with three features as input which includes the sparsity-layer.
Each feature is processed by a sub-network of the whole ELM. For input x(1) , the green part of the model
highlights the corresponding sub-network.

This work makes use of this characteristic by introducing a sparsity-layer. Given the non-linear activations
h(1) , . . . , h(m) , where each h(i) represents a block of k values, it’s critical to ensure that the model remains
interpretable and avoids overfitting by using only the most important blocks of activations. To achieve
this, we introduce a sparsity-inducing step using the best-subset selection approach (J. Zhu et al., 2022).
Considering each block h(i) as an individual subset, the best-subset selection aims to find the subset
of blocks which, when used in the ELM model, results in the optimal balance between model fit and
complexity. Mathematically, the problem can be extended from Equation 1 as

min L (y, β T hS ), (2)


β ,S

where S denotes the selected subset of blocks from h(1) , . . . , h(m) , and hS represents the hidden activations
corresponding to this subset. The objective is to minimize the Bayesian Information Criterion (BIC) which
is defined as

BIC = |S| ln(n) − 2 ln(L̂), (3)


where S is the set of all selected blocks, |S| is the number of selected blocks (corresponding to features) by
the model, n is the number of observations, and L̂ is the observed value of the negative loss function for
the model. By using BIC in the selection process, only the blocks that add significant explanatory power
to the model’s predictions are retained, resulting in a sparser and more interpretable model.
With this approach, the majority of blocks in hS can be set to zero, leaving only those blocks that contribute
most significantly to the model’s predictive power. This results in a sparse representation of the hidden
activations.

4 Experiment Design
4.1 Datasets and Pre-Processing

Our experiments are based on common, publicly available benchmark datasets presented in Table 1. The
number of categorical features “cat” in these tables are measured after one-hot encoding. For preprocessing,
we removed columns like IDs or for categorical features with more than 25 distinct values as is done
in similar experiments (e.g., Zschech et al., 2022). Furthermore, the standard scaler from scikit-learn
is used for all numerical features. The data is split in a 5-fold cross validation to evaluate the model’s
performance.

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 4


Stoecker et al. / IGANN Sparse

Classification Regression
Dataset Samples num cat Dataset Samples num cat
college (Mukti, 2022) 1,000 4 10 bike (Fanaee-T and Gama, 2014) 17,379 7 5
churn (IBM, 2019) 7,043 3 37 wine (Cortez et al., 2009) 4,898 11 0
credit (Fair Isaac Corporation, 2018) 10,459 21 16 productivity (Imran et al., 2019) 1,197 9 26
income (Kohavi, 1996) 32,561 6 59 insurance (Lantz, 2015) 1,338 3 6
bank (Moro, Cortez, and Rita, 2014) 45,211 6 41 crimes (Redmond and Baveja, 2002) 1,994 100 0
airline (Klein, 2020) 103,904 18 6 farming (Sidhu, 2021) 3,893 7 3
recidivism (Angwin et al., 2016) 7,214 7 4 house (Pace and Barry, 1997) 20,640 8 0

Table 1. Overview of selected datasets covering classification (y ∈ {0, 1}) and regression (y ∈ R) tasks.
Samples describes the number of observations recorded in the dataset (number of rows). Numerical
(num) features and categorical (cat) features are the number of input columns representing numerical or
categorical values, respectively
. Cat features are measured after one-hot encoding.

4.2 Experiments

Our first experiment compares the prediction quality of our sparse model versus an unconstrained IGANN
model. This experiment assesses the trade-off between the quality of prediction and the level of sparsity.
Our second experiment tests the performance of the IGANN Sparse model for feature selection by
comparing it to a lasso model for feature selection. The main metrics to evaluate on are the number of
selected features, and the area under the receiver operating characteristic (AUROC) and root mean squared
error (RMSE) for classification and regression, respectively.
Both experiments are repeated for 20 times with different random states in both data split and during
model training, in order to gain a statistical distribution of the results, while maintaining reproducibility of
the results. The statistical analysis is done using a Wilcoxon test (Neuhäuser, 2011). The test assumes the
h0 hypothesis of similar model performance. For the comparison in experiment one we consider similar
model performance within a tolerance of one standard deviation, as a sparser model with comparable
performance is better in fields where comprehensibility is required (Rudin, 2019). The statistical analysis
for the second experiment comparing IGANN Sparse and lasso as feature selectors will be conducted
without tolerance.

5 Results
Table 2 shows the performance of IGANN Full and IGANN Sparse across 20 runs with a 5-fold cross
validation. For both experiments, the results of the Wilcoxon tests are highlighted in the tables. In only
three cases, the sparse model selected equal to or more than 75 % of the datasets’ total features, in one
case as few as 4 %.
The sparse model is considered superior if within one standard deviation of its non-sparse counterpart.
The comparison between the predictive performance of IGANN Full and Sparse shows that for 10 out of
14 datasets IGANN sparse is significantly better at p≤0.01. Each of the statistical tests are based on 100
observations. Moreover, IGANN Sparse is significantly better in 11 out of 14 datasets at p≤0.05.
Figure 2 exemplary shows the performance for two datasets (college and credit) for varying numbers of
selected features. Already with as few as four features we find very promising predictive performance
which is not further improved using more features.
As a feature selector, IGANN Sparse performed better than lasso in 9 out of 14 cases. In three of our seven
classification datasets and in two of our seven regression datasets, lasso performed better than IGANN
Sparse. Making IGANN Sparse the better feature selector in the majority of datasets for both classification
and regression tasks.

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 5


Stoecker et al. / IGANN Sparse

Classification Regression
Dataset IGANN Full IGANN Sparse Dataset IGANN Full IGANN Sparse
AUROC ± SD AUROC ± SD # Features RMSE ± SD RMSE ± SD # Features
college 0.863 ± 0.022 0.852 ± 0.025∗∗ 72.0 % bike 0.766 ± 0.006 0.768 ± 0.007∗∗ 80.4 %
churn 0.722 ± 0.012 0.711 ± 0.013∗∗ 51.5 % wine 0.901 ± 0.015 0.914 ± 0.016∗ 34.8 %
credit 0.731 ± 0.009 0.725 ± 0.016∗∗ 44.8 % productivity 0.896 ± 0.032∗∗ 0.960 ± 0.038 62.3 %
income 0.775 ± 0.006∗∗ 0.706 ± 0.024 51.1 % insurance 0.706 ± 0.020 0.707 ± 0.020∗∗ 68.0 %
bank 0.587 ± 0.005 0.584 ± 0.006∗∗ 90.1 % crimes 0.771 ± 0.026 0.778 ± 0.026∗∗ 4.0 %
airline 0.933 ± 0.002∗∗ 0.929 ± 0.002 64.6 % farming 0.815 ± 0.020 0.822 ± 0.020∗∗ 56.0 %
recidivism 0.685 ± 0.010 0.680 ± 0.014∗∗ 51.4 % house 0.733 ± 0.006 0.735 ± 0.005∗∗ 78.6 %

Table 2. Performance comparison of classification and regression datasets, showing AUROC and RMSE
results as well as standard deviations (SD) of the standard IGANN model compared to the sparse model
with the average percentage of selected features out of the input features described in Table 1, including
both categorical as well as numerical features. For classification, values closer to 1 are better, and for
regression, lower values are better. Significant differences using the Wilcoxon signed rank test are marked
with ∗ ∗ p ≤ 0.01, ∗p ≤ 0.05 at the respectively better model. We consider the sparser model to be better
if it performs within at least one standard deviation of the full model, due to its easier comprehensibility.

&ROOHJH &UHGLW
 

 

 
$852&

$852&

 

 

 
             
1XPEHURIIHDWXUHVVHOHFWHG 1XPEHURIIHDWXUHVVHOHFWHG

Figure 2. Impact of the number of features selected on model performance for the college and credit
dataset as measured using the AUROC score (closer to 1 is better).

6 Discussion & Future Work


This research introduced IGANN Sparse, a novel model designed for sparsity and interpretability in
predictive analytics projects. We validated its usage as both, a predictive model and a feature selector on a
variety of datasets. Despite resulting in much sparser models with as little as 4 % of the input features,
we showed that IGANN Sparse obtains competitive results in more than 75 % of our tested datasets.
Additionally, IGANN Sparse outperformed traditional feature selectors in the majority of cases. In future
work, we plan to include further state-of-the-art (black-box) models for a broader comparison, as well as
more diverse datasets with different properties. In this way, we aim to better understand our model’s merits
and limitations and explore its generalizability to different types of data, tasks, and contexts. Further, we
want to analyse the effect of including pairwise interactions to potentially improve predictive performance.
A key aspect of IGANN Sparse is its role as an exploratory tool for IS researchers, particularly capable
of revealing non-linear relationships within data, an area not fully explored by existing methods. This
capability positions IGANN Sparse not only as a predictive or feature-reducing model, but also as a means
for deeper data understanding. Future research can leverage the exploratory strengths of IGANN Sparse
in conjunction with domain-specific knowledge to validate theoretical models or to hypothesize new
relationships. This could be especially transformative in interdisciplinary IS research, where the fusion of

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 6


Stoecker et al. / IGANN Sparse

methodological robustness with domain expertise is essential for innovation.


Our assessment of interpretability primarily considered feature reduction from a mathematical perspective.
While this is a critical element (Wang et al., 2023), it alone does not capture the full spectrum of
model transparency. Interpretability extends beyond numerical simplicity to the model’s ability to convey
understanding and actionable insights to researchers and decision makers. To this end, we plan to conduct
user studies focused on evaluating the practical interpretability of IGANN Sparse, aiming to capture
comprehensive, qualitative feedback from domain experts on its effectiveness and applicability in real-
world scenarios. These studies are designed to explore how well different stakeholders can grasp the
underlying model mechanics, apply its predictions to complex settings, and trust its recommendations in
their professional environment.
In conclusion, our research shows that simplification in prediction models does not require compromising
performance. IGANN Sparse shows the potential of machine learning models that can achieve the critical
balance between simplicity and accuracy that is decisive to practical understanding and application. We
believe that the principles and results of this study will make a valuable contribution to the IS discourse
and inspire further research in interpretable modeling.

References
Angwin, J., J. Larson, S. Mattu, and L. Kirchner (2016). “Machine bias.” In: Ethics of Data and Analytics.
Auerbach Publications, pp. 254–264.
Bauer, K., M. von Zahn, and O. Hinz (2023). “Expl(AI)ned: The Impact of Explainable Artificial
Intelligence on Users’ Information Processing.” Information Systems Research. ISSN: 1047-7047.
Broniatowski, D. A. (2021). Psychological foundations of explainability and interpretability in artifi-
cial intelligence. Tech. rep. NIST IR 8367. Gaithersburg, MD: National Institute of Standards and
Technology (U.S.), NIST IR 8367.
Cortez, P., A. Cerdeira, F. Almeida, T. Matos, and J. Reis (2009). “Modeling wine preferences by data
mining from physicochemical properties.” Decision Support Systems 47 (4). Smart Business Networks:
Concepts and Empirical Evidence, 547–553. ISSN: 0167-9236.
Esteva, A., A. Robicquet, B. Ramsundar, V. Kuleshov, M. DePristo, K. Chou, C. Cui, G. Corrado, S.
Thrun, and J. Dean (2019). “A guide to deep learning in healthcare.” Nature Medicine 25 (1), 24–29.
ISSN : 1546-170X.
Fair Isaac Corporation (2018). Explainable Machine Learning Challenge.
Fanaee-T, H. and J. Gama (2014). “Event labeling combining ensemble detectors and background
knowledge.” Progress in Artificial Intelligence 2 (2), 113–127.
Gui, J., Z. Sun, S. Ji, D. Tao, and T. Tan (2017). “Feature Selection Based on Structured Sparsity: A
Comprehensive Study.” IEEE Transactions on Neural Networks and Learning Systems 28 (7), 1490–
1507.
Hastie, T. and R. Tibshirani (1986). “Generalized Additive Models.” Statistical Science 1 (3), 297–310.
Hastie, T., R. Tibshirani, J. H. Friedman, and J. H. Friedman (2009). The elements of statistical learning:
data mining, inference, and prediction. Vol. 2. Springer.
Huang, G.-B., Q.-Y. Zhu, and C.-K. Siew (2006). “Extreme learning machine: Theory and applications.”
Neurocomputing 70 (1-3), 489–501.
IBM (2019). Telco customer churn. https://community.ibm.com/community/user/businessanalytics/
blogs/steven-macko/2019/07/11/telco-customer-churn-1113.
Imran, A. A., M. N. Amin, M. R. I. Rifat, and S. Mehreen (2019). “Deep Neural Network Approach for
Predicting the Productivity of Garment Employees.” In: 2019 6th International Conference on Control,
Decision and Information Technologies (CoDIT). IEEE.
Kim, B. R., K. Srinivasan, S. H. Kong, J. H. Kim, C. S. Shin, and S. Ram (2023). “ROLEX: A Novel
Method for Interpretable Machine Learning Using Robust Local Explanations.” Management Informa-
tion Systems Quarterly 47 (3), 1303–1332. ISSN: ISSN 0276-7783/ISSN 2162-9730.

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 7


Stoecker et al. / IGANN Sparse

Klein, T. (2020). Airline Passenger Satisfaction. URL: https : / / www . kaggle . com / datasets /
teejmahal20/airline-passenger-satisfaction.
Kohavi, R. (1996). “Scaling up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid.” In:
Proceedings of the Second International Conference on Knowledge Discovery and Data Mining.
KDD’96. Portland, Oregon: AAAI Press, pp. 202–207.
Kraus, M., D. Tschernutter, S. Weinzierl, and P. Zschech (2023). “Interpretable generalized additive neural
networks.” European Journal of Operational Research. ISSN: 0377-2217.
Kühl, N., R. Hirt, L. Baier, B. Schmitz, and G. Satzger (2021). “How to Conduct Rigorous Supervised
Machine Learning in Information Systems Research: The Supervised Machine Learning Report Card.”
Communications of the Association for Information Systems 48 (1), 589–615. ISSN: 15293181.
Lantz, B. (2015). Machine learning with R: Learn how to use R to apply powerful machine learning
methods and gain an insight into real-world applications. Packt Publ.
Lundberg, S. M. and S.-I. Lee (2017). “A unified approach to interpreting model predictions.” Advances
in neural information processing systems 30.
Miller, G. A. (1956). “The magical number seven, plus or minus two: Some limits on our capacity for
processing information.” Psychological review 63 (2), 81.
Moro, S., P. Cortez, and P. Rita (2014). “A data-driven approach to predict the success of bank telemarket-
ing.” Decision Support Systems 62, 22–31.
Mukti, S. S. J. (2022). Go To College Dataset. URL: https : / / www . kaggle . com / datasets /
saddamazyazy/go-to-college-dataset.
Neuhäuser, M. (2011). “Wilcoxon–Mann–Whitney Test.” In: International Encyclopedia of Statistical
Science. Ed. by M. Lovric. Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 1656–1658.
Nori, H., S. Jenkins, P. Koch, and R. Caruana (2019). “InterpretML: A Unified Framework for Machine
Learning Interpretability.” arXiv: 1909.09223.
Pace, K. R. and R. Barry (1997). “Sparse spatial autoregressions.” Statistics & Probability Letters 33 (3),
291–297.
Redmond, M. and A. Baveja (2002). “A data-driven software tool for enabling cooperative information
sharing among police departments.” European Journal of Operational Research 141 (3), 660–678.
Ribeiro, M. T., S. Singh, and C. Guestrin (2016). “"Why should i trust you?" Explaining the predictions
of any classifier.” In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge
discovery and data mining, pp. 1135–1144.
Rudin, C. (2019). “Stop explaining black box machine learning models for high stakes decisions and use
interpretable models instead.” Nature Machine Intelligence 1 (5), 206–215. ISSN: 2522-5839.
Shmueli and Koppius (2011). “Predictive Analytics in Information Systems Research.” MIS Quarterly
35 (3), 553. ISSN: 02767783. DOI: 10.2307/23042796.
Sidhu, G. S. (2021). Crab Age Prediction. DOI: 10 . 34740 / KAGGLE / DSV / 2834512. URL: https :
//www.kaggle.com/dsv/2834512.
Wang, C., B. Han, B. Patel, and C. Rudin (2023). “In Pursuit of Interpretable, Fair and Accurate Machine
Learning for Criminal Recidivism Prediction.” Journal of Quantitative Criminology 39 (2), 519–581.
ISSN : 1573-7799.
Yang, Z., A. Zhang, and A. Sudjianto (2021). “GAMI-Net: An explainable neural network based on
generalized additive models with structured interactions.” Pattern Recognition 120, 108192. ISSN:
0031-3203.
Zhu, J., X. Wang, L. Hu, J. Huang, K. Jiang, Y. Zhang, S. Lin, and J. Zhu (2022). abess: A Fast Best
Subset Selection Library in Python and R. arXiv: 2110.09697.
Zschech, P., S. Weinzierl, N. Hambauer, S. Zilker, and M. Kraus (2022). “GAM(e) Change or Not? An
Evaluation of Interpretable Machine Learning Models Based on Additive Model Constraints.” In:
Proceedings of the 30th European Conference on Information Systems (ECIS).

Thirty-Second European Conference on Information Systems (ECIS 2024), Paphos, Cyprus 8

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy