0% found this document useful (0 votes)

32 views21 pages

Ijoc 2022 1251

Uploaded by

luis wilbert

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views21 pages

Ijoc 2022 1251

Uploaded by

luis wilbert

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

This article was downloaded by: [18.9.61.

111] On: 29 May 2023, At: 23:43

Publisher: Institute for Operations Research and the Management Sciences (INFORMS)
INFORMS is located in Maryland, USA

INFORMS Journal on Computing

Publication details, including instructions for authors and subscription information:
http://pubsonline.informs.org

Machine Learning Methods for Data-Driven Demand

Estimation and Assortment Planning Considering Cross-
Selling and Substitutions
Zhen-Yu Chen, Zhi-Ping Fan, Minghe Sun

To cite this article:

Zhen-Yu Chen, Zhi-Ping Fan, Minghe Sun (2023) Machine Learning Methods for Data-Driven Demand Estimation and Assortment
Planning Considering Cross-Selling and Substitutions. INFORMS Journal on Computing 35(1):158-177. https://doi.org/10.1287/
ijoc.2022.1251

Full terms and conditions of use: https://pubsonline.informs.org/Publications/Librarians-Portal/PubsOnLine-Terms-and-

Conditions

This article may be used only for the purposes of research, teaching, and/or private study. Commercial use
or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher
approval, unless otherwise noted. For more information, contact permissions@informs.org.

The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitness
for a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, or
inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or
support of claims made of that product, publication, or service.

Copyright © 2022, INFORMS

Please scroll down for article—it is on subsequent pages

With 12,500 members from nearly 90 countries, INFORMS is the largest international association of operations research (O.R.)
and analytics professionals and students. INFORMS provides unique networking and learning opportunities for individual
professionals, and organizations of all types and sizes, to better understand and use O.R. and analytics tools and methods to
transform strategic visions and achieve better outcomes.
For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org
INFORMS JOURNAL ON COMPUTING
Vol. 35, No. 1, January–February 2023, pp. 158–177
https://pubsonline.informs.org/journal/ijoc ISSN 1091-9856 (print), ISSN 1526-5528 (online)

Machine Learning Methods for Data-Driven Demand Estimation

and Assortment Planning Considering Cross-Selling and
Substitutions
Downloaded from informs.org by [18.9.61.111] on 29 May 2023, at 23:43 . For personal use only, all rights reserved.

Zhen-Yu Chen,a,* Zhi-Ping Fan,a Minghe Sunb

a
School of Business Administration, Northeastern University, Shenyang 110169, China; b Carlos Alvarez College of Business, The University
of Texas at San Antonio, San Antonio, Texas 78249
*Corresponding author
Contact: zychen@mail.neu.edu.cn, https://orcid.org/0000-0003-4158-3938 (Z-YC); zpfan@mail.neu.edu.cn,
https://orcid.org/0000-0001-6778-4637 (Z-PF); minghe.sun@utsa.edu, https://orcid.org/0000-0001-8503-9761 (MS)

Received: December 14, 2020 Abstract. This study develops machine learning methods for the data-driven demand esti
Revised: August 18, 2021; February 21, 2022; mation and assortment planning problem by addressing three subproblems, that is, de
July 4, 2022; September 11, 2022; mand forecasting simultaneously considering cross-selling and substitutions, estimation of
September 21, 2022
the cross-selling and substitution effects, and assortment optimization. These three sub
Accepted: September 27, 2022
problems are transformed into three sequentially related machine learning problems: col
Published Online in Articles in Advance:
November 16, 2022
lective demand forecasting, demand inference for cross-selling and substitutions, and
assortment rule mining. For collective demand forecasting, related product features are
introduced to consider both the cross-selling and substitution effects, and a collaborative
https://doi.org/10.1287/ijoc.2022.1251 coordinate descent method with a good convergence property is developed to make dis
Copyright: © 2022 INFORMS tributed demand forecasting and a global update of related product features. Using the
results, demand inference adopts transfer and semisupervised learning methods to tackle
the challenge of missing data in quantifying the cross-selling and substitution effects. For
assortment rule mining, the assortment rules bridge the gap between prediction and opti
mization, and the developed heuristics obtain the best assortment using the prior knowl
edge discovered in demand inference. The computational results on a real-world database
and a semisynthetic database show that collective demand forecasting obtained far better
results than the standard demand forecasting methods and some popular graph learning
methods, and the developed heuristics identified much better assortments than those
obtained with the baseline methods.

History: Accepted by Ram Ramesh, Area Editor for Data Science and Machine Learning.
Funding: This work was supported by the construction base project of discipline innovation and talent
introduction plan of Chinese higher educational institutions (111 project) [Grant B16009] and the
National Natural Science Foundation of China [Grant 72031002].
Supplemental Material: The online appendices are available at https://doi.org/10.1287/ijoc.2022.1251.

Keywords: data analytics • machine learning • assortment planning • demand forecasting • data-driven optimization

1. Introduction Despite extensive studies, some issues about the

Assortment planning, the problem of selecting a set of three important components of assortment planning,
products to offer to customers to maximize profit, that is, demand modeling, substitutions, and assort
involves a wide variety of related problems such as cate ment optimization, still need to be investigated. Sub
gory management, inventory management, product line stitutions refer to the act of selecting an alternative
design, and allocations of online advertising spaces and when the first choice of a customer is unavailable (Shin
social resources (Belloni et al. 2008, Kök et al. 2015). et al. 2015). Previous studies on substitutions are usu
Retailers periodically update their assortments to meet ally based on some strong assumptions on demand
the seasonal requirements, fashion trends, and customer models that limit the scope and the applicability of the
preference changes to achieve competitive advantages. existing methods. For the exogenous demand models
With the development of e-commerce, many new prob (Smith and Agrawal 2000, Kök and Fisher 2007), as far as
lems, such as category management of the local and is known, almost all previous studies assume that the
overseas warehouses for online retailers, the location and substitution probability between two products is homo
category management of groceries for online-to-offline geneous, that is, unchanged over time, among product
firms, and product line design for customized products, popularity, with promotions and other external factors. It is
can be tackled through assortment planning. necessary to extend from homogeneous to heterogeneous

158
Chen, Fan, and Sun: Machine Learning for Demand Estimation/Assortment Planning
INFORMS Journal on Computing, 2023, vol. 35, no. 1, pp. 158–177, © 2022 INFORMS 159

substitution probabilities. Moreover, almost all previous considered cross-selling and substitutions in assortment
studies assume that only one substitution is attempted if planning.
the first choice of a customer is not available, and a second Another issue in demand estimation and assortment
choice to substitute is not considered. For utility-based planning is about methodology. When predictive and
demand models, such as the multinominal logit models prescriptive methods are jointly used, it is usually natu
(MNL) (Mahajan and van Ryzin 2001, Chan et al. 2020), ral for them to be sequentially and separately conducted
nested logit models (NL) (Wan et al. 2018), and mixed as two phases with different objectives. The maximi
MNLs (Şen et al. 2018), some additional structural assump zation of expected profits obtained by different optimi
Downloaded from informs.org by [18.9.61.111] on 29 May 2023, at 23:43 . For personal use only, all rights reserved.

tions are made (Farias et al. 2013). For example, the zation methods is independent of the preceding demand
assumptions of a mutually independent utility across prod modeling, for example, the likelihood maximization
ucts and the proportional substitution in MNLs (Mahajan for exogenous demand models and MNLs. Separating
and van Ryzin 2001), the improper division of nests in NLs, these two phases leads to suboptimal policies (Liyanage
and the improper assumptions about the distributions of
and Shanthikumar 2005). Instead of the two-phase proc
parameters across individuals in mixed MNLs may hinder
ess, Ban and Rudin (2019) proposed two single-phase
those models from capturing the complex and network-
machine learning algorithms for the data-driven news
dependent characteristics of the substitution behavior.
vendor problem. Huber et al. (2019) investigated three
Other than field experiments, predictive and prescrip
ways of the data-driven newsvendor problem and pre
tive analytics have potentials to provide new ways to vali
date the assumptions mentioned previously and to more sented a single-phase solution method based on quantile
accurately quantify the substitution effects in a general regression (QR). Oroojlooyjadid et al. (2020) proposed
situation. a single-phase algorithm based on deep learning that
Cross-selling, as one of the main customer relation optimizes the order quantities and an (r, Q) inventory
ship management strategies, aims at selling additional policy. However, these single-phase methods do not
products or services related to a previous purchase consider substitution, which is a fundamental assump
(Kamakura et al. 2003). Cross-selling closely influences tion in assortment planning and is one of the focuses of
customers’ choice decisions and firms’ profits. More this current study. It is an open question to fuse demand
over, cross-selling and substitutions have opposite and estimation and assortment optimization in the frame
complementary effects on the product variety in an work with the identical objective and/or constraints.
assortment. For example, some successful e-commerce The key issues that have not yet been well addressed
firms such as Amazon and JD.com take advantage of include (1) simultaneously considering the cross-selling
the long tail phenomenon in a full range of products, and substitution effects on demands in forecasting
but some retailers such as Costco and VIP.com guide models; (2) leveraging machine learning methods to
substituted demands of customers toward a small range estimate the cross-selling and substitution effects with
of special offerings. These two types of firms show great out basing on strong assumptions; and (3) fusing the
potentials for cross-selling and substitutions to generate tasks of prediction and optimization. In this study, the
profits. Furthermore, some firms try to develop a mo three issues are addressed by transforming them into
derate category management strategy making tradeoffs three new machine learning problems, that is, collective
between cross-selling and substitutions. Thus, neither demand forecasting, demand inference for cross-selling
cross-selling nor substitutions should be ignored in and substitutions, and assortment rule mining. Demand
assortment decisions. However, cross-selling effects are forecasting considers the complementary and competitive
neglected to a large extent in the studies of assortment relationships among products, quantified as cross-selling
planning. Bai et al. (2015) deployed association rule anal and substitution effects. It is described as a collective
ysis to identify frequent item sets with complementary demand forecasting problem in which demands of re
items and incorporated two measures derived from the lated products are introduced as features of forecasting
item sets into an optimization model for the optimal models to consider the relationships and a collaborative
product assortment. Cachon and Kök (2007) investi coordinate descent method (CCDM) is developed to solve
gated the impact of cross-category interactions on assort the problem. The demand inference method is capable of
ment planning and found that assortment planning simultaneously quantifying the cross-selling and substitu
ignoring the cross-selling effect never finds optimal solu tion effects by making inferences from the collective
tions, and the obtained assortments provide both less demand forecasting model and by introducing transfer
variety and higher prices than optimal. However, these and semisupervised learning methods to tackle the chal
two studies did not consider substitutions. In a later lenge of missing data. The assortment rules are obtained
study, Feng et al. (2018) defined the concepts of substi from demand inference, and the best combination of
tutability and complementarity in discrete choice models assortment rules is obtained by the proposed greedy heu
and studied the reflection of such concepts in choice ristics. The sequentially connected relationship of these
modeling. As far as is known, no studies simultaneously three problems is illustrated in Figure 1.
Chen, Fan, and Sun: Machine Learning for Demand Estimation/Assortment Planning
160 INFORMS Journal on Computing, 2023, vol. 35, no. 1, pp. 158–177, © 2022 INFORMS

Figure 1. (Color online) Framework for Data-Driven Demand (1) The collective demand forecasting problem extends
Estimation and Assortment Planning Using Machine Learning collective regression to implicit networks, for example,
Methods product networks, and is able to incorporate the correla
tions among nodes in networks. (2) Collective demand
forecasting connects prediction, inference, and optimi
zation and leads to actionable predictions and explain
able assortment decisions. (3) The proposed CCDM is a
new collective classification/regression method on graphs
Downloaded from informs.org by [18.9.61.111] on 29 May 2023, at 23:43 . For personal use only, all rights reserved.

with a good convergence property. This study also con

tributes to the field of, and is the first study to introduce
machine learning methods for, demand estimation consid
ering substitutions. Specifically, the estimation of cross-
selling and substitution effects is transformed to a demand
inference problem, the missing data problem is identified
as the difficulty in demand inference, and semisupervised
and transfer learning methods are adopted to overcome
this difficulty.
The rest of this paper is organized as follows. Section 2
reviews the relevant literature. Section 3 describes the
development of the collective demand forecasting method.
Section 4 discusses demand inference for cross-selling
and substitutions. Assortment rule mining is presented in
Section 5. The data and experimental design are described
in Section 6. Section 7 reports the computational results
and discusses the managerial implications. Conclusions
are given and future research directions are outlined in
Section 8. Supplemental materials are provided in the
This study makes the following three major contribu online appendices.
tions to the fields of demand estimation and assortment
planning. (1) A novel framework is proposed for demand 2. Literature Review
estimation and assortment planning using machine learn Most studies in assortment planning presume that the
ing methods. This framework consists of three connected parameters in the underlying demand models are known
phases. Each phase has its distinct contributions, and all and focus on devising optimal decisions (Rusmevichien
the phases share an identical objective to correct the deci tong et al. 2010). Different from these studies, data-driven
sion bias (suboptimality) coming from the mismatched assortment planning incorporates demand learning as an
objectives in prediction and optimization. (2) This is the important component. Studies in data-driven assortment
first study to consider both cross-selling and substitutions planning usually adopt a framework with two sequential
in assortment planning. The complementary or competi tasks, that is, prediction and optimization. Predictive
tive relationships among products are represented by analysis methods are usually used to build deterministic,
positive or negative correlations of their demands. Behind for example, regression (Kök and Fisher 2007) or stochas
the correlations, cross-selling and substitutions are two tic demand models (Smith and Agrawal 2000) to estimate
antidirectional forces that simultaneously affect product the parameters (Rusmevichientong et al. 2010) or to
demands. Incorporating cross-selling and substitution update the parameters in known distributions (Caro and
effects has potentials to improve the performances of Gallien 2007, Talebian et al. 2014). Demand models for
demand estimation and assortment optimization. (3) The assortment planning can be categorized as exogenous
concept of assortment rule is introduced and the re demand models with deterministic demands (Kök and
vised greedy heuristics are developed for assortment rule Fisher 2007, Fisher and Vaidyanathan 2014) and with sto
mining. The profit improvement under cross-selling and chastic demands (Smith and Agrawal 2000, Talebian et al.
substitutions is adopted as a measure to decide which 2014, Hübner et al. 2016), utility-based models such as
products to add to or to drop from an existing assortment. MNLs (Mahajan and van Ryzin 2001, Rusmevichientong
The revised greedy heuristics can, but the existing greedy et al. 2010, Topaloglu 2013, Chan et al. 2020), NLs (Wan
heuristics usually cannot, obtain the optimal solution et al. 2018), mixed MNLs (Şen et al. 2018), and multistage
under some assumptions. choice models (Liu et al. 2020); locational choice models
This study makes the following three contributions (Gaur and Honhon 2006); and nonparametric choice
to the fields of demand forecasting and graph learning. models (Farias et al. 2013). This study differs from these
Chen, Fan, and Sun: Machine Learning for Demand Estimation/Assortment Planning
INFORMS Journal on Computing, 2023, vol. 35, no. 1, pp. 158–177, © 2022 INFORMS 161

existing studies in two ways. First, the collective demand vector and the arrival rates in MNLs (Vulcano et al.
forecasting adopts the profit maximization objective in 2012), and the preference vector and the interproduct-
stead of the commonly used likelihood maximization or group similarity in NLs (Lee and Eun 2016). Other meth
mean square error minimization objective. Second, the ods used for demand estimation considering substitutions
collective demand forecasting considers the correlations include the maximum likelihood estimation (Fisher and
among product demands linked to the estimation of the Vaidyanathan 2014), the Markov chain Monte Carlo simu
cross-selling and substitution effects, while the exoge lation (Wan et al. 2018), and the least squares method min
nous demand models usually independently estimate imizing the total squared errors (Kök and Fisher 2007).
Downloaded from informs.org by [18.9.61.111] on 29 May 2023, at 23:43 . For personal use only, all rights reserved.

the substitution probability. The proposed demand inference bridges prediction and
Collective demand forecasting is related to collective optimization with basic idea different from that of the
classification/regression and more broadly graph learn existing studies. Moreover, demand inference can simulta
ing. Collective classification/regression is made on a set neously estimate the cross-selling and substitution effects.
of interconnected entities consisting of a network (Sen Assortment optimization is usually formulated as
et al. 2008), and statistical relational learning (SRL) meth mathematical programming models with high compu
ods (Nickel et al. 2016, Moore and Neville 2017) are tational complexity (Hübner and Kuhn 2012, Shin et al.
developed for this task. The SRL methods can be classi 2015). Researchers have developed various solution
fied as local methods that involve iterative computations approaches to solve these problems (Hübner and Kuhn
of local classifiers using the relational and sometimes the 2012, Shin et al. 2015, Şen et al. 2018, Chan et al. 2020).
nonrelational features and global methods that optimize Kök et al. (2015), Hübner and Kuhn (2012), and Shin
a global objective function (Sen et al. 2008). Because they et al. (2015) provided comprehensive reviews of related
involve some iterative processes, these methods face the topics on assortment optimization. Greedy heuristics
convergence problems including cascading inference are widely used in assortment optimization (Belloni
errors (Mcdowell et al. 2009), weight dependence and et al. 2008). Some of these studies ranked products in
inability to converge for some global methods (Sen et al. their profit contributions considering (Kök and Fisher
2008), and the dependence on the ordering of neighbors 2007, Fisher and Vaidyanathan 2014) or not considering
for some local methods (Sen et al. 2008). In recent years, (Çömez-Dolgan et al. 2021) substitutions, and others
graph learning is popular in capturing complex relation chose the popular assortment in the purchase probabil
ships from networks such as social media and the In ities of the products (Çömez-Dolgan et al. 2021). Substi
ternet of Things (Xia et al. 2021). The representative tutions make the demands for the products nonlinearly
methods in graph learning include matrix factorization, dependent, make the measurement of the profit contri
random walk, and graph neural networks (Xia et al. bution of each product difficult, and make the assort
2021). As far as know, almost all the existing graph learn ment problem complex to solve optimally (Fisher and
ing methods cannot consider the interrelated predicted Vaidyanathan 2014). The improved greedy heuristics
demands of the products but can only consider the his adopt the profit changes, including that coming from
torical demands of the related products as features to cross-selling and substitution effects, to completely meas
improve the classification/regression performance. This ure the profit contribution of a product, but the exist
study contributes to the literature by developing a novel ing greedy heuristics cannot simultaneously incorporate
collective regression and graph learning method, that is, cross-selling and substitution effects into the demand
the CCDM, that has a good convergence property to make functions. This difference facilitates the analysis of opti
data-driven prediction and optimization for the demand mality and enables the improved heuristics to approach
estimation and the assortment planning problem. the optimal solution.
Two types of substitutions, assortment-based and Some topics in predictive and prescriptive analytics
stockout-based substitutions, with different mechanisms including demand forecasting, uplift modeling, and data-
are usually considered (Kök et al. 2015, Chan et al. 2020). driven prediction and optimization are related to the pro
Substitutions influence demand estimation and thus are posed framework and methods in this study. Demand
usually embedded into demand models. Demand esti forecasting, as a fundamental component of operations
mation methods considering substitutions vary with the management, has been extensively studied, especially
demand models. Distinguishing a customer’s primary with the rapid development of machine learning (Feng
and secondary demand from the observed transaction and Shanthikumar 2018). However, machine learning
data are a difficult task. Some studies viewed this prob methods are rarely used for demand estimation in assort
lem as a missing data problem. These studies used the ment planning due to the complexity of the substitution
expectation-maximization (EM) algorithm to iteratively effect estimation that is more difficult to address by stand
estimate the latent variables and the decision variables ard machine learning methods but is easier to address
such as the substitution probability matrix in exogenous by utility-based models and diverse choice models. Up
demand models (Kök and Fisher 2007), the preference lift modeling is an effective tool when combined with
Chen, Fan, and Sun: Machine Learning for Demand Estimation/Assortment Planning
162 INFORMS Journal on Computing, 2023, vol. 35, no. 1, pp. 158–177, © 2022 INFORMS

machine learning methods to measure the effectiveness of 3.1. Profit-Driven Demand Forecasting for
management actions such as marketing campaigns and is Assortment Planning Without Considering
widely used in advertisement and coupon targeting, cus Cross-Selling and Substitutions
tomer churn management, and personalized medicine Following Gaur and Honhon (2006), Hübner and Kuhn
(Gubela et al. 2020, Olaya et al. 2020, Zhang et al. 2021). (2012), and Smith and Agrawal (2000), the joint decision
Demand inference is related to but different from uplift of assortment and inventory levels can be described as
modeling. The demand inference focuses on the estima a capacitated expected profit maximization problem
tion of cross-selling and substitution effects, which is with an objective function similar to that in a news
Downloaded from informs.org by [18.9.61.111] on 29 May 2023, at 23:43 . For personal use only, all rights reserved.

beyond the scope of the existing uplift modeling methods. vendor problem (Gotoh and Takano 2007). From the
Moreover, transfer learning and semisupervised learning machine learning perspective, this problem can be
are introduced to deal with the missing data problem in described as the following form of structural risk mini
demand inference. However, data from randomized ex mization:
periments are implicitly assumed for (Zhang et al. 2021),
cX n
and thus the missing data problem in observational stud min � cP(yi ,f (Xi )) + Ω( f ) �� (pi f (Xi )
ies was not considered in uplift modeling. f n i�1
Recently, methods in data-driven prediction and opti
� (pi + s̃i )(yi � f (Xi ))+
mization have become hot topics in predictive and pre
scriptive analytics. These methods can be categorized as
� hi ( f (Xi ) � yi )+ ) + Ω( f ), (1)
joint and separated prediction and optimization, also
known as single-phase and two-phase methods. Studies where pi is the sales margin, s̃ i is the unit shortage cost,
on joint prediction and optimization usually adopt the hi is the unit holding cost, c is a regularization parame
empirical risk minimization methods (Ban and Rudin ter, Xi is the feature vector derived from historical
2019, Huber et al. 2019, Oroojlooyjadid et al. 2020) to demand and demand-dependent data, f (Xi ) is the func
fuse prediction and optimization in a unified model. tion representing the demand to be learned by Equation
The studies, such as smart prediction-then-optimization (1), yi is the true demand, of observation i, P( yi , f (Xi ))
(Elmachtoub and Grigas 2021) and predictive prescrip is the profit function, Ω( f ) is a regularization term
tion using machine learning methods as weights in sam (Hastie et al. 2010), and n is the number of observations
ple average approximation (Bertsimas and Kallus 2020), in the data set. Correspondingly, observation i consists of
pave new ways for separated prediction and optimiza (Xi , yi ). The data for a product may be recorded in one or
tion with sound theoretical properties. However, neither more observations. Multiple observations for a product,
single-phase nor two-phase methods consider the sub adopted in this study, are acquired by moving the time
stitution effect and cannot be (easily) extended to incor window of the historical demand data for the product.
porate substitution. To fill this research gap, this study The details of the comparison between the objective func
develops a three-phase, that is, prediction-inference- tion in Equation (1) and those of some assortment optimi
optimization, framework. This framework can, whereas zation problems are described in Online Appendix F.
the existing methods in data-driven prediction and op The problem in Equation (1) is different from the
timization cannot, estimate the cross-selling and sub standard and extended QR problems (Ban and Rudin
stitution effects in an assortment with interconnected 2019, Huber et al. 2019, Oroojlooyjadid et al. 2020)
products and make explainable assortment decisions. because the pinball loss function in QR is replaced by
Moreover, the three-phase framework fuses the advan the negative of the profit function. When support vector
tages of the flexibilities for the two-phase methods and regression (SVR) and neural networks (NNs) are used
the low decision bias for the single-phase methods. to build the function f (Xi ) in Equation (1), two different
profit-driven machine learning methods, that is, profit-
3. Demand Forecasting for driven support vector regression (P-SVR) and profit-
Assortment Planning driven neural networks (P-NN), are developed with
The profit-driven demand forecasting without consider subtly different formulations from that of the corre
ing cross-selling and substitutions, as an important phase sponding nonparametric QR. Online Appendix C pro
of data-driven demand estimation and assortment plan vides detailed P-SVR and P-NN formulations.
ning, is presented first. A collective demand forecasting The negative of the first term in Equation (1) is called
method is then proposed. The major notation used in this the net profit index (NP). The average of the backorder
work is presented in Online Appendix A. Product-level ing and holding costs, corresponding to the first term in
data, including historical demand data with time stamps, Equation (1) excluding pi f (Xi ), is called the cost index
demand-dependent data such as prices, and backorder (CI). These two indices, that is, NP and CI, are used as cri
ing and holding costs, are necessary for the proposed teria to evaluate forecasting performance. In practice, the
methods. step-ahead sizes in the prediction can be set according to
Chen, Fan, and Sun: Machine Learning for Demand Estimation/Assortment Planning
INFORMS Journal on Computing, 2023, vol. 35, no. 1, pp. 158–177, © 2022 INFORMS 163

the predetermined lead time. For example, the label yi is and wT2 h2 (yri ) can be represented as linear or nonlinear
chosen as the demand in the week after the next week basis expansions, for example, linear regression, NNs,
when the lead time is one week. and SVR, of xi and yri . Furthermore, the unknown
parameters (b, w1 , w2 ) can be obtained by solving the
3.2. Collective Demand Forecasting structural risk minimization problem in Equation (1),
Cross-selling and substitutions are two important factors especially by solving P-SVR or P-NN described in
affecting the sales of the products. However, cross-selling Online Appendix C.
and substitutions have opposite and complementary effects The network feature is limited to a vector describing
Downloaded from informs.org by [18.9.61.111] on 29 May 2023, at 23:43 . For personal use only, all rights reserved.

for assortment planning. The assortment carrying sufficient the first-order neighbors, and the networked data are
breadth of product categories and sufficient depth of prod limited to product networks. The extension of the net
ucts in each category can promote cross-selling and profit work feature to graphs and the extension of product
improvements. However, substitution effects are helpful in networks to the other types of implicit networks are
improving profits by narrowing the depth of products in a discussed in Section 8. The two steps of collective
product category and in lowering the in-stock rate (Kök demand forecasting are described in the next two sec
et al. 2015). Therefore, it is necessary to simultaneously con tions, respectively.
sider cross-selling and substitution effects in demand mod
els for assortment planning. 3.2.2. Initialization of the Network Feature Vector. The
The problem of demand forecasting considering cross- construction of yri is a key issue in the whole process
selling and substitution effects is not demand forecasting and is crucial to the success of collective demand fore
for independent products being focused on by conven casting. Like the classification of entities in collective
tional methods but rather collective demand forecasting classification, the interconnection of demands in collec
for interrelated products. Cross-selling and substitution tive demand forecasting determines the construction of
effects come from the complementary and competitive yri to be an iterative process. Identifying the relation
ships among products to determine the initial values of
relationships among products. The relationships make
the elements of yri and developing an efficient algo
the products interact with each other to form a product
rithm with a good convergence property to find the
network. In the network, the nodes represent the prod
optimal values of the elements of yri are two key steps in
ucts, and the edges are established between nodes if the
the construction of yri .
demand of a product influences those of the others in the
For the first step, the element yrij in yri is the demand
network. Thus, the product network reflects the aggrega
of product j if the demand of the product correspond
tion of cross-selling and substitution behaviors of a large
ing to observation i is related to that of product j or is
number of customers.
zero otherwise. In this way, the construction of the ini
tial values of the elements of yri is transformed into a
3.2.1. Collective Demand Forecasting Problem. The col feature selection problem. Due to the use of network
lective demand forecasting problem can be treated as a feature vectors, the number of features is always large,
collective regression problem, that is, regression within even perhaps far larger than the number of observa
a product network. By extending the idea of local meth tions. Regularization methods are powerful and flexible
ods in collective classification (Sen et al. 2008), the prod for high-dimensional feature selection problems with
uct network is used to construct relational features. The the number of features as large as the number of obser
problem is then written as the additive model in Equa vations. The sure independence screening framework
tion (2) to learn the demand of each observation i: shows the sure screening property, that is, all the
fi (xi , yri ) � b + wT1 h1 (xi ) + wT2 h2 (yri ), (2) important variables survive with probability tending to
one (Fan and Lv 2008), for ultrahigh dimensional fea
where xi is the nonrelational feature vector widely ture selection problems with the number of features far
used in standard demand forecasting models, yri is the larger than the number of observations. In this study,
network/relational feature vector incorporating the correlation learning with the sure screening property
demands of the related products of the corresponding (Fan and Lv 2008) is adopted to rank and select features,
product of observation i, b is the intercept, (w1 , w2 ) rep that is, related products, according to the correlations of
resent the weights of the inputs, and (h1 , h2 ) represent their demands with the demand of the target product.
the basis functions (Hastie et al. 2010) transforming the The initial value of yrij may be the lagged demand, the
inputs, for example, the linear and sigmoid transforma moving average of historical demands or the forecasted
tion functions in linear regression and NNs. For collec demand of product j, obtained from standard demand
tive demand forecasting, the composite vector of xi and forecasting models discussed in Section 3.1, if the de
yri , that is, [xi yri ], replaces the feature vector Xi in Equa mand of product j has a higher correlation with that
tion (1). The model in Equation (2) is a generative addi of the target product than a threshold value or is zero
tive model (Hastie et al. 2010), and the terms wT1 h1 (xi ) otherwise.
Chen, Fan, and Sun: Machine Learning for Demand Estimation/Assortment Planning
164 INFORMS Journal on Computing, 2023, vol. 35, no. 1, pp. 158–177, © 2022 INFORMS

3.2.3. CCDM for Network Feature Vector Construction collaborations of the local and global processors. Thus,
and Demand Forecasting. For the second step, an the method is called the CCDM. The CCDM is shown
improved coordinate descent method (CDM), that is, in Algorithm 1 in Online Appendix D, and the modules
the CCDM, is developed to simultaneously obtain the with information flow in the CCDM are graphically
optimal values of the elements of yri and the demand illustrated in Figure A1 of Online Appendix B. The
forecasting results. Properly dividing the observations complexity of Algorithm 1 depends on the number of
in the data are crucial in making the standard CDM iterations and the complexity of the machine learning
suitable to this problem. The observations are divided models used to solve each subproblem.
Downloaded from informs.org by [18.9.61.111] on 29 May 2023, at 23:43 . For personal use only, all rights reserved.

into blocks in such a way that the observations in a

block are unrelated to each other and are assumed to be 3.2.4. Properties of the CCDM. Because each subpro
independent and identically distributed, whereas the blem in Equation (3) is solved in the same way, that is,
correlations are considered by iteratively updating the using the CDM, the findings in Bertsekas (2016) can be
network features. In practice, some strategies are needed applied to obtain the convergence results of Algorithm
to balance the number of products in different blocks. As 1 as stated in Proposition 1 in the following.
the simplest strategy, the observations in a block are of
Proposition 1. Suppose the objective function of each sub
the same product. The demands of the observations in a
problem υ for υ � 1, : : : , m in Equation (3) is continuously
block are learned separately from those in other blocks
differentiable and its minimum is uniquely attainable. The
just like the coordinate vector in an iteration of the CDM.
sequence { fiυ(t) } generated by Algorithm 1 converges to a
Moreover, the division makes parallel computation easy
stationary point fiυ∗ .
to implement and potentially accelerates convergence. If
the block sizes vary considerably, an asynchronous The profit function contained in Equation (1) is not
update mechanism, in which the local processors run differentiable at the origin. However, the assumption in
concurrently without synchronization, can be adopted Proposition 1 can be satisfied for P-SVR and P-NN
to avoid waiting times. because the objective function in the dual of P-SVR and
Through the division, the problem in Equation (1) the objective functions after the smooth approximations
can be rewritten as the sum of m subproblems and sub of the profit function in P-NN are continuously differ
problem υ learns the demand of observation i in block υ entiable. Specifically, due to the convexity of the objec
as shown in Equation (3): tive function of P-SVR, the method proposed by Luo
and Tseng (1992) can be applied to obtain the global
fiυ(t+1) (xυi , yυr(t)
i ) � bυ + (wυ1 )T hυ1 (xυi ) + (wυ2 )T hυ2 (yυr(t)
i ), convergence results of Algorithm 1.
(3) υ(t)
Proposition 2. For P-SVR, the sequence { fi } generated
where fiυ(t+1) is the forecasted demand of observation i by Algorithm 1 globally converges to an optimal solution
in block υ at iteration t + 1, yυr(t)
i is the network feature fiυ∗ . The convergence rate is at least linear, that is, the
vector globally updated at iteration t, (wυ1 , wυ2 ) repre inequality in (4) is satisfied for a constant 0 < µ < 1 and an
sent the weights of the inputs in block υ, and (hυ1 , hυ2 ) iteration number k0
represent the basis functions in block υ. In the follow � �
ing, yυr υ υr(t) fiυ(k+1) � fiυ∗ ≤ µ fiυ(k) � fiυ∗ ; ∀k ≥ k0 : (4)
i and fi denote the convergent values of yi
υ(t+1)
and fi , respectively.
The global updating strategy of yυr(t) i is described as Proof. See Online Appendix E.
follows. For the subproblem υ in Equation (3), the ele
ments in yυr(t) i are latent variables determined by the 4. Demand Inference for Cross-Selling
other m � 1 subproblems. In the central processor layer and Substitutions
of the CCDM, a central processor coordinates the com Demand inference for cross-selling and substitutions is
putations of the m local processors, each of which deals the second phase of data-driven demand estimation
with a subproblem. A greedy/random strategy is used and assortment planning. For this phase, the measure
to pick the best/random observation i∗ one at a time to ment of the cross-selling and substitution effects in de
guarantee the convergence like the strategy used in the mand inference is presented first, the challenge in demand
CDM (Bertsekas 2016) and to update the corresponding inference is discussed next, and the improved demand
element of yυr(t) i when the products corresponding to i∗ inference with transfer and semisupervised learning meth
and i are related in the product network. Across the ods is then described.
local and central processor layers of the CCDM, the par
allel computations and the element updates of yυr(t) i are 4.1. Measurement of the Cross-Selling and
repeatedly conducted until the stopping criterion is Substitution Effects
satisfied. Therefore, the product demands fiυ and the Collective demand forecasting usually cannot directly lead
network feature vector yυr i are determined through the to quantitative results of the cross-selling and substitution
Chen, Fan, and Sun: Machine Learning for Demand Estimation/Assortment Planning
INFORMS Journal on Computing, 2023, vol. 35, no. 1, pp. 158–177, © 2022 INFORMS 165

effects. The reason is that conventional nonparametric pre of observation i, changes status, that is, added to (dropped
dictive, either regression or classification, methods do not from) an assortment, the corresponding element of yυr i
υr′
aim at measuring the effects of the changes in the inde changes from yυr ij � 0 (yυr
ij � d with d > 0) to y ij � d with
pendent variables resulting from an action or event, for d > 0 (yυr
′
ij � 0). The quantity d equals the forecasted de
example, revising the assortment or being out of stock of a
mand, that is, fjυ , in the CCDM. The algorithmic process
specific product, on a dependent variable. For this issue,
demand inference for cross-selling and substitutions is of demand inference for the one-to-one effects consider
introduced. ing assortment-based substitutions is provided in Proce
Downloaded from informs.org by [18.9.61.111] on 29 May 2023, at 23:43 . For personal use only, all rights reserved.

The observable data correspond to such states as the dure 1 in Online Appendix D and is graphically shown in
current and previous assortments and inventory levels Figure A2 in Online Appendix B.
and the known demands of the products, and the unob Different from assortment-based substitutions, when
servable data correspond to such states as varying assort product j stocks out, the corresponding element of yυr i
υr′
ments and inventory levels and the unknown demands changes from yυr ′
ij � d with d > 0 to yij � d with d
of the products. Demand inference uses the trained > d′ ≥ 0. The quantity d′ representing an adjusted in
model in collective demand forecasting to directly fore ventory level is in the interval d′ ∈ [0, fjυ ) for stockout-
cast the unknown demands of the unobservable data and based substitutions.
then quantify the differences in demands as the assort
ments and inventory levels change that are used to meas 4.2. Challenges in Demand Inference
ure the cross-selling and substitution effects. The proposed demand inference is different from causal
Let yυr υr′ inference because the purpose of demand inference is to
i and yi represent the forecasted demands
under two different assortments with different inven identify the association, rather than the causality, among
tory levels indexed by r and r′ , obtained by collective product demands. Specifically, demand inference can
demand forecasting for the observable and/or unob contain simultaneity, for example, the undirected and
servable data. The change in the forecasted demands of exchangeable cross-selling and substitution relationships,
observation i in block υ under these two assortments and homophily, for example, the effects resulting from
with different inventory levels is given by products’ common characteristics or consumers’ intrinsic
tastes, but causal inference must exclude them (Chen et al.
′ ′
∆fijυ (yυr υr υ υ υr υ υ υr
i , yi ) � fi (xi , yi ) � fi (xi , yi )
2019). Except for simultaneity and homophily, the miss
′
ing data problem is the major challenge faced by demand
� (wυ2 )T (hυ2 (yυr υ υr
i ) � h2 (yi )), (5) inference and is further divided into the following three
subproblems. An example illustrating these three subpro
where j indexes the related products of the correspond blems is presented in Table 1.
ing product of observation i for which the assort (1) Missing historical data. The first subproblem in
ment and the inventory levels change from r to r′ , volves the unavailability of the historical demands of
′
∆fijυ (yυr υr
i , yi ) measures the cross-selling and substi some products for certain periods due to the change of
tution effects between observation i and its related an assortment or a short product lifetime. When the
product j, yυr υr
i � {yij | j � 1, : : : ,′ n̂}, n̂ is the number of products are excluded from the assortment, the histori
products, and fi , yυr
υ
i , and yi
υr
without the superscript cal demands in the period cannot be observed, which
(t) or (t + 1) in Equation (3) are the convergent solution leads to the missing of a part of the features. When the
of the CCDM. This study mainly focuses on the cross- products have a short lifetime, almost all the historical
selling and substitution relationships between two prod demands and the corresponding features are unavail
ucts, called the one-to-one effects (see Procedure 1 in able, and the products can be viewed as new products.
Online Appendix D); that is, the effects when one product This subproblem hinders both phases of demand fore
is added to or dropped from an assortment or stocks out casting and inference, and thus the strategies such as
in a future time period once at a time. Moreover, like the missing data imputation (Little and Rubin 2020), instance
other exogenous demand models, this study assumes transfer (Yang et al. 2020), and matrix completion (Can
that the cross-selling and substitution effects are inde dès and Plan 2010) should be applied before these two
pendent and additive; that is, the effects between a group phases.
of products and a single product, called the multiple-to- (2) Missing labels. Unlike the missing features dis
one effects, are the sum of the effects between each of the cussed above, the second subproblem involves the miss
products in the group and the single product. The relaxa ing labels, that is, the partially unavailable demands of
tion of this assumption is discussed in Sections 5 and 8. the products under varying product assortments at dif
In Equation (5), the changes in assortments and inven ferent inventory levels, due to the limited observable
′
tory levels are reflected in yυr υr
i and yi indexed by r and r.
′
product assortments. This subproblem brings up the
For example, for assortment-based substitutions, when covariate shift (Simester et al. 2020) from the complete
product j, a related product of the corresponding product to the observable demands in the network features and
Chen, Fan, and Sun: Machine Learning for Demand Estimation/Assortment Planning
166 INFORMS Journal on Computing, 2023, vol. 35, no. 1, pp. 158–177, © 2022 INFORMS

Table 1. Three Subproblems of the Missing Data Problem

Nonrelational features Network features

Products Lagged Lagged Lagged Related Related Related

Subproblems (labels) Price Promotion demand 1 demand 2 demand 3 product 1 product 2 product 3

Missing historical data 1 (6) 3.9 0 NA NA NA 2 13 0

2 (15) 2.1 0 28 11 NA 2 0 0
3 (32) 5.9 1 32 NA 30 0 13 24
Downloaded from informs.org by [18.9.61.111] on 29 May 2023, at 23:43 . For personal use only, all rights reserved.

Missing labels 1 (NA) 3.9 1 8 3 7 0 13 0

2 (NA) 2.1 0 28 11 14 1 0 0
3 (NA) 5.9 1 32 25 30 0 13 0
Missing control variables 1 (6) 3.9 NA 8 3 7 2 13 0
2 (15) 2.1 NA 28 11 14 2 0 0
3 (32) 5.9 NA 32 25 30 0 13 24
Note. NA, missing data.

produces biases in inferring the unobservable demands as a semisupervised learning problem in which the
using the demand forecasting models trained on observ unobservable demands are viewed as unlabeled data
able demands. labeled by the semisupervised learning methods. As
(3) Missing control variables. The third subproblem the most widely recognized assumption in semisuper
involves the unobservable features that cannot be con vised learning, the smoothness assumption (Engelen
trolled to exclude their effects on the estimation of and Hoos 2020) can usually be satisfied because the
cross-selling and substitution effects. For example, a observable and unobservable demands under different
simultaneous promotion of two products may increase assortments with different inventory levels are possibly
the sales of both of them, but the concurrent increase in very smooth. The smoothness assumption justifies the
sales may be wrongly identified as a cross-selling effect use of the semisupervised learning methods in demand
if the promotion is not included as a control variable. inference.
This subproblem can also be tackled by controlling the Besides the set of observations {yi , Xi }ni�1 used for de
covariate shift (Simester et al. 2020). If the underlying mand forecasting in Section 3.1, let {ysi , Xsi }nsis�1 be a set of
distribution of the demands used to train the model is ns labeled observations with known demands ysi trans
controlled to be close to the true distribution under ferred from other domains and let {Xui }nuui �1 be a set of nu
varying product assortments at different inventory lev unlabeled observations with unobservable demands ŷ ui
els, the observations can be viewed as being chosen to be estimated in demand inference. Using transfer and
approximately randomly, the demands are approxi semisupervised learning, the structural risk minimization
mately independent of the unobservable features, and problem in Equation (1) changes to the following form:
the effects of the unobservable features can be reduced.
min � c1 P(yi , f (Xi )) � c2 P(ysi , f (Xsi ))
f
4.3. Demand Inference with Transfer and
� c3 P(ŷ (t)
ui , f (Xui )) + Ω( f ), (6)
Semisupervised Learning
Almost all the existing assortment optimization methods where ŷ (t)
is the value of ŷ ui at iteration t ≥ 0, and c1 , c2 ,
ui
with exogenous demand and discrete choice models over and c3 are the weights to balance the contributions of
looked the three subproblems described previously. In the three types of observations to the structural risk.
this study, transfer learning and semisupervised learning The demand inference process for the ensemble of
are used to deal with the missing data problem in de the three types of observations in Equation (6) includes
mand inference. Specifically, the instance transfer methods three main steps.
(Yang et al. 2020) have the abilities to forecast demands of (1) Learn labels of labeled data by collective demand
new products (subproblem 1) and incorporate more val forecasting without unlabeled observations (c3 � 0). For
ues of demands into the corresponding element of the net the same weights c1 � c2 , the problem in Equation (6) is
work feature vector yυr i to correct distribution divergence equivalent to that in Equation (1) with (n + ns ) observa
(subproblems 2 and 3), and the semisupervised learning tions and can use the same base learners, for example,
methods (Zhou and Li 2007, Engelen and Hoos 2020) have P-SVR and P-NN, and the same CCDM method. For the
potential to deal with covariate shift (subproblems 2 and general case with c1 ≠ c2 , the base learners P-SVR and
3) and improve the performance of demand inference on P-NN need to be modified, but the same CCDM method
missing labels. can be used only if each block simultaneously contains
Inferring the demands of products under different the first two types of observations to train the base learn
assortments with different inventory levels is viewed ers. The base learners, that is, the modified P-SVR and
Chen, Fan, and Sun: Machine Learning for Demand Estimation/Assortment Planning
INFORMS Journal on Computing, 2023, vol. 35, no. 1, pp. 158–177, © 2022 INFORMS 167

P-NN, for transfer and semisupervised learning are de to store and describe the discovered knowledge, and to
scribed in Online Appendix C. further develop heuristics for assortment rule mining.
(2) Infer labels of unlabeled data. The unobservable
Definition 1 (Assortment Rules). An assortment rule is
demands ŷ (0)
ui under different assortments with differ defined as the combination of an item set consisting of
ent inventory levels are inferred as discussed in Section
the products to drop from or add to an assortment
4.1. The change in demands for an observation caused
and an index set including measures of the assortment
by changing the status of a related product is measured
and is denoted by
using Equation (5).
Downloaded from informs.org by [18.9.61.111] on 29 May 2023, at 23:43 . For personal use only, all rights reserved.

(3) Modify labels of unlabeled data by semisuper [I 1 , : : : , I ] ← (IT1 , ∆q1 ) ∧ : : : ∧ (ITL , ∆qL ):
k̃ (7)
vised learning. With the inferred demands ŷ (0)
ui as initial In Equation (7), the rule body, that is, the part on the
values, the semisupervised learning methods are used right-hand side, is the item set with L tuples, and the rule
to solve the problem in Equation (6) to modify ŷ (t) ui for head, that is, the part on the left-hand side, is the index
t > 0. This study adopts three semisupervised learning set with k̃ indexes. Each tuple z ∈ {1, : : : , L} in the item
methods including cotraining (Zhou and Li 2007, Enge set has two elements: the item ITz and the change in its
len and Hoos 2020), transductive regression (Cortes inventory level ∆qz from the current state to the next pos
and Mohri 2007), and EM for self-training (Nigam et al. sible future state. The first index I 1 is the change in the
2000, Moore and Neville 2017, Engelen and Hoos objective function in assortment optimization such as a
2020). Cotraining and EM for self-training modify ŷ (t) ui profit improvement. The other indexes are the restrictions
iteratively, that is, with t ≥ 1, but transductive regres such as the storage capacity, service level, and financial
sion obtains the results directly, that is, with t � 1. budget at the next possible future state. For example, the
Cotraining uses multiple base learners or a single base if-then rules for the two examples mentioned previously
learner with different parameters to modify a part of are described as the assortment rules [I 1 � ∆ONA ] ←
the labels of the observations in alternating order, but (A, � fAυ ) and [I 1 � ∆ONBC ] ← (B, fBυ ) ∧ (C, fCυ � d′ ). As
the other two methods usually use a single base learner a bridge between demand estimation and assortment
to modify the labels of the observations all at once. The optimization, assortment rules have the advantages of
changes in demands obtained in the second step are improving the explainability of the “black-box” demand
then modified. forecasting methods and the efficiency of the assort
ment optimization algorithms described in the next two
5. Assortment Rule Mining sections.
This study focuses on the problem of joint assortment
and inventory decisions using machine learning meth 5.2. Assortment Rule Mining Considering
ods and develops an assortment rule mining system Assortment-Based Substitutions
consisting of four main components, shown in Figure 1, The definition of assortment rules in Equation (7) is a
to generate the assortment with specific inventory lev general formulation in which different objectives and
els. After the concept of assortment rule is introduced, restrictions can be embodied in its left-hand side. Spe
the greedy heuristic for assortment optimization is cifically, when assortment-based substitutions are con
developed and the joint decisions of assortment and sidered, the assortment optimization problem with a
inventory levels are discussed. The core issue is to use profit maximization objective and a capacity constraint
the prior knowledge discovered from demand infer is considered. When the profit of the current assortment
ence, that is, the demand and profit changes inferred is used as a baseline, the profit maximization problem
under different assortments with different inventory can be reformulated as the following profit improve
levels, to make assortment decisions. ment maximization problem to identify the products to
drop from and add to the current assortment:
5.1. Concept of Assortment Rule X̂
n
′

Demand inference leads to if-then rule-like results, that max [P{yi , fiυ (xυi , yυr υ υ υr
i (q))} � P{yi , fi (xi , yi )}], (8)
q
i�1
is, assortment rules. Two examples of these rules are
given as follows. (1) If Product A is added to an assort X̂
n
s:t: qi ≤ K, q � [q1 , q2 , : : : , qn̂ ],
ment, then the profit change from the current to the i�1
updated assortments is ∆ONA . (2) If Product B is
qi ∈ {0, 1} for i � 1, 2, : : : , n̂, (9)
dropped from an assortment and the inventory level of
Product C is adjusted downward from d to d′ , then the where q represents the vector of 0-1 decision variables,
profit change from the current to the updated assort that is, the assortment of the n̂ products, and K is the
ments and inventory levels is ∆ONBC . Thus, it is natural capacity, that is, the maximal number of products in the
to define a new type of rules, that is, assortment rules, assortment. For this problem, the objective function is
Chen, Fan, and Sun: Machine Learning for Demand Estimation/Assortment Planning
168 INFORMS Journal on Computing, 2023, vol. 35, no. 1, pp. 158–177, © 2022 INFORMS

the maximization of the change of the profit when the profits resulting from substitutions (Kök and Fisher 2007,
current assortment r changes to the new assortment r′ , Fisher and Vaidyanathan 2014) cannot obtain the optimal
and the decision variables q determine the new assort assortment due to the facts that the cross-selling effect is
ment r′ and the values of the network feature vector neglected if it is not added and is doubly counted if it is
′
yυr
i . The corresponding element of the network feature added in the additive demand function in the existing
′
vector becomes yυrij � 0 if qj � 0 or keeps unchanged oth studies. The improved greedy heuristic can consider both
erwise. For comparison purpose, the details of the stand the cross-selling and substitution effects.
ard assortment optimization problem are described in An example in Table 2 is used to show the differences
Downloaded from informs.org by [18.9.61.111] on 29 May 2023, at 23:43 . For personal use only, all rights reserved.

Online Appendix F. between the proposed and the existing greedy heuris
A greedy heuristic is developed to select the best assort tics. Five products are considered with undirected and
ment for the optimization problem in Equations (8) and exchangeable cross-selling and substitution effects cor
(9). Because demand inference obtains the demand and responding to the first scenario discussed previously.
profit changes after considering cross-selling and substi The purpose is to reduce the current assortment size to
tutions, the assortment rules obtained by demand infer meet the capacity constraint with a capacity K � 2. The
ence can be directly used in ranking products/item sets body of the table contains the individual cross-selling
and generating their greedy combination. The two steps and substitution effects and their differences among the
of the proposed greedy heuristic are described by com five products multiplied by pi . The profit change ∆ONj
paring the proposed with the existing greedy heuristics resulting from each product j at the bottom of the table
and by providing a theoretical analysis. is the difference between the sum of the cross-selling
and substitution effects multiplied by pi (Sum) and the
5.2.1. Comparisons of Different Greedy Heuristics. There profit of the target product (Profit). The total profit (TP)
are two differences between the proposed and the exist on the right side of the table is the sum of Profit and the
ing greedy heuristics. The profit maximization objective total substitution effects multiplied by pi of each prod
in the existing greedy heuristics is replaced by the profit uct. Based on ∆ONj , products 1, 4, and 3 are chosen to
improvement maximization objective in the improved drop, and the new assortment consists of products 2
greedy heuristic, and the substitution effect in the exist and 5 with the total profit improvement of 19.9. How
ing greedy heuristics is extended to the cross-selling ever, based on TP, the new assortment consists of prod
and substitution effects in the improved greedy heuris ucts 2 and 1 with the total profit improvement of –3.
tic. The calculation of the profit improvement maximi A qualitative analysis of the results in Table 2 is also
zation objective is described first. provided. A product with a big TP may simultaneously
Two scenarios, that is, one with a product dropped have small cross-selling opportunities and large proba
from and the other with a product added to the current bilities of being substituted by other products and thus
assortment, are separately discussed. The products are may not be chosen into the assortment (see product 1 in
assumed to have the same prediction errors and thus Table 2). However, a product with a small TP may simul
the same backorder and holding costs, and hence, the taneously have big cross-selling opportunities and small
backorder and holding costs in Equation (1) do not probabilities of being substituted by other products and
need to be considered. For the product dropped from thus may be chosen into the assortment (see product 5 in
the current assortment, the original profit contribution Table 2). Similar results can be found when new prod
of product j is pj fjυ (xυj , yυr j ), and the increased profit, ucts are added to the current assortment.
that is, the difference between the substitution and the
cross-selling effects resulting from product Pm j P multiplied
Mυ υ
5.2.2. Theoretical Analysis. The profit improvement
by pi , in the new assortment is υ�1 i�1 pi ∆fij ∆ONj of a product in an assortment rule discussed previ
υr′ υr
(ŷ i , yi ). The profit improvement P PMofυ product j in this ously is used to rank the products. The ranking results
scenario is then ∆ONj � m υ�1 i�1 p i ∆f υ υr′ υr
ij ( ŷ i , yi ) � are used directly to generate the greedy product/item set
pj fjυ (xυj , yυr ). For the product added to the current
j combinations. Let ΛJ with J products represent greedy
assortment, the original profit is zero, but the increased
product combination J for J � 1, : : : , n̂. The greedy prod
profit equals the profit of the newly added product plus
uct combinations are constructed in such a way that the
the difference between the cross-selling and the substi
product in Λ1 has the largest profit improvement among
tution effects resulting from product j in the new assort
all the products, and the product with the largest profit
ment multiplied by pi . Thus, the profit improvement Pmof
υ υ υr improvement among all those not in ΛJ is added to ΛJ to
product
PMυ j in this scenario is ∆ON j � p j f j (xj , y ) + υ�1
υ υr′ υr
j form ΛJ+1 . The greedy product combinations constructed
i�1 pi ∆fij (ŷ i , yi ). this way have the following properties.
The greedy heuristics ranking products in their prof
its without considering substitutions (Çömez-Dolgan Property 1. Suppose (1) the products in the best greedy
et al. 2021), in their purchase probabilities (Çömez- product combination do not have positive relations to
Dolgan et al. 2021), and in their own profits plus the the products chosen to drop, and the products chosen
Chen, Fan, and Sun: Machine Learning for Demand Estimation/Assortment Planning
INFORMS Journal on Computing, 2023, vol. 35, no. 1, pp. 158–177, © 2022 INFORMS 169

Table 2. Example Showing the Differences Between the Proposed and Existing Greedy Heuristics

ID 1 2 3 4 5 Profit TP

1 0.0 12.0 (18.0–6.0) 0.0 0.0 3.5 (4.0–0.5) 3.4 25.4

2 12.0 (18.0–6.0) 0.0 3.2 (4.2–1.0) 5.1 (5.2–0.1) �13.8 (3.2–17.0) 7.0 37.6
3 0.0 3.2 (4.2–1.0) 0.0 0.0 0.0 1.3 5.5
4 0.0 5.1 (5.2–0.1) 0.0 0.0 2.0 (5.5–3.5) 1.5 12.2
5 3.5 (4.0–0.5) �13.8 (3.2–17.0) 0.0 2.0 (5.5–3.5) 0.0 2.5 15.2
Sum 15.5 6.5 3.2 7.1 �8.3
Downloaded from informs.org by [18.9.61.111] on 29 May 2023, at 23:43 . For personal use only, all rights reserved.

Profit 3.4 7.0 1.3 1.5 2.5

∆ONj 12.1 �0.5 1.9 5.9 �10.8

to drop do not have negative relations to products outside effects resulting from the products in the best greedy
the best greedy product combination; and (2) the products product combination come from the interrelations among
in a greedy product combination have independent and the products to drop from the current assortment. The
additive cross-selling and substitution effects. Let entries in the body and the bottom of the table have
the same meanings as those in Table 2. Based on ∆ONj ,
Λ1 � {IT1 }, : : : , Λn̂ ∗ � {IT1 , : : : , ITn̂ ∗ }, : : : ,
products 1, 3, and 4 are chosen to drop. As highlighted in
Λn̂ � {IT1 , : : : , ITn̂ ∗ , : : : ITn̂ } (10) Table 3, all the profit improvements, that is, the positive
represent the greedy product combinations arranged in cross-selling and substitution effects multiplied by pi ,
the descending order of the profit improvement ∆ONJ , come from the other products to drop, and thus cannot
that is, the profit improvement resulting from the last be realized. In this case, the true total profit change by
product added to ΛJ . Starting with Λ1 , the total profit dropping the chosen products is negative when the profit
contribution of the products in the greedy product com contributions of the related products are excluded, and
bination increases first and then decreases. The total keeping the current assortment without a profit change is
profit contribution of the greedy product combination the best choice. Hence, the difference between the largest
Λn̂ ∗ with ∆ONn̂ ∗ ≥ 0 and ∆ON(n̂ ∗ +1) < 0 is the maximum total profit changes of the optimal and the final assort
among all greedy product combinations from Λ1 to Λn̂ . ments is no more than that of the assortment found by
Using this property, the profit improvement ∆ONj the proposed greedy heuristic.
of each product is used to identify the best greedy Similarly, the other worst case is when all the negative
product combination, and then the values of ∆qz in cross-selling and substitution effects resulting from the
the corresponding assortment rules determine the products outside the best greedy product combination
products to drop from or add to the assortment. If the come from the interrelations among these products and
resulting assortment violates the capacity constraint, the products chosen to drop from the current assort
the products with the smallest profit decrease should ment. When all the profit improvements, that is, the neg
drop from the assortment one at a time until the ative cross-selling and substitution effects multiplied by
capacity constraint is satisfied. When the errors in de pi , come from the products chosen to drop, the products
mand forecasting and inference are not considered and outside the best greedy product combination should be
the two assumptions in Property 1 hold, the proposed incorporated in the combination. Summarizing these
greedy heuristic can find the optimal assortment, that two worst cases, the optimality gap is provided here.
is, the combination of the products that generates the Property 2. Suppose the products in a greedy product
largest profit under the capacity constraint. combination have independent and additive cross-
The two assumptions in Property 1 hold under some selling and substitution effects. Let qAS be the true
conditions. For example, the demands of the products
in the best greedy product combination may not be Table 3. Example Showing the Worst Case in Which the
related to each other when the assortment size is far First Assumption in Property 1 is Violated
smaller than the number of products, and the cross-
Products 1 2 3 4 5
selling and substitution effects resulting from the prod
ucts in the greedy product combinations from Λ1 to Λn̂ 1 0.0 �0.5 35.6 24.9 0.0
are linearly additive when the combinational effects 2 �0.5 0.0 �3.2 0.0 1.8
3 35.6 �3.2 0.0 11.8 0.0
such as product bundling do not exist. However, the 4 24.9 0.0 11.8 0.0 0.0
proposed greedy heuristic may not find optimal solu 5 0.0 1.8 0.0 0.0 0.0
tions in general. Sum 60.0 �1.9 44.2 36.7 1.8
When the first assumption in Property 1 does not Profit 10.4 2.0 5.3 1.5 2.5
∆ONj 49.6 �3.9 38.9 35.2 �0.7
hold, the example in Table 3 gives the worst case in
which all the positive cross-selling and substitution Note. The best result for each measure is highlighted in bold.
Chen, Fan, and Sun: Machine Learning for Demand Estimation/Assortment Planning
170 INFORMS Journal on Computing, 2023, vol. 35, no. 1, pp. 158–177, © 2022 INFORMS

optimal solution of the assortment optimization problem the set Λjj′ � Λn̂ ∗ do not have cross-selling and substitu
with a capacity constraint, qGH be the final solution found tion effects.
by the proposed greedy heuristic, and P(·) be the profit To further narrow the optimality gap, collective de
function as defined in Equation (1). Then the optimality mand forecasting needs to be extended to incorporate
gap, represented by P(qAS ) � P(qGH ), satisfies the high-order neighbors in the network features in
! Equation (2), and demand inference needs to be ex
X X ��
�
AS GH
0 ≤ P(q ) � P(q ) ≤ max ∆ONj , �∆ONj � : tended to include the multiple-to-one effects without
j∈Λn̂ ∗ j∉Λn̂ ∗ the independent and additive assumptions. These
Downloaded from informs.org by [18.9.61.111] on 29 May 2023, at 23:43 . For personal use only, all rights reserved.

(11) issues are important to study as a future work.

To narrow the optimality gap, a reranking strategy is 5.3. Assortment Rule Mining Considering
adopted to iteratively modify the products in the prod Assortment-Based and Stockout-Based
uct combination chosen by the greedy heuristic to Substitutions
reduce the interrelations of the chosen products. For the Assortment rule mining simultaneously considering
products chosen to drop from the current assortment, assortment-based and stockout-based substitutions deals
the cross-selling and substitution effects between each with the joint decision problem of assortment optimiza
chosen product in the greedy product combination Λn̂ ∗ tion and inventory levels. The profit improvements under
and the remaining products in the assortment are different assortments with different inventory levels can
excluded, and the greedy heuristic is used to rerank the be measured to determine the best assortment with the
products and find a new best greedy product combina best inventory levels. This problem has the same objective
tion. This process continues until the greedy product and constraint as those in Equations (8) and (9) but has
combination does not change anymore. different decision variables q and y � [y1 , y2 , : : : , yn̂ ].
In the proposed greedy heuristic, the cross-selling Two differences exist between the greedy heuristic
and substitution effects resulting from a greedy prod considering only assortment-based substitutions and
uct combination Λn̂ ∗ are assumed to be the sum of the that considering both types of substitutions. The first
effects resulting fromP the P individual products in the difference is in the parameter d′ used in demand infer
m PMυ υ υr′ υr
combination, i.e., j∈Λn̂ ∗ υ�1 i�1 ∆f ij (ŷ i , yi ). In ence. When a target product is out of stock, the parame
general, the effects are nonlinear and nonadditive, ter d′ is not a single fixed value in Procedure 1, but a
and can be decomposed into the following form by value in an interval, that is, d′ ∈ [0, fjυ ). Thus, a grid
functional analysis of variance decomposition (Hastie search, such as using d′ � 2τ fjυ for τ � 0, � 1, : : : , � 10,
et al. 2010): is conducted for each product to determine the value of
X̂
n X Mυ
m X
′
X̂
n X̂
n X Mυ
m X
′
d′ producing the best profit improvement. The value of d′
∆fijυ (yυr υr
i , yi ) + ∆fijjυ ′ (yυr υr
i , yi ) + δ, is then fine-tuned around the selected value with a
j�1 υ�1 i�1 j′ �1 j�1 υ�1 i�1
smaller step size. The other difference is in the calculation
(12) of the profit improvement P in Equation
PMυ (8).υ The profit
where ∆fijjυ ′ (·)
denotes the cross-selling and substitution improvement of product j is m υ�1
υr′ υr
i�1 pi ∆fij (ŷ i , yi ) �
effects resulting from the combinations of products j pj { fjυ (xυi , yυr ′
i ) � d } when
Pits Pinventory level decreases
and j′ , and δ represents the sum of the effects of all from fj to d or is pj d′ + m
υ ′
υ�1
Mυ υ υr′ υr
i�1 pi ∆fij (ŷ i , yi ) when
high-order interactions. its inventory level increases from zero to d′ . The complete
process for assortment rule mining with both types of
Property 3. Suppose the products remaining in the substitutions is shown in the schematic diagram of the
best greedy product combination have no positive re framework in Figure 1.
lations to those chosen to drop, and those outside the
best greedy product combination do not have nega
6. Data and Experimental Design
tive relations to those chosen to drop. Let Λjj′ be the
A real-world database and a semisynthetic database are
greedy product combination in which the cross-
used in this study. The real-world database is from a large
selling and substitution effects resulting from prod
global retail chain. In this database, four data sets, that iss,
ucts j and j′ are positive. Then,
Customers, Products, Sales, and Inventories, contain cus
X XX Mυ
m X
′ tomer profiles, product categories, customer-level sales
0 ≤ P(qAS ) � P(qGH ) ≤ pi ∆fijjυ ′ (yυr υr
i , yi ) + P(δ),
j′ ⊆Λjj′ j⊆Λjj′ υ�1 i�1
records, and product-level inventory records, respec
tively. The Customers data set has the detailed profiles of
(13)
10,281 customers from 25 stores of the retail chain in
where P(δ) represents the sum of the profits from all North America. The Products data set contains the infor
high-order interactions. In Equation (13), the upper mation such as the brand, product name, and SKU (stock
bound is achieved only if Λn̂ ∗ ⊆ Λjj′ and the products in keeping unit) of 1,559 products in 110 subcategories, 45
Chen, Fan, and Sun: Machine Learning for Demand Estimation/Assortment Planning
INFORMS Journal on Computing, 2023, vol. 35, no. 1, pp. 158–177, © 2022 INFORMS 171

categories, and three product families. The Sales data set profit and the unit holding cost. The shortage cost s̃ i is
contains 269,720 transaction records over a period of 102 not considered in the experiment (Gaur and Honhon
weeks. Each record has the sales volume, sales value, and 2006, Gotoh and Takano 2007).
sales profit of a product purchased by a customer at a spe Semisynthetic simulations using real-world data and
cific time and store. The Inventories data set contains the a presumed data generation process are used to gener
inventory historical data and inventory cost of each prod ate unobservable labels of unlabeled data (Hill 2011,
uct with 11,952 records over the same period. Chen et al. 2019). In the experiments, the simulated true
In the computational experiments, the purpose of demands are generated according to the assumptions
Downloaded from informs.org by [18.9.61.111] on 29 May 2023, at 23:43 . For personal use only, all rights reserved.

demand forecasting is to predict the sales volume of that they are linearly/nonlinearly correlated to the his
each product in the next week. Following the longitudi torical demands of the product, the current price of the
nal framework for the experimental setting of dynamic product, the current season, and the current demands of
forecasting problems (Sett et al. 2017), a holdout valida the related products. A linear function and a nonlinear
tion approach is used, and the data are divided along function with similar structure to that of feedforward
the time dimension into a training set, a validation set, NNs are used to simulate the linear and nonlinear data
and a testing set. For standard demand forecasting, generation processes, respectively. The descriptions of the
each of the training, validation, and testing sets have semisynthetic database and the linear/nonlinear simula
1,559 observations each corresponding to a product. tion data, especially the linear and nonlinear functions,
The input of each observation covers a time period of are presented in Online Appendix G.
52 weeks, and the output covers a time period of 1 Four criteria including the mean absolute error (MAE),
week immediately following the time period of the the mean squared error (MSE), the CI, and the NP are
input. For these observations in the training, validation, used to measure performances of the demand forecasting
and testing sets, the time period of the input data starts methods. Furthermore, an additional criterion is designed
at weeks 46, 47, and 48 and terminates at weeks 97, 98, to measure profit improvements in assortment rule min
and 99, respectively. For collective demand forecasting, ing. The criterion is the average predicted profit lift (Prof
a moving window method is used to generate the itLift), which is the difference between the predicted
observations with the temporal features in the nonrela maximum profits per product with and without using
tional feature vector. Specifically, there are 1,559 blocks, demand inference and assortment rule mining and, cor
each corresponding to the same product. Each of the respondingly, with and without considering the cross-
training, validation, and testing sets in a block has 46
selling and substitution effects. In demand inference,
observations each with a different time period. For the
labeling performance for unlabeled data is evaluated by
46 observations in the training set, the time periods of
referring to the influence of labeling unlabeled data on the
the input data start from week 1 to week 46 and termi
labeled data (Zhou and Li 2007). Unlabeled data are used
nate from week 52 to week 97, and those of the output
in semisupervised learning according to Equation (6), and
data are the next week immediately following the time
then the MAE, MSE, CI, and NP on the labeled data are
periods of the input data. The starting/terminating
used as measures for demand inference. The details of the
time of a corresponding observation in the input and
evaluation method are provided in Online Appendix G.
output data of the validation/testing set is one week
In the experiments, the whole retail chain is consid
after the last observation in the training/validation set.
ered with full assortment, and the purpose of assort
The ways of dividing the data into the training, valida
ment rule mining for assortment-based substitutions is
tion, and testing sets for the standard and collective
to identify the products to drop. The methods can be
demand forecasting are illustrated in Figures A3 and
A4 of Online Appendix B, respectively. Different divi naturally extended to the store-level assortment plan
sion strategies of the blocks can be used in the CCDM to ning in which new products can be added to the assort
potentially improve the performances of the CCDM ment. Due to the page limit, results of the store-level
and the assortment decisions. assortment planning are not presented.
The features of the input data for collective demand
forecasting include the lagged weekly sales volumes 7. Experimental Results
with a time window of T � 52 weeks, the current price, This section reports the computational results of the
the season of the target time period, and the sales vol proposed method on the real-world and semisynthetic
ume of related products of each product. By using data databases. Comparisons between the proposed and
warehousing tools, the lagged sales volume, the price, baseline methods are then given. Some managerial
and the season are extracted from the Sales data set, implications are finally provided. Matlab 2014a is used
and the lagged sales volume is aggregated over cus to conduct the computations. The desktop computer
tomers. The sales and holding cost of each product in used for the computation has two Intel Core i7 process
the target time period are extracted from the Sales ors with a Quad Core 3.40-GHz clock speed and 16 GB
and Inventory data sets, respectively, to obtain the unit of RAM.
Chen, Fan, and Sun: Machine Learning for Demand Estimation/Assortment Planning
172 INFORMS Journal on Computing, 2023, vol. 35, no. 1, pp. 158–177, © 2022 INFORMS

7.1. Performance of Collective Demand P-NN lies in the high dimension and small sample of the
Forecasting on the Real-World Database data used in the CCDM to which NN is often sensitive
Three methods including P-SVR, profit-driven linear but SVR is usually robust (Hastie et al. 2010).
kernel support vector regression (P-LKSVR), and P-NN Besides the three standard demand forecasting meth
are used for the standard demand forecasting to obtain ods, collective demand forecasting is compared with
the initial values for collective demand forecasting and four other baseline methods, that is, the semisupervised
are used as the baseline methods for comparison. The recurrent NN (SS-RNN) (Moore and Neville 2017), the
features including the lagged weekly sales volumes, graph convolutional NN (GCNN) (Kipf and Welling
Downloaded from informs.org by [18.9.61.111] on 29 May 2023, at 23:43 . For personal use only, all rights reserved.

the price, and the season are inputs of these three meth 2017), the modified MNL, and the newsvendor models
ods. The free parameters, including a regularization with a negative binomial distribution (NBD) (Smith
parameter for each of the three methods and a kernel and Agrawal 2000), a data-driven NBD (DD-NBD), a
parameter for P-SVR (see Online Appendix C), are Poisson distribution (Poisson) (Topaloglu 2013), and a
empirically set to optimal values using the validation normal approximation of Poisson (Norm) (Mahajan
set. The four measures of performance are calculated, and van Ryzin 2001). The first two methods are popular
and the results are shown in Table 4. As shown in Table graph learning methods, the standard MNL and its var
4, P-SVR obtained the best MAE and MSE, P-SVR iants are widely used for demand estimation in assortment
obtained the best CI, and P-NN obtained the best NP. planning, and the newsvendor models are representative
Ridge regression (Ridge) (Fan and Lv 2008) with a methods for stochastic demand modeling. The implemen
regularization parameter c � 10 is used to obtain the tation details of the baseline methods are provided in
associations among the products. Products with abso Online Appendix H.
lute values of the regression coefficients larger than the The results of collective demand forecasting (CDF)
threshold θ � 0:01 are identified as the related products using the combination of S2 and P-SVR and of the four
of the target product. The sales volumes of the related baseline methods are shown in Table 6. The values of
products of each product are added as features for MAE, MSE, and CI of collective demand forecasting are
collective demand forecasting. P-SVR, P-LKSVR, and all substantially lower than, and the value of NP of col
P-NN are also used as base learners (regression models) lective demand forecasting is substantially larger than,
in the CCDM. Moreover, three initialization methods, the corresponding measures of the baseline methods.
including the results of the corresponding base learners Moreover, due to bad forecasting performance, the four
in the standard demand forecasting (S1), the one week newsvendor models obtained negative values for NP,
lagged demand (S2), and the moving average of the his meaning that the profits are less than the sum of the back
torical demands in the last three weeks (S3), are used in ordering and holding costs if placing orders according to
the CCDM. The collective demand forecasting results the results of the newsvendor models.
with different base learners and initial values are shown The effects of the feature selection methods with dif
in Table 5. As the results in Tables 4 and 5 show, collec ferent threshold values on forecasting performances
tive demand forecasting with P-SVR and P-LKSVR per are examined, and the results are given in Table A2 of
formed far better than standard demand forecasting. Online Appendix H. Moreover, the sensitivity analysis
These results show that considering the relations among about the effects of the length of the historical data and
the products and incorporating the network features are the step-ahead size in the prediction on the computa
crucial for improving demand forecasting performance. tional results is conducted, and the results are given in
The performance of the CCDM is stable with different Figures A6–A9 in Online Appendix H.
initialization methods. Moreover, the CCDM with S2
and P-SVR obtained slightly better results than those 7.2. Results of Demand Inference and
with S1/S3 and P-SVR, and thus the combination of S2 Assortment Rule Mining on the Real-
and P-SVR is chosen as the initialization method and as World Database
the regression model in the CCDM to further conduct Three demand inference, that is, cotraining, transduc
demand inference. In addition, the major reason hinder tive regression (Transductive), and EM for self-training,
ing the performance improvement of the CCDM with methods are used to obtain the demands under different

Table 4. Results of the Three Standard Demand Forecasting Methods

MAE MSE CI NP

Methods Mean Standard deviation Mean Standard deviation Mean Standard deviation Mean Standard deviation

P-SVR 6.0538 1.7252 57.2604 35.9981 9.8020 2.5916 1.2812 1.8273

P-LKSVR 7.4748 1.4442 89.9497 36.7539 6.2415 3.9392 2.4973 1.9163
P-NN 7.1497 0.6718 78.2157 15.6838 6.9203 1.2605 2.7880 1.0535
Note. The best result for each measure is highlighted in bold.
Chen, Fan, and Sun: Machine Learning for Demand Estimation/Assortment Planning
INFORMS Journal on Computing, 2023, vol. 35, no. 1, pp. 158–177, © 2022 INFORMS 173

Table 5. Results of Collective Demand Forecasting with Different Base Learners and Initial Values

MAE MSE CI NP
Initial
values Methods Mean Standard deviation Mean Standard deviation Mean Standard deviation Mean Standard deviation

S1 P-SVR 3.6088 1.4503 24.1754 23.1849 3.0132 3.9392 5.7879 2.9546

S2 P-SVR 3.0756 0.8944 18.0994 19.1427 2.8141 3.9392 6.2411 3.1084
S3 P-SVR 4.1307 1.1171 27.5710 23.9608 3.4650 3.9392 5.3526 2.4572
S1 P-LKSVR 5.1348 1.1072 47.7592 24.2391 4.1362 3.9392 4.4970 2.3994
Downloaded from informs.org by [18.9.61.111] on 29 May 2023, at 23:43 . For personal use only, all rights reserved.

S2 P-LKSVR 5.3508 1.0833 50.2629 24.9573 4.3756 3.9392 4.3111 2.3382

S3 P-LKSVR 4.9579 1.2374 43.8858 24.5492 5.7838 3.9392 4.6772 2.7754
S1 P-NN 7.2831 1.0676 87.9214 26.7699 4.4043 2.1361 2.6328 1.4885
S2 P-NN 7.2818 1.0642 87.9024 26.6939 4.4029 2.1373 2.6349 1.4925
S3 P-NN 10.0899 1.1635 155.9741 36.4731 4.6765 2.1983 0.1828 0.4181
Note. The best result for each measure is highlighted in bold.

assortments with different inventory levels for the real- the cross-selling and substitution effects in the demand
world database. The different assortments with different function. The values of Profitlift and n̂ ∗ for these appro
inventory levels are used to rebuild the network feature aches are shown in Table 8. The value of NP in the first
′
vector yri , and the newly labeled data (xυi , yυr
′
ˆυ
i ; fij ) ob row of Table 6 is the baseline data of profit improve
tained with Procedure 1 are merged with the original ments. As shown in Table 8, the proposed greedy heu
training data to form a new training set. A new cycle of ristics obtained higher Profitlift, chose much smaller
training is conducted on the new training set to obtain number of products to drop, and correspondingly, ob
the results of Transductive, a series of trainings is itera tained a larger assortment than the existing greedy
tively conducted on the updated training set to obtain heuristics. Moreover, considering the stockout-based
the results of EM, and two P-SVRs with different param substitutions and adopting the reranking strategy have
eters are used as the base learners to modify the labels of the ability to slightly improve Profitlift.
the observations with high NP on the validation set in The effects of the threshold values and feature selec
alternating orders. The updated values of the four meas tion methods on assortment rule mining and profit
ures obtained with the three methods on the labeled improvements are investigated, and the results are
observations are presented in Table 7. Transductive shown in Table A3 in Online Appendix H. Moreover,
obtained the best MAE and NP, cotraining obtained the summary statistics of the identified products in the best
best MSE, and CI and EM performed the worst on the greedy product combination considering assortment-
real-world database. The comparisons of the results in based substitutions, reported in Tables A4–A6 in Online
Tables 5 and 7 show that the newly added data are help Appendix H, show wide product varieties, that is, the
ful in improving the forecasting performance and the identified products are well distributed among the prod
demand inference results are at least surely acceptable. uct subcategories. The implementation issues and results
The proposed greedy heuristic (Greedy) and the of the missing historical data problem are described in
greedy heuristic combined with the reranking strategy Online Appendix H.
(Greedy + reranking) for assortment-based and both
types of substitutions are used to obtain the best greedy 7.3. Results on the Semisynthetic Database
product combination, inventory levels, and thus profit Semisynthetic simulations are used to more accurately
lift values. As a benchmark, the existing greedy heuris evaluate the performance of demand inference and ex
tic (Fisher and Vaidyanathan 2014) is modified by using amine the estimated profit improvements. The results of

Table 6. Comparisons of Results of CDF and Baseline Methods

MAE MSE CI NP

Methods Mean Standard deviation Mean Standard deviation Mean Standard deviation Mean Standard deviation

CDF 3.0756 0.8944 18.0994 19.1427 2.8141 3.9392 6.2411 3.1084

SS-RNN 7.4405 1.4026 92.173 52.1495 6.3358 2.1478 2.5154 1.3370
GCNN 7.3605 0.5306 82.0551 13.3042 7.4036 1.1247 2.6272 1.0449
MNL 9.5615 1.2347 154.8190 39.3881 8.2620 1.7993 0.7158 1.1776
NBD 19.5833 2.3745 456.2348 109.9037 8.4334 1.0274 �8.0004 1.0271
DD-NBD 19.3866 2.1861 447.9664 97.8925 8.3431 0.9495 �7.8159 0.9363
Poisson 14.6725 1.984 311.2979 73.1646 6.9546 0.6243 �4.1326 1.1794
Norm 19.3509 2.0685 457.2154 99.4258 8.6057 0.8451 �7.9713 0.7960
Chen, Fan, and Sun: Machine Learning for Demand Estimation/Assortment Planning
174 INFORMS Journal on Computing, 2023, vol. 35, no. 1, pp. 158–177, © 2022 INFORMS

Table 7. Results of Demand Inference for Assortment-Based Substitutions

MAE MSE CI NP

Methods Mean Standard deviation Mean Standard deviation Mean Standard deviation Mean Standard deviation

Cotraining 2.8889 0.8962 16.0139 18.7385 2.1646 1.2107 6.3952 3.1337

Transductive 2.8781 0.9159 16.3637 18.8276 2.3848 1.3674 6.3975 3.1365
EM 3.0696 0.9076 17.9298 19.0104 2.8535 1.6208 6.2452 3.1102
Downloaded from informs.org by [18.9.61.111] on 29 May 2023, at 23:43 . For personal use only, all rights reserved.

demand forecasting and inference on the linear and non comparison of the results in the seventh and the eighth
linear simulation data are shown in Table A6 in Online rows of Table 9 shows that the proposed greedy heuris
Appendix H, and the comparisons of different data-driven tic exhibits a far better performance than the greedy
assortment planning methods are given here. heuristic in Çömez-Dolgan et al. (2021). These results
To position the profit contribution of each phase in provide evidence that (1) the proposed method is supe
the overall performance improvement, multiple differ rior to these baseline methods on the simulation data,
ent combinations of the proposed and existing methods (2) all the three phases of the proposed methods have
are considered. (1) The proposed methods are used for their own contributions in the overall performance
the first two phases, and the greedy heuristic in Fisher improvement, and (3) the reranking strategy and, corre
and Vaidyanathan (2014) is used for assortment optimi spondingly, the first assumption in Property 1 have
zation. (2) The proposed collective demand forecasting, very small effects on the profit improvement.
the method for the substitution rate estimation in Kök
and Fisher (2007), and the greedy heuristic in Fisher 7.4. Managerial Implications
and Vaidyanathan (2014) are used for the three phases, From the experimental results, the following two mana
respectively. (3 and 4) The modified MNL is used for the gerial implications are obtained.
first two phases, and the proposed greedy heuristic and (1) Both cross-selling and substitutions have profound
the greedy heuristics using the purchase probabilities of effects on assortment planning. Neglecting the cross-selling
the products (Çömez-Dolgan et al. 2021) are used for the effects leads to overestimates of profits and to suboptimal
last phase. (5) The multi-item newsboy model (Smith assortments. The best tradeoff between the cross-selling
and Agrawal 2000) is used to directly obtain the final and substitution effects produces the best assortment. The
decisions. The implementation details of the baseline cross-selling and substitution effects at high and low levels
methods are provided in Online Appendix H. lead to four combinations. The products with high cross-
The profits obtained by demand forecasting (NP1) selling and low substitution effects should be incorporated
and assortment optimization (NP2) for the proposed into, whereas those with low cross-selling and high substi
methods and for the different combinations of the tution effects should be excluded from, the assortment.
methods on the linear and nonlinear simulation data When the cross-selling effects are neglected, it is very possi
are shown in Table 9. Fixing the methods for one or two ble for the products with both high cross-selling and substi
of the three phases, the profit improvements must tution effects to be excluded from, whereas those with both
come from the method(s) for the other phase(s). For low cross-selling and substitution effects to be included in,
example, (1) a comparison of the results in the first four the assortment. As a result, suboptimal assortments with
rows with those in the fifth (sixth) row of Table 9 shows increased demand forecasting errors, overestimated profits
that the proposed greedy heuristic with the reranking or even negative profits, and possibly narrow depth
strategy (combined with cotraining) exhibits far better may be obtained. The comparisons between the pro
performances than the greedy heuristic in Fisher and posed method and the baseline methods on the real-
Vaidyanathan (2014) (combined with the substitution world and semisynthetic databases presented in Tables
rate estimation in Kök and Fisher (2007)); and (2) a 8 and 9 provide evidence.

Table 8. Profit Lifts and the Number of Chosen Products n̂ ∗ for Assortment-Based and Stockout-
Based Substitutions Based on the Results of Collective Demand Forecasting and Demand Inference

Assortment optimization Substitutions Profitlift n̂ ∗

Greedy Assortment 36.4669 245

Greedy + reranking Assortment 37.7454 290
Greedy Assortment and stockout 37.7847 249
Greedy + reranking Assortment and stockout 39.0051 299
Fisher and Vaidyanathan (2014) Assortment 31.4586 740
Chen, Fan, and Sun: Machine Learning for Demand Estimation/Assortment Planning
INFORMS Journal on Computing, 2023, vol. 35, no. 1, pp. 158–177, © 2022 INFORMS 175

Table 9. Results of the Proposed Methods and Baseline Methods on the Linear and Nonlinear Simulation Data

Phase 1 NP1 Phase 2 Phase 3 Substitutions NP2

CDF 3.4943 Cotraining Greedy Assortment 3.7161

CDF 3.4943 Cotraining Greedy + reranking Assortment 3.7111
CDF 3.4943 Cotraining Greedy Assortment and stockout 3.7161
CDF 3.4943 Cotraining Greedy + reranking Assortment and stockout 3.7111
CDF 3.4943 Cotraining Fisher and Vaidyanathan (2014) Assortment �37.0178
CDF 3.4943 Kök and Fisher (2007) Fisher and Vaidyanathan (2014) Assortment �4.1640
Downloaded from informs.org by [18.9.61.111] on 29 May 2023, at 23:43 . For personal use only, all rights reserved.

MNL �2.0548 MNL Greedy Assortment �8.6553

MNL �2.0548 MNL Çömez-Dolgan et al. (2021) Assortment �16.6808
NBD NA Kök and Fisher (2007) Multi-item newsvendor Assortment �6.7893

(2) Computational results show that the cross-selling for prediction correlated to each other in a network. These
and substitution effects, reflected in ∆fijυ , may be nega applications are common in marketing, revenue, and
tive and may be different for different products with supply chain management. (2) The one substitution as
different prices at different time points. Thus, data ana sumption and the independent and additive substitution
lytics is necessary in quantifying the heterogeneous assumption for multiple unavailable products limit the
cross-selling and substitution effects. accurate measure of the cross-selling and substitution
effects. The multiple-to-one and multiple-to-multiple
8. Conclusions effects may exist, but no evidence for these effects are
From the machine learning perspective, the demand esti available in theory and practice. It is necessary to use par
mation and assortment planning problem is decomposed ticular, such as product bundling sales, data to make
into three subproblems: collective demand forecasting, elaborate analysis and develop new methods, such as
demand inference for cross-selling, and substitutions and representation and manifold learning to deal with the
assortment rule mining (Figure 1). Two profit-driven possible dimension explosion. (3) The proposed methods
machine learning methods are used as the basic models in may be extended to the shelf space allocation problems,
collective demand forecasting, and a CCDM with a good hierarchical demand estimation and assortment opti
convergence property is developed to solve this subpro mization problems with product depth and breadth
blem. Procedures are developed for demand inference constraints, and joint pricing and assortment decision
and assortment rule mining using the results of collective problems with stochastic lead time. (4) For seasonal goods
demand forecasting to obtain the best assortment. or goods without historical demand data, dynamic assort
A real-world database and a semisynthetic database
ment optimization needs to be considered by capturing
are used to evaluate the performance of the proposed
the exploration (learning) and exploitation (profit maxi
methods. The computational results on the real-world
mization) tradeoffs (Caro and Gallien 2007, Rusmevi
database show that (1) collective demand forecasting can
chientong et al. 2010). The combination of data-driven
substantially outperform the standard demand forecast
and dynamic assortment optimization, the use of observa
ing methods, some popular graph learning methods,
MNL, and the single-item newsvendor models; (2) trans tional and experimental data, and the use of supervised,
ductive regression and cotraining in demand inference semisupervised, and reinforcement learning methods are
can obtain the best results among the three semisuper interesting problems and may be promising directions to
vised learning methods; and (3) using the combination of efficiently provide more accurate and convincing results
the greedy heuristic and the reranking strategy can signif for assortment planning.
icantly improve performance. The computational results
on the semisynthetic database generated with linear and Acknowledgments
nonlinear functions show that (1) the proposed methods The authors thank the associate editor and three anony
mous reviewers for valuable suggestions.
can perform far better than multiple different combina
tions of the proposed and existing methods; (2) all the
References
three phases of the proposed methods have their own Bai X, Bhattacharjee S, Boylu F, Gopal R (2015) Growth projections
contributions in the overall performance improvement; and assortment planning of commodity products across multiple
and (3) the first assumption in Property 1 has small effects stores: A data mining and optimization approach. INFORMS J.
on the performance improvement on small data sets. Comput. 27(4):619–635.
Several future research directions are outlined. (1) Col Ban GY, Rudin C (2019) The big data newsvendor: Practical insights
from machine learning. Oper. Res. 67(1):90–108.
lective demand forecasting can be extended as a general Belloni A, Freund R, Selove M, Simester D (2008) Optimizing prod
graph learning method and as a phase of data-driven uct line designs: Efficient methods and comparisons. Manage
optimization to applications with the dependent variables ment Sci. 54(9):1544–1552.
Chen, Fan, and Sun: Machine Learning for Demand Estimation/Assortment Planning
176 INFORMS Journal on Computing, 2023, vol. 35, no. 1, pp. 158–177, © 2022 INFORMS

Bertsekas DP (2016) Nonlinear Programming, 3rd ed. (Athena Scien Kök AG, Fisher ML (2007) Demand estimation and assortment
tific, Belmont, MA). optimization under substitution: Methodology and application.
Bertsimas D, Kallus N (2020) From predictive to prescriptive ana Oper. Res. 55(6):1001–1021.
lytics. Management Sci. 66(3):1005–1507. Kök AG, Fisher ML, Vaidyanathan R (2015) Assortment planning:
Cachon GP, Kök AG (2007) Category management and coordination Review of literature and industry practice. Agrawal N, Smith S,
in retail assortment planning. Management Sci. 53(6):934–951. eds. Retail Supply Chain Management. International Series in Oper
Candès EJ, Plan Y (2010) Matrix completion with noise. Proc. IEEE ations Research and Management Science, vol. 223 (Springer, Bos
98(6):925–936. ton, MA), 175–236.
Caro F, Gallien J (2007) Dynamic assortment with demand learning Lee H, Eun Y (2016) Estimating primary demand for a heterogeneous-
Downloaded from informs.org by [18.9.61.111] on 29 May 2023, at 23:43 . For personal use only, all rights reserved.

for seasonal consumer goods. Management Sci. 53(2):276–292. groups product category under hierarchical consumer choice
Chan R, Li Z, Matsypura D (2020) Assortment optimisation prob model. IISE Trans. 48(6):541–554.
lem: A distribution-free approach. Omega 95:102083. Little R, Rubin DB (2020) Statistical Analysis with Missing Data, 3rd
Chen ZY, Fan ZP, Sun M (2019) Individual-level social influence ed. (Willey, Hoboken, NJ).
identification in social media: A learning-simulation coordi Liu N, Ma Y, Topaloglu H (2020) Assortment optimization under the
nated method. Eur. J. Oper. Res. 273(3):1005–1015. multinomial logit model with sequential offerings. INFORMS J.
Çömez-Dolgan N, Fescioglu-Unver N, Cephe E, Şen A (2021) Comput. 32(3):835–853.
Capacitated strategic assortment planning under explicit demand Liyanage LH, Shanthikumar JG (2005) A practical inventory control
substitution. Eur. J. Oper. Res. 294(3):1120–1138. policy using operational statistics. Oper. Res. Lett. 33(4):341–348.
Cortes C, Mohri M (2007) On transductive regression. Platt J, Koller Luo ZQ, Tseng P (1992) On the convergence of coordinate descent
D, Singer Y, Roweis S, eds. Advances in Neural Information Proc method for convex differentiable minimization. J. Optim. Theory
essing Systems, vol. 20 (MIT Press, Cambridge, MA), 305–312. Appl. 72(1):7–35.
Elmachtoub AN, Grigas P (2021) Smart “predict, then optimize”. Mahajan S, van Ryzin G (2001) Stocking retail assortments under
Management Sci. 68(1):9–26. dynamic consumer substitution. Oper. Res. 49(3):334–351.
Engelen JV, Hoos HH (2020) A survey on semi-supervised learning. Mcdowell LK, Gupta KM, Aha DW (2009) Cautious collective classi
Machine Learn. 109(2):373–440. fication. J. Machine Learn. Res. 10(18):2777–2836.
Fan J, Lv J (2008) Sure independence screening for ultrahigh dimen Moore J, Neville J (2017) Deep collective inference. Singh S, Marko
sional feature space. J. Royal Statist. Soc. B 70(5):849–911. vitch S, eds. Proc. 31st AAAI Conf. on Artificial Intelligence (AAAI
Farias VF, Jagabathula S, Shah D (2013) A nonparametric approach to Press, Palo Alto, CA), 2364–2372.
modeling choice with limited data. Management Sci. 59(2):305–322. Nickel M, Murphy K, Tresp V, Gabrilovich E (2016) A review of
Feng Q, Shanthikumar G (2018) How research in production and relational machine learning for knowledge graphs. Proc. IEEE
operations management may evolve in the era of big data. Pro 104(1):11–33.
duction Oper. Management 27(9):1670–1684. Nigam K, McCallum A, Thrun S, Mitchell T (2000) Text classifica
Feng G, Li X, Wang Z (2018) On substitutability and complementar tion from labeled and unlabeled documents using EM. Machine
ity in discrete choice models. Oper. Res. Lett. 46(1):141–146. Learn. 39(2/3):103–134.
Fisher ML, Vaidyanathan R (2014) A demand estimation procedure Olaya D, Coussement K, Verbeke W (2020) A survey and bench
for retail assortment optimization with results from implemen marking study of multitreatment uplift modeling. Data Mining
tations. Management Sci. 60(10):2401–2415. Knowledge Discovery 34:273–308.
Gaur V, Honhon D (2006) Assortment planning and inventory deci Oroojlooyjadid A, Snyder LV, Takáč M (2020) Applying deep learn
sions under a locational choice model. Management Sci. 52(10): ing to the newsvendor problem. IISE Trans. 52(4):444–463.
1528–1543. Rusmevichientong P, Shen ZJM, Shmoys DB (2010) Dynamic assort
Gotoh JY, Takano Y (2007) Newsvendor solutions via conditional ment optimization with a multinomial logit choice model and
value-at-risk minimization. Eur. J. Oper. Res. 179(1):80–96. capacity constraint. Oper. Res. 58(6):1666–1680.
Gubela RM, Lessmann S, Jaroszewicz S (2020) Response transforma Şen A, Atamtürk A, Kaminsky P (2018) Technical Note—A conic
tion and profit decomposition for revenue uplift modeling. Eur. integer optimization approach to the constrained assortment
J. Oper. Res. 283(2):647–661. problem under the mixed multinomial logit model. Oper. Res.
Hastie T, Tibshirani R, Friedman J (2010) The Elements of Statistical 66(4):994–1003.
Learning: Data Mining, Inference, and Prediction, 2nd ed. (Springer, Sen P, Namata G, Bilgic M, Getoor L, Galligher B, Eliassi-Rad T
New York). (2008) Collective classification in network data. AI Magazine
Hill JL (2011) Bayesian nonparametric modeling for causal inference. 29(3):93–106.
J. Comput. Graphical Statist. 20(1):217–240. Sett N, Basu S, Nandi S, Singh SR (2017) Temporal link prediction
Huber J, Müller S, Fleischmann M, Stuckenschmidt H (2019) A in multi-relational network. World Wide Web (Bussum) 21(2):
data-driven newsvendor problem: From data to decision. Eur. J. 395–419.
Oper. Res. 278(3):904–915. Shin H, Park S, Lee E, Benton WC (2015) A classification of the liter
Hübner AH, Kuhn H (2012) Retail category management: A state-of- ature on the planning of substitutable products. Eur. J. Oper.
the-art review of quantitative research and software applications Res. 246(3):686–699.
in assortment and shelf space management. Omega 40(2):199–209. Simester D, Timoshenko A, Zoumpoulis SI (2020) Targeting pro
Hübner AH, Kuhn H, Kühn S (2016) An efficient algorithm for spective customers: Robustness of machine-learning methods to
capacitated assortment planning with stochastic demand and typical data challenges. Management Sci. 66(6):2495–2522.
substitution. Eur. J. Oper. Res. 250(2):505–520. Smith SA, Agrawal N (2000) Management of multi-item retail inven
Kamakura WA, Wedel M, de Rosa F, Mazzon JA (2003) Cross-sell tory systems with demand substitution. Oper. Res. 48(1):50–64.
ing through database marketing: A mixed data factor analyzer Talebian M, Boland N, Savelsbergh M (2014) Pricing to accelerate
for data augmentation and prediction. Internat. J. Res. Marketing demand learning in dynamic assortment planning for perish
20(1):45–65. able products. Eur. J. Oper. Res. 237(2):555–565.
Kipf TN, Welling M (2017) Semi-supervised classification with Topaloglu H (2013) Joint stocking and product offer decisions under
graph convolutional networks. Proc. 5th Internat. Conf. on Learn the multinomial logit model. Production Oper. Management 22(5):
Representations. OpenReview.net. 1182–1199.
Chen, Fan, and Sun: Machine Learning for Demand Estimation/Assortment Planning
INFORMS Journal on Computing, 2023, vol. 35, no. 1, pp. 158–177, © 2022 INFORMS 177

Vulcano G, van Ryzin G, Ratliff R (2012) Estimating primary Yang Q, Zhang Y, Dai W, Pan S (2020) Transfer Learning (Cambridge
demand for substitutable products from sales transaction data. University Press, Cambridge, UK).
Oper. Res. 60(2):313–334. Zhang W, Li J, Liu L (2021) A unified survey of treatment effect het
Wan M, Huang Y, Zhao L, Deng T, Fransoo JC (2018) Demand esti erogeneity modelling and uplift modelling. ACM Comput. Sur
mation under multi-store multi-product substitution in high
vey 54(8):162.
density traditional retail. Eur. J. Oper. Res. 266(1):99–111.
Xia F, Sun K, Yu S, Aziz A, Wan L, Pan S, Liu H (2021) Graph learn Zhou ZH, Li M (2007) Semi-supervised regression with co-training-style
ing: A survey. IEEE Trans. Artificial Intelligence 2(2):109–127. algorithms. IEEE Trans. Knowledge Data Engrg. 19(11):1479–1493.
Downloaded from informs.org by [18.9.61.111] on 29 May 2023, at 23:43 . For personal use only, all rights reserved.

Retail Demand Forecasting and Inventory Optimization Using Data Mining & BI
No ratings yet
Retail Demand Forecasting and Inventory Optimization Using Data Mining & BI
4 pages
St. Thomas Aquinas's Concept of The Self
No ratings yet
St. Thomas Aquinas's Concept of The Self
3 pages
Joc Main Text
No ratings yet
Joc Main Text
42 pages
1 s2.0 S2772662222000066 Main
No ratings yet
1 s2.0 S2772662222000066 Main
11 pages
Classification-Based Model Selection in Retail Demand
No ratings yet
Classification-Based Model Selection in Retail Demand
15 pages
Journal Industrial Engineering
No ratings yet
Journal Industrial Engineering
12 pages
Application of Machine Learning in Demand Forecasting: Ae 407: Production and Operations Management
No ratings yet
Application of Machine Learning in Demand Forecasting: Ae 407: Production and Operations Management
19 pages
Prediction of Customer Demands For Production Planning
No ratings yet
Prediction of Customer Demands For Production Planning
4 pages
SSRN Id2641268
No ratings yet
SSRN Id2641268
39 pages
An Improved Demand Forecasting Model Using PDF
No ratings yet
An Improved Demand Forecasting Model Using PDF
16 pages
1 s2.0 S0169207020300224 Main
No ratings yet
1 s2.0 S0169207020300224 Main
19 pages
Comparison Study: Product Demand Forecasting With Machine Learning For Shop
No ratings yet
Comparison Study: Product Demand Forecasting With Machine Learning For Shop
6 pages
A Support Vector Machine For Model Selection in Demand Forecasting Applications PDF
No ratings yet
A Support Vector Machine For Model Selection in Demand Forecasting Applications PDF
7 pages
Assortment Planning 2 PDF
No ratings yet
Assortment Planning 2 PDF
18 pages
Research Methodology
No ratings yet
Research Methodology
19 pages
Anticipating Consumer Demand Using ML
No ratings yet
Anticipating Consumer Demand Using ML
8 pages
Demand Forecasting Tool For Inventory Control Smar
No ratings yet
Demand Forecasting Tool For Inventory Control Smar
12 pages
A Comparative Study of Demand Forecasting Models For A Multi Channel Retail Company: A Novel Hybrid Machine Learning Approach
No ratings yet
A Comparative Study of Demand Forecasting Models For A Multi Channel Retail Company: A Novel Hybrid Machine Learning Approach
22 pages
Ankit Survey Paper
No ratings yet
Ankit Survey Paper
6 pages
SSRN 4477833
No ratings yet
SSRN 4477833
44 pages
Combining Artificial Neural Networks and Association Rule Mining For Sequential Pattern Discovery
No ratings yet
Combining Artificial Neural Networks and Association Rule Mining For Sequential Pattern Discovery
17 pages
Demand Forecasting in Supply Chain
No ratings yet
Demand Forecasting in Supply Chain
23 pages
BreadthVsDepth R1 20200607 Web
No ratings yet
BreadthVsDepth R1 20200607 Web
46 pages
Tree Vs LSTM For SCM
No ratings yet
Tree Vs LSTM For SCM
17 pages
Novel AI Powered Dynamic Inventory Manag
No ratings yet
Novel AI Powered Dynamic Inventory Manag
13 pages
Smart Inventory Management System
No ratings yet
Smart Inventory Management System
5 pages
Conference Paper Final
No ratings yet
Conference Paper Final
8 pages
Predictive Analytics For Demand Forecasting A Deep
No ratings yet
Predictive Analytics For Demand Forecasting A Deep
9 pages
Grid Search Optimization (GSO) Based Future Sales Prediction For Big Mart
No ratings yet
Grid Search Optimization (GSO) Based Future Sales Prediction For Big Mart
7 pages
Dynamic Assortment With Demand Learning For Seasonal Consumer Goods
No ratings yet
Dynamic Assortment With Demand Learning For Seasonal Consumer Goods
18 pages
Demand Estimation of Full-Cut Promotion On E-Commerce Company
No ratings yet
Demand Estimation of Full-Cut Promotion On E-Commerce Company
73 pages
Algorithm Proposal - Inventory Demand Forecasting Using Machine Learning
No ratings yet
Algorithm Proposal - Inventory Demand Forecasting Using Machine Learning
5 pages
Big Data Analytics in Retail
No ratings yet
Big Data Analytics in Retail
18 pages
Journal
No ratings yet
Journal
7 pages
E-Commerce Inventory Prediction by Hybridization D
No ratings yet
E-Commerce Inventory Prediction by Hybridization D
20 pages
Demand Estimation and Assortment Optimization Under Substitution: Methodology and Application
No ratings yet
Demand Estimation and Assortment Optimization Under Substitution: Methodology and Application
22 pages
Sales Prediction and Product Recommendation Model Through
No ratings yet
Sales Prediction and Product Recommendation Model Through
20 pages
C A M M L M R S F: Omparative Nalysis of Odern Achine Earning Odels For Etail Ales Orecasting
No ratings yet
C A M M L M R S F: Omparative Nalysis of Odern Achine Earning Odels For Etail Ales Orecasting
20 pages
Unlocking Accurate Demand Forecasting in Retail Supply Chains With Ai-Driven Predictive Analytics
No ratings yet
Unlocking Accurate Demand Forecasting in Retail Supply Chains With Ai-Driven Predictive Analytics
10 pages
1 s2.0 S2192437622000280 Main
No ratings yet
1 s2.0 S2192437622000280 Main
14 pages
Thesis Presentation
No ratings yet
Thesis Presentation
23 pages
Data Analytics For Product Segmentation and Demand Forecasting of A Local Retail Store Using Python
No ratings yet
Data Analytics For Product Segmentation and Demand Forecasting of A Local Retail Store Using Python
8 pages
ADEM: An Online Decision Tree Based Menu Demand Prediction Tool For Food Courts
No ratings yet
ADEM: An Online Decision Tree Based Menu Demand Prediction Tool For Food Courts
6 pages
Capstone Synopsis
No ratings yet
Capstone Synopsis
10 pages
Pericles 85032019
No ratings yet
Pericles 85032019
1 page
Large Scale Product Recommendation of Supermarket
No ratings yet
Large Scale Product Recommendation of Supermarket
19 pages
Demand Forecasting For Improved Inventory Management in 1hoyakdy
No ratings yet
Demand Forecasting For Improved Inventory Management in 1hoyakdy
11 pages
Predictive Analytics Solution For Fresh Food Demand Using Heterogeneous Mixture Learning Technology
No ratings yet
Predictive Analytics Solution For Fresh Food Demand Using Heterogeneous Mixture Learning Technology
5 pages
Sales Prediction
No ratings yet
Sales Prediction
37 pages
Complexity - 2024 - Anitha - A Demand Forecasting Model Leveraging Machine Learning To Decode Customer Preferences For New
No ratings yet
Complexity - 2024 - Anitha - A Demand Forecasting Model Leveraging Machine Learning To Decode Customer Preferences For New
10 pages
Objective 2 Manishbuses - Net - Booking-Confirm - HTML - PNR - Number 179828&cms - Status 1&email Revati2004@
No ratings yet
Objective 2 Manishbuses - Net - Booking-Confirm - HTML - PNR - Number 179828&cms - Status 1&email Revati2004@
9 pages
Predictive Big Data Analytics For Supply Chain Demand Forecasting: Methods, Applications, and Research Opportunities
No ratings yet
Predictive Big Data Analytics For Supply Chain Demand Forecasting: Methods, Applications, and Research Opportunities
22 pages
Data Driven Supply Chain ML
No ratings yet
Data Driven Supply Chain ML
27 pages
Predictive Analysis For Big Mart Sales Using Machine
100% (1)
Predictive Analysis For Big Mart Sales Using Machine
11 pages
Report of Mini Project
No ratings yet
Report of Mini Project
53 pages
Product Pricing Solutions Using Hybrid Machine Learning Algorithm
No ratings yet
Product Pricing Solutions Using Hybrid Machine Learning Algorithm
12 pages
Demand Prediction Using Machine Learning Methods and Stacked Generalization
No ratings yet
Demand Prediction Using Machine Learning Methods and Stacked Generalization
7 pages
Data and Analytics in Action: Project Ideas and Basic Code Skeleton in Python
From Everand
Data and Analytics in Action: Project Ideas and Basic Code Skeleton in Python
Zemelak Goraga
No ratings yet
Data-Driven Decision Making
From Everand
Data-Driven Decision Making
Aadinath Pothuvaal
No ratings yet
Synthetic Data Generation: A Beginner’s Guide
From Everand
Synthetic Data Generation: A Beginner’s Guide
Robert Johnson
No ratings yet
Navigating Complexity: Advanced Decision Support Systems for Healthcare Professionals: O7.0 TRANSFORM INFORMATION TECHNOLOGY
From Everand
Navigating Complexity: Advanced Decision Support Systems for Healthcare Professionals: O7.0 TRANSFORM INFORMATION TECHNOLOGY
Elizabeth Mogopodi
No ratings yet
UWEV3EV
No ratings yet
UWEV3EV
89 pages
Report 2893990
No ratings yet
Report 2893990
11 pages
NP33333
No ratings yet
NP33333
2,491 pages
134G34G
No ratings yet
134G34G
2 pages
OH43H4H
No ratings yet
OH43H4H
43 pages
FC Monitor
No ratings yet
FC Monitor
836 pages
W23R3R
No ratings yet
W23R3R
226 pages
QERRGG
No ratings yet
QERRGG
2 pages
34G43G1
No ratings yet
34G43G1
7 pages
General: NAM Adidas
No ratings yet
General: NAM Adidas
2 pages
From Predictive To Prescriptive
No ratings yet
From Predictive To Prescriptive
57 pages
OMSA ISYE6501Syllabus-combined-Fall 2022
No ratings yet
OMSA ISYE6501Syllabus-combined-Fall 2022
8 pages
SQL Server Management Studio Keyboard Shortcuts
No ratings yet
SQL Server Management Studio Keyboard Shortcuts
21 pages
Visio Decision Tree
No ratings yet
Visio Decision Tree
1,392 pages
Refining A Go-To-Market Strategy in Fashion and Apparel
No ratings yet
Refining A Go-To-Market Strategy in Fashion and Apparel
10 pages
An SQL Based Cost Effective Inventory Op
No ratings yet
An SQL Based Cost Effective Inventory Op
13 pages
EXASOL User Manual 6.1.0 en
No ratings yet
EXASOL User Manual 6.1.0 en
514 pages
Jay Hardwick Resume
No ratings yet
Jay Hardwick Resume
2 pages
Humming Bird - Olympiad & SpellBee
No ratings yet
Humming Bird - Olympiad & SpellBee
2 pages
Ela - The Quilt Story Lesson Plan
No ratings yet
Ela - The Quilt Story Lesson Plan
3 pages
ENGLISH IV Mumtaz Ali Khan
No ratings yet
ENGLISH IV Mumtaz Ali Khan
18 pages
"From DFD To Structure Chart": TCS2411 Software Engineering 1
No ratings yet
"From DFD To Structure Chart": TCS2411 Software Engineering 1
19 pages
JT - SDT Final-Revised
No ratings yet
JT - SDT Final-Revised
13 pages
Marx Resume For Sales and Marketing
No ratings yet
Marx Resume For Sales and Marketing
3 pages
Preparing For Admission Into AKU's MBBS Program
No ratings yet
Preparing For Admission Into AKU's MBBS Program
70 pages
Detailed Lesson Plan
No ratings yet
Detailed Lesson Plan
5 pages
Flag in Every School List
No ratings yet
Flag in Every School List
274 pages
IELTS Writing
100% (1)
IELTS Writing
5 pages
Thomas Tullis' Grievance Filed Against Ben Bowman
No ratings yet
Thomas Tullis' Grievance Filed Against Ben Bowman
5 pages
Mco 03 em PDF
80% (5)
Mco 03 em PDF
8 pages
ML
No ratings yet
ML
1 page
Psychological Perspective
No ratings yet
Psychological Perspective
2 pages
Beginner Book 1 Activity Worksheets
No ratings yet
Beginner Book 1 Activity Worksheets
12 pages
Pay-For-Performance: The Evidence: Mcgraw-Hill/Irwin
No ratings yet
Pay-For-Performance: The Evidence: Mcgraw-Hill/Irwin
29 pages
Lewis Hine: I. Introduction: The Context
No ratings yet
Lewis Hine: I. Introduction: The Context
7 pages
Unit-1: Dynamics of Communication
No ratings yet
Unit-1: Dynamics of Communication
46 pages
Gelect Lesson 2
No ratings yet
Gelect Lesson 2
7 pages
Bidisha
No ratings yet
Bidisha
2 pages
Jahangirnagar Model United Nations 2015: Proposal Letter
No ratings yet
Jahangirnagar Model United Nations 2015: Proposal Letter
11 pages
PSYCHOPHYSIOLOGICALPERSPECTIVESONANXIETYWord 97
No ratings yet
PSYCHOPHYSIOLOGICALPERSPECTIVESONANXIETYWord 97
50 pages
Ansuman Resume
No ratings yet
Ansuman Resume
1 page
Dipr 001 Course Outline - 2025
No ratings yet
Dipr 001 Course Outline - 2025
7 pages
Reading Test - Hanoi University
100% (1)
Reading Test - Hanoi University
4 pages
1171 Math 647
No ratings yet
1171 Math 647
16 pages
Research Learning Styles
100% (1)
Research Learning Styles
21 pages
FINAL C1ES 108903 New Reporting Template For LESF
No ratings yet
FINAL C1ES 108903 New Reporting Template For LESF
528 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Ijoc 2022 1251

Uploaded by

Ijoc 2022 1251

Uploaded by

This article was downloaded by: [18.9.61.

111] On: 29 May 2023, At: 23:43

INFORMS Journal on Computing

Machine Learning Methods for Data-Driven Demand

To cite this article:

Full terms and conditions of use: https://pubsonline.informs.org/Publications/Librarians-Portal/PubsOnLine-Terms-and-

Copyright © 2022, INFORMS

Please scroll down for article—it is on subsequent pages

Machine Learning Methods for Data-Driven Demand Estimation

Zhen-Yu Chen,a,* Zhi-Ping Fan,a Minghe Sunb

1. Introduction Despite extensive studies, some issues about the

with a good convergence property. This study also con

into blocks in such a way that the observations in a

Table 1. Three Subproblems of the Missing Data Problem

Nonrelational features Network features

Products Lagged Lagged Lagged Related Related Related

Missing historical data 1 (6) 3.9 0 NA NA NA 2 13 0

Missing labels 1 (NA) 3.9 1 8 3 7 0 13 0

1 0.0 12.0 (18.0–6.0) 0.0 0.0 3.5 (4.0–0.5) 3.4 25.4

Profit 3.4 7.0 1.3 1.5 2.5

(11) issues are important to study as a future work.

Table 4. Results of the Three Standard Demand Forecasting Methods

P-SVR 6.0538 1.7252 57.2604 35.9981 9.8020 2.5916 1.2812 1.8273

S1 P-SVR 3.6088 1.4503 24.1754 23.1849 3.0132 3.9392 5.7879 2.9546

S2 P-LKSVR 5.3508 1.0833 50.2629 24.9573 4.3756 3.9392 4.3111 2.3382

Table 6. Comparisons of Results of CDF and Baseline Methods

CDF 3.0756 0.8944 18.0994 19.1427 2.8141 3.9392 6.2411 3.1084

Table 7. Results of Demand Inference for Assortment-Based Substitutions

Cotraining 2.8889 0.8962 16.0139 18.7385 2.1646 1.2107 6.3952 3.1337

Assortment optimization Substitutions Profitlift n̂ ∗

Greedy Assortment 36.4669 245

Phase 1 NP1 Phase 2 Phase 3 Substitutions NP2

CDF 3.4943 Cotraining Greedy Assortment 3.7161

MNL �2.0548 MNL Greedy Assortment �8.6553

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Ijoc 2022 1251

Uploaded by

Ijoc 2022 1251

Uploaded by

This article was downloaded by: [18.9.61.

111] On: 29 May 2023, At: 23:43

INFORMS Journal on Computing

Machine Learning Methods for Data-Driven Demand

To cite this article:

Full terms and conditions of use: https://pubsonline.informs.org/Publications/Librarians-Portal/PubsOnLine-Terms-and-

Copyright © 2022, INFORMS

Please scroll down for article—it is on subsequent pages

Machine Learning Methods for Data-Driven Demand Estimation

Zhen-Yu Chen,a,* Zhi-Ping Fan,a Minghe Sunb

1. Introduction Despite extensive studies, some issues about the

with a good convergence property. This study also con­

into blocks in such a way that the observations in a

Table 1. Three Subproblems of the Missing Data Problem

Nonrelational features Network features

Products Lagged Lagged Lagged Related Related Related

Missing historical data 1 (6) 3.9 0 NA NA NA 2 13 0

Missing labels 1 (NA) 3.9 1 8 3 7 0 13 0

1 0.0 12.0 (18.0–6.0) 0.0 0.0 3.5 (4.0–0.5) 3.4 25.4

Profit 3.4 7.0 1.3 1.5 2.5

(11) issues are important to study as a future work.

Table 4. Results of the Three Standard Demand Forecasting Methods

P-SVR 6.0538 1.7252 57.2604 35.9981 9.8020 2.5916 1.2812 1.8273

S1 P-SVR 3.6088 1.4503 24.1754 23.1849 3.0132 3.9392 5.7879 2.9546

S2 P-LKSVR 5.3508 1.0833 50.2629 24.9573 4.3756 3.9392 4.3111 2.3382

Table 6. Comparisons of Results of CDF and Baseline Methods

CDF 3.0756 0.8944 18.0994 19.1427 2.8141 3.9392 6.2411 3.1084

Table 7. Results of Demand Inference for Assortment-Based Substitutions

Cotraining 2.8889 0.8962 16.0139 18.7385 2.1646 1.2107 6.3952 3.1337

Assortment optimization Substitutions Profitlift n̂ ∗

Greedy Assortment 36.4669 245

Phase 1 NP1 Phase 2 Phase 3 Substitutions NP2

CDF 3.4943 Cotraining Greedy Assortment 3.7161

MNL �2.0548 MNL Greedy Assortment �8.6553

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

with a good convergence property. This study also con