Technical Specifications: 10.1.1 Data Management
Technical Specifications: 10.1.1 Data Management
Technical Specifications
10.1 Technical Specifications of Data Platform
RfP for appointment of AI/ML Powered Business Intelligence Analytics Solution Provider for APDCL
Page 84 of 115
The proposed solution should have a single interface for data integration & manipulation
18
such as tabulation, statistical analysis, econometric modeling and multi-dimensional analysis
The proposed solution should have the ability to easily capture and display performance
19 information such as real time, CPU time, memory use, input/output, and record count data,
with the ability to display this information as a table or as a graph.
The proposed solution shall be able to join data from multiple sources and support for
20
concurrent processing of multiple source data streams, without writing procedural code
The proposed solution shall facilitate data profiling based on dynamic, user defined
21 validation rules and support identification of user defined ‘events’ to trigger alerts (through
email reports) to authorities
22 The proposed solution shall support In‐memory data handling
The proposed solution shall support correction logic for Indian names, addresses, phone
23
numbers, identification ID and other identification proof documents and demographic details
The proposed solution should be able to create networks based on both transaction as well as
24
relationship-based data, and create a nodes and links among the entities specified
The proposed solution should be possible to identify common entity types which are super
25
hubs i.e. appear commonly in majority of transactions and treat them separately as per need
The Data Analytics OEM should be a Leader for recent three consecutive years in Gartner
26
Magic Quadrant for Data Integration and Data Quality Reports.
The proposed solution should contain a sophisticated and GUI based predictive modeling and
1
analytical workbench.
The proposed solution should enable identification of suspicious consumer profiles through a
2 judicious mix of anomaly detection, business rules, predictive modeling and network analytics
The proposed solution should help analysts to visualize complex network of relationships between
3 entities - such as people, places/ locations, things and events over time and across multiple
dimensions
The proposed solution should help analysts identify entity relationships that aren’t obvious, traverse
4
and query complex relationships, and uncover patterns and communities interactively
The proposed solution should have in-built modules for analysis of variance, multivariate analysis
and statistical algorithms to build prediction models such as Linear, Logistic, Non-Linear and
5 Quantile regression models, Generalized Linear models, Predictive partial least squares and Decision
trees
The proposed solution should provide in-built features and advanced techniques for the analyst to
6 detect rare events, anomalies and outliers and/or influence points to help determine, capture or
remove them from downstream analysis such as predictive models
The proposed solution should in-built modules for modern machine learning algorithms to build
7 predictive models - such as random forests, gradient boosting, artificial neural networks, support
vector machines and factorization machines
The proposed solution should provide a rich set of data mining algorithms that can be used for
8 classification, regression, clustering, detection of outliers and anomalies, feature extraction,
association analysis, and attribute ranking.
The proposed solution should support Clustering of entities that are either user Defined or
9 statistically chosen as best clusters, along with strategies for encoding class variables into the
analysis
RfP for appointment of AI/ML Powered Business Intelligence Analytics Solution Provider for APDCL
Page 85 of 115
The proposed solution should have flexibility of high-performance imputation of missing values in
10 features using different statistical options such as mean, pseudo-median and with user-specified
values or with random value of non-missing values
The proposed solution should support automated algorithms which will help the end-users to run
11
multiple algorithms at a time and hence compare the results between them.
The proposed solution should enable automated model assessment and scoring, and generate the
12 associated model performance statistics and code for model scoring
The proposed solution should allow user to compare different predictive models on the basis of
13
different test statistics, and select the best model for deployment automatically
The proposed solution should provide multiple methods to visualize data mining models and provide
14
the user with sufficient levels of understanding and trust
The proposed solution should support processing, trend-analysis and modeling for forecast of
15 data‐points through exponential smoothing, missing data and outlier data on all data‐sets before trend
analysis / modelling
The proposed solution should support profile matching through user‐defined (configurable) business
16 rules through ad‐hoc querying across multiple fields of consumer‐wise information from in-house
and external agency data
The proposed solution should support Time Series and scenario (“What‐If”) analysis for dependent
17
variables.
The proposed solution should enable rule based / cluster analysis for profile grouping and profile
18
matching
The proposed solution should allow Analysts and Investigators to make use of a fraud intelligence
repository which gets populated containing information of performance of past models and scenarios,
19
to improve accuracy of current predictive models should be able to define risk based on different
levels such as relationships with entities, financial / non‐financial transactions & events etc.
The proposed solution should allow alerts to be generated whenever flagged entities or entities with
20
high risk rating and having financial/ non-financial transactions or some level of activity
The proposed solution should support detection of patterns so that criteria for various thresholds can
21
be reviewed and modified.
The proposed solution should provide in-built feature of detailing rule robustness through measures
22
of true positive, false positive and false negative as visual diagnostics
The proposed solution should provide an extensive list of prebuilt rule operators that would be
23
available to the analyst for detailed rule-model specification
The proposed solution shall support identification of common patterns / factors / profile
24
characteristics that could enable selection of criteria for selection of Business Audit cases
The proposed solution shall support analysis of variance, Multivariate analysis of variance and
25
repeated measurements and Linear and non‐linear mixed models.
The proposed solution shall provide a rich set of data mining algorithms that can be used for
26 classification, regression, clustering, detection of outliers and anomalies, feature extraction,
association analysis, and attribute ranking.
The proposed solution shall support detection of patterns from the transaction data set over a defined
27
time period for particular individuals / groups
The proposed solution shall support automated algorithms which will help the end-users to run
28
multiple algorithms at a time and hence compare the results between them.
The proposed solution shall support processing of data‐points for exponential smoothing, missing
29
data and outlier data on all data‐sets before trend analysis / modelling
The proposed solution shall support profile matching through user‐defined (configurable) business
30 rules through ad‐hoc querying across multiple fields of consumer‐wise information (across
application, usage, payments and other data)
The proposed solution shall allow identification of localities/ regions where high numbers of risky
31
consumer profiles are detected.
RfP for appointment of AI/ML Powered Business Intelligence Analytics Solution Provider for APDCL
Page 86 of 115
The proposed solution shall support analysis of voluminous data to identify patterns and determine
32
risk rating / payment reliability of a particular consumer
The proposed solution should be capable of payments matching ‐ i.e. reconciliation of payments by
33
consumers with that identified by the customer account statements.
The proposed solution should be capable of calculating network analytics and relationships among
34
consumers with a known risky / suspicious entity, thus enabling risk by association.
The proposed solution shall have capability to identify cases for consumers for whom the usage is
35
fluctuating at very fast rate. Such consumers are likely to be part of fraudulent activity.
The proposed solution shall have capability to estimate the liability of the consumer to pay or any
36
other sum payable and accordingly revise the risk rating.
The proposed solution shall support assessment of impact of an NTL on the revenue, workload on a
37
particular office location / region, impact on a particular group
The proposed solution should enable analysts to be able to carry out Collection optimization based on
38
projections of default as against resources to collect / follow up / investigate at ward / circle level
The proposed solution shall support detection of patterns so that criteria for various thresholds can be
39
reviewed and modified.
RfP for appointment of AI/ML Powered Business Intelligence Analytics Solution Provider for APDCL
Page 87 of 115
EDW system should support following aspect of performance:
§ Concurrency
§ Competing workloads
19 § Reporting, real time, OLAP, advanced analytics, etc.
§ Intraday data loads
§ Thousands of users
§ Ad hoc queries
EDW should have the capability to scale a data warehouse easily, efficiently, quickly, and cost-
effectively by adding units of computing power/disks/capacity, without undermining the increase in
20 linearity of performance and capacity per incremental resource keeping in mind both vertical and
horizontal scaling.
The system should have auditing and Logging Features for every action by user or system itself
21 without affecting performance of the system.
22 System Licensing should not have Limitation on number of Servers and CPU Cores
System Licensing should not have hardware infrastructure dependency and should be open to use on
23 any environment i.e. cloud, inhouse etc.
24 The analytical database should have inbuilt geospatial data analysis capability
Platform should support Docker and containers technology so that it can be launched standalone and
25
run as container on any Linux server
Platform should support infra-automation framework like Kubernetes so that high availability of
26 platform can be maintained with containers
Platform should provide query federation feature so that external data like parquet and ORC can be
27 queried from this platform and can be joined with internal table, without moving external data inside
platform.
The platform should provide inbuilt data exploration capabilities so that analyst and data scientists
28
can explore the data with native inbuilt functions and use need not to write code.
Platform should allow machine learning models to be exported and imported in standard format like
29 PMML. So that other platform developed/trained ML models can be used in this platform and vice-
versa.
Platform should support accessing and querying open data formats like Parquet and ORC on external
30
storage like NFS/S3/GCS/SAN etc without moving data into this platform
Platform should support querying other standard SQL databases without copying data from other
31 databases
Platform should be able to integrate with open-source Machine learning model and framework like
32 Tenser flow so that Deep learning forecasting can be done within data platform without moving data
out of the platform.
33 Platform should be complying with standard data Security Norms like FIPS 140-2 and equivalent
34 Platform should support Format Preserve Encryption FPE for data security purpose.
RfP for appointment of AI/ML Powered Business Intelligence Analytics Solution Provider for APDCL
Page 90 of 115
10.3 ALERT AND INVESTIGATION FRAMEWORK
The proposed solution should incorporate two key aspects – one is a advance analytics solution
which provides dashboards and reporting at aggregated level for key officers and stakeholders, and
1 the other should be an investigative workbench which also enables operational stakeholders
(officers, enforcement department etc.) to take action on the alerts that emerge for a consumer,
assess the evidence around the same, and take a decision on further action on the alert.
The proposed solution should provide in-built features for Alert and Event Management with -
Governance, audit and compliance, Prioritized queuing model, Enrichment, Scenario-fired event
2
model, including scenario context, Manual alert creation and routing, Alert domains and Custom
disposition actions.
The proposed solution should provide a built-in functionality of alert based investigation and alert
3 exploration and triage - in which alerts are reviewed to determine the probability that they represent
suspicious behavior and are evaluated for their importance
The proposed solution should provide built in features to apply an appropriate disposition of the
alert - such as closing, suppressing, moving to another queue (such as high or low priority), linking
4
to a different object and sending the alert information to an external system after a decision is
reached about how to handle the alert
The proposed solution should provide an option to automatically disposition the alert when a new
5 alerting event arrives. For example, the alert may be automatically assigned to an investigative team
The proposed solution should enable to not just identify the entity against whom the alert was
6
created, but also related alerts, related entities and their interlinkages.
The proposed solution should provide facility to define rules and set threshold‐based alerts for the
7
same on the data used for query and analysis supported by solution
The proposed solution search feature should allow user to select/deselect entities of interest like
8
Name, ID, PAN, TIN etc. to narrow down the search results for enhanced understanding
The proposed solution shall have capability to generate scores for different entities based on the
9 formulae and select alerts with higher scores.
The proposed solution should have built-in capabilities to assign Alerts/Cases to different strategies
10
and queue for assignment from drop-down
11 The proposed solution should be web-enabled
The proposed solution should provide analytical capabilities such as Correlations, Regression using
12
predefined ontologies, Network Plot, Decision Trees, Scenario Analysis, Statistical Analysis
The proposed solution should have capability to enable officers to segregate cases separately for
13 defaulters and inconsistent usage
The proposed solution shall enable assessment of registration details for determining / modifying
14
risk profile rating and to detect fraudulent consumers
The proposed solution should have optimization capability to help in allocation of alerts for
15 different wards / circle members as per value at risk / capacity to investigate and probability of
recovery
The proposed solution shall provide facility to define rules and set threshold‐based alerts for the
16 same on the data used for query and analysis
The proposed solution should allow Networks to be visualized based on a temporal view – so that
17
the chronology of events are depicter through a time slider
The proposed solution should be able to activate / deactivate / reactivate / override specific alerts /
18 rules as per need at real time.
The proposed solution should be able to decide appropriate treatment for the alerts depending on
19
the rules under which they were flagged (route to investigator etc.) as per data event
The proposed solution should allow for Historical rule activity (alerts generated) and performance
20
(final disposition) to be identifiable which can be analyzed in order to improve rules
The proposed solution should be able to assign a unique case number to each item scored and
21
actioned by the Solution or out sorted for analyst review.
RfP for appointment of AI/ML Powered Business Intelligence Analytics Solution Provider for APDCL
Page 91 of 115
The proposed solution should have capability to visualize the network related to alerts or risk
22 networks. The networks as well as other alert reports should be capable of color coding to highlight
risky / high risk entities.
The proposed solution should allow investigator to look at the concise view of the network, as well
23 as, based on need, grow the network along a timeline or across entities to incorporate a larger level
of entities.
The proposed solution should enable analysts to be able to add comments and make notes in alerts /
24
investigations
25 The proposed solution should identify the risks associated with a given set of consumers
26 The proposed solution should route alerts to the appropriate investigator / person/group
The proposed solution should have Fraud Analytics solution to be integrated with Social Network
27
Analysis solution
1 The Data Analytics OEM should be a Leader for recent three consecutive years in Gartner Magic
Quadrant for Data Science and Machine Learning Reports.
2 Aggregates transactional data into time series format and identifies and accounts for missing values.
Accumulates time-stamped data into any time interval (hours, weeks, months, etc.) for forecasting,
3 sophisticated modeling techniques.
Automatic outlier detection outliers with provision for selecting number of outliers, percentage of
4 outliers and sensitivity levels.
5 Trend analysis, Seasonality and intermittent series tests.
6 Missing value imputation using various interpolation and extrapolation methods.
Utilize comprehensive modeling techniques such as ARIMA, Neural Network, UCM, Generalized
7 Linear Models, and GAMS etc.
The load forecasting should be able to handle two stage modeling for residuals. The forecasted
8 residual should be able to be added back to the forecast load to generate the two-stage forecast load.
Special days and effects Management: Prebuilt Effects Management with diagnose process
workflow to move from Naïve load forecasting model to add effects and see improvements in a step
9 wise method with a holdout sample.
10 The effects that the solution should be configured to include -
a. Recency effect
b. Weekend effect
c. Holiday effect
Statistics should include prebuild measurements such as -
a. MAE, MAPE, and ME for annual energy, annual peak load, daily energy, daily peak load,
11 monthly energy, monthly peak load and hourly load
b. Ability to include macro-economic variables and also the cross effect of macro-economic
variables.
The tool should have Automatic generation of model selection lists and Choice of automation level
12 for all three forecasting steps: model selection, model parameter estimation and forecast generation.
The tool should have Hierarchical Load Forecasting and Temporal Load Forecasting. Exponential
smoothing models, ARIMA, ARIMAX, Unobserved Component Models. Utilize comprehensive
modeling techniques including ARIMA, ARIMAX, Neural Network, Unobserved Component
13 Models, Generalized Linear Models
The load forecasting should have pre-built two stage modeling techniques for residual forecasting
that adds back to the forecast load to generate the two-stage forecast load using Generalized Linear
14 Modelling, ARIMAX Modelling and Neural Network
15 Option for uploading a manual demand forecast shall also be provided by the Analytics solution.
RfP for appointment of AI/ML Powered Business Intelligence Analytics Solution Provider for APDCL
Page 92 of 115
10.5. USER ADMINSTRATION
The proposed solution should control access to applications, modules and functions based on
1
user roles and privileges.
The proposed solution should control access to applications, modules and functions based on
2 user security.
The system should have auditing and Logging Features for every action by user or system
3 itself without affecting performance of the system.
Proposed system shall support role-based access and interfaces for all types of users and
4 centralized administration.
10.6 Security
1 The proposed solution design should ensure the integrity and security of the data and
application.Integrated security measures regarding system, users and data sources.
2 Security: Adequate security features to ensure only authorized secure access.
The solution should employ the latest industry standard security tools and features required
3
to secure the platform, solution, access and protect from any unauthorized access.
4 Security audit for the solution may be performed by APDCL.
RfP for appointment of AI/ML Powered Business Intelligence Analytics Solution Provider for APDCL
Page 93 of 115