@@ -102,6 +102,10 @@ the API when needed are retrieved from the ``BIGML_USERNAME`` and
102
102
environment, any attempt to download the information will raise a condition
103
103
asking the user to set these variables.
104
104
105
+ If a connection with no ``storage `` information is provided, then the models
106
+ will never be stored in your local file system, and will be retrieved from
107
+ BigML's API each time the local model is instantiated.
108
+
105
109
Ensembles and composite objects, like Fusions, need more than one resource
106
110
to be downloaded and stored locally for the class to work. In this case,
107
111
the class needs all the component models,
@@ -2144,10 +2148,11 @@ should be applied and after that, both the prediction and the anomaly score
2144
2148
should be computed and added to the initial data. The `` Pipeline`` class
2145
2149
will help us do that.
2146
2150
2147
- Fist , we instantiate the `` Pipeline`` object by providing the models
2151
+ First , we instantiate the `` Pipeline`` object by providing the models
2148
2152
that we want it to use and a name for it:
2149
2153
2150
2154
.. code- block:: python
2155
+
2151
2156
from bigml.pipeline import Pipeline
2152
2157
local_pipeline = Pipeline([" model/5143a51a37203f2cf7020351" ,
2153
2158
" anomaly/5143a51a37203f2cf7027551" ],
@@ -2164,6 +2169,7 @@ model's prediction and the anomaly's score. All of them will be added to the
2164
2169
original input data.
2165
2170
2166
2171
.. code- block:: python
2172
+
2167
2173
local_pipeline.execute([{" plasma glucose" : 130 , " bmi" :3 },
2168
2174
{" age" :26 , " plasma glucose" : 70 }])
2169
2175
""" That could produce a result such as
@@ -2178,6 +2184,7 @@ the API connection info and/or a ``cache_get`` function to be used when
2178
2184
resources are stored in memory caches.
2179
2185
2180
2186
.. code- block:: python
2187
+
2181
2188
from bigml.pipeline import Pipeline
2182
2189
local_pipeline = Pipeline([" model/5143a51a37203f2cf7020351" ,
2183
2190
" anomaly/5143a51a37203f2cf7027551" ],
@@ -2197,6 +2204,7 @@ a ``.zip`` file whose name is the name of the ``Pipeline`` and will
2197
2204
be placed in the `` output_directory`` given by the user:
2198
2205
2199
2206
.. code- block:: python
2207
+
2200
2208
from bigml.pipeline import Pipeline
2201
2209
local_pipeline = Pipeline([" model/5143a51a37203f2cf7020351" ,
2202
2210
" anomaly/5143a51a37203f2cf7027551" ],
@@ -2209,6 +2217,120 @@ In this example, we wil find a ``my_export_dir/my new pipeline.zip`` file
2209
2217
in the current directory. The file contains a `` my new pipeline`` folder where
2210
2218
the four JSONs for the two datasets and two models are stored.
2211
2219
2220
+ Local batch predictions
2221
+ ---------------------- -
2222
+
2223
+ As explained in the `` 101s `` provided in the `Quick Start< # quick_start>`_
2224
+ section, batch predictions for a list of inputs can be obtained by iterating
2225
+ the single predictions discussed in each different local model. However,
2226
+ we' ve also provided a homogeneous ``batch_prediction`` method in the following
2227
+ local objects:
2228
+
2229
+
2230
+ - SupervisedModel
2231
+ - Anomaly
2232
+ - Cluster
2233
+ - PCA
2234
+ - TopicModel
2235
+
2236
+ which can receive the following parameters:
2237
+
2238
+ - input_data_list: This can be a list of input data, expressed as a
2239
+ dictionary containing `` field_name: field_value`` pairs or
2240
+ a Pandas' DataFrame
2241
+ - outputs: That' s a dictionary that can contain ``output_fields``
2242
+ and / or `` output_headers`` information. Each one is
2243
+ defined by default as the list of prediction keys to be
2244
+ added to the inputs and the list of headers to be used
2245
+ as keys in the output. E.g., for a supervised learning
2246
+ model, the default if no information is provided would
2247
+ be equivalent to `` {" output_fields" : [" prediction" ,
2248
+ " probability" ], " output_headers" : [" prediction" ,
2249
+ " probability" ]}`` and both the prediction and the
2250
+ associated probability would be added to the input data.
2251
+ - ** kwargs: Any other parameters allowed in the `` .predict`` method
2252
+ could be added to the batch prediction too. For instance,
2253
+ we could add the operating kind to a supervised model
2254
+ batch prediction using `` operating_kind = probability`` as
2255
+ argument.
2256
+
2257
+
2258
+ Let' s write some examples. If we are reading data from a CSV, we can use the
2259
+ `` csv`` library and pass the list of inputs as an array to an anomaly detector.
2260
+
2261
+ .. code- block:: python
2262
+
2263
+ import csv
2264
+
2265
+ from bigml.anomaly import Anomaly
2266
+
2267
+ input_data_list = []
2268
+ with open (" my_input_data.csv" ) as handler:
2269
+ reader = csv.DictReader(handler)
2270
+ for row_dict in reader:
2271
+ input_data_list.append(row_dict)
2272
+
2273
+ local_anomaly = Anomaly(" anomaly/5143a51a37203f2cf7027551" )
2274
+ scored_data_list = local_anomaly.batch_predict(input_data_list)
2275
+
2276
+ Or if we are using a Pandas' ``DataFrame`` instead to read the data, we could
2277
+ also use the DataFrame directly as input argument:
2278
+
2279
+ .. code- block:: python
2280
+
2281
+ import pandas as pd
2282
+
2283
+ from bigml.anomaly import Anomaly
2284
+ dataframe = pd.read_csv(" my_input_data.csv" )
2285
+
2286
+ local_anomaly = Anomaly(" anomaly/5143a51a37203f2cf7027551" )
2287
+ scored_dataframe = local_anomaly.batch_predict(dataframe)
2288
+
2289
+ Now, let' s add some complexity and do use a supervised model. We' d like to
2290
+ add both the predicted value and the associated probability but we' d like
2291
+ to use an `` operating point`` when predicting. The operating point needs
2292
+ specifying a positive class , the kind of metric to compare (probabily or
2293
+ confidence) and the threshold to use. We also want the prediction to
2294
+ be added to the input data using the key `` sm_prediction`` . In this case, the
2295
+ code would be similar to
2296
+
2297
+ .. code- block:: python
2298
+
2299
+ import pandas as pd
2300
+
2301
+ from bigml.supervised import SupervisedModel
2302
+ dataframe = pd.read_csv(" my_input_data.csv" )
2303
+
2304
+ local_supervised = SupervisedModel(" ensemble/5143a51a37203f2cf7027551" )
2305
+ operating_point = {" positive_class" : " yes" ,
2306
+ " kind" : " probability" ,
2307
+ " threshold" : 0.7 }
2308
+ predicted_dataframe = local_supervised.batch_predict(
2309
+ dataframe,
2310
+ outputs = {" output_headers" : [" sm_prediction" , " probability" ]},
2311
+ operating_point = operating_point)
2312
+
2313
+ and the result would be like the one below:
2314
+
2315
+ .. code- block:: python
2316
+
2317
+ >> > predicted_dataframe
2318
+ pregnancies plasma glucose ... sm_prediction probability
2319
+ 0 6 148 ... true 0.95917
2320
+ 1 1 85 ... false 0.99538
2321
+ 2 8 183 ... true 0.93701
2322
+ 3 1 89 ... false 0.99452
2323
+ 4 0 137 ... true 0.90622
2324
+ .. ... ... ... ... ...
2325
+ 195 1 117 ... false 0.90906
2326
+ 196 5 123 ... false 0.97179
2327
+ 197 2 120 ... false 0.99300
2328
+ 198 1 106 ... false 0.99452
2329
+ 199 2 155 ... false 0.51737
2330
+
2331
+ [200 rows x 11 columns]
2332
+
2333
+
2212
2334
Local predictions with shared models
2213
2335
------------------------------------
2214
2336
0 commit comments