Skip to content

Commit a78de7c

Browse files
committed
Fixing locale check and adding documentation on local batch_predict method
1 parent a61305d commit a78de7c

File tree

4 files changed

+134
-6
lines changed

4 files changed

+134
-6
lines changed

HISTORY.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,13 @@
33
History
44
-------
55

6+
8.2.2 (2022-09-29)
7+
------------------
8+
9+
- Fixing locale check.
10+
- Documenting the new ``.batch_predict`` method added to local models to
11+
homogenize local batch predictions and accept Pandas' DataFrame as input.
12+
613
8.2.1 (2022-09-23)
714
------------------
815

bigml/util.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -213,12 +213,11 @@ def locale_synonyms(main_locale, locale_alias):
213213
return False
214214
alternatives = LOCALE_SYNONYMS[language_code]
215215
if isinstance(alternatives[0], str):
216-
return main_locale in alternatives and locale_alias in alternatives
216+
return locale_alias in alternatives
217217
result = False
218218
for subgroup in alternatives:
219-
if main_locale in subgroup:
220-
result = locale_alias in subgroup
221-
break
219+
result = locale_alias in subgroup
220+
break
222221
return result
223222

224223

bigml/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = '8.2.1'
1+
__version__ = '8.2.2'

docs/local_resources.rst

Lines changed: 123 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,10 @@ the API when needed are retrieved from the ``BIGML_USERNAME`` and
102102
environment, any attempt to download the information will raise a condition
103103
asking the user to set these variables.
104104

105+
If a connection with no ``storage`` information is provided, then the models
106+
will never be stored in your local file system, and will be retrieved from
107+
BigML's API each time the local model is instantiated.
108+
105109
Ensembles and composite objects, like Fusions, need more than one resource
106110
to be downloaded and stored locally for the class to work. In this case,
107111
the class needs all the component models,
@@ -2144,10 +2148,11 @@ should be applied and after that, both the prediction and the anomaly score
21442148
should be computed and added to the initial data. The ``Pipeline`` class
21452149
will help us do that.
21462150
2147-
Fist, we instantiate the ``Pipeline`` object by providing the models
2151+
First, we instantiate the ``Pipeline`` object by providing the models
21482152
that we want it to use and a name for it:
21492153
21502154
.. code-block:: python
2155+
21512156
from bigml.pipeline import Pipeline
21522157
local_pipeline = Pipeline(["model/5143a51a37203f2cf7020351",
21532158
"anomaly/5143a51a37203f2cf7027551"],
@@ -2164,6 +2169,7 @@ model's prediction and the anomaly's score. All of them will be added to the
21642169
original input data.
21652170
21662171
.. code-block:: python
2172+
21672173
local_pipeline.execute([{"plasma glucose": 130, "bmi":3},
21682174
{"age":26, "plasma glucose": 70}])
21692175
"""That could produce a result such as
@@ -2178,6 +2184,7 @@ the API connection info and/or a ``cache_get`` function to be used when
21782184
resources are stored in memory caches.
21792185
21802186
.. code-block:: python
2187+
21812188
from bigml.pipeline import Pipeline
21822189
local_pipeline = Pipeline(["model/5143a51a37203f2cf7020351",
21832190
"anomaly/5143a51a37203f2cf7027551"],
@@ -2197,6 +2204,7 @@ a ``.zip`` file whose name is the name of the ``Pipeline`` and will
21972204
be placed in the ``output_directory`` given by the user:
21982205
21992206
.. code-block:: python
2207+
22002208
from bigml.pipeline import Pipeline
22012209
local_pipeline = Pipeline(["model/5143a51a37203f2cf7020351",
22022210
"anomaly/5143a51a37203f2cf7027551"],
@@ -2209,6 +2217,120 @@ In this example, we wil find a ``my_export_dir/my new pipeline.zip`` file
22092217
in the current directory. The file contains a ``my new pipeline`` folder where
22102218
the four JSONs for the two datasets and two models are stored.
22112219
2220+
Local batch predictions
2221+
-----------------------
2222+
2223+
As explained in the ``101s`` provided in the `Quick Start<#quick_start>`_
2224+
section, batch predictions for a list of inputs can be obtained by iterating
2225+
the single predictions discussed in each different local model. However,
2226+
we've also provided a homogeneous ``batch_prediction`` method in the following
2227+
local objects:
2228+
2229+
2230+
- SupervisedModel
2231+
- Anomaly
2232+
- Cluster
2233+
- PCA
2234+
- TopicModel
2235+
2236+
which can receive the following parameters:
2237+
2238+
- input_data_list: This can be a list of input data, expressed as a
2239+
dictionary containing ``field_name: field_value`` pairs or
2240+
a Pandas' DataFrame
2241+
- outputs: That's a dictionary that can contain ``output_fields``
2242+
and/or ``output_headers`` information. Each one is
2243+
defined by default as the list of prediction keys to be
2244+
added to the inputs and the list of headers to be used
2245+
as keys in the output. E.g., for a supervised learning
2246+
model, the default if no information is provided would
2247+
be equivalent to ``{"output_fields": ["prediction",
2248+
"probability"], "output_headers": ["prediction",
2249+
"probability"]}`` and both the prediction and the
2250+
associated probability would be added to the input data.
2251+
- **kwargs: Any other parameters allowed in the ``.predict`` method
2252+
could be added to the batch prediction too. For instance,
2253+
we could add the operating kind to a supervised model
2254+
batch prediction using ``operating_kind=probability`` as
2255+
argument.
2256+
2257+
2258+
Let's write some examples. If we are reading data from a CSV, we can use the
2259+
``csv`` library and pass the list of inputs as an array to an anomaly detector.
2260+
2261+
.. code-block:: python
2262+
2263+
import csv
2264+
2265+
from bigml.anomaly import Anomaly
2266+
2267+
input_data_list = []
2268+
with open("my_input_data.csv") as handler:
2269+
reader = csv.DictReader(handler)
2270+
for row_dict in reader:
2271+
input_data_list.append(row_dict)
2272+
2273+
local_anomaly = Anomaly("anomaly/5143a51a37203f2cf7027551")
2274+
scored_data_list = local_anomaly.batch_predict(input_data_list)
2275+
2276+
Or if we are using a Pandas' ``DataFrame`` instead to read the data, we could
2277+
also use the DataFrame directly as input argument:
2278+
2279+
.. code-block:: python
2280+
2281+
import pandas as pd
2282+
2283+
from bigml.anomaly import Anomaly
2284+
dataframe = pd.read_csv("my_input_data.csv")
2285+
2286+
local_anomaly = Anomaly("anomaly/5143a51a37203f2cf7027551")
2287+
scored_dataframe = local_anomaly.batch_predict(dataframe)
2288+
2289+
Now, let's add some complexity and do use a supervised model. We'd like to
2290+
add both the predicted value and the associated probability but we'd like
2291+
to use an ``operating point`` when predicting. The operating point needs
2292+
specifying a positive class, the kind of metric to compare (probabily or
2293+
confidence) and the threshold to use. We also want the prediction to
2294+
be added to the input data using the key ``sm_prediction``. In this case, the
2295+
code would be similar to
2296+
2297+
.. code-block:: python
2298+
2299+
import pandas as pd
2300+
2301+
from bigml.supervised import SupervisedModel
2302+
dataframe = pd.read_csv("my_input_data.csv")
2303+
2304+
local_supervised = SupervisedModel("ensemble/5143a51a37203f2cf7027551")
2305+
operating_point = {"positive_class": "yes",
2306+
"kind": "probability",
2307+
"threshold": 0.7}
2308+
predicted_dataframe = local_supervised.batch_predict(
2309+
dataframe,
2310+
outputs={"output_headers": ["sm_prediction", "probability"]},
2311+
operating_point=operating_point)
2312+
2313+
and the result would be like the one below:
2314+
2315+
.. code-block:: python
2316+
2317+
>>>predicted_dataframe
2318+
pregnancies plasma glucose ... sm_prediction probability
2319+
0 6 148 ... true 0.95917
2320+
1 1 85 ... false 0.99538
2321+
2 8 183 ... true 0.93701
2322+
3 1 89 ... false 0.99452
2323+
4 0 137 ... true 0.90622
2324+
.. ... ... ... ... ...
2325+
195 1 117 ... false 0.90906
2326+
196 5 123 ... false 0.97179
2327+
197 2 120 ... false 0.99300
2328+
198 1 106 ... false 0.99452
2329+
199 2 155 ... false 0.51737
2330+
2331+
[200 rows x 11 columns]
2332+
2333+
22122334
Local predictions with shared models
22132335
------------------------------------
22142336

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy