Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion pgml-docs/docs/user_guides/training/preprocessing.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ There are 3 steps to preprocessing data:
These preprocessing steps may be specified on a per-column basis to the [train()](/user_guides/training/overview/) function. By default, PostgresML does minimal preprocessing on training data, and will raise an error during analysis if NULL values are encountered without a preprocessor. All types other than `TEXT` are treated as quantitative variables and cast to floating point representations before passing them to the underlying algorithm implementations.

```postgresql title="pgml.train()"
select pgml.train(
SELECT pgml.train(
project_name => 'preprocessed_model',
task => 'classification',
relation_name => 'weather_data',
Expand All @@ -52,6 +52,14 @@ In some cases, it may make sense to use multiple steps for a single column. For
!!! note
TEXT is used in this document to also refer to VARCHAR and CHAR(N) types.

## Predicting with Preprocessors

A model that has been trained with preprocessors should use a Postgres tuple for prediction, rather than a `FLOAT4[]`. Tuples may contain multiple different types (like `TEXT` and `BIGINT`), while an ARRAY may only contain a single type. You can use parenthesis around values to create a Postgres tuple.

```postgresql title="pgml.predict()"
SELECT pgml.predict('preprocessed_model', ('jan', 'nimbus', 0.5, 7));
```

## Categorical encodings
Encoding categorical variables is an O(N log(M)) where N is the number of rows, and M is the number of distinct categories.

Expand Down
1 change: 1 addition & 0 deletions pgml-docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,7 @@ nav:
- Training:
- Training Overview: user_guides/training/overview.md
- Algorithm Selection: user_guides/training/algorithm_selection.md
- Preprocessing Data: user_guides/training/preprocessing.md
- Hyperparameter Search: user_guides/training/hyperparameter_search.md
- Joint Optimization: user_guides/training/joint_optimization.md
- Predictions:
Expand Down
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy