Skip to content

Add clustering algorithms #795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/package-extension.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ on:
workflow_dispatch:
inputs:
packageVersion:
default: "2.6.0"
default: "2.7.0"

jobs:
build:
Expand Down
2 changes: 1 addition & 1 deletion pgml-dashboard/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pgml-dashboard/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "pgml-dashboard"
version = "2.6.0"
version = "2.7.0"
edition = "2021"
authors = ["PostgresML <team@postgresml.org>"]
license = "MIT"
Expand Down
2 changes: 1 addition & 1 deletion pgml-dashboard/content/docs/guides/setup/developers.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ SELECT pgml.version();
postgres=# select pgml.version();
version
-------------------
2.6.0
2.7.0
(1 row)
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,7 @@ SELECT pgml.version();
postgres=# select pgml.version();
version
-------------------
2.6.0
2.7.0
(1 row)
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

We currently support regression and classification algorithms from [scikit-learn](https://scikit-learn.org/), [XGBoost](https://xgboost.readthedocs.io/), and [LightGBM](https://lightgbm.readthedocs.io/).

## Algorithms
## Supervised Algorithms

### Gradient Boosting
Algorithm | Regression | Classification
Expand Down Expand Up @@ -54,6 +54,18 @@ Algorithm | Regression | Classification
`kernel_ridge` | [KernelRidge](https://scikit-learn.org/stable/modules/generated/sklearn.kernel_ridge.KernelRidge.html) | -
`gaussian_process` | [GaussianProcessRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html) | [GaussianProcessClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessClassifier.html)

## Unsupervised Algorithms

### Clustering

|Algorithm | Reference |
|---|-------------------------------------------------------------------------------------------------------------------|
`affinity_propagation` | [AffinityPropagation](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AffinityPropagation.html)
`birch` | [Birch](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.Birch.html)
`kmeans` | [K-Means](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html)
`mini_batch_kmeans` | [MiniBatchKMeans](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.MiniBatchKMeans.html)


## Comparing Algorithms

Any of the above algorithms can be passed to our `pgml.train()` function using the `algorithm` parameter. If the parameter is omitted, linear regression is used by default.
Expand Down
42 changes: 29 additions & 13 deletions pgml-dashboard/src/models.rs
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ impl Project {
"summarization" => Ok("rouge_ngram_f1"),
"translation" => Ok("bleu"),
"text_generation" | "text2text" => Ok("perplexity"),
"cluster" => Ok("silhouette"),
task => Err(anyhow::anyhow!("Unhandled task: {}", task)),
}
}
Expand All @@ -68,6 +69,7 @@ impl Project {
"summarization" => Ok("Rouge Ngram F<sup>1</sup>"),
"translation" => Ok("Bleu"),
"text_generation" | "text2text" => Ok("Perplexity"),
"cluster" => Ok("silhouette"),
task => Err(anyhow::anyhow!("Unhandled task: {}", task)),
}
}
Expand Down Expand Up @@ -544,7 +546,7 @@ impl Model {
pub struct Snapshot {
pub id: i64,
pub relation_name: String,
pub y_column_name: Vec<String>,
pub y_column_name: Option<Vec<String>>,
pub test_size: f32,
pub test_sampling: Option<String>,
pub status: String,
Expand Down Expand Up @@ -686,28 +688,42 @@ impl Snapshot {
}
}

pub fn features<'a>(&'a self) -> Option<Vec<&'a serde_json::Map<String, serde_json::Value>>> {
pub fn features(&self) -> Option<Vec<&serde_json::Map<String, serde_json::Value>>> {
match self.columns() {
Some(columns) => Some(
columns
.into_iter()
.filter(|column| {
!self
.y_column_name
.contains(&column["name"].as_str().unwrap().to_string())
})
.collect(),
),
Some(columns) => {
if self.y_column_name.is_none() {
return Some(columns.into_iter().collect());
}

Some(
columns
.into_iter()
.filter(|column| {
!self
.y_column_name
.as_ref()
.unwrap()
.contains(&column["name"].as_str().unwrap().to_string())
})
.collect(),
)
}
None => None,
}
}

pub fn labels<'a>(&'a self) -> Option<Vec<&'a serde_json::Map<String, serde_json::Value>>> {
pub fn labels(&self) -> Option<Vec<&serde_json::Map<String, serde_json::Value>>> {
if self.y_column_name.is_none() {
return Some(Vec::new());
}

self.columns().map(|columns| {
columns
.into_iter()
.filter(|column| {
self.y_column_name
.as_ref()
.unwrap()
.contains(&column["name"].as_str().unwrap().to_string())
})
.collect()
Expand Down
14 changes: 10 additions & 4 deletions pgml-dashboard/templates/content/dashboard/panels/snapshot.html
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,11 @@ <h2><span class="material-symbols-outlined">bubble_chart</span>Features</h2>
%>
<h3><%= name %>&nbsp;<code><%= feature["pg_type"].as_str().unwrap() | upper %></code></h3>
<figure id="<%= name_machine %>_distribution"></figure>
<% for y_column_name in snapshot.y_column_name.iter() { %>
<% if snapshot.y_column_name.as_ref().is_some() { %>
<% for y_column_name in snapshot.y_column_name.as_ref().unwrap().iter() { %>
<figure id="<%= name_machine %>_correlation_<%= y_column_name %>"></figure>
<% } %>
<% } %>
<% } %>
</section>

Expand All @@ -102,11 +104,13 @@ <h3><%= name %>&nbsp;<code><%= feature["pg_type"].as_str().unwrap() | upper %></
renderModel(<%= model.id %>, <%= model.key_metric(project).unwrap() %>, [0, 1]);
<% } %>

<% for y_column_name in snapshot.y_column_name.iter() { %>
<% if snapshot.y_column_name.as_ref().is_some() { %>
<% for y_column_name in snapshot.y_column_name.as_ref().unwrap().iter() { %>
setTimeout(renderDistribution, delay, "<%= y_column_name %>", <%= y_column_name %>_samples, NaN);
setTimeout(renderOutliers, delay, "<%= y_column_name %>", <%= y_column_name %>_samples, <%= snapshot.target_stddev(y_column_name) %>)
<% } %>

<% } %>

var delay = 600;

<% for feature in snapshot.features().unwrap().iter() {
Expand All @@ -116,9 +120,11 @@ <h3><%= name %>&nbsp;<code><%= feature["pg_type"].as_str().unwrap() | upper %></
delay += 200;

setTimeout(renderDistribution, delay, "<%= name_machine %>", <%= name_machine %>_samples, NaN);
<% for y_column_name in snapshot.y_column_name.iter() { %>
<% if snapshot.y_column_name.as_ref().is_some() { %>
<% for y_column_name in snapshot.y_column_name.as_ref().unwrap().iter() { %>
setTimeout(renderCorrelation, delay, "<%= name_machine %>", "<%= y_column_name %>", <%= name_machine %>_samples, <%= y_column_name %>_samples);
<% } %>
<% } %>
<% } %>
}
renderCharts();
Expand Down
Loading
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy