Skip to content

Commit 9284cf1

Browse files
committed
Added a tutorial for 9 classes - draft 1
1 parent 7cbee43 commit 9284cf1

File tree

1 file changed

+75
-4
lines changed

1 file changed

+75
-4
lines changed

README.md

Lines changed: 75 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,8 @@
4747
- [Fill-Mask](#fill-mask)
4848
- [Vector Database](#vector-database)
4949
- [LLM Fine-tuning](#llm-fine-tuning)
50-
- [Text Classification](#llm-fine-tuning-text-classification)
50+
- [Text Classification - 2 classes](#text-classification-2-classes)
51+
- [Text Classification - 9 classes](#text-classification-9-classes)
5152
<!-- - [Regression](#regression)
5253
- [Classification](#classification) -->
5354

@@ -878,7 +879,7 @@ In this section, we will provide a step-by-step walkthrough for fine-tuning a La
878879

879880
2. Obtain a Hugging Face API token to push the fine-tuned model to the Hugging Face Model Hub. Follow the instructions on the [Hugging Face website](https://huggingface.co/settings/tokens) to get your API token.
880881

881-
## LLM Fine-tuning Text Classification
882+
## Text Classification 2 Classes
882883

883884
### 1. Loading the Dataset
884885

@@ -1245,7 +1246,77 @@ SELECT pgml.tune(
12451246

12461247
By following these steps, you can effectively restart training from a previously trained model, allowing for further refinement and adaptation of the model based on new requirements or insights. Adjust parameters as needed for your specific use case and dataset.
12471248

1248-
## Conclusion
1249+
## Text Classification 9 Classes
12491250

1250-
By following these steps, you can leverage PostgresML to seamlessly integrate fine-tuning of Language Models for text classification directly within your PostgreSQL database. Adjust the dataset, model, and hyperparameters to suit your specific requirements.
1251+
### 1. Load and Shuffle the Dataset
1252+
In this section, we begin by loading the FinGPT sentiment analysis dataset using the `pgml.load_dataset` function. The dataset is then processed and organized into a shuffled view (pgml.fingpt_sentiment_shuffled_view), ensuring a randomized order of records. This step is crucial for preventing biases introduced by the original data ordering and enhancing the training process.
12511253

1254+
```sql
1255+
-- Load the dataset
1256+
SELECT pgml.load_dataset('FinGPT/fingpt-sentiment-train');
1257+
1258+
-- Create a shuffled view
1259+
CREATE VIEW pgml.fingpt_sentiment_shuffled_view AS
1260+
SELECT * FROM pgml."FinGPT/fingpt-sentiment-train" ORDER BY RANDOM();
1261+
```
1262+
1263+
### 2. Explore Class Distribution
1264+
Once the dataset is loaded and shuffled, we delve into understanding the distribution of sentiment classes within the data. By querying the shuffled view, we obtain valuable insights into the number of instances for each sentiment class. This exploration is essential for gaining a comprehensive understanding of the dataset and its inherent class imbalances.
1265+
1266+
```sql
1267+
-- Explore class distribution
1268+
SELECT
1269+
output,
1270+
COUNT(*) AS class_count
1271+
FROM pgml.fingpt_sentiment_shuffled_view
1272+
GROUP BY output
1273+
ORDER BY output;
1274+
1275+
```
1276+
1277+
### 3. Create Training and Test Views
1278+
To facilitate the training process, we create distinct views for training and testing purposes. The training view (pgml.fingpt_sentiment_train_view) contains 80% of the shuffled dataset, enabling the model to learn patterns and associations. Simultaneously, the test view (pgml.fingpt_sentiment_test_view) encompasses the remaining 20% of the data, providing a reliable evaluation set to assess the model's performance.
1279+
1280+
```sql
1281+
-- Create a view for training data (e.g., 80% of the shuffled records)
1282+
CREATE VIEW pgml.fingpt_sentiment_train_view AS
1283+
SELECT *
1284+
FROM pgml.fingpt_sentiment_shuffled_view
1285+
LIMIT (SELECT COUNT(*) * 0.8 FROM pgml.fingpt_sentiment_shuffled_view);
1286+
1287+
-- Create a view for test data (remaining 20% of the shuffled records)
1288+
CREATE VIEW pgml.fingpt_sentiment_test_view AS
1289+
SELECT *
1290+
FROM pgml.fingpt_sentiment_shuffled_view
1291+
OFFSET (SELECT COUNT(*) * 0.8 FROM pgml.fingpt_sentiment_shuffled_view);
1292+
1293+
```
1294+
1295+
### 4. Fine-Tune the Model for 9 Classes
1296+
In the final section, we kick off the fine-tuning process using the `pgml.tune` function. The model will be internally configured for sentiment analysis with 9 classes. The training is executed on the 80% of the train view and evaluated on the remaining 20% of the train view. The test view is reserved for evaluating the model's accuracy after training is completed. Please note that the option `hub_private_repo: true` is used to push the model to a private Hugging Face repository.
1297+
1298+
```sql
1299+
-- Fine-tune the model for 9 classes without HUB token
1300+
SELECT pgml.tune(
1301+
'fingpt_sentiement',
1302+
task => 'text-classification',
1303+
relation_name => 'pgml.fingpt_sentiment_train_view',
1304+
model_name => 'distilbert-base-uncased',
1305+
test_size => 0.2,
1306+
test_sampling => 'last',
1307+
hyperparams => '{
1308+
"training_args": {
1309+
"learning_rate": 2e-5,
1310+
"per_device_train_batch_size": 16,
1311+
"per_device_eval_batch_size": 16,
1312+
"num_train_epochs": 5,
1313+
"weight_decay": 0.01,
1314+
"hub_token" : "YOUR_HUB_TOKEN",
1315+
"push_to_hub": true,
1316+
"hub_private_repo": true
1317+
},
1318+
"dataset_args": { "text_column": "input", "class_column": "output" }
1319+
}'
1320+
);
1321+
1322+
```

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy