Skip to content

Commit 66c65c8

Browse files
committed
README updates
1 parent 9284cf1 commit 66c65c8

File tree

1 file changed

+12
-2
lines changed

1 file changed

+12
-2
lines changed

README.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -981,7 +981,7 @@ OFFSET (SELECT COUNT(*) * 0.8 FROM pgml.imdb_shuffled_view);
981981

982982
### 5. Fine-Tuning the Language Model
983983

984-
Now, fine-tune the Language Model for text classification using the created training view. In the following sections, you will see a detailed explanation of different parameters used during fine-tuning.
984+
Now, fine-tune the Language Model for text classification using the created training view. In the following sections, you will see a detailed explanation of different parameters used during fine-tuning. Fine-tuned model is pushed to your public Hugging Face Hub periodically. A new repository will be created under your username using your project name (`imdb_review_sentiment` in this case). You can also choose to push the model to a private repository by setting `hub_private_repo: true` in training arguments.
985985

986986
```sql
987987
SELECT pgml.tune(
@@ -1236,7 +1236,7 @@ SELECT pgml.tune(
12361236
"per_device_eval_batch_size": 16,
12371237
"num_train_epochs": 1,
12381238
"weight_decay": 0.01,
1239-
"hub_token": "",
1239+
"hub_token": "YOUR_HUB_TOKEN",
12401240
"push_to_hub": true
12411241
},
12421242
"dataset_args": { "text_column": "text", "class_column": "class" }
@@ -1246,6 +1246,16 @@ SELECT pgml.tune(
12461246

12471247
By following these steps, you can effectively restart training from a previously trained model, allowing for further refinement and adaptation of the model based on new requirements or insights. Adjust parameters as needed for your specific use case and dataset.
12481248

1249+
1250+
## 8. Hugging Face Hub vs. PostgresML as Model Repository
1251+
We utilize the Hugging Face Hub as the primary repository for fine-tuning Large Language Models (LLMs). Leveraging the HF hub offers several advantages:
1252+
1253+
* The HF repository serves as the platform for pushing incremental updates to the model during the training process. In the event of any disruptions in the database connection, you have the flexibility to resume training from where it was left off.
1254+
* If you prefer to keep the model private, you can push it to a private repository within the Hugging Face Hub. This ensures that the model is not publicly accessible by setting the parameter hub_private_repo to true.
1255+
* The pgml.transform function, designed around utilizing models from the Hugging Face Hub, can be reused without any modifications.
1256+
1257+
However, in certain scenarios, pushing the model to a central repository and pulling it for inference may not be the most suitable approach. To address this situation, we save all the model weights and additional artifacts, such as tokenizer configurations and vocabulary, in the pgml.files table at the end of the training process. It's important to note that as of the current writing, hooks to use models directly from pgml.files in the pgml.transform function have not been implemented. We welcome Pull Requests (PRs) from the community to enhance this functionality.
1258+
12491259
## Text Classification 9 Classes
12501260
12511261
### 1. Load and Shuffle the Dataset

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy