Skip to content

Releases: allenai/OLMo

v0.6.0

19 Dec 17:03
Compare
Choose a tag to compare

What's new

Added 🎉

  • A bunch of annealing configs
  • constant_with_warmup learning rate schedule
  • one_in_eight configuration for activation checkpointing
  • New tokenizer in the source instead of from huggingface
  • Improved support for GCS
  • torch.compile() now only compiles each block, not the whole model.
  • Support for torch.compile() with dynamic=True
  • Resetting the torch.compile() after every evaluation, because evaluation messes with the compiled versions
  • Added more in-loop evaluation tasks to pick from, mostly for scaling law.

Commits

b41634f One more hint for what's going on.
d74e835 A little more help for getting started
24ce0ca Merge pull request #756 from allenai/MoreCheckpoints2
69d1e4e Note about and link to Huggingface
3c6d515 Merge pull request #754 from allenai/MoreCheckpoints
645587e Merge branch 'main' of https://github.com/allenai/LLM
a346674 Fix link
4f0d7d1 We use safetensors now.
a6e6e2b Remove links that don't work
e6f6b45 Remove obsolete docs
0d14158 Merge pull request #745 from allenai/improve-documentation
1048c16 Merge pull request #750 from allenai/dave/annealing_peteish_v2
767047c Merge pull request #749 from allenai/mattj/legalwhammy2-augusta
9c677c9 Merge pull request #748 from allenai/oeeval-ladder-testtrain
7e81a6c Merge pull request #739 from allenai/peteish13-augusta
31c385f Merge pull request #742 from allenai/GoogleStorage
afd728f Merge pull request #738 from allenai/annealing_peteish_v2_neweval
837a4ff Merge pull request #687 from allenai/kylel/config-diff

v0.5.1

17 Oct 21:32
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added ability to try loading latest checkpoint from save folder using --try_load_latest_save.
  • Added support for flash attention and gradient checkpointing to hf_olmo.
  • Added effective_n_kv_heads to OLMoConfig for hacky VLLM support.

Commits

6889991 Merge pull request #735 from allenai/true-version-0.5.1
4d81b1b Merge pull request #733 from allenai/version-0.5.1
76ad758 Merge pull request #724 from allenai/shanea/lumi-24.03-2
885bc22 Merge pull request #721 from allenai/hey-my-first-pr-to-olmo
aa1863e Merge pull request #725 from allenai/shanea/fix-build-errors
59360be add missing function
d2b655a Merge pull request #720 from allenai/shanea/set-device-early
47f8f5a Merge pull request #719 from allenai/shanea/hf-olmo-gradient-checkpointing
0b92077 Merge pull request #718 from allenai/ot-fix-mmlu-bpb
ca81901 Merge pull request #717 from allenai/shanea/try-load-latest-save-2
46f06cb Merge pull request #712 from allenai/ot-fix-oe-eval-bpb

v0.5.0

27 Aug 02:00
Compare
Choose a tag to compare

What's new

  • Fixed conversion to HuggingFace model for DDP-trained models.
  • Added support for remote source and destination for HuggingFace model conversion.

Added 🎉

  • Added support for document masking via flash-attn during training with --data.generate_doc_lengths.
  • Added config options for model.norm_after, model.scale_emb_init, and auxiliary_loss_multiplier (used with zloss).
  • Added scripts for running experiments on qk_norm, norm reordering, and zloss.
  • Added model.rope_theta configuration option.
  • Added model.embedding_layer_norm configuration option for adding a LN to the embeddings.
  • Added model.emb_init_std configuration option to override the standard deviation used to initialize the embeddings.
  • Added downstream eval task for requests dumped from oe-eval tasks
  • Added CosLinearEnvelope scheduler, which is a pointwise product of a cosine schedule and a linear decay.
  • Added ability to save outputs of submodules for debugging purposes.
  • Version dolma flan change in named_data_mix.py

Changed ⚠️

  • Changed default distributed training strategy from single-GPU to FSDP
  • Fixed behavior of effective_memmap_dtype to prevent unrecognized dtypes to be parsed as uint16.

Fixed ✅

  • Fixed restarting a training run in later epochs so that we no longer need to set the flag --epoch=INT.
  • Swapped in correct flan data mix.
  • Fix bug where the attention norm, when applied before the attention block, was modifying the residual stream.
  • Fixed OLMo.from_checkpoint() so that it correctly loads olmo_core and torch_new style checkpoints.
  • Fixed preserve_rng_state being incorrectly set to False when doing gradient checkpointing with dropout

Commits

cee1a5d Merge pull request #710 from allenai/version-dolma-flan-change
213a639 Merge pull request #711 from allenai/epwalsh/fix-unbound-qkv
4575d40 Fix Conversion Issues + add support for remote upload. (#694)
78d79a5 Merge pull request #709 from allenai/shanea/debugging-docs
9147889 Merge pull request #685 from allenai/ot-oe-eval-requests
6cdc4cc Merge pull request #698 from allenai/shanea/compare-model-state
e5217cf Merge pull request #705 from allenai/dave/checkpoint_style_naming
f4b386e Merge pull request #704 from allenai/shanea/fix-olmo-1.7-batch-size
1e71ce3 Merge pull request #547 from allenai/shanea/add-olmo-1.7-7b-to-readme
6c4d53f Merge pull request #702 from chrisc36/main
0bc7f6c Merge pull request #690 from allenai/shanea/trace-model-outputs-2
4332c32 Merge pull request #691 from allenai/dave/cosine_linear_envelope
6587ddb Merge pull request #674 from allenai/dave/flan_data_mix
7d63fe0 Merge pull request #671 from allenai/s3_unshard_to_hf
c322b9a Merge pull request #686 from allenai/fix-from-checkpoint
c482df7 Merge pull request #680 from allenai/shanea/fix-incorrect-attn-norm
3e30710 Merge pull request #629 from allenai/epwalsh/amberish
4e00460 Add support for document masking during training (#661)
b45002e make epoch logging less confusing
1b7d275 Fix restarts in later epochs (#670)
345edc6 Merge branch 'main' of https://github.com/allenai/LLM
66d2be7 Revert "Update Beaker image"
0757223 Merge pull request #649 from allenai/ModelLadder
90b3889 Merge pull request #660 from allenai/fix_convert_olmo_to_hf
dfb7212 Merge pull request #616 from allenai/chameleon
d627c94 Merge pull request #665 from allenai/ddp-ckpt-fix
ab63296 Improving memmap type parser (#663)
b55fb5f Merge pull request #662 from allenai/tiny-olmo-config-fix
56d1fe0 Merge pull request #657 from allenai/shanea/lumi-torch2.3-3
26c2d53 Merge pull request #648 from allenai/shanea/default-fsdp-strategy
65f1fff Merge pull request #656 from jeqcho/patch-1
20b82f8 Merge pull request #653 from allenai/shanea/olmo-v0.4.0

v0.4.0

11 Jul 21:52
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added clipping fix to Optimizer class to make it work with FSDP no_shard and DDP.
  • Added tests to compare grad norm differences between torch optimizer and clipping and OLMo optimizer and clipping on both CPU and GPU.
  • Expose memmap dtype in data config
  • Added support for DDP training.
  • Added caching to disk of HF datasets used in downstream evals
  • Added FLOPs logging
  • Added configs for OLMo tiny set of models
  • Added configuration field optimizer.record_update_metrics, which defaults to False, but when set to True will trigger AdamW to collect the step size norm and absolute max for each parameter.
  • Added configuration field optimizer.selective_updates, which defaults to False, but when set to True will tell the optimizer to skip updating the parameter and state when the corresponding gradient is 0.
  • Added configuration field optimizer.record_update_metrics, which defaults to False, but when set to True will trigger AdamW to collect the step size norm and absolute max for each parameter.
  • Added olmo_data, a package holding data files like tokenizers.
  • Added ability to load tokenizers from olmo_data package data.

Changed ⚠️

  • Added original legacy unsharding implementation back, as the default. The new
    shared memory implementation can be used by passing use_legacy_shared_mem_impl to unshard.py.
  • Refactor weight initialization. IMPORTANT: this does not maintain backwards-compatibility with older configs; the jobs will still run, but may produce different outputs.
  • Changed the behavior of the Lion optimizer to only record the update cosine similarity when optimizer.record_update_metrics is True in order to be consistent with the API.
  • Added HF datasets into olmo_data, and changed downstream eval to load from the package.

Fixed ✅

  • Changed from ignored_index to ignore_index for cross_entropy_loss when flash-attn>=2.5.8.
  • Make hf_olmo support AutoModelForCasualLM and similar HF methods again.

Commits

d423c11 Merge pull request #652 from allenai/shanea/update-to-torch2.3
b10ab4b Merge pull request #651 from allenai/shanea/lumi-torch2.3-2
a101b31 Merge pull request #646 from allenai/shanea/hf-datasets-from-package
429a752 Merge pull request #647 from allenai/shanea/fix-tokenizer-break
bc60b8a Add option to skip optim steps for 0 grad params (#636)
cbc7c25 Merge pull request #645 from allenai/shanea/tokenizer-package-data
1b2658b Add option to record step size metrics from AdamW (#605)
a3e2ea7 multiple epoch fix
a1f118a Merge pull request #628 from allenai/olmo-tiny
d7994c8 Fix Z-loss calculation (#634)
a5539f4 Merge pull request #631 from allenai/shanea/hf-olmo-auto-model
d72a262 Merge pull request #626 from allenai/shanea/inspect-train-data-improvements
2417b11 Make olmo-core checkpointer more robust on weka (#624)
ddc8847 Merge pull request #612 from allenai/ddp
41ed20a Merge pull request #623 from allenai/shanea/hf-save-to-disk-2
a33caa9 Merge pull request #604 from allenai/WandbDiff
e5d63a3 Merge pull request #619 from allenai/shanea/add-olmo-1.7-7b-checkpoints
e207df7 Officially add OLMo-core as a dependency (#615)
72159ae Merge pull request #614 from allenai/shanea/pass-include-instance-metadata
c2cedbc Merge pull request #607 from allenai/rewrite-init
578234d Merge pull request #611 from allenai/shanea/hf-get-tokenizer-from-config-2
de43ee8 Merge pull request #610 from allenai/shanea/hf-get-tokenizer-from-config
2639279 Merge pull request #594 from NeuralFabricAI/lx/expose-data-dtype
9e89408 Create sensible filenames
02a8a58 Merge pull request #603 from allenai/shanea/unshard-without-passing-type
ae84d47 Merge pull request #602 from allenai/no_shard_ddp_clip
40210bb Merge pull request #599 from allenai/train-olmo-large
55c1e2f Merge pull request #601 from allenai/no_shard_ddp_clip
5789cfe Merge pull request #593 from allenai/shanea/inspect-train-data-no-indices
eafd154 Merge pull request #579 from MLgdg/main
652c745 Merge pull request #590 from allenai/shanea/update-readme-to-olmo-1.7
8ec2809 Merge pull request #589 from allenai/shanea/update-main-readme-hf
6e714b8 Merge pull request #588 from allenai/shanea/hf-olmo-docs-auto-methods
65d5575 Merge pull request #587 from allenai/shanea/storage-cleaner-improvemnts
0bddfe0 Merge pull request #585 from allenai/shanea/add-hf-docs
e6430a0 Merge pull request #582 from allenai/shanea/hybrid-shard-as-no-shard
c29787a Merge pull request #569 from allenai/Muennighoff/fix-torchv
7a462c5 Merge pull request #580 from allenai/shanea/update-ignore-index-kwarg
4f917fb Merge pull request #575 from allenai/shanea/add-weka
5c721cc Fix GPU tests CI (#574)
467adcc Merge remote-tracking branch 'origin/train-olmo-large'
4b2d12e Merge pull request #565 from allenai/readme
ccc49fd Merge pull request #564 from allenai/shanea/add-new-hf-converter
b17abd0 Merge pull request #512 from liaoleo/main
295d309 Merge pull request #561 from allenai/shanea/delay-device-mesh-import
4e8746d Merge pull request #562 from allenai/shanea/re-add-easy-legacy-unshard-impl
f38de95 Merge pull request #558 from allenai/shanea/release-v0.3.0
829f1d6 Merge pull request #520 from allenai/add-ce-loss-metric

v0.3.0

25 Apr 19:23
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added support for Grouped Query Attention.
  • Added commonsense_qa and social_iqa downstream evaluation tasks
  • Makes it possible to read from http/https the same way we read from s3/r2.
  • Added MMLU multiple choice (A/B/C/D) 5-shot variant downstream tasks
  • Tokenizer patch
  • Added option to specify number of model replicas when using hybrid sharding.

Changed ⚠️

  • Rename Olmo to OLMo everywhere in the codebase
  • Disabled automatic garbage collection during training, instead we run manually at regular intervals to avoid ranks getting out-of-sync with their own gc.

Removed 👋

  • Removed AMDLayerNorm, since the original layer norm bug has been fixed and we don't need this workaround anymore.
  • Removed OLMoParallelBlock.

Fixed ✅

  • Don't log garbage on nodes that aren't rank 0
  • Don't crash in the HF code when we are referring to a tokenizer in a local file
  • Point official training scripts to publicly available URLs
  • Corrected the resize_token_embeddings method in the OLMoForCausalLM class to properly update the token embeddings when resizing the vocabulary.
  • Changed tie_weights method to a no-op as weight tying is handled in olmo/model.py
  • Fixed the size calculation for qk layer norm
  • Fixed pipeline test failure that occurs due to a bug in transformers version 4.39.1
  • Make hf_olmo compatible with transformers versions >=4.40.0

Commits

3b16e21 Merge pull request #556 from allenai/shanea/make-hf-olmo-support-new-transformers
ccf7bf0 Merge pull request #555 from allenai/shanea/wandb-cancel-failure-bypass
7be71cd use correct PG when collecting metrics with HYBRID shard (#551)
06786a7 Merge pull request #548 from allenai/shanea/fix-olmo-name-hf
4ed135e Merge pull request #540 from allenai/shanea/hybrid-sharding-num-groups-2
2eae988 Merge pull request #546 from allenai/shanea/add-olmo-1.7-7b-checkpoints
d2afcaa Add cfg option --scheduler.warmup_min_lr (#542)
9d40898 Merge pull request #537 from allenai/AkshitaB-tokenizer-patch
62c7954 Merge pull request #536 from allenai/shanea/storage-cleaner-wandb-path-from-checkpoint
657a55e Merge pull request #494 from allenai/shanea/storage-cleaner-move-entry
9a0a84a Merge pull request #527 from allenai/PublicTrainingData
0de5fdc Merge pull request #501 from djliden/dl/fix-embedding-resize
4792f94 Adds a new experimental sharded checkpointer from OLMo-core (#532)
1c12980 make garbage collection interval configurable (#533)
db2dee2 Merge pull request #503 from djliden/dl/hf-weight-tying
8fad649 Merge pull request #534 from allenai/shanea/fix-transformer-cache-position-regression
71f7014 Merge pull request #528 from allenai/add-mmlu-mc-5shot
8472d0b Merge pull request #521 from allenai/davidbrandfonbrener-patch-1
194012a Merge pull request #523 from allenai/davidbrandfonbrener-patch-2
8949bd8 Added deprecation for memmap (#517)
83cc8b1 Merge pull request #464 from allenai/olmo7-ablations
f8aef84 Merge pull request #509 from allenai/epwalsh/manual-gc
0ac82a9 Merge pull request #508 from allenai/RunDataloader
74de51d Merge pull request #414 from allenai/mitchish65-2
417af0e Merge pull request #504 from allenai/add-csqa-siqa
666da70 Patch other S3 methods with 404 detection fix
0b6e28c Fix checking HTTP status code for boto3 responses
0b835a8 Merge pull request #500 from allenai/shanea/expose-official-checkpoints
50da7a4 Add work-arounds for new-style checkpointing issues
6d42d7a Fix hang when training is canceled
7eb7f3d Merge pull request #455 from gahdritz/main
ed47c29 Merge pull request #453 from hxdtest/only_rank0_log_metrics
ad8198e Merge pull request #495 from allenai/add-basic-math
1511fed Merge pull request #487 from allenai/fix-mmlu-prompt-bug
c2840e4 Merge pull request #493 from allenai/shanea/storage-cleaner-move-improvements
658f7cc Merge pull request #466 from allenai/rename
eb5b2da Merge pull request #490 from allenai/RemoveAMDLN
752353b Merge pull request #488 from allenai/shanea/optimize-unsharding-2

v0.2.5

07 Mar 00:31
Compare
Choose a tag to compare

What's new

Fixed ✅

  • Fixed default value of --tokenizer argument to scripts/prepare_tulu_data.py to be an absolute path, not relative path, the script can be run from other directories.
  • Added the option to directly pass input embeddings to OLMo and OLMoForCausalLM.
  • Added support for Python 3.8.
  • Added code to throw an error if output_attentions is set to True in forward call to OLMoForCausalLM. This functionality hasn't been implemented yet.
  • Fixed running with data loading workers on LUMI

Added 🎉

  • Added output_hidden_states argument and associated functionality to OLMo and OLMoForCausalLM to return model intermediate hidden states.
  • Added MMLU downstream evaluation tasks, with prompt variations.
  • Added support for PyTorch v2.2.
  • Added ability to show logs from all ranks
  • Added option for QKV clipping.

Changed ⚠️

  • Refactor torch.load monkey patching for legacy checkpoint unsharding in anticipation of unsharding implementation change.

Commits

c499632 Add option for QKV clipping (#489)
31d8528 Pull checkpoint patch from mitchish-gqa-2
03d7643 Merge pull request #486 from allenai/shanea/monkey-patch-ctx-manager
fd3a57b Merge pull request #483 from allenai/shanea/storage-cleaner-unshard-improvements
1d264e4 Merge pull request #481 from allenai/WorkersOnLumi
70ad30c Merge pull request #480 from allenai/Firehose
493c0b8 Add MMLU prompt variants (#484)
cb711e2 Add support for PyTorch v2.2 (#476)
67d24f5 Merge pull request #468 from allenai/mmlu-downstream
0c58bee Fix bug when clipping is disabled
922db6a Only run the profiler through a single cycle (#463)
37ca789 Merge pull request #462 from allenai/epwalsh/fsdp-wrap-patch
cc36709 Add attn bias arg to HF wrapper (#458)
7f7abbb Merge pull request #451 from sarahwie/main
9fd9130 Add support for Python 3.8 (#448)
d9c0993 Require Python>=3.9 for now
97296e6 Merge pull request #442 from allenai/shanea/add-input-embedding-arg
3be4c1e add link to W&B logs for 1B run
d7d4de4 Add link to OLMo-7B-Twin-2T W&B logs
cf12108 Update README.md (#429)
15af668 freeze official configs for reproductions (#421)
7739fe1 Add link to W&B logs for OLMo-7B
80db5e3 Fix default value of --tokenizer
6765317 Add link to paper in README badge

v0.2.4

02 Feb 18:40
Compare
Choose a tag to compare

What's new

Fixed ✅

  • Fixed an issue with the HuggingFace integration where we were inadvertently using a feature that was introduced in Python 3.10, causing an error for older Python versions.

Commits

8a3f2d8 Fix HF integration for Python < 3.10 (#426)
49c8647 Use temp branding GIF for logo (for now) (#419)

v0.2.3

31 Jan 18:36
Compare
Choose a tag to compare

What's new

Commits

98c115c Bump version to v0.2.3 for release
0e53b33 specify dependencies in pyproject.toml (#418)
18e5dad update PyPI release process
141cc94 Merge pull request #415 from allenai/readme-inf
2587240 Merge pull request #417 from allenai/Muennighoff/ckpt
a5a01a2 Merge pull request #416 from allenai/nol_rdme
98425a5 Merge pull request #413 from allenai/shanea/storage-cleaner-s3-upload-cleanup
3053bfa Update install instructions in README
f36ac42 Merge pull request #410 from allenai/epwalsh/fine-tune-with-label-masking
dcae8e8 Merge pull request #411 from allenai/epwalsh/lr-schedule-tokens
45ed078 Add more mcli configs
905359e fix bug with saving unsharded checkpoint
3e3df71 Merge pull request #409 from allenai/epwalsh/tulu-fine-tune
a2e1d13 Merge pull request #368 from allenai/mitchish-lumi
5a735dd Merge pull request #350 from allenai/mitchish
df19554 Merge pull request #388 from allenai/mitchish65
23eb949 Train a few steps after time limit reached (#362)
ac1aee1 Merge pull request #408 from allenai/NixLogz
6da42cf ensure we save checkpoint at end of loop
568a3d8 Merge pull request #406 from allenai/hf-olmo-loading
3c51402 Merge pull request #407 from allenai/shanea/storage-cleaner-avoid-redundant-copy
53217d2 Merge pull request #405 from allenai/shanea/storage-cleaner-fix-upload-path
5eb26aa Merge pull request #404 from allenai/shanea/storage-cleaner-minor-fixes
87ed747 backwards compat fix
1c13e5f Merge pull request #403 from allenai/shanea/storage-cleaner-fix-max-archive-size
685d11b Merge pull request #400 from allenai/shanea/storage-cleaner-wandb
5bdccc3 Merge pull request #402 from allenai/shanea/storage-cleaner-is-run-improvement
75d6738 Merge pull request #401 from allenai/shanea/storage-cleaner-is-file-no-key
0475f3a Make logo a little smaller
1184050 Add logo to README
e2d77c4 Ephemeral checkpoints (#397)
6f2abfb Merge pull request #399 from allenai/shane/storage-cleaner-fix-s3-upload
f8beb5b Merge pull request #398 from allenai/shanea/storage-cleaner-move-run
185d7e2 Move remaining top-level mkd docs into docs folder (#395)
5d03d38 Merge pull request #396 from allenai/shanea/storage-cleaner-delete-temp-files
fe49693 Merge pull request #382 from allenai/shanea/storage-cleaner-unsharding-legacy
1ede949 Merge pull request #381 from allenai/shanea/storage-cleaner-unsharding-2
9cc7154 update some links to new repo (#394)

v0.2.2

11 Dec 05:58
Compare
Choose a tag to compare

What's new

Commits

364e21e Merge pull request #393 from allenai/hf-olmo-auto-map

v0.2.1

11 Dec 00:11
Compare
Choose a tag to compare

What's new

Commits

ad3e676 missing readme
9fa23b4 Merge pull request #392 from allenai/hf-bug-fix

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy