Skip to content

buyaojinchang/calvin_world

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RLVR-World: Training World Models with Reinforcement Learning

Project Page Paper Hugging Face

This is the official code base for the paper RLVR-World: Training World Models with Reinforcement Learning.

Give it a star 🌟 if you find our work useful!

🔥 News

  • 🚩 2025.05.26: We release all models and datasets.
  • 🚩 2025.05.21: We open-source our training codes.
  • 🚩 2025.05.21: Our paper is released on arXiv.

📋 TL;DR

We pioneer training world models through RLVR:

  • World models across various modalities (particularly, language and videos) are unified under a sequence modeling formulation;
  • Task-specific prediction metrics serve as verifiable rewards directly optimized by RL.

concept

🤗 Models and Datasets

At the moment, we provide the following models and datasets:

Modality Type Domain Name
Language Dataset Text game bytesized32-world-model-cot
Language World model Text game bytesized32-world-model-sft
Language World model Text game bytesized32-world-model-rlvr-binary-reward
Language World model Text game bytesized32-world-model-rlvr-task-specific-reward
Language Dataset Web navigation webarena-world-model-cot
Language World model Web navigation webarena-world-model-sft
Language World model Web navigation webarena-world-model-rlvr
Video Tokenizer Robot manipulation rt1-frame-tokenizer
Video World model Robot manipulation rt1-world-model-single-step-base
Video World model Robot manipulation rt1-world-model-single-step-rlvr
Video Tokenizer Robot manipulation rt1-compressive-tokenizer
Video World model Robot manipulation rt1-world-model-multi-step-base
Video World model Robot manipulation rt1-world-model-multi-step-rlvr

💬 Evaluating Language World Models

See lang_wm:

  • Text game state prediction
  • Web page state prediction
  • Application: Model predictive control for web agents

🎇 Evaluating Video World Models

See vid_wm:

  • Robot manipulation trajectory prediction
  • Application: Real2Sim policy evaluation

🎥 Showcases

showcase

🚀 Release Progress

  • Video world model with RLVR
  • Pre-trained & post-trained video world model weights
  • Real2sim policy evaluation with video world models
  • Text game SFT data
  • Web page SFT data
  • Language world model on text games with RLVR
  • Language world model on web pages with RLVR
  • Post-trained language world model weights
  • Web agents with language world models

📜 Citation

If you find this project useful, please cite our paper as:

@article{wu2025rlvr,
    title={RLVR-World: Training World Models with Reinforcement Learning}, 
    author={Jialong Wu and Shaofeng Yin and Ningya Feng and Mingsheng Long},
    journal={arXiv preprint arXiv:2505.13934},
    year={2025},
}

🤝 Contact

If you have any questions, please contact wujialong0229@gmail.com.

💡 Acknowledgement

We sincerely appreciate the following github repos for their valuable codebase we build upon:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy