Content-Length: 472962 | pFad | http://github.com/topics/chinese-simplified

BF chinese-simplified · GitHub Topics · GitHub
Skip to content
#

chinese-simplified

Here are 236 public repositories matching this topic...

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

  • Updated Feb 18, 2025

Improve this page

Add a description, image, and links to the chinese-simplified topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the chinese-simplified topic, visit your repo's landing page and select "manage topics."

Learn more









ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/topics/chinese-simplified

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy