π AI-Ready Text Extractor for Git Repos | CLI tool for dataset prep, summaries, reverse engineering & bundling
Gittxt is an open-source tool that transforms GitHub repositories into LLM-compatible datasets.
Perfect for developers, data scientists, and AI engineers, Gittxt helps you extract and structure .txt
, .json
, .md
content into clean, analyzable formats for use in:
- Prompt engineering
- Fine-tuning & retrieval
- Codebase summarization
- Open-source LLM workflows
Large Language Models often expect input in very specific formats. Many tools (e.g., ChatGPT, Gemini, Ollama) struggle with arbitrary GitHub URLs, complex folders, or non-text assets.
Gittxt bridges this gap by:
- Extracting all usable text from a repo
- Organizing it for easy ingestion by LLMs
- Offering structured
.txt
,.json
,.md
,.zip
outputs - Giving you full control with filtering, formatting, and plugin support
- β Text extractor for code, docs, config files
- β
Output:
.txt
,.json
,.md
,.zip
- β CLI and plugin system (FastAPI, Streamlit)
- β AI-ready summaries (OpenAI / Ollama)
- β
Reverse engineer
.txt
/.json
reports back into repo structure - β
.gittxtignore
support - β Async scanning for large projects
- β Works offline and in constrained compute environments
outputs/
βββ txt/ # Plain text report
βββ json/ # Structured metadata
βββ md/ # Markdown-formatted summary
βββ zip/ # Bundled results + manifest
pip install gittxt
gittxt scan https://github.com/sandy-sp/gittxt --output-format txt,json --lite --zip
gittxt re outputs/project.md -o ./restored
Try the hosted version (no install required!)
π Launch Streamlit App
- Use it to build structured input for LLMs
- Ideal for prompt chaining, document agents, code summarization
- Helps transform messy repos into single-file, AI-consumable reports
All CLI flags, plugins, formats, and filters are documented here:
π Explore Gittxt Docs
Gittxt supports modular plugins:
gittxt-api
: Run via FastAPI backendgittxt-streamlit
: Interactive dashboard
Install & run with:
gittxt plugin install gittxt-streamlit
gittxt plugin run gittxt-streamlit
Created by Sandeep Paidipati, Gittxt was born out of a need to:
- Quickly preview and summarize GitHub repos with LLMs
- Avoid manual copying, filtering, and converting files
- Create AI-ready datasets for learning and experimentation
- βοΈ Star this repo if it helped you
- π§΅ Share it with your dev/AI community
- π€ Contact me for collaboration or sponsorship
MIT License Β© Sandeep Paidipati
Gittxt β Get Text from Git β Optimized for AI