0% found this document useful (0 votes)
2 views

github_setup_guide

This document provides a comprehensive guide for setting up a GitHub repository for a Data Science Docker project, including prerequisites, repository setup, initial commits, and PyCharm integration. It outlines a structured workflow for daily development, feature development, experiment tracking, and release preparation, along with guidelines for commit messages and repository maintenance. Key steps include creating a GitHub repository, initializing a local Git repository, configuring Git user information, and establishing a branch strategy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

github_setup_guide

This document provides a comprehensive guide for setting up a GitHub repository for a Data Science Docker project, including prerequisites, repository setup, initial commits, and PyCharm integration. It outlines a structured workflow for daily development, feature development, experiment tracking, and release preparation, along with guidelines for commit messages and repository maintenance. Key steps include creating a GitHub repository, initializing a local Git repository, configuring Git user information, and establishing a branch strategy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

GitHub Setup for Data Science Docker Project

Quick Reference Guide for Version Control Setup & Daily Workflow

Prerequisites Checklist

Git installed locally

GitHub account created

SSH keys configured (recommended) or Personal Access Token

PyCharm Git integration enabled

Docker Data Science project created

Phase 1: Repository Setup


1

Create GitHub Repository

Set up a new repository on GitHub:

Go to GitHub.com → New Repository


Repository name: my-ds-project (match your local folder)

Description: Brief project description

Set to Public or Private as needed


Do NOT initialize with README, .gitignore, or license
Click "Create repository"

Keep the GitHub page open - you'll need the repository URL

Initialize Local Git Repository

In your project directory, initialize Git:

cd /path/to/my-ds-project git init git branch -M main

Modern Git uses 'main' as the default branch name


3
Create Comprehensive .gitignore

Create .gitignore file for Data Science projects:

# Python __pycache__/ *.py[cod] *$py.class *.so .Python env/ venv/ ENV/ env.bak/
venv.bak/ # Jupyter Notebook .ipynb_checkpoints */.ipynb_checkpoints/* # Data files
(add specific paths as needed) data/raw/* data/processed/* *.csv *.xlsx *.parquet
!data/.gitkeep # Models and outputs models/*.pkl models/*.joblib outputs/ results/ #
Docker .docker/ # Environment files .env .env.local # IDE .vscode/ .idea/ *.swp *.swo
*~ # OS .DS_Store Thumbs.db # Logs logs/ *.log

Create README.md

Document your project with a comprehensive README:

# My Data Science Project ## Overview Brief description of what this project does ##
Setup Instructions ### Prerequisites - Docker - Docker Compose - Git ### Quick Start
```bash git clone https://github.com/yourusername/my-ds-project.git cd my-ds-project
docker-compose up -d ``` Access Jupyter Lab at http://localhost:8888 ## Project
Structure ``` ├── data/ │ ├── raw/ # Original data │ ├── processed/ # Cleaned data │
└── external/ # External datasets ├── notebooks/ # Jupyter notebooks ├── src/ # Source
code ├── tests/ # Unit tests ├── docker-compose.yml ├── Dockerfile └──
requirements.txt ``` ## Usage [Add specific instructions for your project] ##
Contributing [Add contribution guidelines] ## License [Add license information]

Configure Git User Information

Set up your Git identity (if not already done):

git config --global user.name "Your Name" git config --global user.email
"your.email@example.com"

⚠️Use the same email as your GitHub account


Phase 2: Initial Commit and Push
6

Stage and Commit Initial Files

Add and commit your initial project structure:

git add . git status # Review what will be committed git commit -m "Initial commit:
Docker DS project setup - Add Dockerfile with Python 3.11 and DS libraries - Add
docker-compose.yml for easy container management - Add requirements.txt with core DS
packages - Add comprehensive .gitignore for DS projects - Add project structure and
README"

Write descriptive commit messages that explain the 'what' and 'why'

Connect to GitHub Repository


Link your local repository to GitHub:

git remote add origin https://github.com/yourusername/my-ds-project.git # OR if using


SSH: # git remote add origin git@github.com:yourusername/my-ds-project.git git remote
-v # Verify the remote is set correctly

⚠️Replace 'yourusername' with your actual GitHub username

Push to GitHub

Upload your project to GitHub:

git push -u origin main

The -u flag sets up tracking between local and remote branches.

After this, you can use just git push for future pushes
Phase 3: PyCharm Git Integration
9

Enable VCS in PyCharm

Activate version control in PyCharm:

VCS → Enable Version Control Integration


Select "Git" from dropdown
Click OK
PyCharm should detect your existing Git repository

You'll see Git options appear in the VCS menu and toolbar

10

Configure GitHub Integration

Connect PyCharm to your GitHub account:

File → Settings → Version Control → GitHub


Click "+" to add account
Login via Token (recommended) or GitHub credentials
Test connection
Click OK

11

Test PyCharm Git Operations

Verify Git integration works:

Make a small change to README.md

Notice file appears in "Local Changes" (VCS tool window)


Right-click file → Git → Commit File
Write commit message and commit

VCS → Git → Push (or Ctrl+Shift+K)

Check GitHub to confirm the change appears online


Phase 4: Branch Strategy Setup
12

Set Up Development Branch

Create a development branch for ongoing work:

git checkout -b develop git push -u origin develop

Or in PyCharm: VCS → Git → Branches → New Branch

Keep 'main' for stable releases, use 'develop' for active development

Setup Validation Checklist

GitHub repository created and accessible

Local Git repository initialized

Initial files committed and pushed

PyCharm VCS integration working

Main and develop branches created

📋 Daily/Weekly Workflow Reminders

Follow these patterns as your project evolves:

🔄 Daily Development Cycle


Start day → Pull latest changes → Work on features → Commit frequently → Push at end of day

Commands:

git checkout develop git pull origin develop # ... work on your code ... git add
. git commit -m "Add: [description]" git push origin develop

🔀 Feature Development
Create feature branch → Develop → Test → Merge back to develop

Commands:
git checkout develop git checkout -b feature/data-preprocessing # ... develop
feature ... git add . git commit -m "Implement data preprocessing pipeline" git
checkout develop git merge feature/data-preprocessing git push origin develop git
branch -d feature/data-preprocessing

📊 Experiment Tracking
Create experiment branches for different approaches

Commands:

git checkout -b experiment/lstm-model # ... run experiments ... git add


notebooks/lstm_experiment.ipynb git commit -m "Experiment: LSTM model with
attention Results: - Accuracy: 92.5% - Loss: 0.234 - Training time: 45min Next:
Try with different hyperparameters"

Release Preparation
When ready for a release, merge develop to main

Commands:

git checkout main git pull origin main git merge develop git tag -a v1.0.0 -m
"Release version 1.0.0" git push origin main --tags

🛠️Handling Data Files


Remember: Never commit large data files! Use Git LFS or external storage

For large files, consider:

# Install Git LFS (one time setup) git lfs install # Track large files git lfs
track "*.csv" git lfs track "*.parquet" git lfs track "models/*.pkl" # Add
.gitattributes git add .gitattributes

📝 Commit Message Guidelines


Use clear, descriptive commit messages
Format:

# Type: Brief description (50 chars or less) # # Longer explanation if needed


(wrap at 72 chars) # # Types: Add, Update, Fix, Remove, Refactor, Experiment
Examples: - "Add: Initial data preprocessing pipeline" - "Fix: Handle missing
values in feature engineering" - "Update: Improve model accuracy from 85% to 92%"
- "Experiment: Test XGBoost vs Random Forest"

🔍 Regular Maintenance
Keep your repository clean and organized

Weekly: Review and clean up old branches


Monthly: Update dependencies in requirements.txt
Quarterly: Update documentation and README

Before releases: Run full test suite

Pro Tip: Always pull before you push, and commit often with meaningful messages!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy