0% found this document useful (0 votes)
230 views22 pages

4-Channel YOLO Training Guide For RGB+IR Drone Detection

This document provides a comprehensive guide for training a 4-channel YOLO model for drone detection using RGB and infrared (IR) images. It covers environment setup, dataset preparation, modifications to the YOLO architecture for 4-channel input, and a custom training loop. The guide also includes code snippets for implementing the necessary classes and functions to facilitate the training process.

Uploaded by

Gamerithms
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
230 views22 pages

4-Channel YOLO Training Guide For RGB+IR Drone Detection

This document provides a comprehensive guide for training a 4-channel YOLO model for drone detection using RGB and infrared (IR) images. It covers environment setup, dataset preparation, modifications to the YOLO architecture for 4-channel input, and a custom training loop. The guide also includes code snippets for implementing the necessary classes and functions to facilitate the training process.

Uploaded by

Gamerithms
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

4-Channel YOLO Training Guide for RGB+IR Drone Detection

Step 1: Environment Setup

python

# Install required packages


!pip install ultralytics
!pip install opencv-python
!pip install pillow
!pip install matplotlib
!pip install torch torchvision

# Import necessary libraries


import torch
import torch.nn as nn
import cv2
import numpy as np
import os
from pathlib import Path
import yaml
from ultralytics import YOLO
from ultralytics.models.yolo.detect import DetectionTrainer
from ultralytics.nn.modules import Conv, C2f, SPPF, Detect
import matplotlib.pyplot as plt

Step 2: Dataset Preparation

2.1 Dataset Structure


Organize your dataset as follows:
dataset/
├── images/
│ ├── train/
│ │ ├── rgb_image1.jpg
│ │ ├── ir_image1.jpg
│ │ └── ...
│ └── val/
│ ├── rgb_image1.jpg
│ ├── ir_image1.jpg
│ └── ...
└── labels/
├── train/
│ ├── rgb_image1.txt
│ └── ...
└── val/
├── rgb_image1.txt
└── ...

2.2 Create 4-Channel Dataset Class


python
import torch
from torch.utils.data import Dataset
from PIL import Image
import numpy as np

class RGBIRDataset(Dataset):
def __init__(self, rgb_dir, ir_dir, label_dir, img_size=640, augment=False):
self.rgb_dir = Path(rgb_dir)
self.ir_dir = Path(ir_dir)
self.label_dir = Path(label_dir)
self.img_size = img_size
self.augment = augment

# Get list of RGB images (assuming RGB and IR have same names)
self.rgb_files = list(self.rgb_dir.glob('*.jpg')) + list(self.rgb_dir.glob('*.png'))

def __len__(self):
return len(self.rgb_files)

def __getitem__(self, idx):


# Load RGB image
rgb_path = self.rgb_files[idx]
rgb_img = cv2.imread(str(rgb_path))
rgb_img = cv2.cvtColor(rgb_img, cv2.COLOR_BGR2RGB)

# Load corresponding IR image


ir_path = self.ir_dir / rgb_path.name
ir_img = cv2.imread(str(ir_path), cv2.IMREAD_GRAYSCALE)

# Resize images
rgb_img = cv2.resize(rgb_img, (self.img_size, self.img_size))
ir_img = cv2.resize(ir_img, (self.img_size, self.img_size))

# Combine RGB + IR to create 4-channel image


# Shape: (height, width, 4) -> (4, height, width)
combined_img = np.dstack((rgb_img, ir_img)) # (H, W, 4)
combined_img = np.transpose(combined_img, (2, 0, 1)) # (4, H, W)
combined_img = combined_img.astype(np.float32) / 255.0

# Load labels
label_path = self.label_dir / (rgb_path.stem + '.txt')
labels = []
if label_path.exists():
with open(label_path, 'r') as f:
for line in f.readlines():
labels.append(list(map(float, line.strip().split())))
return torch.from_numpy(combined_img), torch.tensor(labels)

# Create dataset yaml configuration


dataset_config = {
'path': '/content/dataset',
'train': 'images/train',
'val': 'images/val',
'nc': 1, # number of classes (adjust based on your classes)
'names': ['drone'] # class names (adjust based on your classes)
}

with open('/content/dataset.yaml', 'w') as f:


yaml.dump(dataset_config, f)

Step 3: Modify YOLO Architecture for 4-Channel Input

3.1 Custom Conv Layer for 4-Channel Input

python

class Conv4Channel(nn.Module):
"""Modified Conv layer to handle 4-channel input"""
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
super().__init__()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
self.bn = nn.BatchNorm2d(c2)
self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Ide

def forward(self, x):


return self.act(self.bn(self.conv(x)))

def autopad(k, p=None, d=1):


"""Auto-pad to 'same' shape outputs."""
if d > 1:
k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]
if p is None:
p = k // 2 if isinstance(k, int) else [x // 2 for x in k]
return p

 

3.2 Custom YOLO Model Class


python

from ultralytics.nn.modules import Conv, C2f, SPPF, Detect


import torch.nn as nn

class YOLO4Channel(nn.Module):
def __init__(self, cfg='yolov8n.yaml', ch=4, nc=1):
super().__init__()

# Load base YOLOv8 model


base_model = YOLO('yolov8n.pt')
self.model = base_model.model

# Modify first conv layer to accept 4 channels


old_conv = self.model.model[0].conv
self.model.model[0].conv = nn.Conv2d(
in_channels=4, # Change from 3 to 4
out_channels=old_conv.out_channels,
kernel_size=old_conv.kernel_size,
stride=old_conv.stride,
padding=old_conv.padding,
bias=False
)

# Copy weights from RGB channels and initialize IR channel


with torch.no_grad():
# Copy RGB weights
self.model.model[0].conv.weight[:, :3, :, :] = old_conv.weight
# Initialize IR channel weights (average of RGB channels)
self.model.model[0].conv.weight[:, 3:4, :, :] = old_conv.weight.mean(dim=1, keepdim

def forward(self, x):


return self.model(x)

 

Step 4: Custom Trainer Class


python

from ultralytics.models.yolo.detect import DetectionTrainer


from ultralytics.data.build import build_dataloader
from ultralytics.utils import LOGGER

class RGBIR_Trainer(DetectionTrainer):
def build_dataset(self, img_path, mode="train", batch=None):
"""Build custom dataset for RGB+IR training"""
gs = max(int(self.model.stride.max() if self.model else 0), 32)
return RGBIRDataset(
rgb_dir=f"{self.data['path']}/images/{mode}",
ir_dir=f"{self.data['path']}/images/{mode}", # Assuming IR images are in same fold
label_dir=f"{self.data['path']}/labels/{mode}",
img_size=self.args.imgsz,
augment=mode == "train"
)

def get_dataloader(self, dataset_path, batch_size=16, rank=0, mode="train"):


"""Create custom dataloader"""
dataset = self.build_dataset(dataset_path, mode)
return torch.utils.data.DataLoader(
dataset,
batch_size=batch_size,
shuffle=mode == "train",
num_workers=2,
collate_fn=self.collate_fn
)

@staticmethod
def collate_fn(batch):
"""Custom collate function for batch processing"""
imgs = []
labels = []
for img, label in batch:
imgs.append(img)
if len(label) > 0:
labels.append(label)

imgs = torch.stack(imgs, 0)
return imgs, labels

 

Step 5: Training Setup and Execution

5.1 Fixed Training Approach


python

# First, let's create a working directory structure


import os
os.makedirs('/content/dataset/images/train', exist_ok=True)
os.makedirs('/content/dataset/images/val', exist_ok=True)
os.makedirs('/content/dataset/labels/train', exist_ok=True)
os.makedirs('/content/dataset/labels/val', exist_ok=True)

# Create a simplified 4-channel model approach


def create_4channel_yolo_model():
"""Create YOLO model modified for 4-channel input"""
# Load base model
model = YOLO('yolov8n.pt')

# Access the actual PyTorch model


pytorch_model = model.model

# Get the first convolutional layer


first_conv = pytorch_model[0].conv

# Create new conv layer with 4 input channels


new_conv = nn.Conv2d(
in_channels=4,
out_channels=first_conv.out_channels,
kernel_size=first_conv.kernel_size,
stride=first_conv.stride,
padding=first_conv.padding,
bias=False
)

# Initialize weights
with torch.no_grad():
# Copy RGB weights (first 3 channels)
new_conv.weight[:, :3, :, :] = first_conv.weight.clone()
# Initialize IR channel (4th channel) as average of RGB channels
new_conv.weight[:, 3:4, :, :] = first_conv.weight.mean(dim=1, keepdim=True)

# Replace the first conv layer


pytorch_model[0].conv = new_conv

return model

# Create the modified model


model = create_4channel_yolo_model()
print("4-channel YOLO model created successfully!")
5.2 Custom Training Loop (Recommended Approach)
python
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from pathlib import Path
import numpy as np

# Custom dataset class for RGB+IR


class RGBIRDataset(torch.utils.data.Dataset):
def __init__(self, images_dir, labels_dir, img_size=640):
self.images_dir = Path(images_dir)
self.labels_dir = Path(labels_dir)
self.img_size = img_size

# Find all RGB images (assuming naming convention: *_rgb.jpg and *_ir.jpg)
self.image_files = []
for rgb_file in self.images_dir.glob('*_rgb.*'):
ir_file = self.images_dir / rgb_file.name.replace('_rgb', '_ir')
if ir_file.exists():
self.image_files.append((rgb_file, ir_file))

print(f"Found {len(self.image_files)} RGB-IR image pairs")

def __len__(self):
return len(self.image_files)

def __getitem__(self, idx):


rgb_path, ir_path = self.image_files[idx]

# Load RGB image


rgb_img = cv2.imread(str(rgb_path))
rgb_img = cv2.cvtColor(rgb_img, cv2.COLOR_BGR2RGB)
rgb_img = cv2.resize(rgb_img, (self.img_size, self.img_size))

# Load IR image
ir_img = cv2.imread(str(ir_path), cv2.IMREAD_GRAYSCALE)
ir_img = cv2.resize(ir_img, (self.img_size, self.img_size))

# Combine RGB + IR
combined = np.dstack([rgb_img, ir_img]) # Shape: (H, W, 4)
combined = combined.transpose(2, 0, 1) # Shape: (4, H, W)
combined = combined.astype(np.float32) / 255.0

# Load labels
label_path = self.labels_dir / (rgb_path.stem.replace('_rgb', '') + '.txt')
labels = []
if label_path.exists():
with open(label_path, 'r') as f:
for line in f.readlines():
if line.strip():
labels.append([float(x) for x in line.strip().split()])

return torch.from_numpy(combined), np.array(labels) if labels else np.array([])

# Training function
def train_4channel_yolo(model, train_dataset, val_dataset, epochs=50, batch_size=8):
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Training on device: {device}")

# Move model to device


model.model.to(device)
model.model.train()

# Create data loaders


def collate_fn(batch):
imgs = torch.stack([item[0] for item in batch])
labels = [item[1] for item in batch]
return imgs, labels

train_loader = DataLoader(
train_dataset,
batch_size=batch_size,
shuffle=True,
collate_fn=collate_fn,
num_workers=2
)

val_loader = DataLoader(
val_dataset,
batch_size=batch_size,
shuffle=False,
collate_fn=collate_fn,
num_workers=2
)

# Optimizer
optimizer = torch.optim.AdamW(model.model.parameters(), lr=0.001, weight_decay=0.0005)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=epochs)

# Training loop
for epoch in range(epochs):
model.model.train()
total_loss = 0
num_batches = 0
for batch_idx, (images, targets) in enumerate(train_loader):
images = images.to(device)

optimizer.zero_grad()

# Forward pass
try:
# For YOLO models, we need to handle the loss calculation differently
# This is a simplified approach - you might need to adapt based on your specifi
outputs = model.model(images)

# Simple loss calculation (you'll need to implement proper YOLO loss)


# For now, we'll use a placeholder
loss = torch.tensor(0.5, requires_grad=True).to(device) # Placeholder

loss.backward()
optimizer.step()

total_loss += loss.item()
num_batches += 1

if batch_idx % 10 == 0:
print(f'Epoch {epoch+1}/{epochs}, Batch {batch_idx}/{len(train_loader)}, Lo

except Exception as e:
print(f"Error in batch {batch_idx}: {e}")
continue

scheduler.step()
avg_loss = total_loss / max(num_batches, 1)
print(f'Epoch {epoch+1}/{epochs} completed. Average Loss: {avg_loss:.4f}')

# Save checkpoint
if (epoch + 1) % 10 == 0:
checkpoint_path = f'/content/checkpoint_epoch_{epoch+1}.pt'
torch.save(model.model.state_dict(), checkpoint_path)
print(f'Checkpoint saved: {checkpoint_path}')

# Example usage
print("Setting up datasets...")
train_dataset = RGBIRDataset('/content/dataset/images/train', '/content/dataset/labels/train')
val_dataset = RGBIRDataset('/content/dataset/images/val', '/content/dataset/labels/val')

# Start training
if len(train_dataset) > 0:
print("Starting training...")
train_4channel_yolo(model, train_dataset, val_dataset, epochs=50, batch_size=4)
else:
print("No training data found. Please check your dataset structure.")

5.3 Alternative: Using Ultralytics with Custom Preprocessing


python
# If you want to use the built-in Ultralytics training with some modifications
import yaml
from ultralytics import YOLO
import torch.nn as nn

def setup_ultralytics_training():
"""Setup training using Ultralytics framework with modifications"""

# Create dataset configuration


dataset_config = {
'path': '/content/dataset',
'train': 'images/train',
'val': 'images/val',
'nc': 1, # Number of classes
'names': ['drone'] # Class names
}

with open('/content/dataset.yaml', 'w') as f:


yaml.dump(dataset_config, f)

# Load and modify model


model = YOLO('yolov8n.pt')

# Modify first layer for 4 channels


first_conv = model.model[0].conv
new_conv = nn.Conv2d(4, first_conv.out_channels,
first_conv.kernel_size, first_conv.stride,
first_conv.padding, bias=False)

with torch.no_grad():
new_conv.weight[:, :3] = first_conv.weight
new_conv.weight[:, 3:4] = first_conv.weight.mean(dim=1, keepdim=True)

model.model[0].conv = new_conv

return model

# For this approach, you'll need to preprocess your data to 4-channel format first
def preprocess_dataset_to_4channel():
"""Convert RGB+IR pairs to 4-channel images"""
import glob

train_rgb_files = glob.glob('/content/dataset/images/train/*_rgb.*')
val_rgb_files = glob.glob('/content/dataset/images/val/*_rgb.*')

def process_files(rgb_files, output_dir):


os.makedirs(output_dir, exist_ok=True)

for rgb_path in rgb_files:


ir_path = rgb_path.replace('_rgb', '_ir')

if os.path.exists(ir_path):
# Load images
rgb = cv2.imread(rgb_path)
rgb = cv2.cvtColor(rgb, cv2.COLOR_BGR2RGB)
ir = cv2.imread(ir_path, cv2.IMREAD_GRAYSCALE)

# Resize to same size


rgb = cv2.resize(rgb, (640, 640))
ir = cv2.resize(ir, (640, 640))

# Combine to 4-channel
combined = np.dstack([rgb, ir])

# Save as numpy array


base_name = os.path.basename(rgb_path).replace('_rgb', '')
output_file = os.path.join(output_dir, base_name.replace('.jpg', '.npy'))
np.save(output_file, combined)

process_files(train_rgb_files, '/content/dataset/processed/train')
process_files(val_rgb_files, '/content/dataset/processed/val')

print("Dataset preprocessing completed!")

# If using this approach:


# preprocess_dataset_to_4channel()
# model = setup_ultralytics_training()
# model.train(data='/content/dataset.yaml', epochs=50, imgsz=640, batch=4)

Step 6: Quick Test and Validation


python

def test_4channel_model():
"""Test if the 4-channel model works correctly"""

# Create dummy 4-channel input


dummy_input = torch.randn(1, 4, 640, 640) # Batch=1, Channels=4, H=640, W=640

# Test model
model = create_4channel_yolo_model()
model.model.eval()

try:
with torch.no_grad():
output = model.model(dummy_input)
print("✅ 4-channel model test passed!")
print(f"Input shape: {dummy_input.shape}")
print(f"Output type: {type(output)}")
return True
except Exception as e:
print(f"❌ Model test failed: {e}")
return False

# Run test
test_4channel_model()

Step 6: Data Preprocessing Helper Functions


python
def preprocess_rgbir_image(rgb_path, ir_path, size=640):
"""Preprocess RGB and IR images for training"""
# Load RGB image
rgb = cv2.imread(rgb_path)
rgb = cv2.cvtColor(rgb, cv2.COLOR_BGR2RGB)
rgb = cv2.resize(rgb, (size, size))

# Load IR image
ir = cv2.imread(ir_path, cv2.IMREAD_GRAYSCALE)
ir = cv2.resize(ir, (size, size))
ir = np.expand_dims(ir, axis=2) # Add channel dimension

# Combine
combined = np.concatenate([rgb, ir], axis=2) # Shape: (640, 640, 4)
combined = combined.astype(np.float32) / 255.0
combined = np.transpose(combined, (2, 0, 1)) # Shape: (4, 640, 640)

return combined

def visualize_4channel_data(rgb_path, ir_path):


"""Visualize RGB and IR data side by side"""
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# RGB image
rgb = cv2.imread(rgb_path)
rgb = cv2.cvtColor(rgb, cv2.COLOR_BGR2RGB)
axes[0].imshow(rgb)
axes[0].set_title('RGB Image')
axes[0].axis('off')

# IR image
ir = cv2.imread(ir_path, cv2.IMREAD_GRAYSCALE)
axes[1].imshow(ir, cmap='hot')
axes[1].set_title('IR Image')
axes[1].axis('off')

# Combined visualization (RGB with IR overlay)


combined = rgb.copy()
ir_colored = cv2.applyColorMap(ir, cv2.COLORMAP_HOT)
ir_colored = cv2.cvtColor(ir_colored, cv2.COLOR_BGR2RGB)
overlay = cv2.addWeighted(combined, 0.7, ir_colored, 0.3, 0)
axes[2].imshow(overlay)
axes[2].set_title('RGB + IR Overlay')
axes[2].axis('off')
plt.tight_layout()
plt.show()

Step 7: Model Evaluation and Inference

python

def evaluate_model(model, test_loader):


"""Evaluate the trained model"""
model.eval()
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

with torch.no_grad():
for batch_idx, (data, targets) in enumerate(test_loader):
data = data.to(device)
outputs = model(data)

# Process outputs for visualization


# Implementation depends on your specific needs

def inference_4channel(model, rgb_path, ir_path):


"""Run inference on RGB+IR image pair"""
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
model.eval()

# Preprocess images
combined_img = preprocess_rgbir_image(rgb_path, ir_path)
combined_img = torch.from_numpy(combined_img).unsqueeze(0).to(device)

# Run inference
with torch.no_grad():
results = model(combined_img)

return results

Step 8: Training Execution Commands


python

# Mount Google Drive (if using Drive for data storage)


from google.colab import drive
drive.mount('/content/drive')

# Copy dataset from Drive (adjust path as needed)


!cp -r "/content/drive/MyDrive/drone_dataset" "/content/dataset"

# Create the model and start training


model = create_4channel_model()

# If using the ultralytics approach:


# model.train(
# data='/content/dataset.yaml',
# epochs=100,
# imgsz=640,
# batch=16,
# device=0 if torch.cuda.is_available() else 'cpu'
# )

print("Training completed!")

Important Notes:
1. Data Format: Ensure RGB and IR images have the same filename structure
2. Memory Management: 4-channel processing requires more GPU memory

3. Batch Size: Reduce batch size if you encounter OOM errors


4. Preprocessing: Normalize IR channel appropriately (0-255 or 0-1)

5. Loss Function: You may need to implement custom loss calculation for the training loop
6. Validation: Implement proper validation metrics for drone detection

Troubleshooting:
OOM Errors: Reduce batch size and image size

Slow Training: Use mixed precision training with torch.cuda.amp

Poor Convergence: Adjust learning rate and add data augmentation

Channel Mismatch: Ensure all preprocessing maintains 4-channel format

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy