Intelligent Object Sorting using Deep Reinforcement Learning Robot & Computer Vision

This repository holds the project files of 'Practical Course Robotics: WS21-22' presented at Universität Stuttgart.

* The idea is to use deep reinforcement learning (DRL) algorithm for robot object tending.
* For Proof of Concept, DRL algorithm's are benchmarked on openai-gym's 'FetchReach-v1' environment.
* DDPG is the best agent against PPO & TD3, considering the training rewards as a metric.
* New 'gym' wrapped 'rai' environment (env.) is designed using 'SolidWorks'.
* As solving the env. directly takes >4M episodes, the task is broken in parts to solve it faster.
* Wrapped functions are used to solve tasks.
* One of these functions is moving the robot point-to-point using the trained agent.
* Camera is used to build up object tending strategy to map the coloured objects to its coloured bin.
* This strategy is processed to tend the object in env. using the robot.

Proof of Concept

OpenAI Gym Environments,
- 'FetchReach-v1': The best agent is DDPG.
- DDPG Agent is benchmarked for training rewards with PPO and TD3 Agents.

Repository Setup Instructions

Clone & build rai from the github following it's installation instructions.

Clone this repository.

git clone --recursive https://github.com/KanishkNavale/robotics-lab-project

Add these in the .bashrc file

# Misc. Alias
alias python='python3'
alias pip='pip3'

# RAI Paths
export PATH="$HOME/rai/bin:$PATH"
export PYTHONPATH="${PYTHONPATH}:/usr/local/lib/rai"

# Practical Robotics Lab Project Package
export PYTHONPATH="${PYTHONPATH}:$HOME/robotics-lab-project/"

Source the modified .bashrc file
```
source ~/.bashrc
```

Install python package prequisites

cd $HOME/robotics-lab-project
pip install -r requirements.txt

1. Engineering the Deep Deterministic Policy Gradient (DDPG) Algorithm

About: The Deep Deterministic Policy Gradient (DDPG) agent is an off policy algorithm and can be thought of as DQN for continuous action spaces. It learns a policy (the actor) and a Q-function (the critic). The policy is deterministic and its parameters are updated based on applying the chain rule to the Q-function learnt (expected reward). The Q-function is updated based on the Bellman equation, as in Q learning. (Source & Further Reading)

Vanilla DDPG Agent	DDPG Agent + Parametric Exploration Noise + PER

2. Outcomes: Using Prioritized Experience Replay Buffer + Parametric Exploration Noise

Parameter space noise allows reinforcement learning algorithms to explore by perturbing parameters instead of actions, often leading to significantly improved exploration performance. (Source)
Prioritized Experience Replay (PER) is a type of experience replay in reinforcement learning frequently replay transitions with high expected learning progress are learnt more, as measured by the magnitude of their temporal-difference (TD) error. (Source)

Without	Parametric Noise Overview	With PER + Parametric Noise

Result: The DDPG Agent is 5 times better (metric: training rewards) with PER & Parametric Exploration.

3. Training DDPG Agent for Point-to-Point Robot Trajectory

Training Profile	Testing Profile

The objective is to reach the random target position using DDPG Agent.
For each play step in a game,
- Build: state = Current Robot TCP(x, y, z) | Target Location P(x, y, z)
- Compute: action = actor.choose_noisy_action(state)
- Get: next_state, reward, done = env.step(action)
DDPG Agent is optimized to maximize the reward for each play step over the games.

4. Vision based Pose Detection

Object Pose is computed by processing point cloud and RGB data.

5. Logging the Process Data

The object data is saved in .json format and processed image too.

[
    {
        "Object ID": 0,
        "Camera Coordinates [u, v]": [
            320,
            169
        ],
        "World Coordinates [x, y, z]": [
            -0.0022170670613970446,
            -0.00854486748731096,
            1.0097603467432426
        ],
        "Color": "red"
    },
    {
        "Object ID": 1,
        "Camera Coordinates [u, v]": [
            306,
            179
        ],
        "World Coordinates [x, y, z]": [
            0.04528890767445167,
            0.02470116320227714,
            1.0080491988625047
        ],
        "Color": "blue"
    }
]

5. Object Sorting Process

The processed data is dumped in the folder

6. Runnnig the App

cd main
python main.py

Developers

Olga Klimashevska
Kanishk Navale,
- Email: navalekanishk@gmail.com
- Website: https://kanishknavale.github.io/

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
.github/workflows		.github/workflows
main		main
presentation		presentation
project_proposal		project_proposal
proof_of_concept		proof_of_concept
rai_gym		rai_gym
reading_resources		reading_resources
replay_buffers		replay_buffers
rl_agents		rl_agents
robot_scene		robot_scene
testing_ground/test_reach		testing_ground/test_reach
trained_agents		trained_agents
training_ground		training_ground
vision_pipeline		vision_pipeline
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
project_progress.md		project_progress.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Intelligent Object Sorting using Deep Reinforcement Learning Robot & Computer Vision

Proof of Concept

Repository Setup Instructions

1. Engineering the Deep Deterministic Policy Gradient (DDPG) Algorithm

2. Outcomes: Using Prioritized Experience Replay Buffer + Parametric Exploration Noise

3. Training DDPG Agent for Point-to-Point Robot Trajectory

4. Vision based Pose Detection

5. Logging the Process Data

5. Object Sorting Process

6. Runnnig the App

Developers

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

License

KanishkNavale/AI-for-Robot-Tasks

Folders and files

Latest commit

History

Repository files navigation

Intelligent Object Sorting using Deep Reinforcement Learning Robot & Computer Vision

Proof of Concept

Repository Setup Instructions

1. Engineering the Deep Deterministic Policy Gradient (DDPG) Algorithm

2. Outcomes: Using Prioritized Experience Replay Buffer + Parametric Exploration Noise

3. Training DDPG Agent for Point-to-Point Robot Trajectory

4. Vision based Pose Detection

5. Logging the Process Data

5. Object Sorting Process

6. Runnnig the App

Developers

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Packages