0% found this document useful (0 votes)
5 views49 pages

(2019) (Plus - Ai) Lane Marking Segmentation

The document outlines an internship project focused on improving lane marking segmentation through the development of a semi-automatic labeling tool. It discusses existing problems with current lane detection models, background research on image segmentation techniques, and the results of the project, which include a functional prototype with various features. Future goals include enhancing the tool's capabilities and integrating neural networks for more efficient annotation processes.

Uploaded by

nguyen hung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views49 pages

(2019) (Plus - Ai) Lane Marking Segmentation

The document outlines an internship project focused on improving lane marking segmentation through the development of a semi-automatic labeling tool. It discusses existing problems with current lane detection models, background research on image segmentation techniques, and the results of the project, which include a functional prototype with various features. Future goals include enhancing the tool's capabilities and integrating neural networks for more efficient annotation processes.

Uploaded by

nguyen hung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Lane Marking Segmentation

Summer 2019 Internship Project | Kevin Tan & Julian Jiang


Outline

● Project Description

● Background Research

● Experiments

● Results

● Conclusion

● Internship Highlights
Project
Description
Existing Problem

● Currently the training data for our lane detection model is very archaic,
consisting of only points that make up a polyline.
● This kind of training data is not semantically rich, and, thus, limits the
theoretical capabilities of our lane detection model.
a. Dashed lanes are labeled in the same way as solid lanes.
b. The individual dashes that make up a dashed lane are not extracted.
c. Important, non-lane-line road markings like crosswalks and arrows are completely ignored.
Case Study 1
Aforementioned problems:

1. Dashed lanes are labeled in the


same way as solid lanes.

2. The individual dashes that


make up a dashed lane are not
extracted.
3
3. Important, non-lane-line road
markings are completely
ignored.

Examples of our lane detection model’s output


Case Study 2
Aforementioned problems:

1. Dashed lanes are labeled in the


same way as solid lanes.

2. The individual dashes that


make up a dashed lane are not
extracted.
1&2
3. Important, non-lane-line road
markings are completely
ignored.

Examples of our lane detection model’s output


Task Description

I was tasked with designing and building a semi-automatic lane-marking


labeling tool to help overcome the deficiencies in our current lane detection
model from the perspective of improving the quality of our training data.
We would be able to perceive important
road markings, like this merge arrow.

We wouldn’t misclassify this as solid, when


it is clearly made up of 2 parts.

We would have a notion of thickness


to better gauge perspective.

Examples of how our lane detection model will improve with a new type of data
Background
Research
Region Grow

This technique is the simplest and most naive image segmentation method,
and only works when the foreground object's pixel values are close to each
other and, as a collection, far away from the background pixels' values, for
some measure of pixel value distance.
Naive Region Grow

● Pick a threshold T and a seed point S.


● Create a queue Q and set A initially containing S.
● While Q is not empty:
○ Dequeue item C from Q
○ For all the neighbors D of C, if D is in the image bounds and I(D) is within T of I(S):
■ Add D to A and enqueue into Q
○ Otherwise do nothing
● At the end of this process, we our set A contains all the foreground object’s
pixel locations.
Graphs

Recall that a graph is a mathematical structure that encodes the information


content of a collection of interrelated objects. The simplest kind of graph is an
unweighted, undirected graph; let G be this kind of graph.

Then we say that G = (V, E) where:

● V is the vertex set (e.g. {a, b, c})


● E is the edge set (e.g. {{a, b}, {b, c}, {c, a}})
Direction and Weights

Graphs can also be directed and weighted; each of these two properties will
change a graph G’s mathematical representation.

Directed: G = (V, E) except E is now a set of tuples instead of sets.

Weighted: G = (V, E, w) with a function w: E -> R


Graph Cuts

No matter what graph species you have, there’s a notion of a cut or binary
partition of your graph.
Cut Values

For each graph cut, you separate the vertices of you graph into two disjoint
sets. There is a notion of the value of a cut; for unweighted graphs this is just
the number of edges cut and for weighted graphs this is the sum of the weights
of the cut edges.
Minimum Cuts

It’s sometimes of practical importance to compute the minimum value cut in a


graph (i.e. segmentation). There are a number of efficient algorithms to
compute a minimum cut given a graph.
Graph Cuts in Image Segmentation I

1. A graph from the image is constructed, where every pixel has a


corresponding graph node and is connected to its 4 or 8 cardinal
neighbors. The weights are determined in such a way that pixels that
should be grouped together have high edge weights and pixels that should
not be together have low edge weights.
2. A minimum cut is computed to extract a foreground object.
Graph Cuts in Image Segmentation II

Here are some examples of


how we can determine the
weights that connect graph
nodes. We can operate on a
number of different features
like distance, intensity, color,
and texture.
Adding in User Input

Because we have a human in the loop during annotation, we don’t want a static
algorithm. Instead, we want something that can take in user input to change the
extraction result. Our previous formulation doesn’t really allow for that.

The solution is to use an s-t graph.


s-t (source-sink) graph formulation of images + graph cuts
Behind the Scenes: User Interaction

1. In order to change the graph cut result, we need to change the weights
connecting the nodes of our graph.
Green lines increase
2. Thus, user interaction must be changing the weights. these weight values.
3. Green lines connect nodes more strongly to the source.
4. Red lines connect nodes more strongly to the sink.
These weights are determined from previous
methods (location, color similarity, etc.)

Red lines increase


these weight values.
Minimizing User Input: GrabCut

As an even better improvement upon interactive graph cut segmentation, we


can take just a bounding box, estimating the color distribution of the foreground
and background objects, and then initialize the weights of our graph more
intelligently.

This requires:

● Markov Random Field (MRF)


● Energy functionals
● Gaussian Mixture Model
Limitations of Minimum Cuts in Image Seg.

The weight of any given cut in a graph is roughly proportional to the number of
edges in the cut set. Thus, minimum cut techniques used for image
segmentation and foreground extraction will prefer to remove as few edges as
possible and thereby have a shrinking bias.

This bias means that it will tend to give us a smaller foreground object that we
actually want. There are a variety of ways to combat this, including normalized
cuts and alternative formulations (i.e. Markov Random Fields).
Experiments
At first I tried to build a server from scratch
using Python’s http module.

And ran into a lot of issues regarding CORS


Python Server and preflight requests, because the API was
too low level.
I didn’t know JavaScript or any web
programming at first so I chose to Then I rebuilt my server using Python’s flask
work in Python to realize my module, which solved some issues, but data
annotation tool. transmission was still slow and painful.

However, this led to many different


problems. Then Jinwen told me that I had to implement
an airtight RESTful API for my server that
managed cached images and intermediary
state.

Then Old Huang told me that my server had


to be able to respond to many different
people at the same time.

After considering my options, I chose


to give up.
But not without giving it a good hard try!!
I still learned a lot from the experience.

● Writing servers
● Debugging in Python
● Getting familiar with OpenCV
● Bugfree optimizations for my functions
that I eventually copied into my current
tool
Then I embarked on a long and hard journey to learn web development.
Version 1
Version 2
Version 3
但是我还是稳住了
End Product
Results
Original 1
Original 2
Original 3
Conclusion
What has been achieved

A fully functional prototype for a fine grain lane marking annotation tool with:

● The agency to input a rectangle and get back an extraction result


● The ability to refine the extraction result with minimal correction lines
● A smart bounding box feature for quick and accurate rectangle creation
● Ergonomic and intuitive keyboard shortcuts
● A local cache for the trichromatic annotation results that makes image
switching fast and responsive
● Descriptive documentation comments for all important functions
What remains to be achieved

● An undo feature that allows users to revert mistakes!


● Enabling fully automatic rectangle generation (for use as GrabCut input) by
incorporating our current lane detection model, finding the contours in the
mask image [cv.findContours], and seeding the input canvas with a bunch
of rectangles for the annotator to use [cv.minAreaRect].
● Porting the vanilla JavaScript version of the annotation tool into React
before bringing it online for our annotators to use.
● Writing some graph cut based image segmentation functions better at
retrieving thin and elongated objects, compiling it to WebAssembly, and
adding it to the arsenal of pre-existing features.
Vision for the future

● Adding a repository of special-purpose neural networks to not only speed


up lane marking annotation but make possible efficient fine grain labeling
of other objects like pedestrians, passenger vehicles, and trucks.
● Constructing a pipeline whereby every annotation result from every
annotator improves the neural networks in the central repository, so the
annotation tool can “grow as it goes.”
Internship Highlights
Chilling with my mentor Julian
Going to Taihu on our 3rd day of work!
Seeing our
trucks in
action!
Hanging
out with
the other
interns!
Getting hit in
the face by
ShengLiang
even though
we were on
the same
team!!
Hanging out
with my
roommate
Kaili!
Lmao same expression XD

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy