Real-Time Augmentation of A Children's Card Game: Jordan Rabet
Real-Time Augmentation of A Children's Card Game: Jordan Rabet
Jordan Rabet
Stanford University
jrabet@stanford.edu
Abstract—Still today, trading card games make up a sizable 2) The scene was controlled : players had to place a
part of the entertainment industry. However, their tired paper mat supplied with the game to place their cards on.
form is being rapidly taken over by digital equivalents. In this Additionally, the camera had to be mounted above said
paper, we propose a way to reinvigorate interest in physical
trading card games by making them more interactive through mat using the supplied camera stand. The camera had
real-time 3D augmentation. While our ideal application would to be immobile during the duration of the game.
run on an AR headset such as the Hololens, we build a proof-of- 3) The game ran on the Playstation 3, a powerful non-
concept application on a commercially available Android tablet mobile device, more powerful than modern tablets.
which is able to augment existing Pokemon trading card. We 4) The camera used (the Playstation Eye) was designed for
show that using a modern CPU in conjunction with a GPU, this
task is entirely tractable on a mobile device with few restrictions. computer vision purposes, and was therefore capable of
streaming uncompressed video at high framerates (up to
120 frames per second).
I. I NTRODUCTION A more recent, similar example is ”Drakerz”, a game released
Many trading card games are based on the idea of cards in 2014 for the PC. While it does not require a playmat, it
representing monsters or other entities which are used to still requires that the camera be immobile and mounted above
fight one another. Playing one prominent example of such the playing field. It also runs only on modern computers.
a game, Yu-Gi-Oh, is often portrayed in media as being
B. Contributions
accompanied by holograms representing those monsters
and their actions, which make the game more fun and The main contribution of this project is building an end-
engaging to play. The goal of this project is to build a mobile to-end system which is able to detect, classify, track and
application which is able to augment an existing trading augment multiple commercially-available trading cards at once
card game in a similar fashion. While the ideal target for in real time on a mobile device. It mostly differs from previous
such an application would be an augmented reality headset, comparable applications by the fact that the camera need not
the application was developed for a commercially available be fixed, that the required computations were optimized for a
Android tablet (specifically, the Nvidia SHIELD tablet we mobile device, and that the gamecards being augmented were
have been provided) because of the current lack of such not designed for the purpose of augmentation, making the task
headsets as consumer products. more challenging.
C. Card classifier
Image classification was not the focus of this project, so
in the interest of time little of our resources were put into
making it. As such, the chosen classifier is not particularly
Fig. 3. Sample run of detection pipeline. From top to bottom : original image,
color-filtered image, edge-map, ”border-map”, filtered connected components.
(each color represents a different component)
of course desirable in order to achieve good performance
and smooth augmentation. The tracker works separately for
each card in order to make independent card movement
possible, though tasks which can are batched together for
peformance reasons. The tracker is first initialized for a
card when it receives an initial position from the card
detector. When that happens, the tracker detects Shi-Tomasi
”Good Features to Track” [4] as implemented in OpenCV’s
goodFeaturesToTrack method and saves those which are
Fig. 4. Sample product of a connected component by the edge map. Left located within the initial quadrilateral estimate.
: original edge map. Right : product of edge map by connected component
border. .
When a new frame is received, the first thing done by the
tracker is computing KLT optical flow for the card’s features.
This is done to get an initial estimate of the motion between
the two frames : since our target is planar, a homography is
well adapted to the problem; it was mostly made as a proof-
computed between the frames using those feature matches
of-concept, as well as to take advantage of the geometry
(with RANSAC for outlier resiliance), and this homography
consistency test’s result in the detector and to make the final
is then applied to the card’s four corners. Unfortunately,
augmenter’s output look nicer without having to manually pick
doing just this is not enough : this kind of optical-flow
the types of the detected cards.
based tracking is extremely prone to drift, especially in
The method which was implemented is a simple version of
low resolution and noisy environments. While it might be
a bag-of-words classifier. First, we extract SURF features
able to maintain a good estimate of the card’s location, the
from training images in order to cluster them into a visual
card’s shape slowly changes over time, which is extremely
vocabulary. Then, we train a 1-versus-all SVM for each of the
problematic given that the card’s shape being accurate is
target images based on its histogram response to the full visual
essential to good pose estimation.
vocabulary. With that done, classification can be performed
Due to this, the optical flow tracking is only used as an
by taking the rectified card candidate query image extracted
initial estimate each frame. Once this initial estimate is done,
by the detector, extracting SURF features from it, matching
it is used as a search region for the actual new position of
those features to the vocabulary, computing the histogram
the card. A quad of slightly larger size is used as region of
response and running it through all the SVMs. We then get
interest in an edge map (again computed using the Canny
a score indicating how the confidence in the query image
edge detector), which is then fed to the same quad fitting
matching one of the given training images. We take all n cards
algorithm as the one previously described in the card detector,
which have a confidence score above a certain threshold, and
which a few differences. First, the edge map has to be filtered.
then run a geometry consistency test on them, by matching
Instead of relying on color (which can be unreliable, partly
SURF features from the query image with those from training
due to glare) to isolate the border, we filter out the card’s
images directly and computing a homography using RANSAC.
contents by only keeping the left-most and right-most pixels
Finally, the training image whose homography is computed
on each row, as the top-most and bottom-most pixels on each
with the highest number of inliers is chosen as the one which
column. This filtering can be done efficiently and in practice
matches the query image. Doing this geometry consistency
gives completely usable results, as can be seen in Figure ??.
test is especially useful because it gives us the query image’s
Additionally, instead of having the quad fitting algorithm look
absolute orientation, which in practice is necessary information
for the quadrilateral with the largest area, it instead looks
for our pose estimator.
for the one that minimizes the cumulative distance between
As previously mentioned, classification is not the focus of
the old and new quads. In case the tracker is unable to fit
this project; as a result, this approach is fairly slow and
a quad, it sticks with the initial estimate based on optical
not immediately scalable to large numbers of card types. In
flow. An overview of the card tracker can be found in Figure 6.
practice, given the high number of individual Pokemon (over
700, each of which require their own 3D model, textures,
Another problem which the tracker has to deal with is
animations, sounds...) and of individual Pokemon cards (on the
sudden camera movements, both big and small, which throw
order of 10, 000), one could imagine that hosting the classifier
off the KLT tracker. Indeed, a problem with the approach
as a service on a distant server which the client would send
presented above is that if the optical flow tracker’s initial
query images to, and would in return give information on the
estimate is too far off, then the quad fitting algorithm will
card accompanied by the assets necessary for augmentation.
fail to recover from it. While smooth movements typically
result in good initial estimates, quick camera jerks (typically
D. Card tracker
unintentional) can ruin the tracker’s accuracy. In order to
The goal of the card tracker is to determine the movement deal with this, we first attempt to detect those instance by
of a card between two frames so that neither the detector computing the variance of the distance betwen corresponding
nor the classifier has to be run again each frame, which is card corners as estimated by the optical flow tracker. The
Fig. 5. Sample product of the edge map border filter. Left : original edge
map. Right : border-filtered edge map. .
TABLE I
P ERFORMANCE OF THE MAIN LOOP. P ERFORMANCE RECORDED WHEN
TRACKING 4 CARDS .
V. C ONCLUSION
With this paper, we demonstrated that it is entirely possible
to augment a trading card game in real time even in spite
of the absence of specially designed markers on cards, of a
fixed camera or of heavy computation capabilities. We paid
close attention to code and algorithm optimization in order
to get the system running smoothly on a mobile device, the
Nvidia SHIELD tablet. We leveraged of both the CPU and
the GPU for computation, and found that our card tracker is
robust enough to handle users moving cards while the camera
is also moving independently. We made suggestions for ways
to expand on the project in both purely technical ways and
more feature-oriented ones.
R EFERENCES
Fig. 13. An example of the tracker keeping up with a variety of user card
movements. [1] Jun Rekimoto and Yuji Ayatsuka, CyberCode: Designing Augmented
Reality Environments with Visual Tags 2001.
[2] G. Schweighofer and A. Pinz, Robust pose estimation from a planar target
2006.
[3] Shiqi Li and Chi Xu, Efficient Lookup Table Based Camera Pose
Estimation for Augmented Reality 2011.
[4] Jianbo Shi and Carlo Tomasi, Good Features to Track 1994.
[5] Pokepedia, Pokepedia list of Pokemon cards. http://www.pokepedia.net/