0% found this document useful (0 votes)
24 views11 pages

Wi-Fi Fingerprinting in The Real World - Rtls@Um at The Evaal Competition

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views11 pages

Wi-Fi Fingerprinting in The Real World - Rtls@Um at The Evaal Competition

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/308855851

Wi-Fi fingerprinting in the real world - RTLS@UM at the EvAAL competition

Conference Paper · October 2015


DOI: 10.1109/IPIN.2015.7346967

CITATIONS READS
93 702

4 authors:

Adriano Moreira Maria João Nicolau


University of Minho University of Minho
135 PUBLICATIONS 2,469 CITATIONS 70 PUBLICATIONS 827 CITATIONS

SEE PROFILE SEE PROFILE

Filipe Meneses António Duarte Costa


University of Minho University of Minho
49 PUBLICATIONS 758 CITATIONS 97 PUBLICATIONS 855 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Filipe Meneses on 15 January 2020.

The user has requested enhancement of the downloaded file.


2015 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 13-16 October 2015, Banff, Alberta, Canada

Wi-Fi Fingerprinting in the Real World –


RTLS@UM at the EvAAL Competition
Adriano Moreira1, Maria João Nicolau1, Filipe Meneses1,2, António Costa1
1
Algoritmi Research Centre; 2Centro de Computação Gráfica (CCG)
University of Minho
Portugal
{adriano.moreira, joao.nicolau, antonio.costa}@algoritmi.uminho.pt; filipe.meneses@ccg.pt

Abstract — Research and development around indoor communications, inertial systems, etc. Wi-Fi fingerprinting is
positioning and navigation is capturing the attention of an one of the most popular since Wi-Fi networks are, nowadays,
increasing number of research groups and labs around the installed almost everywhere, therefore supporting the
world. Among the several techniques being proposed for indoor deployment of indoor location systems in buildings without the
positioning, solutions based on Wi-Fi fingerprinting are the most need of specific infrastructures.
popular since they exploit existing WLAN infrastructures to
support software-only positioning, tracking and navigation Wi-Fi fingerprinting is based on measuring the intensity of
applications. Despite the enormous research efforts in this the received radio signals (RSSI - Received Signal Strength
domain, and despite the existence of some commercial products Indicator) of the Access Points that are available in a place
based on Wi-Fi fingerprinting, it is still difficult to compare the (fingerprint) and in comparing it with a previously built radio
performance, in the real world, of the several existing solutions. map [1]. The radio map is a database that contains a list of
The EvAAL competition, hosted by the IPIN 2015 conference, fingerprints and the corresponding real locations. The radio
contributed to fill this gap. This paper describes the experience of map is often build by hand, by manually collecting and
the RTLS@UM team in participating in track 3 of that annotating fingerprints in all the spaces that are covered by the
competition. indoor positioning system. The location of a device is
determined by computing the similarity between a fingerprint
Keywords—indoor positioning; Wi-Fi fingerprinting;
collected by the device (online phase) and the fingerprints
competition; benchmarking
contained in the radio map (offline phase).
I. INTRODUCTION Due to its infrastructure-free characteristics, the research
In the old days, we were used to think of location mostly as community is intensively studying Wi-Fi fingerprinting, and a
something that expresses the person’ location as an address or large number of solutions have been proposed to solve, or
as a pair of coordinates in a well-known referential. It was used minimize, some of the problems associated to this technique.
mostly to locate (in an absolute referential such as, for Among those problems are the large effort (time and human
example, a postal address) and to guide persons (for example, effort) required to build and maintain high quality radio maps
using a GPS receiver). Today, location information has a for large spaces, the way different devices perceive the radio
broader usage and is considered an important feature. Many signals, the effects of multipath and fading, the dynamic nature
applications use location information to increase the quality of of the spaces with frequent layout changes and people moving
the information provided to the end users, personalizing around, the lack of indoor maps, and the limited precision
content and providing access to location-based information. and/or accuracy of the positioning estimation algorithms.
Outdoors, the usage of a GPS receiver has become almost a Despite the huge effort done over the last few years, and
standard since it is available worldwide, it is free for the users despite the significant improvements observed in minimizing
and provides the location in an absolute and universal some of the issues identified above, it continues to be difficult
referential. Acquiring the location of a user or a device indoors to compare the performance of different solutions, from
is a much more challenging task due to the inexistence of a different research teams.
universal solution.
This paper describes the experience of the RTLS@UM
In the last years, many researchers searched for adequate team in the EvAAL competition, where a set of participants
solutions to implement indoor positioning systems. Current subjected their Wi-Fi fingerprinting solutions to a competitive
research challenges include, among other topics, a standard benchmarking test. The EvAAL competition, its aim and
representation for indoor maps, a universal referential to general rules are described next. Section II introduces the
represent the indoor space (for example, as coordinates or as a datasets that supported track 3 of the competition and presents
symbolic space created by a set of divisions in different floors its analysis. The positioning estimation approaches used by our
and buildings), and a universal way to acquire the location with team are described in section III, followed by a discussion
high accuracy. about how the most relevant parameters were adjusted for the
competition. Section V presents the final results obtained by
Many different technologies have been used to build indoor our team in the competition, as well as a summary of the other
positioning solutions, including short-range beacons, optical competitors’ results.
978-1-4673-8402-5/15/$31.00 ©2015 IEEE
2015 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 13-16 October 2015, Banff, Alberta, Canada
A. The EvAAL competition at IPIN 2015 II. THE DATASETS
Evaluating Ambient Assisted Living (AAL) systems is a Track 3 of the EvAAL competition is based on the
challenge due to the complexity of such systems and variety of processing of real world data provided by the organizers in the
solutions adopted. The EvAAL – Evaluating AAL Systems form of three datasets: a training dataset (T), a validation
Through Competitive Benchmarking is a research program dataset (V) and a final test dataset (U). The first two datasets (T
created to evaluate pervasive and ubiquitous systems by and V) are publicly available from the UJIIndoorLoc Database
comparing working AAL/AmI (Ambient Intelligence) [5] and their characteristics are described in detail in [6]. The
technologies solutions in a controlled environment. EvAAL is final test database (U) was distributed through e-mail to the
supported by the AALOA association [2] and its first edition competition participants one and a half months before the
was organized in 2011 by the universAAL project [3]. deadline for submitting the results (see section V). The basic
characteristics of these databases, as described in [6], are
The 2015 edition of the EvAAL competition was hosted by
summarized next, along with new information we extracted
IPIN’2015 – International Conference on Indoor Positioning
from the databases and that we considered relevant for the
and Indoor Navigation, and it consisted of two on-site and one
competition.
off-site tracks [4]. In track 1 (on-site) – “Smartphone based
positioning” – competitors could use any sensor available on A. Dataset description and statistics
smartphones to accurately estimate online their position inside The data in the training, validation and final test databases
a large, public indoor area. In track 2 (also on-site) – “Foot- were collected in three buildings of the University Jaume I,
mounted pedestrian dead reckoning positioning” – competitors Spain, with each building having 4 or 5 floors [6]. Each record
could use MEMS sensors (inertial, compass and pressure (sample) is described by a vector, where the first 520
sensors) mounted on the feet to locate the user inside the same dimensions represent the measured RSSI values of the visible
large indoor area. In the off-site track 3, entitled “Wi-Fi Wi-Fi Access Points (AP), and the remaining 9 dimensions
fingerprinting in large environments”, the competitors had represent the coordinates, floor, building, space, relative
access to a large Wi-Fi fingerprinting database to which they position, user ID, device ID and timestamp associated to
could apply their algorithms to estimate the real position where, who and when the sample has been collected.
associated to a set of previously collected fingerprints. In this
track, all the competitors were required to apply their Table I shows a summary of the most relevant
algorithms to the same database, therefore enabling direct characteristics of each one of the three datasets. It must be
comparison of the achieved precision and accuracy. emphasized that the validation dataset (V) was provided to
enable competitors to tune their position estimation algorithms,
Track 3 of the competition uses the multi-building and and that the final test dataset (U) was the one for which the
multi-floor UJIIndoorLoc Database [5] to compare indoor competitors had to estimate the true position of the users.
location methodologies [4]. The participants had access to Therefore, the samples in the final test dataset naturally include
three different datasets: one for training, one for validation of much less information that the other two datasets, such as the
their solutions (it included the real position of the device and building, floor and coordinates associated with each
thus it was possible to compare the estimated position with the fingerprint.
ground truth), and a final testing database containing a set of
fingerprints collected at unknown locations, used for There are, however, some data missing from the validation
comparing the performance of the solutions of the different and final test datasets that would help in estimating the users’
competitors. positions without turning the process unrealistic, such as the
information about the users (see section II-B). It is also strange
B. Names and conventions that the validation dataset only refers to 13 distinct places.
Throughout this paper, the following names and
conventions are used: Regarding the devices used for collecting the data, 9 of the
11 devices used to collect data for the validation dataset are
T – The training dataset new compared to the training dataset, while 5 of the 7 devices
V – The validation dataset used to collect data for the final test dataset are also new.
U – The final testing dataset
R – The radio map used for positioning estimation
TABLE I. MAIN CHARACTERISTICS OF THE THREE DATASETS.
fpi – Denotes fingerprint i of a radio map Training Validation Final test
fp0 – Denotes a test fingerprint (unknown position) Samples 19 937 1 111 5 179
APin – Denotes the nth strongest AP in fingerprint fpi Distinct users 18 NA NA
rssiij – Denotes the RSSI value of the ith AP in fpj Distinct devices 16 11 7
k – The number of neighbours in k-nearest neighbours Distinct buildings 3 3 NA
approaches
Distinct floor 13 13 NA
p – The estimated position (pair of coordinates) Distinct places 735 13 NA
f – The estimated floor Distinct positions 933 1 074 NA
b – The estimated building
2015 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 13-16 October 2015, Banff, Alberta, Canada
Data for the three dataset were collected over different
periods of time. For the training dataset, users collected data in
6 days within the period running from 30 of May to 20 of June
2013. For the validation dataset, users collected data in 9 days
within the period running from 19 of September to 8 of
October 2013. For the final test dataset, users collected data in
4 days divided into two periods: 2 days around the end of
November 2013, and 2 days at the end of March 2015. This
means that some of the testing data was collected more 19
months after the data collected for the radio map (training
data). The collection periods and distribution of the collected
samples per day are shown in Table II and Figure 1,
respectively.

TABLE II. OTHER CHARACTERISTICS OF THE THREE DATSETS. Fig. 2 Maximum distance between the oldest sampling point and
Training Validation Final test any subsequent sampling point where the same AP was observed.
Collection periods 30 May 19 Sep 29 Nov
This information was used later in our approach to support the
(from) 2013 2013 2013
building of the radio map, in order to maximize the position
Collection periods 20 Jun 8 Oct 31 Mar estimation performance (see section III-A).
(to) 2013 2013 2015
We also observed that the training dataset includes 76
Number of days 6 9 4 samples where not a single AP was observed. These “invalid”
Number of observed samples are useless and can be removed from the radio map.
465 367 270
APs No invalid samples were detected in the validation and final
APs common to test datasets.
- 312 246
training Across the training dataset we found a set of 291 samples
New APs, not with strange RSSI values (values higher than -15dBm, some
- 55 24
observed in training equal to 0dBm). The authors of the dataset refer to this
New APs, not in characteristic of the data in [6], but its causes were not
training or - - 0 identified.
validation
The analysis of the training and validation datasets also
Invalid samples 76 0 0 suggests that some APs were relocated during the data
Strange samples
291 0 0
collection process, or that some observations of mobile
(RSSI>-15dBm) hotspots were also included in the samples. This suspicion is
supported by the fact that some APs were observed in too far
The number of observed Access Points is also different way locations, as illustrated in Fig. 2. As shown in Fig. 2, there
are 13 APs that were observed in locations more than 200
across the three datasets: 465 APs in the training dataset, 367 in
the validation dataset, and 270 in the final test dataset. We also meters apart. Since one part of the final test dataset was
observed that the final test dataset includes 24 APs that are not collected more than 19 months later than the training dataset, it
observed in the training dataset (see Table II). On the other is possible that a lot more APs were relocated during that
hand, all the APs observed in the final test dataset were period. However, that cannot be confirmed since, naturally, the
observed in at least one of the training or validation datasets. final test dataset does not include the coordinates of the
sampling points.
B. EvAAL datasets vs. real datasets
As has been detailed in the previous section, each record of
the EvAAL final testing dataset U contains the RSSI values of
the observed APs, the device ID and a timestamp. Its 5179
records are associated to seven different device IDs.
Initially, our expectation was that the device ID field could
identify the unique user carrying the device to locate. A simple
analysis was conducted on the validation dataset V, to validate
this assumption. Samples were separated in vectors, one for
each device ID, and the vectors were sorted in ascending order
of the timestamp. Most of the vectors reveal only one valid
time sequence, that is, a trajectory that is compatible with the
path of a pedestrian. In some cases more than one sequence can
Fig. 1 Number of samples collected per day for the three datasets. be identified, separated by a time interval greater than 3
2015 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 13-16 October 2015, Banff, Alberta, Canada
minutes. However, it was possible to observe some samples correct the position initially computed by the location
with the same device ID and timestamp in very different algorithm.
locations. This only occurred with device ID 20. To illustrate
As, through the performed analysis, we found that we do
this situation, the sequence of speed values, computed from
consecutive samples, was built. The histogram of those speed not have access to the identification of unique users in the
EvAAL datasets, we could not use all our algorithms,
values, for the samples associated to device ID 20, is shown in
Fig. 3. Clearly, there are values that exceed by far the normal particularly those that use historical information. We argue that
a realistic dataset includes information about the target device,
speed of a pedestrian.
like a serial number or MAC address, therefore enabling the
Based on this analysis we conclude that the device ID field use of historical data to improve the positioning accuracy.
included in all EvAAL datasets do not identify uniquely a user,
but instead the type of device. It is, therefore, quite likely that III. POSITION ESTIMATION
more than one user collected data, simultaneously, using For the participation in the EvAAL/IPIN 2015 competition,
similar devices. That is, the field device ID in the datasets the RTLS@UM team adopted an approach that encompasses a
maps directly the hardware model and software version of the process to build the radio map and two methods to estimate the
devices used in the data collection process, but not a specific positions associated to each one of the samples in the final test
device, and definitely, there is no guarantee that only one dataset (each team was allowed to submit up to 5 attempts with
device of each type was used simultaneously. the estimated positions – see section V).
This fact represented a major inconvenience for us as some For building the radio map, minimal processing of the
of our location estimation algorithms use historical information training and validation datasets was performed, with
to improve their accuracy. If a user is at a given position in a processing mainly aiming at reducing the consequences
given instant of time, it is possible to compute the maximum associated to the use of multiple devices to collect the data.
displacement in the next time interval by considering his/her
velocity and direction. One simple way is to consider a For position estimation, two alternative approaches were
predefined maximum speed for a pedestrian inside a building, used: (i) a hierarchical approach, based on filtering, majority
in all directions, as a theoretical limitation. However, a more rules and k-Nearest Neighbour estimation, where the building,
complex estimation of velocity and direction can be derived floor, and coordinates are estimated one at a time; (ii) a “flat”
from known previous positions. approach based on Weighted k-Nearest Neighbour estimation.
The first approach, with three variants, was used to generate 3
The software we used in the competition already includes of the 5 final attempts, while the second approach, with
algorithms that use historical information. These algorithms are different values for one parameter, was used to generate the
based in PKNN (Predicted K Nearest Neighbours) [7]. PKNN other 2 final attempts.
uses recent past information of users to improve the accuracy
of the localization algorithm. The main idea is based on the A. Building the radio map
assumption that a user cannot travel a large distance in a short The process to build the radio map exploits the analysis
time. Thus, firstly, a maximum distance allowed in a certain described in section II, and tries to overcome the limitations of
time interval is defined. When the distance between the the training dataset and the problem associated with the
position calculated by the localization algorithm and the multitude of devices that was used to collect the samples [8].
position of the same device in the previous time instant (last
known position) exceeded the maximum distance allowed, the Since the final test dataset (U) includes samples where new
algorithm calculates a next possible position for the device APs are observed, compared to the set of APs observed in the
depending on the movement he held in his previous positions. training dataset (see Table II), the final radio map (R) used for
This position is called Next Possible Position and is used to estimating the positions associated to the final test samples was
built by joining the samples of the training and validation
datasets: R = T ∪ V. This decision involved some risk, since
we could not compare the results obtained through the use of
this joint radio map with the results obtained by using the
training dataset alone. Moreover, for tuning our positioning
estimation algorithms, we had to use a radio map based solely
on the training dataset. Invalid samples and samples with
strange RSSI values were not removed.
In order to deal with the diversity of devices that were used
to collect the samples, we implemented a solution inspired in
the method proposed by Laoudias et al. in [9]. The basic idea
proposed in [9] is to compute the histogram of all the RSSI
values observed by a particular device, and to fit that histogram
to those of the other (similar) devices through a simple shifting
operation (translation in the RSSI axis).
Fig. 3 Histogram of estimated speed values for device ID 20 In our solution, we designed a device normalization
(final test dataset – U). procedure where we started by computing a representative
2015 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 13-16 October 2015, Banff, Alberta, Canada
RSSI value for each device type d, rssid, defined as the mean of 6. Count the number of samples, from within the k1,
all the RSSI values collected by those devices. The mean, associated to each floor, and set f to the most
RSSID, of all representative RSSI values was then used as a frequent floor (majority rule).
reference value to correct all RSSI values from all devices by
shifting them by the quantity RSSID - rssid. After correction, the In step 4. above, the similarity function S() is the
Manhattan distance defined as:
representative RSSI values for all devices become equal.
The above procedure was tested while tuning our ! !
S fp! , fp! = × !!! rssi!! − rssi!! − 2×𝐶 (1)
positioning estimation algorithms (approach 1), where the !
training dataset (T) was used as the radio map, and the
validation dataset (V) was used for testing. The obtained results where N is the total number of APs observed in fp1 and/or fp2,
(see section IV) show a marginal gain of this procedure in and C is the number of APs that were observed in both fp1 and
reducing the impact of using different devices to collect the fp2 (common APs). For missing APs, in fp1 or fp2, a default
samples. RSSI value was used.

B. Position estimation - Approach 1: Filtering, majority Floor estimation – variant 2:


rules, and k-Nearest Neighbours The second procedure used to estimate the floor
In our approach 1, the estimation process starts by corresponding to a given fingerprint fp0 is similar to procedure
estimating the correct building (b), then estimates the correct 1, except that step 2. is now defined as:
floor (f) within that building and, finally, estimates the 2. Build R’’, a subset of R’, with all the samples where
coordinates (p) associated to a given fingerprint (fp0). Some of (RSSI01-∆RSSI) ≤ RSSIi1, ≤ (RSSI01+∆RSSI)
these processes are similar to the approach we described in (filtering).
[10].
In other words, R’’ is made of the samples in R’ where the
Building estimation: RSSI value of the strongest AP (RSSIi1) is similar, within
Given a fingerprint fp0, the corresponding building (b) is ∆RSSI, to the RSSI value of the strongest AP in fp0. In our
estimated as follows: experiments we used ∆RSSI=12.
1. Take AP01, the strongest AP observed in fp0. Floor estimation – variant 3:
2. Build R’, a subset of the radio map R, with all the The third procedure used to estimate the floor is similar to
samples where the strongest AP is AP01 (filtering). procedure 2, except that a simple 1-nearest neighbour
estimation function now replaces steps 5 and 6, that is, the
3. If R’ is an empty set, repeat steps 1 and 2 for the 2nd, estimated floor f is the floor associated to the sample most
3rd, ..., strongest AP in fp0. similar to fp0:
4. Count the number of samples in R’ associated to each 5. Set f to the floor associated to the most similar
building and set b to the most frequent building fingerprint.
(majority rule).
Coordinates estimation:
This procedure has a significant advantage in terms of
computing effort, compared to the procedure associated to our The procedure used to estimate the coordinates associated
approach 2, since no similarities between fingerprints need to to a given fingerprint fp0 is:
be computed. 1. Build R’’’, a subset of R’’ (from the floor estimation
For estimating the floor (f) corresponding to a given procedure), with all the samples where the floor is f
fingerprint fp0, three alternative procedures have been used. (the floor estimated in the previous step) (filtering).
Floor estimation – variant 1: 2. Compute the similarity, S(), between fp0 and all the
fingerprints in R’’’.
1. Build R’, a subset of R, with all the samples where
the building is b (the building estimated in the 3. Take the k2 samples in R’’’ that are the most similar
previous step) (filtering). to fp0.
2. Build R’’, a subset of R’, with all the samples where 4. Compute the estimated coordinates as the centroid of
the strongest AP is equal to AP01, AP02 or AP03 the k2 samples.
(filtering). Combining the building estimation procedure with the three
3. If #(R’’) < n, then R’’ = R’, where #(.) denotes the variants for estimating the floor and with the above procedure
cardinality of a set, and n is a parameter. to estimate the coordinates leads to three alternative solutions.
Results for these three solutions are presented in sections IV
4. Compute the similarity, S(), between fp0 and all the and V.
fingerprints in R’’.
5. Take the k1 samples in R’’ that are the most similar
to fp0.
2015 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 13-16 October 2015, Banff, Alberta, Canada
C. Position estimation - Approach 2: Weighted k-Nearest
Neighbours
The second approach used for position estimation is based
on WKNN (Weighted K-Nearest Neighbours) [11]. Like in the
KNN algorithm [1], a similarity function is used to compare
the test fingerprint fp0 with the fingerprints fpi in the radio map.
In this approach, best results were achieved using the
Euclidean distance as similarity function. The k-neighbours
with lower Euclidean distances (K-Nearest Neighbours) are
Fig. 5 Memory view of T dataset.
selected. However, in WKNN algorithm the position of the
device is not calculated using the centroid of the K-nearest computed. The similarity function is one of many possible
neighbour’s coordinates, like in the KNN algorithm. Instead, distance functions, as described in section IV. Best results were
weights are assigned to the k-nearest neighbours and a however achieved with the simple Euclidean distance. Distance
weighted average is used to determine the coordinates (latitude can either be computed only against the summary fps, or
and longitude) of the device. The weight given to each against the entire list of samples associated with a given
neighbour depends on the value of its Euclidean distance to fp0. position. The result is a set of reference points, ordered by the
Higher weights are given to the neighbours with lower similarity value. The first k reference points of this set are the
distances supposing that they are at a shorter distance of the k-nearest neighbours and are taken to compute the weighted
device to locate. Fig. 4 illustrates the idea behind the use of centroid in the second phase.
WKNN algorithm, compared to a KNN solution.
Algorithm I returns k reference points corresponding to k
Before applying the estimation algorithm, the training distinct indoor positions. It is not the same as computing the
dataset T is first loaded into a memory map M = map (rp, [fps, distance to all values in T because, in that case, more than one
list<fp1 … fpn>]), structured in order to facilitate and speed up sample of the same position could be returned. In fact, for each
the computation of the k-nearest neighbours. The structure of position, the algorithm first computes the minimal distance for
M is depicted in Fig. 5. An entry in M is created for each all known fingerprint samples for that position. In this way,
referenced point rpi in the T dataset, indexed by its coordinates each position can only contribute with its best distance value.
rpi (lat, long). The value associated with each entry consists of Another simpler alternative is to compute the distance to a pre-
a fingerprint summary fps and a list of all known fingerprint computed summary of all samples. This is computationally less
samples fpi for that position. The summary value fps can be pre- expensive and maybe useful and desirable when time is
computed by applying a summary function to the list of sample important. In this competition time is not an issue, and the best
values. Several distinct summary functions can be used, like results can be obtained by testing all samples for each position.
for instance, the average function.
In the second phase, the k nearest neighbours, obtained in
ALGORITHM I. OBTAINING THE K-NEAREST NEIGHBOURS. the first phase, are used as input to estimate the building, the
Input: fp0 floor and the coordinates. Estimated floor f and building b are
Input: M = map ( rp, [fps, list<fp1 … fpn>] ) taken directly from the nearest of the k neighbours (1-nearest
neighbour). Regarding position, estimated coordinates are
For each rpi in M computed as a weighted centroid of the k neighbour’s
Let 𝑑! = 𝑀𝑖𝑛∀ !"! ∈ !"#$!!!! ..!"!! 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 (𝑓𝑝! , 𝑓𝑝! ) coordinates. The weights are obtained as the inverse of the
Let 𝑁 = 𝑁 ∪ { ( 𝑟𝑝! , 𝑑! ) } squared Euclidean distance between fingerprints. The
End for algorithm is described in Algorithm II.

Return smallest k values of N In fact the Euclidean distances between fingerprints were
already computed in the first phase, and the k-nearest
neighbours used in this phase are already sorted by distance.
The location estimation algorithm has two phases. In the
The algorithm needs only to compute the weighting factors and
first phase (see Algorithm I), the similarity between the input
fingerprint fp0 and all the fingerprints in the map M is
ALGORITHM II - WEIGHTED CENTROID OF K NEAREST REFERENCE
POINTS.
Input: fp0
Input: N0 … Nk-1, the k nearest neighbors

Let di = EuclideanDistance(fp0, fpNi), for all Ni neighbors


!
Let 𝑤! = !
(!! )
∑!!!
!!! !! ∗ !! .!"# ∑!!!
!!! !! ∗ !! .!"#$
Set 𝑝. 𝑙𝑎𝑡 = ∑!!!
∧ 𝑝. 𝑙𝑜𝑛𝑔 = ∑!!!
!!! !! !!! !!
Set f = N0.f
Set b = N0.b
Fig. 4 Example of locating a device with k=3 using KNN (left)
and WKNN (right) algorithms. Return the position p, the floor f and the building b
2015 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 13-16 October 2015, Banff, Alberta, Canada
apply them to the latitude and longitude coordinates of the k optimum values of defRSSI and k1 previously obtained. Values
candidate positions. The algorithm returns the estimated of ∆RSSI between 10 and 16 were found to provide similar
position, as well as the estimated floor and building. results for both the floor hit rate and for the coordinates errors.
The value of 12 was used in subsequent calculations.
IV. TUNNING THE ALGORITHMS
The estimation algorithms described in the previous section TABLE III. FLOOR HIT RATE (f hr) AND EVAAL MEAN ERROR (p error) FOR
were tested with the provided datasets (T and V), and the SEVERAL VALUES OF THE PARAMETER defRSSI (Variant 1, n=k1=50; k2=7).
values of its parameters adjusted to minimize the mean error defRSSI f hr p error
(see section V for a definition of the EvAAL metric). In these
-85 92,8% 8,14
tests, the training dataset (T) was used to build the radio map
(R), and the validation dataset (V) was used for testing. One -90 94,0% 7,73
now presents some of the most relevant results obtained during -95 93,8% 7,96
this tuning phase. Whenever referring to the mean error, the
-100 93,1% 8,10
metric defined by equation (2) in section V was used.
-105 93,2% 8,29
A. Initial results for approach 1
One of the first experiments we carried out aimed at
evaluating the effectiveness of the device normalization
procedure described in section III-A. Fig. 6 shows the TABLE IV. FLOOR HIT RATE (f hr) AND EVAAL MEAN ERROR (p error) FOR
SEVERAL VALUES OF THE PARAMETERS n AND k1 (defRSSI=-90).
Cumulative Distribution of the errors (EvAAL metric) with
and without device normalization, when using approach 1 – Variant 1 Variant 2 Variant 3
variant 3. While the mean error decreases slightly with device f hr p error f hr p error f hr p error
normalization (from 7,59 to 7,45 meters), the overall gain, in n=k1=20 92,6% 7,88 91,7% 7,44 91,9% 7,46
this case, is marginal. n=k1=30 92,8% 7,89 91,9% 7,52 91,9% 7,52
For building estimation, which uses a parameter-less n=k1=40 94,3% 7,70 91,6% 7,56 91,6% 7,56
algorithm, the obtained results were very good. With or n=k1=50 94,0% 7,73 91,9% 7,56 91,9% 7,45
without device normalization, the algorithm estimated the
n=k1=60 93,7% 7,74 91,4% 7,54 91,4% 7,54
correct building 100% of the times.
Floor estimation is a challenging task, and three alternative Estimating the coordinates depends on the value of the k2
procedures were tested. Variant 1 involves three parameters: n parameter. Values of this parameter between 5 and 9 showed to
(Step 3.), the default RSSI value (defRSSI) (Step 4.), and k1 provide similar results, with k2=7 leading to slightly better
(Step 5.). Several combinations for the values of these results.
parameters were tested. The value of n has very little impact on
the performance, since it is used only in a few cases, as long as All the results presented above show that the proposed
it is not smaller than k1. For defRSSI value and for k1, similar algorithms are not particularly sensitive to the values of its
results were obtained for values between -85 and -95 for the several parameters. However, the differences observed in these
defRSSI value, and for values between 20 and 60 for k1 (-90 experiments, with T and V, might be different with other
and 40, respectively, were used on the final calculations) (see datasets.
Tables III and IV). B. Initial results for approach 2
Variant 2 involves an additional parameter: ∆RSSI. In this Regarding approach 2, the task of parameter tuning played
case, the value of this parameter was optimized for the also a very important role in the process. The set of parameters
and values considered is presented shortly in Table V. At first
we considered a simple KNN algorithm, but the performance
of WKNN was, as expected, much better than the simple KNN.
The three most relevant parameters in the WKNN algorithm
are the number of nearest neighbours to consider (k value), the
similarity function to use (distance function) and the weighting
function. Five values of k (k=1..5), four distance functions and
two weighting factors were considered. Best results were
obtained for k=3 and k=4, Euclidean distance function and a
weight factor equal to the inverse of squared Euclidean
distance. The results obtained are presented and discussed in
next section.
Two other parameters were tuned, that are not really
tightened to the algorithm WKNN in use: the floor/building
estimation strategy and the values of RSSI to use when an AP
Fig. 6 Distribution of the errors with and without device is not detected (defRSSI). The most relevant decision is what to
normalization (Variant 3; n=k1=50; defRSSI=-90; k2=7) do with absent values of RSSI, set to 100 by default in the
2015 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 13-16 October 2015, Banff, Alberta, Canada
VII. The first obvious observation, from the results in this
TABLE V. PARAMETER TUNIG FOR APPROACH 2. table, is that Euclidean distance provides better results for all k
Parameter Values considered values and for both weighting factors. It is also clear that using
k - neighbours 1, 2, 3, 4, 5 the inverse of the distance, as a weighting factor, is, in all
cases, the worst option. Since the observed differences are
Manhattan: 𝑑(𝑝, 𝑞) = ∑!!!! |𝑝! − 𝑞! | significant, other weighting functions should be tested. Finally,
Euclidean: 𝑑(𝑝, 𝑞) = !!∑!!!!(𝑝! − 𝑞! )! some of the distance functions are not adequate for WKNN,
Similarity function ! since the results get worst when k increases. That is the case of
Chebyshev: 𝑑(𝑝, 𝑞) = 𝑚𝑎𝑥!!! (|𝑝! − 𝑞! |)
|! !! |
Canberra: 𝑑(𝑝, 𝑞) = ∑!!!! |! !|!|!! | Manhattan and Canberra. A function of degree two, like
! ! Euclidean, is more adequate.
Weighting function 1! , 𝟏!
𝑑! (𝒅𝒊)𝟐 After those experiments, we have concluded that we should
use WKNN with k=3 and k=4, Euclidean distance as similarity
Default RSSI value -85, -90, -95, -100, -105
function and 1/(di)2 as the weighting factor. The last test was
b, f estimation by nearest neighbour, by vote done only to check if the majority rule could enhance the hit
rates for the building and floor. Results are presented in Table
training set T that was provided. Since they enter directly in the VIII.
distance formulas, they have to be “not far” from weak signal
values. The range considered was -85, -90, -95, -100 and -105, TABLE VIII. FLOOR AND BUILDING ESTIMATION STRATEGIES WITH WKNN.
considering that all values below -80 are really weak ones. Best Building Hit Rate Floor Hit Rate
results were achieved with -95, as illustrated in Table VI. k=3 k=4 k=3 k=4
Nearest k 99,7% 99,7% 91,2% 91,2%
TABLE VI. DEFAULT RSSI VALUE USING 1-NN.
Majority of k 99,4% 99,4% 91,1% 91,1%
defRSSI b hr f hr p error
-85 99,2% 90,5% 11,26 Best results are achieved taking the values of floor and
-90 99,6% 91,7% 9,80 building from the nearest of the neighbours, and avoid the
-95 99,7% 91,2% 9,19
voting procedure. This result is explained by the way we
compute the k neighbours. All k neighbours correspond to
-100 99,6% 90,9% 9,75 distinct positions, as explained before. Voting would make
-105 99,5% 89,6% 10,49 more sense if applied on a set of nearest samples, without
restrictions.
Regarding floor and building estimation, the first attempt Table IX presents the achieved (best) results for approaches
was to use only the nearest neighbour information, but we also 1 and 2. Those settings were used to produce result01 to
considered the majority rule, with and without weights. In this result05 from the final test dataset U. In the last one, and only
approach, the values of floor and building from the k in the last one, we also decided not to merge T with V. The
neighbours are taken as votes for the right ones. This strategy, goal was to see if the merging could produce visible effects on
however, results in lower hit rates. competition results.
With those values fixed, we decided to evaluate all other
possible combinations of the three most relevant parameters (k, TABLE IX. FINAL VALIDATION RESULTS FOR APROACHES 1 AND 2 (n=k1=50;
distance and weighting functions). Results are shown in Table k2=7; defRSSI=-90 for variants 1-3, defRSSI=-95 for WkNN; ∆RSSI=12).
Algorithm Building hit rate Floor hit rate Mean Error
Variant 1 100% 94,0% 7,73
TABLE VII. MEAN ERROR FOR DIFFERENT k, SIMILARITY AND WEIGHTING
FUNCTIONS IN W KNN. Variant 2 100% 91,9% 7,56
Manhattan Euclidean Chebyshev Canberra Variant 3 100% 91,9% 7,45
1/di 11,16 9,19 16,41 10,71 W3NN 99.7% 91.2% 8,58
k=1
1/(di)2 11,16 9,19 16,41 10,71 W4NN 99.7% 91.2% 8,68
1/di 12,38 9,24 15,22 11,96
k=2
1/(di)2 12,26 9,10 15,11 11,84 V. FINAL COMPETITION RESULTS
1/di 12,70 8,97 14,41 12,31 Accordingly to the competition rules, each team could
k=3
1/(di)2 12,46 8,77 14,25 12,04 submit up to 5 different results sets. The best, among these 5
attempts, would be used to rank the competitors.
1/di 12,88 9,17 14,13 12,74
k=4 2 The only metric defined by the competition rules for
1/(di) 12,54 8,86 13,92 12,39
evaluating the submitted results was the mean error. The mean
1/di 13,26 9,48 14,23 13,43 error was calculated considering the precision on correctly
k=5
1/(di)2 12,82 9,07 13,96 12,96 detecting the correct building and floor, and the accuracy on
2015 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 13-16 October 2015, Banff, Alberta, Canada
estimating the correct position within the floor, and was
defined as the mean of the individual errors calculated as:

E = Distance(Ri,Ei) + pn1 × bFail + pn2 × fFail (2)

with Distance(Ri,Ei) being the Euclidean distance between the


real position and the estimated position, and pn1 and pn2 the
penalties associated to incorrect building and floor estimations,
respectively. Variable bFail takes the value of 1 for incorrect
building estimations and 0 otherwise, while fFail takes the
absolute difference between the estimated floor and the correct
floor. Values for pn1 and pn2 were set to 50 and 4 meters,
respectively.
Our team submitted three results sets (V1 to V3) obtained
with the Approach 1 (A1) and two result sets (V4 and V5) Fig. 7 Distribution of the errors for each one of the 5 result sets
calculated using Approach 2 (A2). For results V1 to V4, the (V1 to V5).
used radio map was built by joining T and V. For V5, only the
T dataset was used. Table X presents the final results achieved VI. CONCLUSIONS AND FUTURE WORK
by the RTLS@UM team, as computed by the competition This paper described the participation of the RTLS@UM
organizers. Figure 7 shows the cumulative distribution of the team in the EvAAL/IPIN 2015 competition, starting with the
errors for the 5 results. analysis of the provided datasets, followed by the presentation
of our approach and estimation algorithms, and completed with
TABLE X. FINAL RTLS@UM RESULTS. the results obtained with the validation and final test datasets.
A1, V1 A1, V2 A1, V3 A2, V4 A2, V5 Now that the competition ended, it is clear to our team that
Building hit rate 100% 100% 100% 100% 100% this was a very enriching experience. Comparing our
Floor hit rate 92,76% 93,74% 90,17% 89,38% 87,10%
techniques to those of other research groups, in exact the same
conditions, and based on the largest datasets available for this
Floor +/-1 hit rate 99,96% 99,79% 99,83% 99,92% 100% purpose, showed to improve our competencies in the field.
B&F hit rate 92,76% 93,74% 90,17% 89,38% 87,10% Moreover, the diversity of techniques proposed by the four
teams, all significantly different, suggests that a hybrid
Mean error if
6,39 5,71 5,62 5,99 6,03 technique might lead to significant improvements in Wi-Fi
success
fingerprinting positioning.
Mean error 6,79 6,20 6,39 6,52 6,77
The way our team addressed this challenge called for the
Median error 4,88 4,57 4,63 5,22 5,80 use of several complementary techniques. At the end, it is clear
for us that other combinations of these techniques need to be
All five result sets achieved 100% success in estimating the explored, such as combining the filtering and ranking
correct building, and an overall performance between 87,1% techniques for building and floor estimation with the WKNN
and 93,7% when considering the floor success rate. technique for estimating the coordinates.

The mean error, already considering the building and floor From the obtained results, it is also clear that floor
penalties, varied between 6,20 and 6,79 meters. These results, detection is one of the major challenges in Wi-Fi
in particular in what concerns the mean error, are better than fingerprinting. The results obtained by the HFTS team are very
those achieved with the validation dataset. Also, while promising and deserve further research.
Approach 1 – Variant 3 produced the best result (mean error) In the particular case of these datasets, more could have
with the validation dataset, Variant 2 produced the best results been done about building the radio map, including the filtration
with the final test dataset, both for the floor hit rate and for the of the invalid samples and the strange RSSI values. A study
mean error. The results achieved with the 5 attempts, while about the impact of these samples and values on the overall
different, mainly in the floor hit rate, are very similar, as shown performance of the estimation algorithms should be done.
in Figure 7.
Regarding the competition, these results were the best
TABLE XI. BEST RESULTS ACHIEVED BY THE DIFFERENT TEAMS.
among the four competitors. Table XI summarizes the best
results achieved by the four teams that participated in the MOSAIC HFTS RTLS@UM ICSL
competition. RTLS@UM achieved the best results in the Building hit
98,65% 100% 100% 100%
metrics: median error and mean error. The second best result rate
was achieved by the ICSL team that achieved a mean error of Floor hit rate 93,86% 96,25% 93,74% 86,93%
7,7 meters. Among these results, the floor hit rate achieved by
the HFTS team is remarkable. Mean error 11,64 8,49 6,20 7,67
Median error 6,7 7,0 4,6 5,9
2015 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 13-16 October 2015, Banff, Alberta, Canada
For future editions of this competition, or for other similar [6] Joaquín Torres-Sospedra, Raúl Montoliu, Adolfo Martínez-Usó, Tomar
competitions, it would also be positive to include, in each J. Arnau, Joan P. Avariento, Mauri Benedito-Bordonau, Joaquín Huerta,
“UJIIndoorLoc: A New Multi-building and Multi-floor Database for
sample, information about the specific user that collected the WLAN Fingerprint-based Indoor Localization Problems”, in
data samples. This would enable the use of estimation Proceedings of the Fifth International Conference on Indoor Positioning
algorithms that exploit historical information. and Indoor Navigation, 2014.
[7] B. Li, J. Salter, A. G. Dempster, and C. Rizos, “Indoor positioning
ACKNOWLEDGEMENTS techniques based on wireless LAN,” School of Surveying and Spatial
Information Systems, UNSW, Sydney, Australia, Tech. Rep., 2006.
Research group supported by FEDER Funds through the
[8] G. Lui, T. Gallagher, B. Li, A. G. Dempster, and C. Rizos, “Differences
COMPETE and National Funds through FCT Fundação para a in RSSI readings made by different Wi-Fi chipsets: A limitation of
Ciência e a Tecnologia under the project PEst- WLAN localization,” in Localization and GNSS (ICL-GNSS), 2011
UID/CEC/00319/2013. International Conference on, pp. 53 –57, 2011.
[9] C. Laoudias, R. Piche, C. G. Panayiotou, "Device Self-Calibration in
REFERENCES Location Systems using Signal Strength Histograms", Journal of
[1] P . Bahl and V . N. Padmanabhan, "RADAR: An in-building RF-based Location Based Services, 7(3), pp. 165-181, 2013.
user location and tracking system," in Proc. IEEE INFOCOM 2000. [10] Marques, N., Meneses, F., Moreira, A. 2012. “Combining similarity
Nineteenth Annual Joint Conference of the IEEE Computer and functions and majority rules for multi-building, multi-floor, WiFi
Communications Societies. , Tel Aviv, Israel, pp. 775-784, 2000. Positioning.” IPIN 2012, pp.1-9, 2012.
[2] AALOA Association (http://aaloa.org) [11] S. Khodayari, M. Maleki, and E. Hamedi, “A rss-based fingerprinting
[3] universAAL project website (http://universaal.org/index.php/en/) method for positioning based on historical data”, in Performance
Evaluation of Computer and Telecommunication Systems (SPECTS),
[4] EvAAL Competition (http://evaal.aaloa.org) 2010 International Symposium on, pages 306–310, 2010.
[5] UJIIndoorLoc DB (http://www.geotec.uji.es/ujiindoorloc-database/)

View publication stats

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy