Wi-Fi Fingerprinting in The Real World - Rtls@Um at The Evaal Competition
Wi-Fi Fingerprinting in The Real World - Rtls@Um at The Evaal Competition
net/publication/308855851
CITATIONS READS
93 702
4 authors:
All content following this page was uploaded by Filipe Meneses on 15 January 2020.
Abstract — Research and development around indoor communications, inertial systems, etc. Wi-Fi fingerprinting is
positioning and navigation is capturing the attention of an one of the most popular since Wi-Fi networks are, nowadays,
increasing number of research groups and labs around the installed almost everywhere, therefore supporting the
world. Among the several techniques being proposed for indoor deployment of indoor location systems in buildings without the
positioning, solutions based on Wi-Fi fingerprinting are the most need of specific infrastructures.
popular since they exploit existing WLAN infrastructures to
support software-only positioning, tracking and navigation Wi-Fi fingerprinting is based on measuring the intensity of
applications. Despite the enormous research efforts in this the received radio signals (RSSI - Received Signal Strength
domain, and despite the existence of some commercial products Indicator) of the Access Points that are available in a place
based on Wi-Fi fingerprinting, it is still difficult to compare the (fingerprint) and in comparing it with a previously built radio
performance, in the real world, of the several existing solutions. map [1]. The radio map is a database that contains a list of
The EvAAL competition, hosted by the IPIN 2015 conference, fingerprints and the corresponding real locations. The radio
contributed to fill this gap. This paper describes the experience of map is often build by hand, by manually collecting and
the RTLS@UM team in participating in track 3 of that annotating fingerprints in all the spaces that are covered by the
competition. indoor positioning system. The location of a device is
determined by computing the similarity between a fingerprint
Keywords—indoor positioning; Wi-Fi fingerprinting;
collected by the device (online phase) and the fingerprints
competition; benchmarking
contained in the radio map (offline phase).
I. INTRODUCTION Due to its infrastructure-free characteristics, the research
In the old days, we were used to think of location mostly as community is intensively studying Wi-Fi fingerprinting, and a
something that expresses the person’ location as an address or large number of solutions have been proposed to solve, or
as a pair of coordinates in a well-known referential. It was used minimize, some of the problems associated to this technique.
mostly to locate (in an absolute referential such as, for Among those problems are the large effort (time and human
example, a postal address) and to guide persons (for example, effort) required to build and maintain high quality radio maps
using a GPS receiver). Today, location information has a for large spaces, the way different devices perceive the radio
broader usage and is considered an important feature. Many signals, the effects of multipath and fading, the dynamic nature
applications use location information to increase the quality of of the spaces with frequent layout changes and people moving
the information provided to the end users, personalizing around, the lack of indoor maps, and the limited precision
content and providing access to location-based information. and/or accuracy of the positioning estimation algorithms.
Outdoors, the usage of a GPS receiver has become almost a Despite the huge effort done over the last few years, and
standard since it is available worldwide, it is free for the users despite the significant improvements observed in minimizing
and provides the location in an absolute and universal some of the issues identified above, it continues to be difficult
referential. Acquiring the location of a user or a device indoors to compare the performance of different solutions, from
is a much more challenging task due to the inexistence of a different research teams.
universal solution.
This paper describes the experience of the RTLS@UM
In the last years, many researchers searched for adequate team in the EvAAL competition, where a set of participants
solutions to implement indoor positioning systems. Current subjected their Wi-Fi fingerprinting solutions to a competitive
research challenges include, among other topics, a standard benchmarking test. The EvAAL competition, its aim and
representation for indoor maps, a universal referential to general rules are described next. Section II introduces the
represent the indoor space (for example, as coordinates or as a datasets that supported track 3 of the competition and presents
symbolic space created by a set of divisions in different floors its analysis. The positioning estimation approaches used by our
and buildings), and a universal way to acquire the location with team are described in section III, followed by a discussion
high accuracy. about how the most relevant parameters were adjusted for the
competition. Section V presents the final results obtained by
Many different technologies have been used to build indoor our team in the competition, as well as a summary of the other
positioning solutions, including short-range beacons, optical competitors’ results.
978-1-4673-8402-5/15/$31.00 ©2015 IEEE
2015 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 13-16 October 2015, Banff, Alberta, Canada
A. The EvAAL competition at IPIN 2015 II. THE DATASETS
Evaluating Ambient Assisted Living (AAL) systems is a Track 3 of the EvAAL competition is based on the
challenge due to the complexity of such systems and variety of processing of real world data provided by the organizers in the
solutions adopted. The EvAAL – Evaluating AAL Systems form of three datasets: a training dataset (T), a validation
Through Competitive Benchmarking is a research program dataset (V) and a final test dataset (U). The first two datasets (T
created to evaluate pervasive and ubiquitous systems by and V) are publicly available from the UJIIndoorLoc Database
comparing working AAL/AmI (Ambient Intelligence) [5] and their characteristics are described in detail in [6]. The
technologies solutions in a controlled environment. EvAAL is final test database (U) was distributed through e-mail to the
supported by the AALOA association [2] and its first edition competition participants one and a half months before the
was organized in 2011 by the universAAL project [3]. deadline for submitting the results (see section V). The basic
characteristics of these databases, as described in [6], are
The 2015 edition of the EvAAL competition was hosted by
summarized next, along with new information we extracted
IPIN’2015 – International Conference on Indoor Positioning
from the databases and that we considered relevant for the
and Indoor Navigation, and it consisted of two on-site and one
competition.
off-site tracks [4]. In track 1 (on-site) – “Smartphone based
positioning” – competitors could use any sensor available on A. Dataset description and statistics
smartphones to accurately estimate online their position inside The data in the training, validation and final test databases
a large, public indoor area. In track 2 (also on-site) – “Foot- were collected in three buildings of the University Jaume I,
mounted pedestrian dead reckoning positioning” – competitors Spain, with each building having 4 or 5 floors [6]. Each record
could use MEMS sensors (inertial, compass and pressure (sample) is described by a vector, where the first 520
sensors) mounted on the feet to locate the user inside the same dimensions represent the measured RSSI values of the visible
large indoor area. In the off-site track 3, entitled “Wi-Fi Wi-Fi Access Points (AP), and the remaining 9 dimensions
fingerprinting in large environments”, the competitors had represent the coordinates, floor, building, space, relative
access to a large Wi-Fi fingerprinting database to which they position, user ID, device ID and timestamp associated to
could apply their algorithms to estimate the real position where, who and when the sample has been collected.
associated to a set of previously collected fingerprints. In this
track, all the competitors were required to apply their Table I shows a summary of the most relevant
algorithms to the same database, therefore enabling direct characteristics of each one of the three datasets. It must be
comparison of the achieved precision and accuracy. emphasized that the validation dataset (V) was provided to
enable competitors to tune their position estimation algorithms,
Track 3 of the competition uses the multi-building and and that the final test dataset (U) was the one for which the
multi-floor UJIIndoorLoc Database [5] to compare indoor competitors had to estimate the true position of the users.
location methodologies [4]. The participants had access to Therefore, the samples in the final test dataset naturally include
three different datasets: one for training, one for validation of much less information that the other two datasets, such as the
their solutions (it included the real position of the device and building, floor and coordinates associated with each
thus it was possible to compare the estimated position with the fingerprint.
ground truth), and a final testing database containing a set of
fingerprints collected at unknown locations, used for There are, however, some data missing from the validation
comparing the performance of the solutions of the different and final test datasets that would help in estimating the users’
competitors. positions without turning the process unrealistic, such as the
information about the users (see section II-B). It is also strange
B. Names and conventions that the validation dataset only refers to 13 distinct places.
Throughout this paper, the following names and
conventions are used: Regarding the devices used for collecting the data, 9 of the
11 devices used to collect data for the validation dataset are
T – The training dataset new compared to the training dataset, while 5 of the 7 devices
V – The validation dataset used to collect data for the final test dataset are also new.
U – The final testing dataset
R – The radio map used for positioning estimation
TABLE I. MAIN CHARACTERISTICS OF THE THREE DATASETS.
fpi – Denotes fingerprint i of a radio map Training Validation Final test
fp0 – Denotes a test fingerprint (unknown position) Samples 19 937 1 111 5 179
APin – Denotes the nth strongest AP in fingerprint fpi Distinct users 18 NA NA
rssiij – Denotes the RSSI value of the ith AP in fpj Distinct devices 16 11 7
k – The number of neighbours in k-nearest neighbours Distinct buildings 3 3 NA
approaches
Distinct floor 13 13 NA
p – The estimated position (pair of coordinates) Distinct places 735 13 NA
f – The estimated floor Distinct positions 933 1 074 NA
b – The estimated building
2015 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 13-16 October 2015, Banff, Alberta, Canada
Data for the three dataset were collected over different
periods of time. For the training dataset, users collected data in
6 days within the period running from 30 of May to 20 of June
2013. For the validation dataset, users collected data in 9 days
within the period running from 19 of September to 8 of
October 2013. For the final test dataset, users collected data in
4 days divided into two periods: 2 days around the end of
November 2013, and 2 days at the end of March 2015. This
means that some of the testing data was collected more 19
months after the data collected for the radio map (training
data). The collection periods and distribution of the collected
samples per day are shown in Table II and Figure 1,
respectively.
TABLE II. OTHER CHARACTERISTICS OF THE THREE DATSETS. Fig. 2 Maximum distance between the oldest sampling point and
Training Validation Final test any subsequent sampling point where the same AP was observed.
Collection periods 30 May 19 Sep 29 Nov
This information was used later in our approach to support the
(from) 2013 2013 2013
building of the radio map, in order to maximize the position
Collection periods 20 Jun 8 Oct 31 Mar estimation performance (see section III-A).
(to) 2013 2013 2015
We also observed that the training dataset includes 76
Number of days 6 9 4 samples where not a single AP was observed. These “invalid”
Number of observed samples are useless and can be removed from the radio map.
465 367 270
APs No invalid samples were detected in the validation and final
APs common to test datasets.
- 312 246
training Across the training dataset we found a set of 291 samples
New APs, not with strange RSSI values (values higher than -15dBm, some
- 55 24
observed in training equal to 0dBm). The authors of the dataset refer to this
New APs, not in characteristic of the data in [6], but its causes were not
training or - - 0 identified.
validation
The analysis of the training and validation datasets also
Invalid samples 76 0 0 suggests that some APs were relocated during the data
Strange samples
291 0 0
collection process, or that some observations of mobile
(RSSI>-15dBm) hotspots were also included in the samples. This suspicion is
supported by the fact that some APs were observed in too far
The number of observed Access Points is also different way locations, as illustrated in Fig. 2. As shown in Fig. 2, there
are 13 APs that were observed in locations more than 200
across the three datasets: 465 APs in the training dataset, 367 in
the validation dataset, and 270 in the final test dataset. We also meters apart. Since one part of the final test dataset was
observed that the final test dataset includes 24 APs that are not collected more than 19 months later than the training dataset, it
observed in the training dataset (see Table II). On the other is possible that a lot more APs were relocated during that
hand, all the APs observed in the final test dataset were period. However, that cannot be confirmed since, naturally, the
observed in at least one of the training or validation datasets. final test dataset does not include the coordinates of the
sampling points.
B. EvAAL datasets vs. real datasets
As has been detailed in the previous section, each record of
the EvAAL final testing dataset U contains the RSSI values of
the observed APs, the device ID and a timestamp. Its 5179
records are associated to seven different device IDs.
Initially, our expectation was that the device ID field could
identify the unique user carrying the device to locate. A simple
analysis was conducted on the validation dataset V, to validate
this assumption. Samples were separated in vectors, one for
each device ID, and the vectors were sorted in ascending order
of the timestamp. Most of the vectors reveal only one valid
time sequence, that is, a trajectory that is compatible with the
path of a pedestrian. In some cases more than one sequence can
Fig. 1 Number of samples collected per day for the three datasets. be identified, separated by a time interval greater than 3
2015 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 13-16 October 2015, Banff, Alberta, Canada
minutes. However, it was possible to observe some samples correct the position initially computed by the location
with the same device ID and timestamp in very different algorithm.
locations. This only occurred with device ID 20. To illustrate
As, through the performed analysis, we found that we do
this situation, the sequence of speed values, computed from
consecutive samples, was built. The histogram of those speed not have access to the identification of unique users in the
EvAAL datasets, we could not use all our algorithms,
values, for the samples associated to device ID 20, is shown in
Fig. 3. Clearly, there are values that exceed by far the normal particularly those that use historical information. We argue that
a realistic dataset includes information about the target device,
speed of a pedestrian.
like a serial number or MAC address, therefore enabling the
Based on this analysis we conclude that the device ID field use of historical data to improve the positioning accuracy.
included in all EvAAL datasets do not identify uniquely a user,
but instead the type of device. It is, therefore, quite likely that III. POSITION ESTIMATION
more than one user collected data, simultaneously, using For the participation in the EvAAL/IPIN 2015 competition,
similar devices. That is, the field device ID in the datasets the RTLS@UM team adopted an approach that encompasses a
maps directly the hardware model and software version of the process to build the radio map and two methods to estimate the
devices used in the data collection process, but not a specific positions associated to each one of the samples in the final test
device, and definitely, there is no guarantee that only one dataset (each team was allowed to submit up to 5 attempts with
device of each type was used simultaneously. the estimated positions – see section V).
This fact represented a major inconvenience for us as some For building the radio map, minimal processing of the
of our location estimation algorithms use historical information training and validation datasets was performed, with
to improve their accuracy. If a user is at a given position in a processing mainly aiming at reducing the consequences
given instant of time, it is possible to compute the maximum associated to the use of multiple devices to collect the data.
displacement in the next time interval by considering his/her
velocity and direction. One simple way is to consider a For position estimation, two alternative approaches were
predefined maximum speed for a pedestrian inside a building, used: (i) a hierarchical approach, based on filtering, majority
in all directions, as a theoretical limitation. However, a more rules and k-Nearest Neighbour estimation, where the building,
complex estimation of velocity and direction can be derived floor, and coordinates are estimated one at a time; (ii) a “flat”
from known previous positions. approach based on Weighted k-Nearest Neighbour estimation.
The first approach, with three variants, was used to generate 3
The software we used in the competition already includes of the 5 final attempts, while the second approach, with
algorithms that use historical information. These algorithms are different values for one parameter, was used to generate the
based in PKNN (Predicted K Nearest Neighbours) [7]. PKNN other 2 final attempts.
uses recent past information of users to improve the accuracy
of the localization algorithm. The main idea is based on the A. Building the radio map
assumption that a user cannot travel a large distance in a short The process to build the radio map exploits the analysis
time. Thus, firstly, a maximum distance allowed in a certain described in section II, and tries to overcome the limitations of
time interval is defined. When the distance between the the training dataset and the problem associated with the
position calculated by the localization algorithm and the multitude of devices that was used to collect the samples [8].
position of the same device in the previous time instant (last
known position) exceeded the maximum distance allowed, the Since the final test dataset (U) includes samples where new
algorithm calculates a next possible position for the device APs are observed, compared to the set of APs observed in the
depending on the movement he held in his previous positions. training dataset (see Table II), the final radio map (R) used for
This position is called Next Possible Position and is used to estimating the positions associated to the final test samples was
built by joining the samples of the training and validation
datasets: R = T ∪ V. This decision involved some risk, since
we could not compare the results obtained through the use of
this joint radio map with the results obtained by using the
training dataset alone. Moreover, for tuning our positioning
estimation algorithms, we had to use a radio map based solely
on the training dataset. Invalid samples and samples with
strange RSSI values were not removed.
In order to deal with the diversity of devices that were used
to collect the samples, we implemented a solution inspired in
the method proposed by Laoudias et al. in [9]. The basic idea
proposed in [9] is to compute the histogram of all the RSSI
values observed by a particular device, and to fit that histogram
to those of the other (similar) devices through a simple shifting
operation (translation in the RSSI axis).
Fig. 3 Histogram of estimated speed values for device ID 20 In our solution, we designed a device normalization
(final test dataset – U). procedure where we started by computing a representative
2015 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 13-16 October 2015, Banff, Alberta, Canada
RSSI value for each device type d, rssid, defined as the mean of 6. Count the number of samples, from within the k1,
all the RSSI values collected by those devices. The mean, associated to each floor, and set f to the most
RSSID, of all representative RSSI values was then used as a frequent floor (majority rule).
reference value to correct all RSSI values from all devices by
shifting them by the quantity RSSID - rssid. After correction, the In step 4. above, the similarity function S() is the
Manhattan distance defined as:
representative RSSI values for all devices become equal.
The above procedure was tested while tuning our ! !
S fp! , fp! = × !!! rssi!! − rssi!! − 2×𝐶 (1)
positioning estimation algorithms (approach 1), where the !
training dataset (T) was used as the radio map, and the
validation dataset (V) was used for testing. The obtained results where N is the total number of APs observed in fp1 and/or fp2,
(see section IV) show a marginal gain of this procedure in and C is the number of APs that were observed in both fp1 and
reducing the impact of using different devices to collect the fp2 (common APs). For missing APs, in fp1 or fp2, a default
samples. RSSI value was used.
Return smallest k values of N In fact the Euclidean distances between fingerprints were
already computed in the first phase, and the k-nearest
neighbours used in this phase are already sorted by distance.
The location estimation algorithm has two phases. In the
The algorithm needs only to compute the weighting factors and
first phase (see Algorithm I), the similarity between the input
fingerprint fp0 and all the fingerprints in the map M is
ALGORITHM II - WEIGHTED CENTROID OF K NEAREST REFERENCE
POINTS.
Input: fp0
Input: N0 … Nk-1, the k nearest neighbors
The mean error, already considering the building and floor From the obtained results, it is also clear that floor
penalties, varied between 6,20 and 6,79 meters. These results, detection is one of the major challenges in Wi-Fi
in particular in what concerns the mean error, are better than fingerprinting. The results obtained by the HFTS team are very
those achieved with the validation dataset. Also, while promising and deserve further research.
Approach 1 – Variant 3 produced the best result (mean error) In the particular case of these datasets, more could have
with the validation dataset, Variant 2 produced the best results been done about building the radio map, including the filtration
with the final test dataset, both for the floor hit rate and for the of the invalid samples and the strange RSSI values. A study
mean error. The results achieved with the 5 attempts, while about the impact of these samples and values on the overall
different, mainly in the floor hit rate, are very similar, as shown performance of the estimation algorithms should be done.
in Figure 7.
Regarding the competition, these results were the best
TABLE XI. BEST RESULTS ACHIEVED BY THE DIFFERENT TEAMS.
among the four competitors. Table XI summarizes the best
results achieved by the four teams that participated in the MOSAIC HFTS RTLS@UM ICSL
competition. RTLS@UM achieved the best results in the Building hit
98,65% 100% 100% 100%
metrics: median error and mean error. The second best result rate
was achieved by the ICSL team that achieved a mean error of Floor hit rate 93,86% 96,25% 93,74% 86,93%
7,7 meters. Among these results, the floor hit rate achieved by
the HFTS team is remarkable. Mean error 11,64 8,49 6,20 7,67
Median error 6,7 7,0 4,6 5,9
2015 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 13-16 October 2015, Banff, Alberta, Canada
For future editions of this competition, or for other similar [6] Joaquín Torres-Sospedra, Raúl Montoliu, Adolfo Martínez-Usó, Tomar
competitions, it would also be positive to include, in each J. Arnau, Joan P. Avariento, Mauri Benedito-Bordonau, Joaquín Huerta,
“UJIIndoorLoc: A New Multi-building and Multi-floor Database for
sample, information about the specific user that collected the WLAN Fingerprint-based Indoor Localization Problems”, in
data samples. This would enable the use of estimation Proceedings of the Fifth International Conference on Indoor Positioning
algorithms that exploit historical information. and Indoor Navigation, 2014.
[7] B. Li, J. Salter, A. G. Dempster, and C. Rizos, “Indoor positioning
ACKNOWLEDGEMENTS techniques based on wireless LAN,” School of Surveying and Spatial
Information Systems, UNSW, Sydney, Australia, Tech. Rep., 2006.
Research group supported by FEDER Funds through the
[8] G. Lui, T. Gallagher, B. Li, A. G. Dempster, and C. Rizos, “Differences
COMPETE and National Funds through FCT Fundação para a in RSSI readings made by different Wi-Fi chipsets: A limitation of
Ciência e a Tecnologia under the project PEst- WLAN localization,” in Localization and GNSS (ICL-GNSS), 2011
UID/CEC/00319/2013. International Conference on, pp. 53 –57, 2011.
[9] C. Laoudias, R. Piche, C. G. Panayiotou, "Device Self-Calibration in
REFERENCES Location Systems using Signal Strength Histograms", Journal of
[1] P . Bahl and V . N. Padmanabhan, "RADAR: An in-building RF-based Location Based Services, 7(3), pp. 165-181, 2013.
user location and tracking system," in Proc. IEEE INFOCOM 2000. [10] Marques, N., Meneses, F., Moreira, A. 2012. “Combining similarity
Nineteenth Annual Joint Conference of the IEEE Computer and functions and majority rules for multi-building, multi-floor, WiFi
Communications Societies. , Tel Aviv, Israel, pp. 775-784, 2000. Positioning.” IPIN 2012, pp.1-9, 2012.
[2] AALOA Association (http://aaloa.org) [11] S. Khodayari, M. Maleki, and E. Hamedi, “A rss-based fingerprinting
[3] universAAL project website (http://universaal.org/index.php/en/) method for positioning based on historical data”, in Performance
Evaluation of Computer and Telecommunication Systems (SPECTS),
[4] EvAAL Competition (http://evaal.aaloa.org) 2010 International Symposium on, pages 306–310, 2010.
[5] UJIIndoorLoc DB (http://www.geotec.uji.es/ujiindoorloc-database/)