Skip to content

Commit 32a2451

Browse files
committed
Merge branch 'wbISO3' of https://github.com/jnmclarty/pandas into jnmclarty-wbISO3
Conflicts: doc/source/whatsnew/v0.15.1.txt
2 parents f2c9390 + 6a7ff40 commit 32a2451

File tree

4 files changed

+294
-70
lines changed

4 files changed

+294
-70
lines changed

doc/source/remote_data.rst

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,12 @@ World Bank
143143
`World Bank's World Development Indicators <http://data.worldbank.org>`__
144144
by using the ``wb`` I/O functions.
145145

146+
Indicators
147+
~~~~~~~~~~
148+
149+
Either from exploring the World Bank site, or using the search function included,
150+
every world bank indicator is accessible.
151+
146152
For example, if you wanted to compare the Gross Domestic Products per capita in
147153
constant dollars in North America, you would use the ``search`` function:
148154

@@ -254,3 +260,56 @@ populations in rich countries tend to use cellphones at a higher rate:
254260
Skew: -2.314 Prob(JB): 1.35e-26
255261
Kurtosis: 11.077 Cond. No. 45.8
256262
==============================================================================
263+
264+
Country Codes
265+
~~~~~~~~~~~~~
266+
267+
.. versionadded:: 0.15.1
268+
269+
The ``country`` argument accepts a string or list of mixed
270+
`two <http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2>`__ or `three <http://en.wikipedia.org/wiki/ISO_3166-1_alpha-3>`__ character
271+
ISO country codes, as well as dynamic `World Bank exceptions <http://data.worldbank.org/node/18>`__ to the ISO standards.
272+
273+
For a list of the the hard-coded country codes (used solely for error handling logic) see ``pandas.io.wb.country_codes``.
274+
275+
Problematic Country Codes & Indicators
276+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
277+
278+
.. note::
279+
280+
The World Bank's country list and indicators are dynamic. As of 0.15.1,
281+
:func:`wb.download()` is more flexible. To achieve this, the warning
282+
and exception logic changed.
283+
284+
The world bank converts some country codes,
285+
in their response, which makes error checking by pandas difficult.
286+
Retired indicators still persist in the search.
287+
288+
Given the new flexibility of 0.15.1, improved error handling by the user
289+
may be necessary for fringe cases.
290+
291+
To help identify issues:
292+
293+
There are at least 4 kinds of country codes:
294+
295+
1. Standard (2/3 digit ISO) - returns data, will warn and error properly.
296+
2. Non-standard (WB Exceptions) - returns data, but will falsely warn.
297+
3. Blank - silently missing from the response.
298+
4. Bad - causes the entire response from WB to fail, always exception inducing.
299+
300+
There are at least 3 kinds of indicators:
301+
302+
1. Current - Returns data.
303+
2. Retired - Appears in search results, yet won't return data.
304+
3. Bad - Will not return data.
305+
306+
Use the ``errors`` argument to control warnings and exceptions. Setting
307+
errors to ignore or warn, won't stop failed responses. (ie, 100% bad
308+
indicators, or a single "bad" (#4 above) country code).
309+
310+
See docstrings for more info.
311+
312+
313+
314+
315+

doc/source/whatsnew/v0.15.1.txt

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ users upgrade to this version.
1919

2020
API changes
2121
~~~~~~~~~~~
22+
2223
- ``groupby`` with ``as_index=False`` will not add erroneous extra columns to
2324
result (:issue:`8582`):
2425

@@ -74,12 +75,13 @@ Enhancements
7475
~~~~~~~~~~~~
7576

7677
- Added option to select columns when importing Stata files (:issue:`7935`)
77-
7878
- Qualify memory usage in ``DataFrame.info()`` by adding ``+`` if it is a lower bound (:issue:`8578`)
79-
8079
- Raise errors in certain aggregation cases where an argument such as ``numeric_only`` is not handled (:issue:`8592`).
8180

8281

82+
- Added support for 3-character ISO and non-standard country codes in :func:``io.wb.download()`` (:issue:`8482`)
83+
- :ref:`World Bank data requests <remote_data.wb>` now will warn/raise based on an ``errors`` argument, as well as a list of hard-coded country codes and the World Bank's JSON response. In prior versions, the error messages didn't look at the World Bank's JSON response. Problem-inducing input were simply dropped prior to the request. The issue was that many good countries were cropped in the hard-coded approach. All countries will work now, but some bad countries will raise exceptions because some edge cases break the entire response.
84+
8385
.. _whatsnew_0151.performance:
8486

8587
Performance
@@ -91,7 +93,6 @@ Performance
9193
Experimental
9294
~~~~~~~~~~~~
9395

94-
9596
.. _whatsnew_0151.bug_fixes:
9697

9798
Bug Fixes

pandas/io/tests/test_wb.py

Lines changed: 74 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -14,41 +14,98 @@ class TestWB(tm.TestCase):
1414
@slow
1515
@network
1616
def test_wdi_search(self):
17-
raise nose.SkipTest
18-
19-
expected = {u('id'): {2634: u('GDPPCKD'),
20-
4649: u('NY.GDP.PCAP.KD'),
21-
4651: u('NY.GDP.PCAP.KN'),
22-
4653: u('NY.GDP.PCAP.PP.KD')},
23-
u('name'): {2634: u('GDP per Capita, constant US$, '
24-
'millions'),
25-
4649: u('GDP per capita (constant 2000 US$)'),
26-
4651: u('GDP per capita (constant LCU)'),
27-
4653: u('GDP per capita, PPP (constant 2005 '
17+
18+
expected = {u('id'): {6716: u('NY.GDP.PCAP.KD'),
19+
6718: u('NY.GDP.PCAP.KN'),
20+
6720: u('NY.GDP.PCAP.PP.KD')},
21+
u('name'): {6716: u('GDP per capita (constant 2005 US$)'),
22+
6718: u('GDP per capita (constant LCU)'),
23+
6720: u('GDP per capita, PPP (constant 2011 '
2824
'international $)')}}
29-
result = search('gdp.*capita.*constant').ix[:, :2]
25+
result = search('gdp.*capita.*constant').loc[6716:,['id','name']]
3026
expected = pandas.DataFrame(expected)
3127
expected.index = result.index
3228
assert_frame_equal(result, expected)
3329

3430
@slow
3531
@network
3632
def test_wdi_download(self):
37-
raise nose.SkipTest
3833

39-
expected = {'GDPPCKN': {(u('United States'), u('2003')): u('40800.0735367688'), (u('Canada'), u('2004')): u('37857.1261134552'), (u('United States'), u('2005')): u('42714.8594790102'), (u('Canada'), u('2003')): u('37081.4575704003'), (u('United States'), u('2004')): u('41826.1728310667'), (u('Mexico'), u('2003')): u('72720.0691255285'), (u('Mexico'), u('2004')): u('74751.6003347038'), (u('Mexico'), u('2005')): u('76200.2154469437'), (u('Canada'), u('2005')): u('38617.4563629611')}, 'GDPPCKD': {(u('United States'), u('2003')): u('40800.0735367688'), (u('Canada'), u('2004')): u('34397.055116118'), (u('United States'), u('2005')): u('42714.8594790102'), (u('Canada'), u('2003')): u('33692.2812368928'), (u('United States'), u('2004')): u('41826.1728310667'), (u('Mexico'), u('2003')): u('7608.43848670658'), (u('Mexico'), u('2004')): u('7820.99026814334'), (u('Mexico'), u('2005')): u('7972.55364129367'), (u('Canada'), u('2005')): u('35087.8925933298')}}
34+
# Test a bad indicator with double (US), triple (USA),
35+
# standard (CA, MX), non standard (KSV),
36+
# duplicated (US, US, USA), and unknown (BLA) country codes
37+
38+
# ...but NOT a crash inducing country code (World bank strips pandas
39+
# users of the luxury of laziness, because they create their
40+
# own exceptions, and don't clean up legacy country codes.
41+
# ...but NOT a retired indicator (User should want it to error.)
42+
43+
cntry_codes = ['CA', 'MX', 'USA', 'US', 'US', 'KSV', 'BLA']
44+
inds = ['NY.GDP.PCAP.CD','BAD.INDICATOR']
45+
46+
expected = {'NY.GDP.PCAP.CD': {('Canada', '2003'): 28026.006013044702, ('Mexico', '2003'): 6601.0420648056606, ('Canada', '2004'): 31829.522562759001, ('Kosovo', '2003'): 1969.56271307405, ('Mexico', '2004'): 7042.0247834044303, ('United States', '2004'): 41928.886136479705, ('United States', '2003'): 39682.472247320402, ('Kosovo', '2004'): 2135.3328465238301}}
4047
expected = pandas.DataFrame(expected)
41-
result = download(country=['CA', 'MX', 'US', 'junk'], indicator=['GDPPCKD',
42-
'GDPPCKN', 'junk'], start=2003, end=2005)
48+
expected.sort(inplace=True)
49+
result = download(country=cntry_codes, indicator=inds,
50+
start=2003, end=2004, errors='ignore')
51+
result.sort(inplace=True)
4352
expected.index = result.index
4453
assert_frame_equal(result, pandas.DataFrame(expected))
4554

55+
@slow
56+
@network
57+
def test_wdi_download_w_retired_indicator(self):
58+
59+
cntry_codes = ['CA', 'MX', 'US']
60+
# Despite showing up in the search feature, and being listed online,
61+
# the api calls to GDPPCKD don't work in their own query builder, nor
62+
# pandas module. GDPPCKD used to be a common symbol.
63+
# This test is written to ensure that error messages to pandas users
64+
# continue to make sense, rather than a user getting some missing
65+
# key error, cause their JSON message format changed. If
66+
# World bank ever finishes the deprecation of this symbol,
67+
# this nose test should still pass.
68+
69+
inds = ['GDPPCKD']
70+
71+
try:
72+
result = download(country=cntry_codes, indicator=inds,
73+
start=2003, end=2004, errors='ignore')
74+
# If for some reason result actually ever has data, it's cause WB
75+
# fixed the issue with this ticker. Find another bad one.
76+
except ValueError as e:
77+
raise nose.SkipTest("No indicators returned data: {0}".format(e))
78+
79+
# if it ever gets here, it means WB unretired the indicator.
80+
# even if they dropped it completely, it would still get caught above
81+
# or the WB API changed somehow in a really unexpected way.
82+
if len(result) > 0:
83+
raise nose.SkipTest("Invalid results")
84+
85+
@slow
86+
@network
87+
def test_wdi_download_w_crash_inducing_countrycode(self):
88+
89+
cntry_codes = ['CA', 'MX', 'US', 'XXX']
90+
inds = ['NY.GDP.PCAP.CD']
91+
92+
try:
93+
result = download(country=cntry_codes, indicator=inds,
94+
start=2003, end=2004, errors='ignore')
95+
except ValueError as e:
96+
raise nose.SkipTest("No indicators returned data: {0}".format(e))
97+
98+
# if it ever gets here, it means the country code XXX got used by WB
99+
# or the WB API changed somehow in a really unexpected way.
100+
if len(result) > 0:
101+
raise nose.SkipTest("Invalid results")
102+
46103
@slow
47104
@network
48105
def test_wdi_get_countries(self):
49106
result = get_countries()
50107
self.assertTrue('Zimbabwe' in list(result['name']))
51-
108+
self.assertTrue(len(result) > 100)
52109

53110
if __name__ == '__main__':
54111
nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'],

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy