Skip to content

Commit 8ad9598

Browse files
committed
Merge remote branch 'jreback/dtypes'
* jreback/dtypes: ENH: allow propgation and coexistance of numeric dtypes (closes GH pandas-dev#622) construction of multi numeric dtypes with other types in a dict validated get_numeric_data returns correct dtypes added blocks attribute (and as_blocks()) method that returns a dict of dtype -> homogeneous Frame to DataFrame added keyword 'raise_on_error' to astype, which can be set to false to exluded non-numeric columns fixed merging to correctly merge on multiple dtypes with blocks (e.g. float64 and float32 in other merger) changed implementation of get_dtype_counts() to use .blocks revised DataFrame.convert_objects to use blocks to be more efficient added Dtype printing to show on default with a Series added convert_dates='coerce' option to convert_objects, to force conversions to datetime64[ns] where can upcast integer to float as needed (on inplace ops pandas-dev#2793) added fully cythonized support for int8/int16 no support for float16 (it can exist, but no cython methods for it)
2 parents eb505fd + 166a80d commit 8ad9598

37 files changed

+9634
-3178
lines changed

RELEASE.rst

Lines changed: 38 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,42 @@ Where to get it
2222
* Binary installers on PyPI: http://pypi.python.org/pypi/pandas
2323
* Documentation: http://pandas.pydata.org
2424

25+
pandas 0.10.2
26+
=============
27+
28+
**Release date:** 2013-??-??
29+
30+
**New features**
31+
32+
- Allow mixed dtypes (e.g ``float32/float64/int32/int16/int8``) to coexist in DataFrames and propogate in operations
33+
34+
**Improvements to existing features**
35+
36+
- added ``blocks`` attribute to DataFrames, to return a dict of dtypes to homogeneously dtyped DataFrames
37+
- added keyword ``convert_numeric`` to ``convert_objects()`` to try to convert object dtypes to numeric types
38+
- ``convert_dates`` in ``convert_objects`` can now be ``coerce`` which will return a datetime64[ns] dtype
39+
with non-convertibles set as ``NaT``; will preserve an all-nan object (e.g. strings)
40+
- Series print output now includes the dtype by default
41+
42+
**API Changes**
43+
44+
- Do not automatically upcast numeric specified dtypes to ``int64`` or ``float64`` (GH622_ and GH797_)
45+
- Guarantee that ``convert_objects()`` for Series/DataFrame always returns a copy
46+
- groupby operations will respect dtypes for numeric float operations (float32/float64); other types will be operated on,
47+
and will try to cast back to the input dtype (e.g. if an int is passed, as long as the output doesn't have nans,
48+
then an int will be returned)
49+
- backfill/pad/take/diff/ohlc will now support ``float32/int16/int8`` operations
50+
- Integer block types will upcast as needed in where operations (GH2793_)
51+
52+
**Bug Fixes**
53+
54+
- Fix seg fault on empty data frame when fillna with ``pad`` or ``backfill`` (GH2778_)
55+
56+
.. _GH622: https://github.com/pydata/pandas/issues/622
57+
.. _GH797: https://github.com/pydata/pandas/issues/797
58+
.. _GH2778: https://github.com/pydata/pandas/issues/2778
59+
.. _GH2793: https://github.com/pydata/pandas/issues/2793
60+
2561
pandas 0.10.1
2662
=============
2763

@@ -36,6 +72,7 @@ pandas 0.10.1
3672
- Restored inplace=True behavior returning self (same object) with
3773
deprecation warning until 0.11 (GH1893_)
3874
- ``HDFStore``
75+
3976
- refactored HFDStore to deal with non-table stores as objects, will allow future enhancements
4077
- removed keyword ``compression`` from ``put`` (replaced by keyword
4178
``complib`` to be consistent across library)
@@ -49,7 +86,7 @@ pandas 0.10.1
4986
- support data column indexing and selection, via ``data_columns`` keyword in append
5087
- support write chunking to reduce memory footprint, via ``chunksize``
5188
keyword to append
52-
- support automagic indexing via ``index`` keywork to append
89+
- support automagic indexing via ``index`` keyword to append
5390
- support ``expectedrows`` keyword in append to inform ``PyTables`` about
5491
the expected tablesize
5592
- support ``start`` and ``stop`` keywords in select to limit the row

doc/source/dsintro.rst

Lines changed: 90 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -450,15 +450,101 @@ DataFrame:
450450
df.xs('b')
451451
df.ix[2]
452452
453-
Note if a DataFrame contains columns of multiple dtypes, the dtype of the row
454-
will be chosen to accommodate all of the data types (dtype=object is the most
455-
general).
456-
457453
For a more exhaustive treatment of more sophisticated label-based indexing and
458454
slicing, see the :ref:`section on indexing <indexing>`. We will address the
459455
fundamentals of reindexing / conforming to new sets of lables in the
460456
:ref:`section on reindexing <basics.reindexing>`.
461457

458+
DataTypes
459+
~~~~~~~~~
460+
461+
.. _dsintro.column_types:
462+
463+
The main types stored in pandas objects are float, int, boolean, datetime64[ns],
464+
and object. A convenient ``dtypes`` attribute return a Series with the data type of
465+
each column.
466+
467+
.. ipython:: python
468+
469+
df['integer'] = 1
470+
df['int32'] = df['integer'].astype('int32')
471+
df['float32'] = Series([1.0]*len(df),dtype='float32')
472+
df['timestamp'] = Timestamp('20010102')
473+
df.dtypes
474+
475+
If a DataFrame contains columns of multiple dtypes, the dtype of the column
476+
will be chosen to accommodate all of the data types (dtype=object is the most
477+
general).
478+
479+
The related method ``get_dtype_counts`` will return the number of columns of
480+
each type:
481+
482+
.. ipython:: python
483+
484+
df.get_dtype_counts()
485+
486+
Numeric dtypes will propgate and can coexist in DataFrames (starting in v0.10.2).
487+
If a dtype is passed (either directly via the ``dtype`` keyword, a passed ``ndarray``,
488+
or a passed ``Series``, then it will be preserved in DataFrame operations. Furthermore, different numeric dtypes will **NOT** be combined. The following example will give you a taste.
489+
490+
.. ipython:: python
491+
492+
df1 = DataFrame(randn(8, 1), columns = ['A'], dtype = 'float32')
493+
df1
494+
df1.dtypes
495+
df2 = DataFrame(dict( A = Series(randn(8),dtype='float16'),
496+
B = Series(randn(8)),
497+
C = Series(np.array(randn(8),dtype='uint8')) ))
498+
df2
499+
df2.dtypes
500+
501+
# here you get some upcasting
502+
df3 = df1.reindex_like(df2).fillna(value=0.0) + df2
503+
df3
504+
df3.dtypes
505+
506+
# this is lower-common-denomicator upcasting (meaning you get the dtype which can accomodate all of the types)
507+
df3.values.dtype
508+
509+
Upcasting is always according to the **numpy** rules. If two different dtypes are involved in an operation, then the more *general* one will be used as the result of the operation.
510+
511+
DataType Conversion
512+
~~~~~~~~~~~~~~~~~~~
513+
514+
You can use the ``astype`` method to convert dtypes from one to another. These *always* return a copy.
515+
In addition, ``convert_objects`` will attempt to *soft* conversion of any *object* dtypes, meaning that if all the objects in a Series are of the same type, the Series
516+
will have that dtype.
517+
518+
.. ipython:: python
519+
520+
df3
521+
df3.dtypes
522+
523+
# conversion of dtypes
524+
df3.astype('float32').dtypes
525+
526+
To force conversion of specific types of number conversion, pass ``convert_numeric = True``.
527+
This will force strings and numbers alike to be numbers if possible, otherwise the will be set to ``np.nan``.
528+
To force conversion to ``datetime64[ns]``, pass ``convert_dates = 'coerce'``.
529+
This will convert any datetimelike object to dates, forcing other values to ``NaT``.
530+
531+
.. ipython:: python
532+
533+
# mixed type conversions
534+
df3['D'] = '1.'
535+
df3['E'] = '1'
536+
df3.convert_objects(convert_numeric=True).dtypes
537+
538+
# same, but specific dtype conversion
539+
df3['D'] = df3['D'].astype('float16')
540+
df3['E'] = df3['E'].astype('int32')
541+
df3.dtypes
542+
543+
# forcing date coercion
544+
s = Series([datetime(2001,1,1,0,0), 'foo', 1.0, 1, Timestamp('20010104'), '20010105'],dtype='O')
545+
s
546+
s.convert_objects(convert_dates='coerce')
547+
462548
Data alignment and arithmetic
463549
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
464550

@@ -633,26 +719,6 @@ You can also disable this feature via the ``expand_frame_repr`` option:
633719
reset_option('expand_frame_repr')
634720
635721
636-
DataFrame column types
637-
~~~~~~~~~~~~~~~~~~~~~~
638-
639-
.. _dsintro.column_types:
640-
641-
The four main types stored in pandas objects are float, int, boolean, and
642-
object. A convenient ``dtypes`` attribute return a Series with the data type of
643-
each column:
644-
645-
.. ipython:: python
646-
647-
baseball.dtypes
648-
649-
The related method ``get_dtype_counts`` will return the number of columns of
650-
each type:
651-
652-
.. ipython:: python
653-
654-
baseball.get_dtype_counts()
655-
656722
DataFrame column attribute access and IPython completion
657723
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
658724

doc/source/indexing.rst

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -304,6 +304,34 @@ so that the original data can be modified without creating a copy:
304304
305305
df.mask(df >= 0)
306306
307+
Upcasting Gotchas
308+
~~~~~~~~~~~~~~~~~
309+
310+
Performing indexing operations on ``integer`` type data can easily upcast the data to ``floating``.
311+
The dtype of the input data will be preserved in cases where ``nans`` are not introduced (coming soon).
312+
313+
.. ipython:: python
314+
315+
dfi = df.astype('int32')
316+
dfi['E'] = 1
317+
dfi
318+
dfi.dtypes
319+
320+
casted = dfi[dfi>0]
321+
casted
322+
casted.dtypes
323+
324+
While float dtypes are unchanged.
325+
326+
.. ipython:: python
327+
328+
df2 = df.copy()
329+
df2['A'] = df2['A'].astype('float32')
330+
df2.dtypes
331+
332+
casted = df2[df2>0]
333+
casted
334+
casted.dtypes
307335
308336
Take Methods
309337
~~~~~~~~~~~~

doc/source/v0.10.2.txt

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
.. _whatsnew_0102:
2+
3+
v0.10.2 (February ??, 2013)
4+
---------------------------
5+
6+
This is a minor release from 0.10.1 and includes many new features and
7+
enhancements along with a large number of bug fixes. There are also a number of
8+
important API changes that long-time pandas users should pay close attention
9+
to.
10+
11+
API changes
12+
~~~~~~~~~~~
13+
14+
Numeric dtypes will propgate and can coexist in DataFrames. If a dtype is passed (either directly via the ``dtype`` keyword, a passed ``ndarray``, or a passed ``Series``, then it will be preserved in DataFrame operations. Furthermore, different numeric dtypes will **NOT** be combined. The following example will give you a taste.
15+
16+
**Dtype Specification**
17+
18+
.. ipython:: python
19+
20+
df1 = DataFrame(randn(8, 1), columns = ['A'], dtype = 'float32')
21+
df1
22+
df1.dtypes
23+
df2 = DataFrame(dict( A = Series(randn(8),dtype='float16'), B = Series(randn(8)), C = Series(randn(8),dtype='uint8') ))
24+
df2
25+
df2.dtypes
26+
27+
# here you get some upcasting
28+
df3 = df1.reindex_like(df2).fillna(value=0.0) + df2
29+
df3
30+
df3.dtypes
31+
32+
**Dtype conversion**
33+
34+
.. ipython:: python
35+
36+
# this is lower-common-denomicator upcasting (meaning you get the dtype which can accomodate all of the types)
37+
df3.values.dtype
38+
39+
# conversion of dtypes
40+
df3.astype('float32').dtypes
41+
42+
# mixed type conversions
43+
df3['D'] = '1.'
44+
df3['E'] = '1'
45+
df3.convert_objects(convert_numeric=True).dtypes
46+
47+
# same, but specific dtype conversion
48+
df3['D'] = df3['D'].astype('float16')
49+
df3['E'] = df3['E'].astype('int32')
50+
df3.dtypes
51+
52+
# forcing date coercion
53+
s = Series([datetime(2001,1,1,0,0), 'foo', 1.0, 1,
54+
Timestamp('20010104'), '20010105'],dtype='O')
55+
s.convert_objects(convert_dates='coerce')
56+
57+
**Upcasting Gotchas**
58+
59+
Performing indexing operations on integer type data can easily upcast the data.
60+
The dtype of the input data will be preserved in cases where ``nans`` are not introduced (coming soon).
61+
62+
.. ipython:: python
63+
64+
dfi = df3.astype('int32')
65+
dfi['D'] = dfi['D'].astype('int64')
66+
dfi
67+
dfi.dtypes
68+
69+
casted = dfi[dfi>0]
70+
casted
71+
casted.dtypes
72+
73+
While float dtypes are unchanged.
74+
75+
.. ipython:: python
76+
77+
df4 = df3.copy()
78+
df4['A'] = df4['A'].astype('float32')
79+
df4.dtypes
80+
81+
casted = df4[df4>0]
82+
casted
83+
casted.dtypes
84+
85+
New features
86+
~~~~~~~~~~~~
87+
88+
**Enhancements**
89+
90+
**Bug Fixes**
91+
92+
See the `full release notes
93+
<https://github.com/pydata/pandas/blob/master/RELEASE.rst>`__ or issue tracker
94+
on GitHub for a complete list.
95+

doc/source/whatsnew.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ What's New
1616

1717
These are new features and improvements of note in each release.
1818

19+
.. include:: v0.10.2.txt
20+
1921
.. include:: v0.10.1.txt
2022

2123
.. include:: v0.10.0.txt

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy