- Pydoop now features a brand new, more pythonic MapReduce API
- The path module offers many new functions that serve as the HDFS-aware counterparts of those in os.path
- Added pydoop submit app for job submission
- The pipes backend (except for the performance-critical serialization section) has been reimplemented in pure Python
- An alternative (optional) JPype HDFS backend is available (currently slower than the one based on libhdfs)
- Extension modules do not require Boost.Python anymore
- Added support for Hadoop 2.4.1, 2.5.2 and 2.6.0
- Removed support for Hadoop 0.20.2 and CDH3
- YARN is now fully supported
- Added support for CDH 4.4.0 and CDH 4.5.0
- Added support for hadoop 2.2.0
- Added support for hadoop 1.2.1
- Added support for CDH 4.3.0
- Added a walk() method to hdfs instances (works similarly to os.walk() from Python’s standard library)
- The Hadoop version parser is now more flexible. It should be able to parse version strings for all CDH releases, including older ones (note that most of them are not supported)
- Pydoop script can now handle modules whose file name has no extension
- Fixed “unable to load native-hadoop library” problem (thanks to Liam Slusser)
Fixed a bug that was causing the pipes runner to incorrectly preprocess command line options.
Fixed several bugs triggered by using a local fs as the default fs for Hadoop. This happens when you set a file: path as the value of fs.default.name in core-site.xml. For instance:
<property>
<name>fs.default.name</name>
<value>file:///var/hadoop/data</value>
</property>