Spark MLlib调试笔记之二:AttributeError: 'DataFrame' object has no attribute 'map'

 
avgAge.collect()
Out[6]:
[Row(home='Mechelen', mean=53.0),
 Row(home='Leuven', mean=42.0),
 Row(home='Brussels', mean=33.5)]
因为SchemaRDD也是一种RDD,所以你之前学到的所有RDD上的transform或者action等operation都可以用,同时你可以用row.fieldname取出来某个field,如下:
In [7]:

print(avgAge
      .map(lambda row: "Average age in {0} is {1} years"
                        .format(row.home, row.mean))
      .reduce(lambda x, y: x + "\n" + y))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-7-564345f0ca34> in <module>()
----> 1 print(avgAge
      2       .map(lambda row: "Average age in {0} is {1} years"
      3                         .format(row.home, row.mean))
      4       .reduce(lambda x, y: x + "\n" + y))

~/Downloads/spyder/lib/python3.6/site-packages/pyspark/sql/dataframe.py in __getattr__(self, name)
   1180         if name not in self.columns:
   1181             raise AttributeError(
-> 1182                 "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
   1183         jc = self._jdf.apply(name)
   1184         return Column(jc)

AttributeError: 'DataFrame' object has no attribute 'map'

解析:

You can't map a dataframe, but you can convert the dataframe to an RDD and map that by doing avgAge.rdd.map(). Prior to Spark 2.0, avgAge.map would alias to avgAge.rdd.map(). With Spark 2.0, you must explicitly call .rdd first.

猜你喜欢

转载自blog.csdn.net/m0_37870649/article/details/81607473