Pandas not reindexing properly with NaN

Latecomer :

I am having trouble reindexing a pandas dataframe after dropping NaN values.

I am trying to extract dicts in a df column to another df, then join those values back to the original df in the corresponding rows.

df = pd.DataFrame({'col1': [1, 2, 3, 4, 5], 
                   'col2': [np.NaN, np.NaN, {'aa': 11, 'bb': 22}, {'aa': 33, 'bb': 44}, {'aa': 55, 'bb': 66}]})
df

    col1 col2
0   1    NaN
1   2    NaN
2   3    {'aa': 11, 'bb': 22}
3   4    {'aa': 33, 'bb': 44}
4   5    {'aa': 55, 'bb': 66}

The desired end result is:

    col1    aa      bb
0   1       NaN     NaN
1   2       NaN     NaN
2   3       11      22
3   4       33      44
4   5       55      66

If I pass col2 to the pandas .tolist() function, the dict is not unpacked.

pd.DataFrame(df['col2'].tolist())

0   NaN
1   NaN
2   {'aa': 11, 'bb': 22}
3   {'aa': 33, 'bb': 44}
4   {'aa': 55, 'bb': 66}

If I use dropna(), the dict is unpacked but the index is reset

pd.DataFrame(df['col2'].dropna().tolist())

    aa  bb
0   11  22
1   33  44
2   55  66

If I try to reset the index to that of the original df, the row data appear in different index positions.

pd.DataFrame(df['col2'].dropna().tolist()).reindex(df.index)

    aa  bb
0   11.0    22.0
1   33.0    44.0
2   55.0    66.0
3   NaN     NaN
4   NaN     NaN

The data is varied, and there is no way to know how many NaN values will be at any point in the column.

Any help is very much appreciated.

YOBEN_S :

IIUC fix your code by passing the index after dropna

s=df.col2.dropna()
df=df.join(pd.DataFrame(s.tolist(), index=s.index))
df
Out[103]: 
   col1                  col2    aa    bb
0     1                   NaN   NaN   NaN
1     2                   NaN   NaN   NaN
2     3  {'aa': 11, 'bb': 22}  11.0  22.0
3     4  {'aa': 33, 'bb': 44}  33.0  44.0
4     5  {'aa': 55, 'bb': 66}  55.0  66.0

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=21222&siteId=1