For a list of numbers ranging from x
to y
that may contain NaN
, how can I normalise between 0 and 1, ignoring the NaN
values (they stay as NaN
).
Typically I would use MinMaxScaler
(ref page) from sklearn.preprocessing
, but this cannot handle NaN
and recommends imputing the values based on mean or median etc. it doesn't offer the option to ignore all the NaN
values.
consider pd.Series
s
s = pd.Series(np.random.choice([3, 4, 5, 6, np.nan], 100))
s.hist()
Option 1
Min Max Scaling
new = s.sub(s.min()).div((s.max() - s.min()))
new.hist()
NOT WHAT OP ASKED FOR
I put these in because I wanted to
Option 2
sigmoid
sigmoid = lambda x: 1 / (1 + np.exp(-x))
new = sigmoid(s.sub(s.mean()))
new.hist()
Option 3
tanh (hyperbolic tangent)
new = np.tanh(s.sub(s.mean())).add(1).div(2)
new.hist()
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments