I want to print the outliers (green points) of my boxplot but I don't know how: boxplot
This is my code:
flierprops = dict(marker='o', markerfacecolor='green', markersize=2,
linestyle='none')
plt.boxplot(derivation, vert=False, flierprops=flierprops)
Thanks for helping me!
IIUC:
There are (n) points beyond the whiskers (upper and lower) and you wish to print (display) the values of these points, from your dataset.
Generally, outliers can be visualised as the values outside the upper and lower whiskers of a box plot. The upper and lower whiskers can be defined in a number of ways.
One method is:
Lower: Q1 - k * IQR
Upper: Q3 + k * IQR
where k
is (generally) defined as 1.5, and the IQR (inner quartile range) is defined as:
IQR = Q3 - Q1 = qn(0.75) - qn(0.25)
The following dataset can be visualised as:
array([ 65.46329369, 91.64897781, 96.85666088, 60.18189851,
30.53996122, 55.12666144, 63.00161253, 29.97804178,
...,
47.98458963, 37.69556267, 44.26758617, 58.60869412,
150. , 155. , 160. , 165. ,
170. , 175. ])
To extract the outliers (we'll focus on the upper outliers), we first need to know the IQRs, which can be found using:
pandas.Series(data).describe()
Output:
count 106.000000
mean 62.569111
std 29.698729
min 0.000000
25% 46.934198
50% 57.002615
75% 69.516237
max 175.000000
Determine the approximate whisker values:
iqr = (69.516 - 46.934)
upper = 69.516 + (iqr*1.5)
lower = 46.934 - (iqr*1.5)
>>> iqr, upper, lower
>>> (22.582, 103.389, 13.06)
Extract (examine) the values of the upper outliers:
data[data > upper]
>>> array([150., 155., 160., 165., 170., 175.])
Este artículo se recopila de Internet, indique la fuente cuando se vuelva a imprimir.
En caso de infracción, por favor [email protected] Eliminar
Déjame decir algunas palabras