annotate flys-artifacts/src/main/java/de/intevation/flys/artifacts/math/Outlier.java @ 3565:b136113dad53

FixA: Only evict only one(!) data point as outlier before recalculating the function. flys-artifacts/trunk@5163 c6561f87-3c4e-4783-a992-168aeb5c3f6f
author Sascha L. Teichmann <sascha.teichmann@intevation.de>
date Tue, 31 Jul 2012 16:46:36 +0000
parents e01b9d1bc941
children
rev   line source
2645
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
1 package de.intevation.flys.artifacts.math;
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
2
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
3 import java.util.List;
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
4
2646
c11da3540b70 Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2645
diff changeset
5 import org.apache.commons.math.MathException;
c11da3540b70 Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2645
diff changeset
6
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
7 import org.apache.commons.math.distribution.TDistributionImpl;
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
8
2645
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
9 import org.apache.commons.math.stat.descriptive.moment.Mean;
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
10 import org.apache.commons.math.stat.descriptive.moment.StandardDeviation;
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
11
2646
c11da3540b70 Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2645
diff changeset
12 import org.apache.log4j.Logger;
c11da3540b70 Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2645
diff changeset
13
2645
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
14 public class Outlier
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
15 {
3564
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
16 public static final double EPSILON = 1e-5;
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
17
3011
ab81ffd1343e FixA: Reactivated rewrite of the outlier checks.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2646
diff changeset
18 public static final double DEFAULT_ALPHA = 0.05;
ab81ffd1343e FixA: Reactivated rewrite of the outlier checks.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2646
diff changeset
19
2646
c11da3540b70 Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2645
diff changeset
20 private static Logger log = Logger.getLogger(Outlier.class);
c11da3540b70 Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2645
diff changeset
21
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
22 protected Outlier() {
2645
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
23 }
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
24
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
25 public static Integer findOutlier(List<Double> values) {
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
26 return findOutlier(values, DEFAULT_ALPHA);
3011
ab81ffd1343e FixA: Reactivated rewrite of the outlier checks.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2646
diff changeset
27 }
ab81ffd1343e FixA: Reactivated rewrite of the outlier checks.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2646
diff changeset
28
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
29 public static Integer findOutlier(List<Double> values, double alpha) {
3564
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
30 boolean debug = log.isDebugEnabled();
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
31
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
32 if (debug) {
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
33 log.debug("outliers significance: " + alpha);
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
34 }
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
35
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
36 alpha = 1d - alpha;
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
37
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
38 int N = values.size();
3564
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
39
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
40 if (debug) {
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
41 log.debug("Values to check: " + N);
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
42 }
2646
c11da3540b70 Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2645
diff changeset
43
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
44 if (N < 3) {
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
45 return null;
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
46 }
3564
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
47
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
48 Mean mean = new Mean();
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
49 StandardDeviation std = new StandardDeviation();
3564
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
50
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
51 for (Double value: values) {
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
52 double v = value.doubleValue();
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
53 mean.increment(v);
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
54 std .increment(v);
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
55 }
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
56
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
57 double m = mean.getResult();
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
58 double s = std.getResult();
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
59
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
60 if (debug) {
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
61 log.debug("mean: " + m);
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
62 log.debug("std dev: " + s);
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
63 }
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
64
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
65 double maxZ = -Double.MAX_VALUE;
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
66 int iv = -1;
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
67 for (int i = N-1; i >= 0; --i) {
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
68 double v = values.get(i).doubleValue();
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
69 double z = Math.abs(v - m);
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
70 if (z > maxZ) {
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
71 maxZ = z;
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
72 iv = i;
2646
c11da3540b70 Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2645
diff changeset
73 }
2645
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
74 }
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
75
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
76 if (Math.abs(s) < EPSILON) {
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
77 return null;
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
78 }
2645
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
79
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
80 maxZ /= s;
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
81
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
82 TDistributionImpl tdist = new TDistributionImpl(N-2);
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
83
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
84 double t;
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
85
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
86 try {
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
87 t = tdist.inverseCumulativeProbability(alpha/(N+N));
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
88 }
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
89 catch (MathException me) {
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
90 log.error(me);
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
91 return null;
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
92 }
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
93
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
94 t *= t;
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
95
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
96 double za = ((N-1)/Math.sqrt(N))*Math.sqrt(t/(N-2d+t));
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
97
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
98 if (debug) {
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
99 log.debug("max: " + maxZ + " crit: " + za);
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
100 }
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
101
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
102 return maxZ > za
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
103 ? Integer.valueOf(iv)
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
104 : null;
2645
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
105 }
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
106 }
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
107 // vim:set ts=4 sw=4 si et sta sts=4 fenc=utf8 :

http://dive4elements.wald.intevation.org