Mercurial > dive4elements > river
annotate flys-artifacts/src/main/java/de/intevation/flys/artifacts/math/Outlier.java @ 3727:b81f328da582
Removed code duplication.
flys-artifacts/trunk@5399 c6561f87-3c4e-4783-a992-168aeb5c3f6f
author | Sascha L. Teichmann <sascha.teichmann@intevation.de> |
---|---|
date | Sat, 08 Sep 2012 12:58:58 +0000 |
parents | b136113dad53 |
children |
rev | line source |
---|---|
2645
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
1 package de.intevation.flys.artifacts.math; |
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
2 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
3 import java.util.List; |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
4 |
2646
c11da3540b70
Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2645
diff
changeset
|
5 import org.apache.commons.math.MathException; |
c11da3540b70
Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2645
diff
changeset
|
6 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
7 import org.apache.commons.math.distribution.TDistributionImpl; |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
8 |
2645
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
9 import org.apache.commons.math.stat.descriptive.moment.Mean; |
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
10 import org.apache.commons.math.stat.descriptive.moment.StandardDeviation; |
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
11 |
2646
c11da3540b70
Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2645
diff
changeset
|
12 import org.apache.log4j.Logger; |
c11da3540b70
Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2645
diff
changeset
|
13 |
2645
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
14 public class Outlier |
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
15 { |
3564
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
16 public static final double EPSILON = 1e-5; |
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
17 |
3011
ab81ffd1343e
FixA: Reactivated rewrite of the outlier checks.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2646
diff
changeset
|
18 public static final double DEFAULT_ALPHA = 0.05; |
ab81ffd1343e
FixA: Reactivated rewrite of the outlier checks.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2646
diff
changeset
|
19 |
2646
c11da3540b70
Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2645
diff
changeset
|
20 private static Logger log = Logger.getLogger(Outlier.class); |
c11da3540b70
Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2645
diff
changeset
|
21 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
22 protected Outlier() { |
2645
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
23 } |
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
24 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
25 public static Integer findOutlier(List<Double> values) { |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
26 return findOutlier(values, DEFAULT_ALPHA); |
3011
ab81ffd1343e
FixA: Reactivated rewrite of the outlier checks.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2646
diff
changeset
|
27 } |
ab81ffd1343e
FixA: Reactivated rewrite of the outlier checks.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2646
diff
changeset
|
28 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
29 public static Integer findOutlier(List<Double> values, double alpha) { |
3564
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
30 boolean debug = log.isDebugEnabled(); |
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
31 |
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
32 if (debug) { |
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
33 log.debug("outliers significance: " + alpha); |
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
34 } |
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
35 |
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
36 alpha = 1d - alpha; |
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
37 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
38 int N = values.size(); |
3564
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
39 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
40 if (debug) { |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
41 log.debug("Values to check: " + N); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
42 } |
2646
c11da3540b70
Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2645
diff
changeset
|
43 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
44 if (N < 3) { |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
45 return null; |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
46 } |
3564
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
47 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
48 Mean mean = new Mean(); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
49 StandardDeviation std = new StandardDeviation(); |
3564
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
50 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
51 for (Double value: values) { |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
52 double v = value.doubleValue(); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
53 mean.increment(v); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
54 std .increment(v); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
55 } |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
56 |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
57 double m = mean.getResult(); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
58 double s = std.getResult(); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
59 |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
60 if (debug) { |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
61 log.debug("mean: " + m); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
62 log.debug("std dev: " + s); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
63 } |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
64 |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
65 double maxZ = -Double.MAX_VALUE; |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
66 int iv = -1; |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
67 for (int i = N-1; i >= 0; --i) { |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
68 double v = values.get(i).doubleValue(); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
69 double z = Math.abs(v - m); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
70 if (z > maxZ) { |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
71 maxZ = z; |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
72 iv = i; |
2646
c11da3540b70
Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2645
diff
changeset
|
73 } |
2645
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
74 } |
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
75 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
76 if (Math.abs(s) < EPSILON) { |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
77 return null; |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
78 } |
2645
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
79 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
80 maxZ /= s; |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
81 |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
82 TDistributionImpl tdist = new TDistributionImpl(N-2); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
83 |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
84 double t; |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
85 |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
86 try { |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
87 t = tdist.inverseCumulativeProbability(alpha/(N+N)); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
88 } |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
89 catch (MathException me) { |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
90 log.error(me); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
91 return null; |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
92 } |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
93 |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
94 t *= t; |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
95 |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
96 double za = ((N-1)/Math.sqrt(N))*Math.sqrt(t/(N-2d+t)); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
97 |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
98 if (debug) { |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
99 log.debug("max: " + maxZ + " crit: " + za); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
100 } |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
101 |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
102 return maxZ > za |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
103 ? Integer.valueOf(iv) |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
104 : null; |
2645
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
105 } |
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
106 } |
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
107 // vim:set ts=4 sw=4 si et sta sts=4 fenc=utf8 : |