Mercurial > dive4elements > river
annotate flys-artifacts/src/main/java/de/intevation/flys/artifacts/math/Outlier.java @ 4187:21f4e4b79121
Refactor GaugeDischargeCurveFacet to be able to set a facet name
For adding another output of the GaugeDischargeCurveArtifact it is necessary to
provide to facet instances with different names. Therefore the
GaugeDischargeCurveFacet is extended to set the facet name in the constructor.
author | Björn Ricks <bjoern.ricks@intevation.de> |
---|---|
date | Fri, 19 Oct 2012 13:25:49 +0200 |
parents | b136113dad53 |
children |
rev | line source |
---|---|
2645
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
1 package de.intevation.flys.artifacts.math; |
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
2 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
3 import java.util.List; |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
4 |
2646
c11da3540b70
Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2645
diff
changeset
|
5 import org.apache.commons.math.MathException; |
c11da3540b70
Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2645
diff
changeset
|
6 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
7 import org.apache.commons.math.distribution.TDistributionImpl; |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
8 |
2645
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
9 import org.apache.commons.math.stat.descriptive.moment.Mean; |
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
10 import org.apache.commons.math.stat.descriptive.moment.StandardDeviation; |
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
11 |
2646
c11da3540b70
Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2645
diff
changeset
|
12 import org.apache.log4j.Logger; |
c11da3540b70
Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2645
diff
changeset
|
13 |
2645
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
14 public class Outlier |
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
15 { |
3564
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
16 public static final double EPSILON = 1e-5; |
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
17 |
3011
ab81ffd1343e
FixA: Reactivated rewrite of the outlier checks.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2646
diff
changeset
|
18 public static final double DEFAULT_ALPHA = 0.05; |
ab81ffd1343e
FixA: Reactivated rewrite of the outlier checks.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2646
diff
changeset
|
19 |
2646
c11da3540b70
Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2645
diff
changeset
|
20 private static Logger log = Logger.getLogger(Outlier.class); |
c11da3540b70
Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2645
diff
changeset
|
21 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
22 protected Outlier() { |
2645
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
23 } |
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
24 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
25 public static Integer findOutlier(List<Double> values) { |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
26 return findOutlier(values, DEFAULT_ALPHA); |
3011
ab81ffd1343e
FixA: Reactivated rewrite of the outlier checks.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2646
diff
changeset
|
27 } |
ab81ffd1343e
FixA: Reactivated rewrite of the outlier checks.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2646
diff
changeset
|
28 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
29 public static Integer findOutlier(List<Double> values, double alpha) { |
3564
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
30 boolean debug = log.isDebugEnabled(); |
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
31 |
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
32 if (debug) { |
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
33 log.debug("outliers significance: " + alpha); |
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
34 } |
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
35 |
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
36 alpha = 1d - alpha; |
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
37 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
38 int N = values.size(); |
3564
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
39 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
40 if (debug) { |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
41 log.debug("Values to check: " + N); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
42 } |
2646
c11da3540b70
Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2645
diff
changeset
|
43 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
44 if (N < 3) { |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
45 return null; |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
46 } |
3564
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
47 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
48 Mean mean = new Mean(); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
49 StandardDeviation std = new StandardDeviation(); |
3564
e01b9d1bc941
FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3011
diff
changeset
|
50 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
51 for (Double value: values) { |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
52 double v = value.doubleValue(); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
53 mean.increment(v); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
54 std .increment(v); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
55 } |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
56 |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
57 double m = mean.getResult(); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
58 double s = std.getResult(); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
59 |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
60 if (debug) { |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
61 log.debug("mean: " + m); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
62 log.debug("std dev: " + s); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
63 } |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
64 |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
65 double maxZ = -Double.MAX_VALUE; |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
66 int iv = -1; |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
67 for (int i = N-1; i >= 0; --i) { |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
68 double v = values.get(i).doubleValue(); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
69 double z = Math.abs(v - m); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
70 if (z > maxZ) { |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
71 maxZ = z; |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
72 iv = i; |
2646
c11da3540b70
Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
2645
diff
changeset
|
73 } |
2645
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
74 } |
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
75 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
76 if (Math.abs(s) < EPSILON) { |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
77 return null; |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
78 } |
2645
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
79 |
3565
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
80 maxZ /= s; |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
81 |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
82 TDistributionImpl tdist = new TDistributionImpl(N-2); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
83 |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
84 double t; |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
85 |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
86 try { |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
87 t = tdist.inverseCumulativeProbability(alpha/(N+N)); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
88 } |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
89 catch (MathException me) { |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
90 log.error(me); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
91 return null; |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
92 } |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
93 |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
94 t *= t; |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
95 |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
96 double za = ((N-1)/Math.sqrt(N))*Math.sqrt(t/(N-2d+t)); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
97 |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
98 if (debug) { |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
99 log.debug("max: " + maxZ + " crit: " + za); |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
100 } |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
101 |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
102 return maxZ > za |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
103 ? Integer.valueOf(iv) |
b136113dad53
FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
3564
diff
changeset
|
104 : null; |
2645
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
105 } |
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
106 } |
4f7d1ea38404
Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff
changeset
|
107 // vim:set ts=4 sw=4 si et sta sts=4 fenc=utf8 : |