annotate flys-artifacts/src/main/java/de/intevation/flys/artifacts/math/Outlier.java @ 4478:6153c50f78cf

WaterLineArtifact: Added callcontext-parameter to interfaces getWaterLine. Update all implementations. The change was done to be able to compute the extreme values during getWaterLine to access data needed in CrossSectionProfile Diagrams.
author Felix Wolfsteller <felix.wolfsteller@intevation.de>
date Tue, 13 Nov 2012 14:46:44 +0100
parents b136113dad53
children
rev   line source
2645
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
1 package de.intevation.flys.artifacts.math;
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
2
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
3 import java.util.List;
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
4
2646
c11da3540b70 Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2645
diff changeset
5 import org.apache.commons.math.MathException;
c11da3540b70 Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2645
diff changeset
6
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
7 import org.apache.commons.math.distribution.TDistributionImpl;
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
8
2645
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
9 import org.apache.commons.math.stat.descriptive.moment.Mean;
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
10 import org.apache.commons.math.stat.descriptive.moment.StandardDeviation;
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
11
2646
c11da3540b70 Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2645
diff changeset
12 import org.apache.log4j.Logger;
c11da3540b70 Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2645
diff changeset
13
2645
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
14 public class Outlier
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
15 {
3564
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
16 public static final double EPSILON = 1e-5;
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
17
3011
ab81ffd1343e FixA: Reactivated rewrite of the outlier checks.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2646
diff changeset
18 public static final double DEFAULT_ALPHA = 0.05;
ab81ffd1343e FixA: Reactivated rewrite of the outlier checks.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2646
diff changeset
19
2646
c11da3540b70 Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2645
diff changeset
20 private static Logger log = Logger.getLogger(Outlier.class);
c11da3540b70 Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2645
diff changeset
21
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
22 protected Outlier() {
2645
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
23 }
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
24
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
25 public static Integer findOutlier(List<Double> values) {
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
26 return findOutlier(values, DEFAULT_ALPHA);
3011
ab81ffd1343e FixA: Reactivated rewrite of the outlier checks.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2646
diff changeset
27 }
ab81ffd1343e FixA: Reactivated rewrite of the outlier checks.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2646
diff changeset
28
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
29 public static Integer findOutlier(List<Double> values, double alpha) {
3564
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
30 boolean debug = log.isDebugEnabled();
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
31
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
32 if (debug) {
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
33 log.debug("outliers significance: " + alpha);
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
34 }
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
35
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
36 alpha = 1d - alpha;
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
37
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
38 int N = values.size();
3564
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
39
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
40 if (debug) {
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
41 log.debug("Values to check: " + N);
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
42 }
2646
c11da3540b70 Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2645
diff changeset
43
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
44 if (N < 3) {
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
45 return null;
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
46 }
3564
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
47
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
48 Mean mean = new Mean();
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
49 StandardDeviation std = new StandardDeviation();
3564
e01b9d1bc941 FixA: Corrected the formulas of Grubbs' test for outliers. Still a bit broken.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3011
diff changeset
50
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
51 for (Double value: values) {
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
52 double v = value.doubleValue();
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
53 mean.increment(v);
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
54 std .increment(v);
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
55 }
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
56
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
57 double m = mean.getResult();
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
58 double s = std.getResult();
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
59
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
60 if (debug) {
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
61 log.debug("mean: " + m);
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
62 log.debug("std dev: " + s);
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
63 }
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
64
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
65 double maxZ = -Double.MAX_VALUE;
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
66 int iv = -1;
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
67 for (int i = N-1; i >= 0; --i) {
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
68 double v = values.get(i).doubleValue();
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
69 double z = Math.abs(v - m);
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
70 if (z > maxZ) {
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
71 maxZ = z;
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
72 iv = i;
2646
c11da3540b70 Checked in out dated version of outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 2645
diff changeset
73 }
2645
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
74 }
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
75
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
76 if (Math.abs(s) < EPSILON) {
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
77 return null;
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
78 }
2645
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
79
3565
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
80 maxZ /= s;
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
81
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
82 TDistributionImpl tdist = new TDistributionImpl(N-2);
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
83
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
84 double t;
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
85
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
86 try {
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
87 t = tdist.inverseCumulativeProbability(alpha/(N+N));
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
88 }
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
89 catch (MathException me) {
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
90 log.error(me);
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
91 return null;
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
92 }
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
93
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
94 t *= t;
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
95
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
96 double za = ((N-1)/Math.sqrt(N))*Math.sqrt(t/(N-2d+t));
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
97
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
98 if (debug) {
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
99 log.debug("max: " + maxZ + " crit: " + za);
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
100 }
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
101
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
102 return maxZ > za
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
103 ? Integer.valueOf(iv)
b136113dad53 FixA: Only evict only one(!) data point as outlier before recalculating the function.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents: 3564
diff changeset
104 : null;
2645
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
105 }
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
106 }
4f7d1ea38404 Added simple Grubb's outlier test.
Sascha L. Teichmann <sascha.teichmann@intevation.de>
parents:
diff changeset
107 // vim:set ts=4 sw=4 si et sta sts=4 fenc=utf8 :

http://dive4elements.wald.intevation.org