annotate flys-backend/contrib/shpimporter/utils.py @ 4935:c0a58558b817 dami

Importer: - Handle regular expressions for attribute names - Convert Strings to UTF-8 - Add regular expressions for hws_points values
author Andre Heinecke <aheinecke@intevation.de>
date Thu, 31 Jan 2013 12:23:41 +0100
parents 1f6e544f7a7f
children 174fbaa3d344
rev   line source
2798
5a654f2e35bc Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff changeset
1 import os
4874
b1d7e600b43b (importer) Add utility function to convert paths to utf-8
Andre Heinecke <aheinecke@intevation.de>
parents: 3654
diff changeset
2 import sys
3654
59ca5dab2782 Shape importer: use python's OptionParse to read user specific configuration from command line.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents: 2798
diff changeset
3 from shpimporter import DEBUG, INFO, ERROR
2798
5a654f2e35bc Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff changeset
4
5a654f2e35bc Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff changeset
5 SHP='.shp'
5a654f2e35bc Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff changeset
6
5a654f2e35bc Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff changeset
7 def findShapefiles(path):
5a654f2e35bc Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff changeset
8 shapes = []
5a654f2e35bc Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff changeset
9
5a654f2e35bc Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff changeset
10 for root, dirs, files in os.walk(path):
5a654f2e35bc Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff changeset
11 if len(files) == 0:
5a654f2e35bc Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff changeset
12 continue
5a654f2e35bc Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff changeset
13
3654
59ca5dab2782 Shape importer: use python's OptionParse to read user specific configuration from command line.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents: 2798
diff changeset
14 DEBUG("Processing directory '%s' with %i files " % (root, len(files)))
2798
5a654f2e35bc Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff changeset
15
5a654f2e35bc Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff changeset
16 for f in files:
5a654f2e35bc Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff changeset
17 idx = f.find(SHP)
5a654f2e35bc Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff changeset
18 if (idx+len(SHP)) == len(f):
5a654f2e35bc Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff changeset
19 shapes.append((f.replace(SHP, ''), root + "/" + f))
5a654f2e35bc Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff changeset
20
5a654f2e35bc Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff changeset
21 return shapes
5a654f2e35bc Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff changeset
22
4935
c0a58558b817 Importer: - Handle regular expressions for attribute names
Andre Heinecke <aheinecke@intevation.de>
parents: 4887
diff changeset
23 def getUTF8(string):
c0a58558b817 Importer: - Handle regular expressions for attribute names
Andre Heinecke <aheinecke@intevation.de>
parents: 4887
diff changeset
24 """
c0a58558b817 Importer: - Handle regular expressions for attribute names
Andre Heinecke <aheinecke@intevation.de>
parents: 4887
diff changeset
25 Tries to convert the string to a UTF-8 encoding by first checking if it
c0a58558b817 Importer: - Handle regular expressions for attribute names
Andre Heinecke <aheinecke@intevation.de>
parents: 4887
diff changeset
26 is UTF-8 and then trying cp1252
c0a58558b817 Importer: - Handle regular expressions for attribute names
Andre Heinecke <aheinecke@intevation.de>
parents: 4887
diff changeset
27 """
c0a58558b817 Importer: - Handle regular expressions for attribute names
Andre Heinecke <aheinecke@intevation.de>
parents: 4887
diff changeset
28 try:
c0a58558b817 Importer: - Handle regular expressions for attribute names
Andre Heinecke <aheinecke@intevation.de>
parents: 4887
diff changeset
29 return unicode.encode(unicode(string, "UTF-8"), "UTF-8")
c0a58558b817 Importer: - Handle regular expressions for attribute names
Andre Heinecke <aheinecke@intevation.de>
parents: 4887
diff changeset
30 except UnicodeDecodeError:
c0a58558b817 Importer: - Handle regular expressions for attribute names
Andre Heinecke <aheinecke@intevation.de>
parents: 4887
diff changeset
31 # Probably European Windows names so lets try again
c0a58558b817 Importer: - Handle regular expressions for attribute names
Andre Heinecke <aheinecke@intevation.de>
parents: 4887
diff changeset
32 return unicode.encode(unicode(string, "cp1252"), "UTF-8")
c0a58558b817 Importer: - Handle regular expressions for attribute names
Andre Heinecke <aheinecke@intevation.de>
parents: 4887
diff changeset
33
4874
b1d7e600b43b (importer) Add utility function to convert paths to utf-8
Andre Heinecke <aheinecke@intevation.de>
parents: 3654
diff changeset
34 def getUTF8Path(path):
b1d7e600b43b (importer) Add utility function to convert paths to utf-8
Andre Heinecke <aheinecke@intevation.de>
parents: 3654
diff changeset
35 """
b1d7e600b43b (importer) Add utility function to convert paths to utf-8
Andre Heinecke <aheinecke@intevation.de>
parents: 3654
diff changeset
36 Tries to convert path to utf-8 by first checking the filesystemencoding
b1d7e600b43b (importer) Add utility function to convert paths to utf-8
Andre Heinecke <aheinecke@intevation.de>
parents: 3654
diff changeset
37 and trying the default windows encoding afterwards.
b1d7e600b43b (importer) Add utility function to convert paths to utf-8
Andre Heinecke <aheinecke@intevation.de>
parents: 3654
diff changeset
38 Returns a valid UTF-8 encoded unicode object or throws a UnicodeDecodeError
b1d7e600b43b (importer) Add utility function to convert paths to utf-8
Andre Heinecke <aheinecke@intevation.de>
parents: 3654
diff changeset
39 """
b1d7e600b43b (importer) Add utility function to convert paths to utf-8
Andre Heinecke <aheinecke@intevation.de>
parents: 3654
diff changeset
40 try:
b1d7e600b43b (importer) Add utility function to convert paths to utf-8
Andre Heinecke <aheinecke@intevation.de>
parents: 3654
diff changeset
41 return unicode.encode(unicode(path, sys.getfilesystemencoding()), "UTF-8")
b1d7e600b43b (importer) Add utility function to convert paths to utf-8
Andre Heinecke <aheinecke@intevation.de>
parents: 3654
diff changeset
42 except UnicodeDecodeError:
4887
1f6e544f7a7f Importer: Use cp1252 instead of latin-9 to guess filename encodings
Andre Heinecke <aheinecke@intevation.de>
parents: 4884
diff changeset
43 # Probably European Windows names so lets try again
1f6e544f7a7f Importer: Use cp1252 instead of latin-9 to guess filename encodings
Andre Heinecke <aheinecke@intevation.de>
parents: 4884
diff changeset
44 return unicode.encode(unicode(path, "cp1252"), "UTF-8")

http://dive4elements.wald.intevation.org