Mercurial > dive4elements > river
annotate flys-backend/contrib/shpimporter/utils.py @ 4935:c0a58558b817 dami
Importer: - Handle regular expressions for attribute names
- Convert Strings to UTF-8
- Add regular expressions for hws_points values
author | Andre Heinecke <aheinecke@intevation.de> |
---|---|
date | Thu, 31 Jan 2013 12:23:41 +0100 |
parents | 1f6e544f7a7f |
children | 174fbaa3d344 |
rev | line source |
---|---|
2798
5a654f2e35bc
Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff
changeset
|
1 import os |
4874
b1d7e600b43b
(importer) Add utility function to convert paths to utf-8
Andre Heinecke <aheinecke@intevation.de>
parents:
3654
diff
changeset
|
2 import sys |
3654
59ca5dab2782
Shape importer: use python's OptionParse to read user specific configuration from command line.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
2798
diff
changeset
|
3 from shpimporter import DEBUG, INFO, ERROR |
2798
5a654f2e35bc
Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff
changeset
|
4 |
5a654f2e35bc
Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff
changeset
|
5 SHP='.shp' |
5a654f2e35bc
Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff
changeset
|
6 |
5a654f2e35bc
Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff
changeset
|
7 def findShapefiles(path): |
5a654f2e35bc
Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff
changeset
|
8 shapes = [] |
5a654f2e35bc
Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff
changeset
|
9 |
5a654f2e35bc
Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff
changeset
|
10 for root, dirs, files in os.walk(path): |
5a654f2e35bc
Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff
changeset
|
11 if len(files) == 0: |
5a654f2e35bc
Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff
changeset
|
12 continue |
5a654f2e35bc
Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff
changeset
|
13 |
3654
59ca5dab2782
Shape importer: use python's OptionParse to read user specific configuration from command line.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
2798
diff
changeset
|
14 DEBUG("Processing directory '%s' with %i files " % (root, len(files))) |
2798
5a654f2e35bc
Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff
changeset
|
15 |
5a654f2e35bc
Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff
changeset
|
16 for f in files: |
5a654f2e35bc
Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff
changeset
|
17 idx = f.find(SHP) |
5a654f2e35bc
Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff
changeset
|
18 if (idx+len(SHP)) == len(f): |
5a654f2e35bc
Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff
changeset
|
19 shapes.append((f.replace(SHP, ''), root + "/" + f)) |
5a654f2e35bc
Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff
changeset
|
20 |
5a654f2e35bc
Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff
changeset
|
21 return shapes |
5a654f2e35bc
Added a python tool to import shapefiles into database.
Ingo Weinzierl <ingo.weinzierl@intevation.de>
parents:
diff
changeset
|
22 |
4935
c0a58558b817
Importer: - Handle regular expressions for attribute names
Andre Heinecke <aheinecke@intevation.de>
parents:
4887
diff
changeset
|
23 def getUTF8(string): |
c0a58558b817
Importer: - Handle regular expressions for attribute names
Andre Heinecke <aheinecke@intevation.de>
parents:
4887
diff
changeset
|
24 """ |
c0a58558b817
Importer: - Handle regular expressions for attribute names
Andre Heinecke <aheinecke@intevation.de>
parents:
4887
diff
changeset
|
25 Tries to convert the string to a UTF-8 encoding by first checking if it |
c0a58558b817
Importer: - Handle regular expressions for attribute names
Andre Heinecke <aheinecke@intevation.de>
parents:
4887
diff
changeset
|
26 is UTF-8 and then trying cp1252 |
c0a58558b817
Importer: - Handle regular expressions for attribute names
Andre Heinecke <aheinecke@intevation.de>
parents:
4887
diff
changeset
|
27 """ |
c0a58558b817
Importer: - Handle regular expressions for attribute names
Andre Heinecke <aheinecke@intevation.de>
parents:
4887
diff
changeset
|
28 try: |
c0a58558b817
Importer: - Handle regular expressions for attribute names
Andre Heinecke <aheinecke@intevation.de>
parents:
4887
diff
changeset
|
29 return unicode.encode(unicode(string, "UTF-8"), "UTF-8") |
c0a58558b817
Importer: - Handle regular expressions for attribute names
Andre Heinecke <aheinecke@intevation.de>
parents:
4887
diff
changeset
|
30 except UnicodeDecodeError: |
c0a58558b817
Importer: - Handle regular expressions for attribute names
Andre Heinecke <aheinecke@intevation.de>
parents:
4887
diff
changeset
|
31 # Probably European Windows names so lets try again |
c0a58558b817
Importer: - Handle regular expressions for attribute names
Andre Heinecke <aheinecke@intevation.de>
parents:
4887
diff
changeset
|
32 return unicode.encode(unicode(string, "cp1252"), "UTF-8") |
c0a58558b817
Importer: - Handle regular expressions for attribute names
Andre Heinecke <aheinecke@intevation.de>
parents:
4887
diff
changeset
|
33 |
4874
b1d7e600b43b
(importer) Add utility function to convert paths to utf-8
Andre Heinecke <aheinecke@intevation.de>
parents:
3654
diff
changeset
|
34 def getUTF8Path(path): |
b1d7e600b43b
(importer) Add utility function to convert paths to utf-8
Andre Heinecke <aheinecke@intevation.de>
parents:
3654
diff
changeset
|
35 """ |
b1d7e600b43b
(importer) Add utility function to convert paths to utf-8
Andre Heinecke <aheinecke@intevation.de>
parents:
3654
diff
changeset
|
36 Tries to convert path to utf-8 by first checking the filesystemencoding |
b1d7e600b43b
(importer) Add utility function to convert paths to utf-8
Andre Heinecke <aheinecke@intevation.de>
parents:
3654
diff
changeset
|
37 and trying the default windows encoding afterwards. |
b1d7e600b43b
(importer) Add utility function to convert paths to utf-8
Andre Heinecke <aheinecke@intevation.de>
parents:
3654
diff
changeset
|
38 Returns a valid UTF-8 encoded unicode object or throws a UnicodeDecodeError |
b1d7e600b43b
(importer) Add utility function to convert paths to utf-8
Andre Heinecke <aheinecke@intevation.de>
parents:
3654
diff
changeset
|
39 """ |
b1d7e600b43b
(importer) Add utility function to convert paths to utf-8
Andre Heinecke <aheinecke@intevation.de>
parents:
3654
diff
changeset
|
40 try: |
b1d7e600b43b
(importer) Add utility function to convert paths to utf-8
Andre Heinecke <aheinecke@intevation.de>
parents:
3654
diff
changeset
|
41 return unicode.encode(unicode(path, sys.getfilesystemencoding()), "UTF-8") |
b1d7e600b43b
(importer) Add utility function to convert paths to utf-8
Andre Heinecke <aheinecke@intevation.de>
parents:
3654
diff
changeset
|
42 except UnicodeDecodeError: |
4887
1f6e544f7a7f
Importer: Use cp1252 instead of latin-9 to guess filename encodings
Andre Heinecke <aheinecke@intevation.de>
parents:
4884
diff
changeset
|
43 # Probably European Windows names so lets try again |
1f6e544f7a7f
Importer: Use cp1252 instead of latin-9 to guess filename encodings
Andre Heinecke <aheinecke@intevation.de>
parents:
4884
diff
changeset
|
44 return unicode.encode(unicode(path, "cp1252"), "UTF-8") |