comparison ppgen.py @ 7:8b2f8f439817

Improves: ding parser. * Strips greater and lesser signs in the beginning and end of words when reading a ding directory. Words enclosed by those characters seem to be variants. This affects about 100 to 200 words for de in de-en 1.7.
author Bernhard Reiter <bernhard@intevation.de>
date Tue, 21 Feb 2017 14:14:08 +0100
parents 81f75c9aac84
children 200c2c3c5f67
comparison
equal deleted inserted replaced
6:81f75c9aac84 7:8b2f8f439817
100 # languages are separated by " :: " 100 # languages are separated by " :: "
101 p = line.partition(" :: ") 101 p = line.partition(" :: ")
102 languageEntry = p[0] if useLeft else p[2] 102 languageEntry = p[0] if useLeft else p[2]
103 103
104 for word in splitter.split(languageEntry): 104 for word in splitter.split(languageEntry):
105 word = word.strip('(",.)\'!:;').rstrip('/') 105 word = word.strip('(",.)\'!:;<>').rstrip('/')
106 if len(word) > 2 and not word[0] in '[{/': 106 if len(word) > 2 and not word[0] in '[{/':
107 dset.add(word) 107 dset.add(word)
108 108
109 #TODO: check for very common words and remove them? 109 #TODO: check for very common words and remove them?
110 110
This site is hosted by Intevation GmbH (Datenschutzerklärung und Impressum | Privacy Policy and Imprint)