caleb farley caleb farley - 3 months ago 15
Python Question

Writing a program to print hapax's from a string

A hapax is a word that only occurs once in a string. My code sort of works. At first, it got the first hapax, then, I changed the string I put in, and it got the last one, and the first hapax, but not the second hapax...here's my current code

def hapax(stringz):
w = ''
l = stringz.split()
for x in l:
w = ''
l.remove(x)
for y in l:
w += y
if w.find(x) == -1:
print(x)


hapax('yo i went jogging then yo i went joggin tuesday wednesday')


All i got was

then
wednesday

Answer

String Module:

Use string module to get Punctuation list and use our normal for loop to replace.Demo:

>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
>>> 

more pythonic: how to replace punctuation in a string python?


Algo:

  1. Remove Punctuation from the Input text by string module.
  2. Convert to lower case.
  3. Split Input text and update Dictionary.
  4. Iterate items from the Dictionary and update hapax words.

code:

import string
import collections

def hapax(text):
    # Remove Punctuation from the Input text.
    text = text.translate(string.maketrans("",""), string.punctuation)
    print "Debug 1- After remove Punctuation:", text

    # ignore:- Lower/upper/mix cases
    text = text.lower()
    print "Debug 2- After converted to Lower case:", text

    #- Create Default dictionary. Key is word and value 
    word_count = collections.defaultdict(int)
    print "Debug 3- Collection Default Dictionary:", word_count

    #- Split text and update result dictionary.
    for word in text.split():
        if word:#- Ignore whitespace.
            word_count[word] += 1

    print "Debug 4- Word and its count:", word_count

    #- List which save word which value is 1.
    hapax_words = list()
    for word, value in word_count.items():
        if value==1:
            hapax_words.append(word)

    print "Debug 5- Final Hapax words:", hapax_words


hapax('yo i went jogging then yo i went jogging tuesday wednesday some punctuation ? I and & ')

Output:

$ python 2.py 
Debug 1- After remove Punctuation: yo i went jogging then yo i went jogging tuesday wednesday some punctuation  I and  
Debug 2- After converted to Lower case: yo i went jogging then yo i went jogging tuesday wednesday some punctuation  i and  
Debug 3- Collection Default Dictionary: defaultdict(<type 'int'>, {})
Debug 4- Word and its count: defaultdict(<type 'int'>, {'and': 1, 'then': 1, 'yo': 2, 'i': 3, 'tuesday': 1, 'punctuation': 1, 'some': 1, 'wednesday': 1, 'jogging': 2, 'went': 2})
Debug 5- Final Hapax words: ['and', 'then', 'tuesday', 'punctuation', 'some', 'wednesday']