supersonic_ht supersonic_ht - 5 months ago 12
Python Question

How to input a line word by word in Python?

I have multiple files, each with a line with, say ~10M numbers each. I want to check each file and print a 0 for each file that has numbers repeated and 1 for each that doesn't.

I am using a list for counting frequency. Because of the large amount of numbers per line I want to update the frequency after accepting each number and break as soon as I find a repeated number. While this is simple in C, I have no idea how to do this in Python.

How do I input a line in a word-by-word manner without storing (or taking as input) the whole line?

EDIT: I also need a way for doing this from live input rather than a file.

Answer

Read the line, split the line, copy the array result into a set. If the size of the set is less than the size of the array, the file contains repeated elements

with open('filename', 'r') as f:
    for line in f:
        # Here is where you do what I said above

To read the file word by word, try this

import itertools

def readWords(file_object):
    word = ""
    for ch in itertools.takewhile(lambda c: bool(c), itertools.imap(file_object.read, itertools.repeat(1))):
        if ch.isspace():
            if word: # In case of multiple spaces
                yield word
                word = ""
            continue
        word += ch
    if word:
        yield word # Handles last word before EOF

Then you can do:

with open('filename', 'r') as f:
    for num in itertools.imap(int, readWords(f)):
        # Store the numbers in a set, and use the set to check if the number already exists

This method should also work for streams because it only reads one byte at a time and outputs a single space delimited string from the input stream.


After giving this answer, I've updated this method quite a bit. Have a look here

<script src="https://gist.github.com/smac89/bddb27d975c59a5f053256c893630cdc.js"></script>

Comments