Arda Nalbant Arda Nalbant - 7 months ago 27
Python Question

Python - Group Sequential Array Members

I want to edit my text like this:

arr = []
# arr is full of tokenized words from my text


For example:

"Abraham Lincoln Hotel is very beautiful place and i want to go there with
Barbara Palvin. Also there are stores like Adidas ,Nike , Reebok."


I want my
arr2
like:

arr2[0]= "Abraham Lincoln Hotel"
arr2[1]= "Barbara Palvin"
arr[2]= "Adidas"
arr[3]= "Nike"
arr[4]= "Reebok"


Edit: Basically I want to detect Proper Names by using istitle() and isAlpha() in for statement like:

for i in arr:
if arr[i].istitle() and arr[i].isAlpha


In the example arr2 appened until the next word hasn't his first letter upper case.

arr[0] + arr[1] + arr[2] = arr2[0]
#Abraham Lincoln Hotel

Answer

Is this what you are asking?

sentence = "Abraham Lincoln Hotel is very beautiful place and i want to go there with Barbara Palvin. Also there are stores like Adidas ,Nike , Reebok."

chars = ".!?,"                                   # Characters you want to remove from the words in the array

table = chars.maketrans(chars, " " * len(chars)) # Create a table for replacing characters
sentence = sentence.translate(table)             # Replace characters with spaces

arr = sentence.split()                           # Split the string into an array whereever a space occurs

print(arr)

The output is:

['Abraham',
 'Lincoln',
 'Hotel',
 'is',
 'very',
 'beautiful',
 'place',
 'and',
 'i',
 'want',
 'to',
 'go',
 'there',
 'with',
 'Barbara',
 'Palvin',
 'Also',
 'there',
 'are',
 'stores',
 'like',
 'Adidas',
 'Nike',
 'Reebok']

Note about this code: any character that is in the chars variable will be removed from the strings in the array. Explenation is in the code.

To remove the non-names just do this:

import string
new_arr = []

for i in arr:
    if i[0] in string.ascii_uppercase:
        new_arr.append(i)

This code will include ALL words that start with a capital letter.

To fix that you will need to change chars to:

chars = ","

And change the above code to:

import string
new_arr = []
end = ".!?"    

b = 1
for i in arr:
    if i[0] in string.ascii_uppercase and arr[b-1][-1] not in end:
        new_arr.append(i)
    b += 1

And that will output:

['Abraham', 
'Lincoln', 
'Hotel', 
'Barbara', 
'Palvin.', 
'Adidas', 
'Nike',
'Reebok.']