I want to edit my text like this:
arr = 
# arr is full of tokenized words from my text
"Abraham Lincoln Hotel is very beautiful place and i want to go there with
Barbara Palvin. Also there are stores like Adidas ,Nike , Reebok."
arr2= "Abraham Lincoln Hotel"
arr2= "Barbara Palvin"
for i in arr:
if arr[i].istitle() and arr[i].isAlpha
arr + arr + arr = arr2
#Abraham Lincoln Hotel
Is this what you are asking?
sentence = "Abraham Lincoln Hotel is very beautiful place and i want to go there with Barbara Palvin. Also there are stores like Adidas ,Nike , Reebok." chars = ".!?," # Characters you want to remove from the words in the array table = chars.maketrans(chars, " " * len(chars)) # Create a table for replacing characters sentence = sentence.translate(table) # Replace characters with spaces arr = sentence.split() # Split the string into an array whereever a space occurs print(arr)
The output is:
['Abraham', 'Lincoln', 'Hotel', 'is', 'very', 'beautiful', 'place', 'and', 'i', 'want', 'to', 'go', 'there', 'with', 'Barbara', 'Palvin', 'Also', 'there', 'are', 'stores', 'like', 'Adidas', 'Nike', 'Reebok']
Note about this code: any character that is in the
chars variable will be removed from the strings in the array. Explenation is in the code.
To remove the non-names just do this:
import string new_arr =  for i in arr: if i in string.ascii_uppercase: new_arr.append(i)
This code will include ALL words that start with a capital letter.
To fix that you will need to change
chars = ","
And change the above code to:
import string new_arr =  end = ".!?" b = 1 for i in arr: if i in string.ascii_uppercase and arr[b-1][-1] not in end: new_arr.append(i) b += 1
And that will output:
['Abraham', 'Lincoln', 'Hotel', 'Barbara', 'Palvin.', 'Adidas', 'Nike', 'Reebok.']