k-five k-five - 20 days ago 4x
Perl Question

How to number unique words on each individual line, not miss paragraph?

Before anything I am reading English language and programming, but I am beginner now. So I did a lot of methods to learn EN ( self-study ) but finally I make a personal method to learn.

So I collected a lot of short story then, read those, day after day. Now I am doing this method as usual.

A week a go I started to learn perl one-liner and it was so useful to me.

However, I get into

perl -pe '$q=0; s/(\w+)/++Sq." ".$1'

for content:

just another

perl hacker

hacking perl code

it become to:

1.just 2.another

1.perl 2.hacker

1.hacking 2.perl 3.code

Okay, after I saw this perl one-liner I got an idea

For example I read this short-story:

  1. Morning

    He wakes up. He sees the sun rise. He brushes his teeth. His teeth are white. He puts on his clothes. His shirt is blue. His
    shoes are yellow. His pants are brown. He goes downstairs. He gets a
    bowl. He pours milk and cereal. He eats. He gets the newspaper. He

  2. First

    Day of School He goes to class. There is an empty seat in front. He sits in the seat. He looks around. There are different
    people. He says "hi" to the girl next to him. She smiles. The teacher
    comes in. She closes the door. Everyone is silent. The first day of
    school begins.

  3. Water on the Floor

    She is thirsty. She gets a glass of water. She begins to walk. She drops the glass. There is water on the floor. The
    puddle is big. She gets a mop. She wipes the water off. The floor is
    clean. She gets another glass of water. She drinks it. She is happy.

  4. Babysitting

    Casey wants a new car. She needs money. She decides to babysit. She takes care of the child. She feeds him lunch. She reads
    him a story. The story is funny. The child laughs. Casey likes him.
    The child's mom comes home. The child kisses Casey. Casey leaves. She
    will babysit him again.

5.a docter

Sam is a doctor. He takes care of people. He smiles at them. He gives them medicine. He gives stickers to the younger
patients. The younger patients like him. They see him when they are
sick. He makes them feel better. This makes him happy. He loves his
job. He goes home proud.

As you can see, it is very easy. But at the beginning, it is not easy for someone who just started to learn English.

So my idea is this. I want a script may be in bash or perl that I think perl is better, that script can read a lot of short-story that I have and for each unique work, it number the word in the place.

For example in the above context that I put, I want something like this:

1.He 2.wakes 3.up. He 4.sees 5.the 6.sun 7.rise. He 8.brushes 9.his 10.teeth 11.are 12.white He 13.puts 14.on his 15.clothes. ... so on.

Here the first He is unique so number it to 1.

And until the end of content the "He" word is ignored, and so on.
Then the script does this to second words, if it is unique, then numbers it, otherwise ignore it.

Also the paragraph and each line must not miss, because I print it out in paper and read daily.

For completing this idea to use by everyone else I need to have a database from one word that script is parsed so that I can after, for example 100 short-story, I see whet word I have read.

And use this database for ignoring the repeated word in the new short-story that I want to read.

Why am I doing this? Because this help me to know what word I have read and what word I have not read. Also It can be a good method for the others, so that they can learn English easily. please help me to develop this idea If you see something bad in my idea or if you know the similar idea like this, that have done,please tell me.

In summary, I want a content that each word only once (one time) has been numbered

Sorry guys, but I want to print the content without missing the paragraph. Please the photo

my homework

As you can see, I have to cross the new word in the new short story for read those in the future. The script must print paragraph with numbering word as usual so that I can save it, then I print it for reading on the paper.

I want to do in this form:
$ script my_context.txt > new_context.txt

Then I can print it out.

I am so sorry if you see some mistake in my writing. If you does not understand my idea please comment, so that I put explain it in more detail.

Thank a lot!


awk to the rescue!

$ awk -v RS=" +" -v ORS=" " '{key=$0;gsub(/[^A-Za-z]/,"",key); 
                              if(key in a)print $0;
                              else{a[key];print ++c"."$0}}' file

1.He 2.wakes 3.up. He 4.sees 5.the 6.sun 7.rise. He 8.brushes 9.his 10.teeth 11.are 12.white He 13.puts 14.on his 15.clothes. 16.His 17.shirt 18.is 19.blue. His 20.shoes are 21.yellow. His 22.pants are 23.brown. He 24.goes 25.downstairs. He 26.gets 27.a 28.bowl. He 29.pours 30.some 31.milk 32.and 33.cereal. He 34.eats. He gets the 35.newspaper. He 36.reads.

you can also make is non case-sensitive by changing the key as I did for filtering non alphabetical chars.