neoasr neoasr - 3 months ago 12
PowerShell Question

need to extract set of 4 digit numbers from epub or text file in Batch

I have hundreds of epub files. I need to extract dates from text (only years like 1947, 1987 etc) with file names
i mean, output should be like,this filename contains this this dates and so on
for example epub01 contains 1995 1945 1986.
epub02 contains 1926 1946 1948.
if anyone can provide me a PowerShell script or script that can run in ubuntu terminal, that would be great.

Although I have epub files but I can extract to text file myself. if you have script for text files.

Answer

I can only provide a script for text files. You can read them using the Get-Content cmdlet and use a regex to grab your values containg a negative lookahead and negative lookbehind to ensure there are exactly four digits:

$content = Get-Content 'your_file' -Raw
$matches = [regex]::Matches($content, '(?<!\d)(\d{4})(?!\d)')
$matches | ForEach-Object {
    $_.Groups[1].Value
}

Regex explanation:

(?<!\d)(\d{4})(?!\d)

Regular expression visualization

Comments