Minary Minary - 4 months ago 28
Python Question

Extract lines from (oddly deliminated) txt file and write to new file with Python

I have .txt files (one per image) that is formatted as can be seen below. However, the deliminator used in the file is very strange. I cannot figure out how to extract the information I am interested in.

ExifTool Version Number : 10.20
File Name : R0010023.tiff
Directory : C:/gtag/wf1313
File Size : 46 MB
File Modification Date/Time : 2016:07:07 20:57:38+01:00
File Access Date/Time : 2016:07:07 20:57:38+01:00
File Creation Date/Time : 2016:07:04 21:18:17+01:00
File Permissions : rw-rw-rw-
File Type : TIFF
File Type Extension : tif
MIME Type : image/tiff
Exif Byte Order : Little-endian (Intel, II)
Image Width : 4928
Image Height : 3264
Bits Per Sample : 8 8 8
Compression : PackBits
Photometric Interpretation : RGB
Image Description :
Make : RICOH IMAGING COMPANY, LTD.
Camera Model Name : GR II
Strip Offsets : (Binary data 558 bytes, use -b option to extract)
Orientation : Horizontal (normal)
Samples Per Pixel : 3
Rows Per Strip : 51
Strip Byte Counts : (Binary data 447 bytes, use -b option to extract)
X Resolution : 72
Y Resolution : 72
Planar Configuration : Chunky
Resolution Unit : inches
Software : GR Firmware Ver 01.02
Modify Date : 2016:06:21 13:09:52
XMP Toolkit : Image::ExifTool 10.20
Compressed Bits Per Pixel : 3.2
Flash Fired : False
Flash Function : False
Flash Red Eye Mode : False
Flash Return : No return detection
Interoperability Index : R98 - DCF basic file (sRGB)
Y Cb Cr Positioning : Centered
Y Cb Cr Sub Sampling : YCbCr4:2:0 (2 2)
Copyright :
Exposure Time : 1/1250
F Number : 6.3
ISO : 100
Sensitivity Type : Standard Output Sensitivity
Exif Version : 0230
Date/Time Original : 2016:06:21 13:09:52
Create Date : 2016:06:21 13:09:52
Components Configuration : Y, Cb, Cr, -
Aperture Value : 6.3
Brightness Value : 8.6
Exposure Compensation : 0
Max Aperture Value : 2.8
Metering Mode : Multi-segment
Light Source : Shade
Maker Note Type : Rdc
Firmware Version : 1.02
Recording Format : JPEG
Exposure Program : Manual
Drive Mode : Single-frame
White Balance : Shade
White Balance Fine Tune : 0 0
Focus Mode : Manual
Auto Bracketing : Off
Macro Mode : Off
Flash Mode : Off
Flash Exposure Comp : 0
Manual Flash Output : Full
Full Press Snap : Off
Dynamic Range Expansion : Off
Noise Reduction : Weak
Image Effects : Standard
Vignetting : Off
Toning Effect : Off
Hue Adjust : Off
Focal Length : 18.3 mm
AF Area X Position 1 : 632
AF Area Y Position 1 : 418
AF Area X Position : 2435
AF Area Y Position : 1610
AF Status : In Focus
AF Area Mode : Auto
Sensor Width : 4928
Sensor Height : 3264
Cropped Image Width : 4928
Cropped Image Height : 3264
Wide Adapter : Not Attached
Color Temp Kelvin : 0
Crop Mode 35mm : Off
ND Filter : Off
WB Bracket Shot Number : 0
User Comment :
Flashpix Version : 0100
Color Space : sRGB
Exif Image Width : 4928
Exif Image Height : 3264
Exposure Mode : Manual
Focal Length In 35mm Format : 28 mm
Scene Capture Type : Standard
Contrast : Normal
Saturation : Normal
Sharpness : Normal
Owner Name :
Serial Number : (00000000)14100511
Lens Info : 18.3mm f/2.8
Lens Make : RICOH IMAGING COMPANY, LTD.
Lens Model : GR LENS
GPS Version ID : 2.3.0.0
GPS Latitude Ref : xxxx
GPS Longitude Ref : xxxx
GPS Altitude Ref : Above Sea Level
GPS Time Stamp : 12:09:52
GPS Img Direction Ref : True North
GPS Img Direction : 228.21
GPS Date Stamp : 2016:06:21
GPS Pitch : 0.79
GPS Roll : 0.41
PrintIM Version : 0300
Aperture : 6.3
Flash : Off, Did not fire
GPS Altitude : 91.7 m Above Sea Level
GPS Date/Time : 2016:06:21 12:09:52Z
GPS Latitude : xx deg xx' x.xx" N
GPS Longitude : x deg x' xx.xx" W
GPS Position : xx deg xx' x.xx" N, x deg x' xx.xx" W
Image Size : 4928x3264
Megapixels : 16.1
Scale Factor To 35 mm Equivalent: 1.5
Shutter Speed : 1/1250
Circle Of Confusion : 0.020 mm
Field Of View : 65.5 deg
Focal Length : 18.3 mm (35 mm equivalent: 28.0 mm)
Hyperfocal Distance : 2.71 m
Light Value : 15.6


If I try the following examples the following is returned,

sfile = open("R001.txt", "r")
sfile.readline(1)


'E'

sfile.readline(2)


'xi'

sfile.readline(3)


'fTo'

sfile.readline(4)


'ol V'

sfile.readline(5)


'ersio'

And so on and so forth. Could anyone enlighten me as to how to deal with a file of this type?

What I am interested in is extracting several lines,

File Name GPS Longitude, GPS Latitude etcetera.

I would be very grateful for any help.

Regards
Joel

EDIT/UPDATE

Thank you so much for the comments! I am really grateful!

I have the following now,

import glob
file_list = glob.glob("*.txt")

for file_ in file_list:
saved_lines = []
sfile = open(file_, "r")
lines = sfile.readlines() #array of all lines
for line in lines:
for text in ['File Name', 'GPS Longitude', 'GPS Latitude', 'GPS Altitude', 'GPS Img Direction', 'GPS Pitch', 'GPS Roll']:
if text in line:
saved_lines.append(line)
parsed = "".join(saved_lines) #reassemble the file
with open("parsed.txt", "a") as ofile: #write your output
ofile.write(parsed)

dict={}
sfile = open("R0010022.txt", "r")
list = sfile.readlines()
for i in list:
dict[i.split(':')[0]] = ''.join(i.split(':')[1:])


The challenge I am facing now is that I need to format the data in te following format (to be able to import it in a program I would like to use),

"#image latitude longitude altitude yaw pitch roll"
"R001.JPG xx.xxxx y.yyyy zzz.zz 319.9 8.2 -2.1"
"R002.JPG xx.xxxx y.yyyy zzz.zz 319.4 10.1 3.6"


So one line per image with the data above.

Creating a dictionary as above is a good first step (I think). The dictionary is difficult to call, though, as each member of the dictionary has a different number of spaces after the member name. That is, File Name-----------------------:... etcetera.

Is there a way to look up a member, excluding the spaces?

If I can do that it should be possible to group each image, and then write each group to separate lines in a .csv or .txt file.

Answer

When you're calling readline(1) you're recieving 1 character of the first line. When you're calling readline(2) you're recieving the next 2 characters of the first line and so on. When you hit a new line it will continue on the second line.

Call readline() with no arguments and you'll get the whole line.

If you want several lines you could use readlines(), which return a list of strings with all lines in the text file. Then you can extract them as you do with a normal list.

For more information read the python docs.