Ashish Dahiya Ashish Dahiya - 2 months ago 33
Python Question

How to open and read a file in Instabase?

I am trying to read a file in Instabase notebook (python).

I tried the standard python way but it says file not found:

f = open('/instabase/demo/fs/Instabase%20Drive/files/datasets/hadoop.log')
data = f.read()
print data

Answer

Instabase provides ib.open() for opening files mounted on Instabase.

For example:

[1] To read entire file

# open a file (this would work for files < 10 MB)
f = ib.open('/instabase/demo/fs/Instabase%20Drive/files/datasets/hadoop.log')
data = f.read() # reads the entire content
print(data)

[2] To read large file [they support chunk reading]

# open a file
f = ib.open('/instabase/demo/fs/Instabase%20Drive/files/datasets/hadoop.log')
while f.tell() != -1:
    print("==== chunk begin =====")
    data = f.read(1024) # reads only 1KB
    print(data)
    print("==== chunk end =====")

[3] Instabase file handle implements standard python file interface, so this can also be used with other libraries like csv, pandas, scipy, etc.

Example 1: with csv

import csv
with ib.open('instabase/demo/fs/My%20S3/datasets/csv/invoice_details.csv') as file:
    reader = csv.reader(file)
    for row in reader:
        print ', '.join(row)

Example 2: with pandas

with ib.open('instabase/demo/fs/My%20S3/datasets/csv/invoice_details.csv') as file:
    data = pd.read_csv(file)
    print data

Example 3: with scipy

import scipy
with ib.open('path/ocr.mat') as f:
  ocr_data = scipy.io.loadmat(f)