Michael Perdue Michael Perdue - 5 months ago 79
Python Question

Pandas: I am trying to open a specific .txt file stored in a zip file on an ftp site

I need to concatenate---into a single frame---every

produkt_monat_Monatswerte_18910101_20110331_00003.txt
file in each of the zip files from this ftp site.

This is the code that I am using so far:

import pandas as pd
from pandas.io.parsers import *
import glob
import requests
from zipfile import ZipFile
import urllib.request as ur


years = 'produkt_monat_Monatswerte_*.txt'

names = pd.DataFrame()
for year in years:
path ="ftp://ftp-cdc.dwd.de/pub/CDC/observations_germany/climate/monthly/kl/historical/monatswerte_?????_????????_????????_hist.zip").read()
frame = pd.read_csv(path, names=columns)

frame['year'] = year
names = names.concat(frame, ignore_index=True)


and it is giving me the following error:

File "<ipython-input-25-d57a1d77ecc6>", line 5
path ="ftp://ftp-cdc.dwd.de/pub/CDC/observations_germany/climate/monthly/kl/historical/monatswerte_?????_????????_????????_hist.zip")

Answer

The problem is you can't have pandas extract an inner file from the Zip. Try the following code:

import pandas as pd
from ftplib import FTP
import os
from zipfile import ZipFile
from io import BytesIO

# login to FTP
f_root = 'ftp-cdc.dwd.de'
zips_path = '/pub/CDC/observations_germany/climate/monthly/kl/historical/'
ftp = FTP(f_root)
ftp.login()
ftp.cwd(zips_path)

# get all zip paths
paths = [p[0] for p in ftp.mlsd('.') if p[0].endswith('.zip')]

# read one file into buffer
buf = BytesIO()
ftp.retrbinary("RETR " + paths[0], lambda block: buf.write(block))

# extract and read inner file
z = ZipFile(buf)
zf = list(filter(lambda x: x.filename.startswith('produkt'), z.filelist))[0]
pd.read_csv(BytesIO(z.read(zf.filename)), sep=';', encoding="cp1252")

and you could loop over all files and concat them using pd.concat.

By the way, I'm using python3. so maybe you'll need to modify some imports if you are using python2.

Cheers!

Comments