I am suppose to find if a given file is a media file, not through extension, but through the header information. So i opened some
\00\00\00 ftypqt \00qt \00\00\00\00\00\00\00\00\00\00\00\00\00\00\00wide\00\CF\E1mdat\00\00\00wide\00\00\00\00mdat\00\00\00\00\00\00\00\00\E0\00\00\00\00\FF\A6\00\00\00\00\00\00 \00\00\00\008\00\00\82X\00\00\00@\80\00\87\F4N\CD
\F7\00\80\004\8D\00Z\A2\00\84p\00\9D\8F\00\B6\A5\00\CDt\00\DF\00\ED\8F\007\004\8C\00A\9D\00\00\00udta\00\00\00\00\00\00\00Wudta\00\00\00hinv7.6\00\00\00@hnti\00\00\008rtp sdp b=AS:265
a/A - z/Z and 0-9
can do paragraph-oriented operations.
Emacs Lisp is a good choice if you need sophisticated
string or pattern matching capabilities.
As you said you are suppose to find if a given file is a media file, not through extension. I can offer you 2 possibility:
Magic number in file
So have a look at wikipedia definition of magic number for file:
Magic numbers are common in programs across many operating systems. Magic numbers implement strongly typed data and are a form of in-band signaling to the controlling program that reads the data type(s) at program run-time. Many files have such constants that identify the contained data. Detecting such constants in files is a simple and effective way of distinguishing between many file formats and can yield further run-time information.
To read it in python :
Use a tool that do this(read also header):
Extract metadata from files like this:
hachoir-metadata extracts metadata from multimedia files: music, picture, video, but also archives. It supports most common file formats:
Archives: bzip2, gzip, zip, tar Audio: MPEG audio (“MP3”), WAV, Sun/NeXT audio, Ogg/Vorbis (OGG), MIDI, AIFF, AIFC, Real audio (RA) Image: BMP, CUR, EMF, ICO, GIF, JPEG, PCX, PNG, TGA, TIFF, WMF, XCF Misc: Torrent Program: EXE Video: ASF format (WMV video), AVI, Matroska (MKV), Quicktime (MOV), Ogg/Theora, Real media (RM)
$ hachoir-metadata pacte_des_gnous.avi Common: - Duration: 4 min 25 sec - Comment: Has audio/video index (248.9 KB) - MIME type: video/x-msvideo - Endian: Little endian Video stream: - Image width: 600 - Image height: 480 - Bits/pixel: 24 - Compression: DivX v4 (fourcc:"divx") - Frame rate: 30.0 Audio stream: - Channel: stereo - Sample rate: 22.1 KHz - Compression: MPEG Layer 3