Erlang walk in AVI file
Warning: long post ahead ... but mostly code :)
This post contains absolutely no idea nor thought: it is just a recap of my attempt to read an AVI file format (or RIFF file format, as I do not parse AVI data but only document structure). Let's go directly in code with this simple module header!
I mainly use this short document: AVI is a (nested) sequence of two kind of structure, either LIST or CHUNK. More precisely, first come a mandatory RIFF-AVI LIST then multiple (and optional) RIFF-AVIX kind of LIST. Let's walk those structures:
But that's all it take to read a well formated RIFF file. And for those wondering about the
This post contains absolutely no idea nor thought: it is just a recap of my attempt to read an AVI file format (or RIFF file format, as I do not parse AVI data but only document structure). Let's go directly in code with this simple module header!
-module(avir).
-compile([export_all]).
-include_lib("kernel/include/file.hrl").
dbg(Level, Template, Args) ->
Indent = lists:flatten(lists:duplicate(Level, " ")),
io:format(Indent ++ Template, Args).
go() ->
go("test.avi").
go(Filename) ->
{ok, #file_info{size=Size}} = file:read_file_info(Filename),
{ok, IODev} = file:open(Filename, [read, binary]),
{ok, Parts} = walk_data(0, [], IODev, 0, Size).
So, dbg is a crap function to print debug message ... yeah, the old fashion way, it's so simple for just a post! go is the main entry point and call the 'real' code: the approach is to call the walk_data function which will build and return a list of AVI structures (first parameter will be level of nesting, used for printing comment with a meaningful indentation, and second one is an accumulator for recursion to come).I mainly use this short document: AVI is a (nested) sequence of two kind of structure, either LIST or CHUNK. More precisely, first come a mandatory RIFF-AVI LIST then multiple (and optional) RIFF-AVIX kind of LIST. Let's walk those structures:
walk_data(Level, Parts, File, From, To) when From < To ->
case chunk_or_list(File, From) of
avichunk ->
{ok, Part, NextPos} = walk_chunk(Level, File, From, To),
walk_data(Level, [Part|Parts], File, NextPos, To);
avilist ->
{ok, Part, NextPos} = walk_list(Level, File, From, To),
walk_data(Level, [Part|Parts], File, NextPos, To);
Error ->
{error, "maybe unexpected EOF", Error}
end;
walk_data(_Level, Parts, _File, _From, _To) ->
{ok, lists:reverse(Parts)}.
chunk_or_list(File, Pos) ->
case file:pread(File, Pos, 4) of
{ok, <<"RIFF">>} ->
avilist;
{ok, <<"LIST">>} ->
avilist;
{ok, _FourCC} ->
avichunk;
eof ->
eof
end.
Walk is straightforward, from position From to To, accumulating result in reverse order (I love this [head|tail] list notation ... was Prolog the first to use it?). chunk_or_list read few bytes (the FourCC header) to guess the kind of the next structure (CHUNK or LIST) in file; this structure is loaded, and walk continue.
walk_list(Level, File, From, To) ->
case read_list_header(File, From) of
{ok, AviList={avilist, List, FourCC, DataPos, DataSize}, NextPos} ->
dbg(Level, "read list header (pos=~p, next=~p): List=~p FourCC=~p~n", [From, NextPos, List, FourCC]),
{ok, SubPart} = case FourCC of
<<"movi">> ->
dbg(Level, "... skipping list FourCC=~p...~n", [FourCC]),
{ok, []};
_ ->
walk_data(Level + 1, [], File, DataPos, DataPos + DataSize)
end,
{ok, {AviList, SubPart}, NextPos};
eof ->
dbg(Level, "end of file~n", []),
eof
end.
read_list_header(File, Pos) ->
case file:pread(File, [{Pos, 4}, {Pos + 4, 4}, {Pos + 8, 4}]) of
{ok, [List, <<Size:4/little-unsigned-integer-unit:8>>, FourCC]} ->
{ok, {avilist, List, FourCC, Pos + 12, Size - 4}, Pos + 8 + Size};
{ok, [eof, eof, eof]} ->
eof;
_ ->
{error, "no list header to read, but not empty data~n"}
end.
To walk a LIST, read the header (remember that the FourCC field length is part of the data size ...), read the nested data (this re-use the walk_data), and return the LIST representation: a 2-tuple with first the header (could be a record) and then a list of sub parts. There is a useless test to not walk the real data because my test file is kind of big. Walking the CHUNK is quite the same.
walk_chunk(Level, File, From, To) ->
case read_chunk_header(File, From) of
{ok, AviChunk={avichunk, FourCC, DataPos, DataSize}, NextPos} ->
%FourCC = <<_StreamNumber:2/binary, _DataType:2/binary>>},
dbg(Level, "read chunk header (pos=~p, next=~p): FourCC=~p DataSize=~p~n", [From, NextPos, FourCC, DataSize]),
chunk_spy(FourCC, File, DataPos, DataSize),
{ok, AviChunk, NextPos};
eof ->
dbg(Level, "end of file~n", []),
eof
end.
read_chunk_header(File, Pos) ->
case file:pread(File, [{Pos, 4}, {Pos + 4, 4}]) of
{ok, [FourCC, <<Size:4/little-unsigned-integer-unit:8>>]} ->
NextPos = Pos + 8 + Size,
PaddedNextPos = NextPos + (NextPos rem 2),
{ok, {avichunk, FourCC, Pos + 8, Size}, PaddedNextPos};
{ok, [eof, eof]} ->
eof;
_ ->
{error, "no chunk header to read, but not empty data~n"}
end.
Similar to LIST, without nested data. Also, this went wrong at the first attempt: I found in this page that CHUNK data is padded to word boundary (grr).But that's all it take to read a well formated RIFF file. And for those wondering about the
chunk_spy function, continue to read this blog :).Post imported from wordpress
Please refer to original post for earlier comments.