[ Table Of Contents ][ Answer Guy Current Index ] greetings   Meet the Gang   1   2   3   4   5   6   7 [ Index of Past Answers ]

(?) The Answer Gang (!)

By Jim Dennis, Ben Okopnik, Dan Wilder, Breen, Chris, and... (meet the Gang) ... the Editors of Linux Gazette... and You!
Send questions (or interesting answers) to The Answer Gang for possible publication (but read the guidelines first)

(?) How to optimize space usage for multiple files on multiple CDs

From Michiel van Leening

Answered By Thomas Adam (The LG Weekend Mechanic), Ben Okopnik, Karl-Heinz Herrmann

With our kind thanks to Michiel for granting publication permission :) -- Heather

Hi Gang,

I've got a bunch of files which will roughly fill up 12 CDR's. My question is, how can i best optimize the space used for these files, so as to use as little CD's as possible?

(!) [Thomas] Assumming that you are not fussed as to which files appear on which CDR, and assuming that these files reside in a single common directory, then you could write a script that will take the largest files.
(!) [Ben] Actually, that's an NP-Complete problem, i.e. you can't get an actual solution before the Universe ends. <grin> Sad but true. However, there are a number of "close enough" approximations you can do. I've never heard of anyone scripting this kind of a thing, although I would imagine every CS student out there has played with the concept.
Here's a reasonable one: sort your files. Grab the largest one, subtract that from MAX_SIZE; now, find the largest file that will fit in the remaining space. Iterate until the remaining space is smaller than your smallest file. Repeat the above process for every one of your CDs. You're done - and in linear time, too. :)

(?) I've tried just looping through all the files and as long as the sum of the already seen files stays below the CD size limit, the file is added to a list.

(!) [Ben] Erm... a 20MB file followed by a 631MB file, and you've got a CD with 20MB on it. Not a reasonable compromise. Even if you sort them first, 326+325, and you've wasted half a CD.

(?) As soon as a file goes above the size limit, it starts a new list with filenames. These list are then used to burn the CD's with either mkisofs or Gcombust.

(!) [Thomas] Is the limit 650 MB, in which case:

See attached fill-it-up.bash.txt

(?) I'd actually like a program to figure out the best spread of files over the CD's (eg. CD1 has files 3,5,8 and CD2 has files 1,2,4 and CD3 has files 6,7,9) so as to minimize wasted space. What i've programmed myself has the disadvantage that if the next file in line is big enough to cross the size limit, it is put on the next CD, thereby wasting space on the current CD.

(!) [Thomas] I'll work on this at home :-) Might even put it in my LWM article next month :-)

(?) Now i could go and program such a thing, find out afterwards that something like this exists and tear my hairs out in frustration. So, does a program exist that can do this? Maybe someone has a script lying around. Ofcourse intensive Googling for a week hasn't turned up anything.

Thanks for any input on the matter.

(!) [Ben] As I've mentioned, I haven't heard of anything (although any CompSci department has seen thousands of attempts), but it shouldn't be all that hard to script an approximation. The algorithm I suggested would only take a few lines of Perl, and even a shell script wouldn't be all that bad.
(!) [Thomas] Hey Ben....you and I could work on this one :-) Whadaya reckon -- I'm sticking to bash !!
(!) [Ben] <dryly> The actual scripting is left as an excercise to the student. Gotta let folks do _something on their own, dude! :)
Now, all I've got to do is figure out the purpose of your script, above... what it does is obvious, why you'd care about the result (if there _was any) isn't.
(!) [K.-H.] Actually there is already a script which served me quite nicely. It's already tied in with cdrekord and mkisofs and seems to handle special files correctly (i.e. uisng cpio or something to dump them).

(?) That's excatly what i was looking for. Thanks a lot.

(!) [Ben] Cool; I figured somebody somewhere did it. Interestingly enough, he uses pretty much the algorithm I suggested:
    @todo = sort { ${$b}[2] <=> ${$a}[2] } @todo;


    foreach $file (@todo) {
        if ($thissize + ${$file}[2] < $remaining) {
            $thissize += ${$file}[2];
            push(@{$thisdo[${$file}[0]]}, ${$file}[1]);
It's not a bad approximation; the only degenerate case is where you have no-to-few small files and lots of large ( > MAX_SIZE/2 ) ones. If you have an average file spread, it works well.

This page edited and maintained by the Editors of Linux Gazette Copyright © 2002
Published in issue 80 of Linux Gazette July 2002
HTML script maintained by Heather Stern of Starshine Technical Services, http://www.starshine.org/

[ Table Of Contents ][ Answer Guy Current Index ] greetings   Meet the Gang   1   2   3   4   5   6   7 [ Index of Past Answers ]