, 2 min read

Splitting Large Files on Microsoft Windows

After finishing my ABAP programming course I wanted to upload the Netweaver-trial software to Telekom media-center. Telekom media-center allows you to store up to 25 GB of data in the cloud. That's more storage than offered by many USB sticks. Netweaver-trial software is about 3.5 GB, and media-center (or Windows) does not allow to upload more than 2 GB for a single file. On Linux this is no problem: Just use split, which is in coreutils. I searched in Google for a simple split-program for DOS/Windows, which I knew I once used, but didn't find it. I did find one program which failed, and the rest I found I was afraid of wasting too much time to test, and the sources where of unknown reputation, so I was not willing to risk the Windows PC to unknown downloads. So Fabrice Bellard's tcc came to the rescue. tcc has file-size of 386K, which is downloaded quickly. So I wrote my own splitter as I was already in "programming-mode". Here it is:

#include <stdio.h>
#define FILESZ        1500000000L

int main (int argc, char *argv[]) {
        FILE *in, *out=NULL;
        long s = FILESZ, i = 0;
        char fname[64];
        int c;
        fname[0] = '\0';

        if (argc <= 1) return 1;

        printf("Splitting %s\n", argv[1]);
        if ((in = fopen(argv[1],"rb")) == NULL) return 2;

        while ((c = fgetc(in)) != EOF) {
                //printf("%x\n",c);
                if (++s >= FILESZ) {
                        s = 0;
                        if (out != NULL) fclose(out);
                        sprintf(fname,"%s.%03d",argv[1],++i);
                        if (i > 50) return 3;
                        printf("New split file %s\n",fname);
                        if ((out = fopen(fname,"wb")) == NULL) return 4;
                }
                fputc(c,out);
        }

        return 0;
}

It is also in klmsplit.c on GitHub. Above code uses the fact that open files are closed automatically at end of program.

Performance of this code is not good, as every character is read individually, causing high CPU load. As stated above this was a quick solution to get the job done.