9th June 2020

Splitting and anti-merging vCard files

Sometimes vCard files need to be split into smaller files, or the file needs to be protected against merging in another application.

1. Splitting. Below Perl script splits the input file into as many files as required. Output files are named adr1.vcf, adr2.vcf, etc. You can pass a command line argument "-n" to specify the number of card records per file. Splitting a vCard file is provided in palmadrsplit on GitHub:

use Getopt::Std;

my %opts;
getopts('n:',\%opts);
my ($i,$k,$n) = (1,0,950);
$n = ( defined($opts{'n'}) ? $opts{'n'} : 950 );

open(F,">adr.$i.vcf") || die("Cannot open adr.$i.vcf for writing");
while (<>) {
        if (/BEGIN:VCARD/) {
                if (++$k % $n == 0) {   # next address record
                        close(F) || die("Cannot close adr.$i.vcf");
                        ++$i;   # next file number
                        open(F,">adr.$i.vcf") || die("Cannot open adr.$i.vcf for writing");
                }
        }
        print F $_;
}
close(F) || die("Cannot close adr.$i.vcf");

This is required for Google Contacts, as Google does not allow to import more than 1,000 records per day, see Quotas for Google Services.

2. Anti-Merge. Inhibiting annoying merging is given in file palmantimerge on GitHub. Overall logic is as follows: Read entire vCard file and each card, delimited by BEGIN:VCARD and END:VCARD, is put on a hashmap. Each hashmap entry is a list of vCards. Hash key is the N: entry, i.e., the concatentation of lastname and firstname. Once everything is hashed, then walk through hash. Those hash entries, where the list contains just one entry, can be output as is. Where the list contains more than one entry, then these entries would otherwise be merged, and then the N: part is modified by using the ORG: field.

use strict;
my @singleCard = ();    # all info between BEGIN:VCARD and END:VCARD
my ($name) = "";        # N: part, i.e., lastname semicolon firstname
my ($clashes,$line,$org) = (0,"","");
my %allCards = {};      # each entry is list of single cards belonging to same first and lastname, so hash of array of array

while (<>) {
        if (/BEGIN:VCARD/) {
                ($name,@singleCard) = ("", ());
                push @singleCard, $_;
        } elsif (/END:VCARD/) {
                push @singleCard, $_;
                push @{ $allCards{$name} }, [ @singleCard ];
        } else {
                push @singleCard, $_;
                $name = $_ if (/^N:/);
        }
}

for $name (keys %allCards) {
        $clashes = $#{$allCards{$name}};
        for my $sglCrd (@{$allCards{$name}}) {
                if ($clashes == 0) {
                        for $line (@{$sglCrd}) { print $line; }
                } else {
                        $org = "";
                        for $line (@{$sglCrd}) {
                                $org = $1 if ($line =~ /^ORG:([ \-\+\w]+)/);
                        }
                        for $line (@{$sglCrd}) {
                                $line =~ s/;/ \/${org}\/;/ if ($line =~ /^N:/);
                                print $line;
                        }
                }
        }
}

Every lastname is appended with "/organization/" if the combination of firstname and lastname is not unique. For example, two records with Peter Miller in ABC-Corp and XYZ-Corp, will be written as N:Miller /ABC-Corp/;Peter and N:Miller /XYZ-Corp/;Peter.

This way Simple Mobile Tools Contacts will not merge records together which it shouldn't. Issue #446 for this is on GitHub.