Finding Duplicates in a String Using Bitwise Operations – in c/c++

In this tutorial i am going to show you how to find duplicates in a string using bitwise operations. This method has advantage over the other methods in that it takes O(n) time and O(1) extra or auxiliary space.

Before we start, let’s do a quick review of the bitwise operators we will be using here:

Bitwise OperatorNameUsage
<<left shiftShift all bits to the left based on the given value
&bitwise ANDCompare two values and return true if the two values are true or return false if one value is true and the other value is false. For example: 1 & 0 is 0; 1 & 1 is 1; 0 & 0 is 0; 0 & 1 is 0
|bitwise ORCompare two values and return true if one of the two values is true; return true if both values are true; or return false if the two values are false. For example: 1 & 0 is 1; 1 & 1 is 1; 0 & 0 is 0; 0 & 1 is 1

Now let me describe the logic of how we are going to find duplicates in a string using bitwise operations.

So let’s say we are given a string ‘finding’ (initialized as A[] = “finding”) as our input and we are expected to print ‘i is a duplicate’ and ‘n is a duplicate’.

So what we want to do is have another variable H of datatype say long int which can hold 4 byte of data i.e 32 bits. What i want to achieve with variable H is store each character of the string “finding” as bit in variable H. I am choosing long int because i need to store up to 26 letters of the alphabet in H (in case our string contains all letters of alphabets), so since i am sure that long int takes 4 bytes of storage, i am safer using it than using int which i am not sure whether it is taking 2 or 4 bytes (because that really depends on the compiler you are using).

I believe you know that bits are represented as 1s’ and 0s’. So as i go through each character in the string using a loop i will retrieve each character’s ascii value (assume that our input is in small letters so the ascii value will ranges from 97 to 122), use the bitwise leftshift operator to move all the bits on the left of our variable H for as much as the ascii value for that particular character. Once i find the bit position for that character in H, i do a bitwise & operation to ascertain if the bit in that position has been turned on before now (on is 1, off is 0; we will initially set all bits in H to 0 by doing this: long int H = 0). So if the bit in that position has been set to 1 before now, we will simply print: so so character is a duplicate, otherwise we will set it to 1.

But observe that if i want to use the ascii codes of lower case characters for the leftshift operation then our long int datatype for variable H will not be sufficient: we will need more than 4 bytes for storage. For instance say we retrieve the character “f” which has the ascii value of 102, if we have to leftshift 102 times from the starting position (i.e the least significant bit), we

This is our 4 bytes of memory for variable H. Least significant bit is the starting point

will go pass 32 places already. So the way to go around this is instead of leftshift for the no. of times of the ascii value returned for each character, i will subtract each ascii value returned by 97 so that the start position for leftshifting can be 0 for the character “a”, 1 for the character “b”, 2 for the character “c”, …, 5 for the character “f”, … , 25 for the character “z”; from 0 to 25 total space needed is 26 and what we have available is 32; that should suffix.

Code in c

#include <stdio.h>
#include <stdlib.h>
void Find_Duplicates(char *A){
    long int H;//variable to set each character's bit
    int x=0,y=0,i;

    for(i=0;A[i]!='\0';i++){//loop through the given string
        x = 1;//Start from least significant bit
        x = x << A[i] - 97;//leftshift by this no. of times

        if((x & H) > 0){//perform bitwise AND (if set before - duplicate)
            printf("%c is a duplicate", A[i]);
            printf("\n");
        }else{
            H = H | x;//perform bitwise OR (if not set before - set now)
        }
    }
}
int main(){
    char A[] = "finding";//our given string

    Find_Duplicates(A);
}

Code in c++

#include <bits/stdc++.h>

using namespace std;

void Find_Duplicates(char *A){
    long int H;//variable to set each character's bit
    int x=0,y=0,i;

    for(i=0;A[i]!='\0';i++){//loop through the given string
        x = 1;//Start from least significant bit
        x = x << A[i] - 97;//leftshift by this no. of times

        if((x & H) > 0){//perform bitwise AND (if set before - duplicate)
            cout << A[i] << " is a duplicate" << endl;
        }else{
            H = H | x;//perform bitwise OR (if not set before - set now)
        }
    }
}
int main(){
    char A[] = "finding";//our given string

    Find_Duplicates(A);
}
Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You May Also Like