Thursday, September 20, 2012

Find max sub array in a given array

This is also known as Kadane's algorithm (here). 

To quote from wikipedia: This problem was posed as a simplified model for maximum likelihood of patter matching in digitized images.

In Computer Science this is a task to find a contiguous subarray within a 1-D array of numbers (containing at least 1 +tive number), which has the largest sum.

Here are a couple of approaches to solve this problem. The brute force method is to iterate over the array twice and sum up the contiguous cells until you find the block with the maximum sum. However the complexity of this algorithm O(n2).

Brute force method

int FindMaxSum( int[] array, int nSize, int& nMaxStartIndex, int& nMaxEndIndex) {
    int nMaxSum = INT_MIN;
    nMaxStartIndex = 0;
    nMaxEndIndex = nSize > ? nSize - 1 : 0;

    if( nSize == 2 ) {
        nMaxSum = array[ 0 ] + array[ 1 ];
    }
    else if( nSize == 1 ) {
        nMaxSum = array[ 0 ];
    }

    else if( nSize == 0 ) {
        nMaxSum = 0;
    }
    else {
        int nCurrentSum = 0;
        for( int i = 2 ; i < nSize; ++i ) {
            for( int j = 0 ; j < nSize - i; j++ ) {
                for( int k = j; k < k + i; k++ )
                    nCurrentSum += array[ k ];
                if( nMaxSum < nCurrentSum ) {
                    nMaxSum = nCurrentSum;
                    nMaxStartIndex = j;
                    nMaxEndIndex = k;
                }
            }

        }

    }
    
    return nMaxSum;
}

O(n) Algorithm

Obviously the above O(n2) is not preferred, that is why Jay Kadane came up with an approach which solves this in O(n) time. Here is a C++ implementation of the same.

int FindMaxSum( int[] array, int nSize, int& nMaxStartIndex, int& nMaxEndIndex) {
    int nMaxSum = INT_MIN;
    int nCurrSum = 0;
    int nCurrStartIndex = 0;

    for( int nCurrEndIndex = 0 ; nCurrEndIndex < nSize; ++nCurrEndIndex ) {
        nCurrSum += array[ nCurrEndIndex ];
         if( nMaxSum < nCurrSum ) {
           nMaxSum        = nCurrSum;
           nMaxStartIndex = nCurrStartIndex;
           nMaxEndIndex   = nCurrEndIndex;
        }

        if( nCurrSum < 0 ) {
            nCurrSum = 0;
            nCurrStartIndex = nCurrEndIndex + 1;
        }

    }

    return nMaxSum;
}

I will soon update this thread to extend Kadane's algorithm to a 2-D array.

Wednesday, September 19, 2012

How to find whether a string is an anagram

First things first, what is an anagram? Anagram is a word, phrase, name or a sentence formed by rearranging the letters of an another word, phrase, name of sentence.The key rule here is that all letters of the name/phrase/word/sentence should be used ONCE and only ONCE. For example
  • cinema and iceman.
  • Longest Anagram ever created is here.
  • Stone age and Stage one.
  • A chevrolet and Love the car
  • A carton of cigarettes and I got a taste for cancer.
Some more examples are here.

Anyways back to Computer Science, how to write an algorithm, which can check whether a given word/phrase/sentence is an anagram or not.
The rule for an anagram is that both the strings should be formed using the exactly the same letters as shown above in the example.

Lets take cinema and iceman
In the first word cinema,
c occurs once, i occurs once and so do the rest of the letters i.e., n, e, m and a. Now observe that in the anagram word iceman:
i occurs once, i occurs once and so do the rest of the letters i.e, e, m, a and n. The key takeaway is that all these letters occur only once. If the second letter uses a letter, which is not present in the first, then it is not an anagram.

Two approaches:
  1. A simple approach would be to sort both the words/phrase/sentence and compare for equality. If they are equal then they are anagrams otherwise not.
  2. Another approach is to  check if the two strings have identical counts for each unique char.
The approach I am proposing is similar to the second approach above. It is:
  1. Take the first string and create a binary sorted tree out of it. Note the data that the node will contain would be the letter and a count, which basically is used to keep track of occurrence of that particular character.
  2. Once the tree is populated with the first string, now add the second string also in the BST. The only difference is that if while populating the data, you end up in a situation where a new node has to be created, then you know it is not an anagram.
  3. There is one more case and which is that the unique letters are same, however their counts are different. So while adding the second string to the BST, if you run into a existing letter, decrement that nodes letter count.
  4. If we exhaust all the elements of the second tree as well, do a tree-traversal using any mechanism (pre/post/in/level), the requirement is that the letter count should be zero for it to an anagram.
class Node {
    public:
        Node( wchar_t wchElement ) : m_wchElement( wchElement ),
                                     m_nCount( 1 ),
                                     m_pLeft( NULL ),
                                     m_pRight( NULL ) {};
        ~Node() {
            delete m_pLeft;
            m_pLeft = NULL;
            delete m_pRight;
            m_pRight = NULL;
        };

        int  GetCount() const       { return m_nCount;};
        void IncrementCount()       { ++m_nCount;};
        void DecrementCount()       { --m_nCount;};

        wchar_t GetData() const     { return m_wchElement;};
        Node* GetLeft() const       { return m_pLeft;};
        Node* GetRight() const      { return m_pRight;};

        void  SetLeft( Node* pNode) { m_pLeft = pNode;};
        void  SetRight( Node* pNode){ m_pRight = pRight;};
    private:
        //hidden constructor
        Node();
        wchar_t m_wchElement;
        int     m_nCount;
        Node*   m_pLeft;
        Node*   m_pRight;
};

class BST {
    public:
        BST(): m_pRoot( NULL ) {}
        ~BST() {
            //function to empty the tree, left as an exercise
            //for the reader
        }

        Node* Add( wchar_t wchElement, bool bIncrementCount = true)
        bool  IsAnagram( wchar_t* pszString1, wchar_t* pszString2 );
        bool  PreOrder( Node* pNode );
    private:
        Node* m_pRoot;
};

bool BST::Add( wchar_t wchElement, bool bIncrementCount=/*true*/) {
    //Note: When adding the first string
    //the return values are of no use
    //they will come into use while adding the second string
    //i.e, when bIncrementCount is false.
    bool bAnagram = true;
    if( !m_pRoot ) {
        m_pRoot = new Node( wchElement );
        return bAnagram;
    }

    Node* pRoot = m_pRoot;
    while( pRoot ) {
        if( pRoot->GetData() == wchElement ) {
            bIncrementCount ? pRoot->IncrementCount():
                              pRoot->DecrementCount();
        }
        elseif( pRoot->GetData() < wchElement ) {
            if( pRoot->GetLeft() ) {
                pRoot = pRoot->GetLeft();
            }
            else {
                if( nIncrementCount ) {
                    pRoot->SetLeft( new Node( wchElement ) );
                }
                else {
                    //we are adding the second string
                    //and we have found a unique element
                    //which is not in the first string
                    //so this is not a anagram.
                    bAnagram = false;
                    break;
                }
            }
        }
        else {
            //greater so operate on the right sub-tree
            if( pRoot->GetRight() ) {
                pRoot = pRoot->GetRight();
            }
            else {
                if( bIncrementCount ) {
                    pRoot->SetRight( new Node( wchElement ) );
                }
                else {
                    //we are adding the second string
                    //and we have found a unique element
                    //which is not in the first string
                    //so this is not a anagram.
                    bAnagram = false;
                    break;
                }
            }
        }
    }

    return bAnagram;
}

bool BST::IsAnagram( wchar_t *pszString1, wchar_t *pszString2 ) {
    wchar_t *pStart = pszString1;
    while( *pStart ) {
        Add( *pStart++, true );
    }

    //add the pszString2, while adding the next string
    //we will get to know any character which is unique
    //in the 2nd string.
    //however there may be strings like abcd and abcc, or
    //like xxxx and xxx. For such cases while populating the
    //BST we will not come to know, so for such cases we will later
    //use one of the traversal techniques to see how many elements
    //still have a non-zero count. If they have then it is not an 
    //anagram
    pStart = pszString2;
    while( *pStart ) {
        if( !Add( *pStart++, false ) )
            return false;
    }
    //Here we will use one of the traversals techniques to traverse
    //all the nodes and check the count at each node
    //if we find any node having non-zero count, then we know for 
    //sure that it is not an anagram.

    return PreOrder( m_pRoot );
}

//rLR
bool BST::PreOrder( Node* pNode ) {
    if( !pNode )
        return true;
    if( pNode->GetCount() != 0 )
        return false;
    if( PreOrder( pNode->GetLeft() ) )
        return PreOrder( pNode->GetRight() );
    else
        return false;
}

int wmain( int argc, wchar_t** argv ) {
     wchar_t *pszString1 = L"cinema";
     wchar_t *pszString2 = L"iceman";

     BST oBST;
     if( oBST.IsAnagram( pszString1, pszString2 ) ) {
         std::wcout << L"Is Anagram";
     }
     else {
         std::wcout << L"NOT is Anagram";
     }

     std::wcout << std::endl;

     return 0;
}

I agree there is scope of improvement in the above code, however it basically demonstrates a way to find whether strings/words/phrases are anagrams. The above code can be modified to read the two strings from files, or from user input, or from any other input mechanism.

Tuesday, September 18, 2012

Algorithm to replace all occurrence of a space with %20

This particular algorithm is pretty common. This is used whenever you enter an URL containing spaces, browsers will automatically convert the spaces to %20 to process the request further. Here is an attempt for the same.

The basic logic is to iterate over the list and find the count of spaces, since each space needs to be replaced with %20, which is 3 characters, which basically means additional 2 more bytes per space. So if there are 10 spaces, then the extra memory needed is 10*2 = 20 bytes. The way to solve this is:
1. So the new length = existinglength + (numberofspaces*2)
2. Allocate memory for the new string
3. Start scanning the string from the back.
4. If you hit a space, set the string[ newlength -2 ] = '%', string[ newlength - 1 ] = 2 and string[newlength]=0, else just set the character encountered.

Why backwards? Let take the example of:

String: "http://www.myurl.com/my page/your query?value=this is it.
Number of spaces: 4
Original length: 56
New length: 56 + 4*2= 56 + 8 = 64.

Here is the algorithm:

wchar_t* Replace( wchar_t* pszText ) {
    if( !pszText ) 
        return;

    //find the length and the number of spaces
    size_t nLength = 0;
    size_t nNumberOfSpaces = 0;
  
    while( *( pszText + nLength ) ) {
        if( *( pszText + nLength ) == L' ' )
            ++nNumberOfSpaces;
        ++nLength;
    }

    size_t nNewLength = nLength + (nNumberOfSpaces * 2);
    wchar* pNewText = new wchar_t[ nNewLength + 1 ];
    wmemset( pNewText, L'\0', nNewLength );

    size_t srcIndex = 0;
    size_t destIndex = 0;

    while( *(pszText + srcIndex) ) {
        if( *(pszText+srcIndex) == L' ' ) {
            *(pszText+destIndex++) = L'%';
            *(pszText+destIndex++) = L'2';
            *(pszText+destIndex++) = L'0';
        }
        else {
            *(pszText+destIndex++) = *(pszText + srcIndex++);
        }
    }
         return pszNewText;
}

Note it is the callers responsibility to free up the returned memory space.

Saturday, September 15, 2012

Algorithm to reverse a string in place

This one ought to be simple. The logic is:
1. Get a pointer to the end.
2. Keep a temp pointer
3. Reverse the first and the last pointers
4. increment the pointer which is starting from the start and decrement the pointer from the end.

here goes:

void Reverse( char* pszString ) {
    if( !pszString )
        return;

    char* pEnd = pszString;
    while( *++pEnd ) {;}
    //this goes beyond the end
    --pEnd;

    while( pszString < pEnd ) {
        pTemp = *pszString;
        *pszString++ = *pEnd;
        *pEnd-- = pTemp;
    }


The thing to note is that while testing the above if you call the function as
Reverse( L"myname" ), then it will not work and you will get first chance access violation on account of access denied. The reason is because such strings are stored in the read only area. So you might want to create a string on the heap using the new or the malloc and then call this function. The complexity of the above would be O(n) and no space complexity

By the way the same can be done recursively as well. However recursion will eat up stack. I will update this thread later with a recursive version as well soon.

Friday, September 14, 2012

Algorithm to find whether a string contains unique characters

There are multiple ways to do this. The brute force method is to compare each character with the next character and the moment there is a duplicate, return with a false. The brute force method is O(n2) for time complexity but does not require any overhead for space. Here is the brute force algorithm:

bool HasUnique( wchar_t* pszString ) {

    //In this we take the first character
    //and compare it with the next character
    //and so on till we reach the end
    //then we start with the next character
    //and compare it again with each character
    if( !pszString )
        return true;

    size_t nLen = wcslen( pszString );
    for( size_t i = 0 ; i < nLen ; ++i ) {
        for( size_t j = i + 1; j < nLen ; ++j ) {
            if( pszString[ i ] == pszString[ j ] )
                return false;
        }
    }

    return true;
}


However the above is not optimal, given that in today's computers space is not expensive, however
computing time is still important. So a better approach could be to use some sort of mechanism, where each unique element encountered is remembered, and the moment we encounter a scenario, where we try to re-remember an already remembered element, we would know that the string (or dataset) is not unique. There are various approaches for this, the simplest being using a boolean array of 256 elements. 256 because this is the range of ASCII. However this approach will fail for unicode elements. Anyways here is this algorithm:

bool HasUnique( char* pszString ) {
    bool bStatus[ 256 ] = { false };
    
    if( !pszString )
        return true;

    size_t size   = strlen( pszString );
    int nASCIIVal = 0;
    for( size_t i = 0 ; i < size ; ++i ) {
        nASCIIVal = pszString[ i ];
        if( bStatus[ nASCIIVal ] )
            return false;
        
        bStatus[ nASCIIVal ] = true;
    }

    return true;
}


The above algo can be improved and made generic such that it works for all kind of elements by simply putting this in a map. Basically put the element encountered in the map, if it is not there, if it is already there, then you know that the string does not contain unique elements.
This can be extended to solve many similar problems like, find all duplicates, count the occurrence of each character/element in an array

I am going to write my own code to do the same. The code to 

//class to wrap the Node for a tree.
class Node {
    public:
        Node( int nData ) : 

            m_pLeft( NULL ), m_pRight( NULL ), m_nData( nData ) {};
        ~Node() {};
        Node*    GetLeft()    const    { return m_pLeft;    };
        Node*    GetRight()   const    { return m_pRight;   };
        int      GetData()    const    { return m_nData;    };

        void    SetLeft( Node* pNode )  { m_pLeft  = pNode; };
        void    SetRight( Node* pNode ) { m_pRight = pNode; };
        void    SetData( int nData )    { m_nData  = nData; };
    protected:
        Node() : m_pLeft( NULL ), m_pRight( NULL ), m_nData( 0 ) {};
        Node* m_pLeft;
        Node* m_pRight;
        int   m_nData;   
};//class Node


The idea is to add elements in a binary sorted tree and if while inserting elements we run across duplicates, we know the entries are not unique.

class Tree {
public:
    Tree(): m_pRoot( NULL ) {};
    ~Tree();
    Node* Add( int nData, bool& bUnique );
protected:
    Node* m_pRoot;
};

Node* Tree::Add( int nData, bool& bUnique ) {
    Node* pNode = new Node( nData );
    bUnique = true;
    if( !m_pRoot ) {
        m_pRoot = pNode;

        return m_pRoot;
    }

    Node* pRoot = m_pRoot;
    while( pRoot ) {
        if( pRoot->GetData() == nData ) {
            //found duplicate
            bUnique = false;
        }
        else if( pRoot->GetData() < nData ) {
            //now check if there is a left child
            if( pRoot->GetLeft() ) {
                pRoot = pRoot->GetLeft();
            }
            else {
                pRoot->SetLeft( pNode );
            }
        }
        else {
            //now check if there is a right child
            if( pRoot->GetRight() ) {
                pRoot = pRoot->GetRight();
            }
            else {
                pRoot->SetRight( pNode );
            }
        }

    }
}

So you see this function can be used to create a binary sorted tree and while inserting data itself we will know whether the element being inserted is unique or not. The complexity of this is: O( log n).

Wednesday, September 12, 2012

How do you find whether a number is power of 2.

An interesting question is how do find whether a number is a power of 2?

Well it is simple actually. Lets take for example the number 256, which is 2 to the power 8. Lets write the binary representation of 256

2^8 = 256 = 1 0000 0000
2^7 = 128 = 0 1000 0000
2^6 = 64   = 0 0100 0000
2^5 = 32   = 0 0010 0000
2^4 = 16   = 0 0001 0000
2^3 = 8     = 0 0000 1000
2^2 = 4     = 0 0000 0100
2^1 = 2     = 0 0000 0010
2^0 = 1     = 0 0000 0001


The commonality is that all these numbers have only a single bit set. So our requirement for a number to be power of 2 is It should have only 1 bit set (to 1).

Now the next challenge is how to convert this into a C code. Well you could write some algorithm to
iterate over the bits and make sure there is only a single bit set. It would work but it requires writing a loop and doing some bit manipulations to iterate over the binary number.

What about the following approach:
We basically need to ensure that only a single bit is set. Lets take 128
128 = 0 1000 0000
127 = 0 0111 1111
if we AND these two numbers we will get 0 0000 0000, similarly
256 = 1 0000 0000
255 = 0 1111 1111
if we AND these two numbers we will get 0 0000 0000.
You can try this for other power of 2 numbers. So now we are seeing an approach which is
for a number to be power of 2 if we AND the number and number - 1 its result should be 0
bool bIsPowerOfTwo = number & (number - 1 ) == 0 ? true : false;

This will work, however a better code is:
bool bIsPowerOfTwo = number && ( number & (number - 1) == 0 );