Tuesday, September 18, 2012

Algorithm to replace all occurrence of a space with %20

This particular algorithm is pretty common. This is used whenever you enter an URL containing spaces, browsers will automatically convert the spaces to %20 to process the request further. Here is an attempt for the same.

The basic logic is to iterate over the list and find the count of spaces, since each space needs to be replaced with %20, which is 3 characters, which basically means additional 2 more bytes per space. So if there are 10 spaces, then the extra memory needed is 10*2 = 20 bytes. The way to solve this is:
1. So the new length = existinglength + (numberofspaces*2)
2. Allocate memory for the new string
3. Start scanning the string from the back.
4. If you hit a space, set the string[ newlength -2 ] = '%', string[ newlength - 1 ] = 2 and string[newlength]=0, else just set the character encountered.

Why backwards? Let take the example of:

String: "http://www.myurl.com/my page/your query?value=this is it.
Number of spaces: 4
Original length: 56
New length: 56 + 4*2= 56 + 8 = 64.

Here is the algorithm:

wchar_t* Replace( wchar_t* pszText ) {
    if( !pszText ) 
        return;

    //find the length and the number of spaces
    size_t nLength = 0;
    size_t nNumberOfSpaces = 0;
  
    while( *( pszText + nLength ) ) {
        if( *( pszText + nLength ) == L' ' )
            ++nNumberOfSpaces;
        ++nLength;
    }

    size_t nNewLength = nLength + (nNumberOfSpaces * 2);
    wchar* pNewText = new wchar_t[ nNewLength + 1 ];
    wmemset( pNewText, L'\0', nNewLength );

    size_t srcIndex = 0;
    size_t destIndex = 0;

    while( *(pszText + srcIndex) ) {
        if( *(pszText+srcIndex) == L' ' ) {
            *(pszText+destIndex++) = L'%';
            *(pszText+destIndex++) = L'2';
            *(pszText+destIndex++) = L'0';
        }
        else {
            *(pszText+destIndex++) = *(pszText + srcIndex++);
        }
    }
         return pszNewText;
}

Note it is the callers responsibility to free up the returned memory space.

2 comments:

Taher said...

Was looking forward to your next article. Funny how seemingly simple things have so many tricks that can be applied.

Since you mentioned URLs here, spaces could be in the minority. One could record the offsets of the spaces in the first loop. That way you can jump from space to space, and do large bulk transfers (memcpy for e.g. for non-space characters (in the else clause).

munsingh said...

Sure. An algorithm a day keeps the "doctor" way is my motto now.

Yep I agree with your comments. My next optimized algorithm was to use memcpy. Trying to figure out right now how do various c rt implement mempcy. The basic one is a simple loop, however I ran into this article on google:
Optimizing Memcpy.

Will post an optimized version, soon.