string to integer conversion edgecase handling

152 views Asked by At

I want to convert a number, given in string format to an integer. I've found multiple solutions for this, like atoi() or strtol(). However, if not given an Input, that could be converted to an integer, like strtol("junk", &endptr, 10) I just get back the integer 0. This conflicts with cases, where I actually have the number "0" as my input.

Is there a function, that can handle this edgecase by returning a pointer to an integer, instead of an integer, so that in a case like above, I'd just get NULL?

3

There are 3 answers

8
abelenky On BEST ANSWER

If a conversion is successfully done, (eg, if value "0" is parsed), then the second parameter to strtol (endptr) will end up greater than the first one.

If a conversion could not be done, then the parameter endptr will be unchanged.

Demonstrated in this program:

#include <stdio.h>

int main(void) {
    char* text = "junk";
    char* endptr = NULL;
    
    int answer = strtol(text, &endptr, 10);
    
    if ( endptr > text )
    {
        printf("Number was converted: %d\n", answer);
    }
    else
    {
        printf("No Number could be found\n");
    }

    return 0;
}
7
JustaNobody On
int main(){
    char x[] = "0";
    int y;
    int t=sscanf(x,"%d",&y);
    if(t==1){
        printf("Converted");
    }
    else
    printf("Not converted");
}

Here the sscanf if converted would return 1. Other values will be not converted.

0
Andreas Wenzel On

The function strtol will return 0 if

  • the string was successfully converted to 0, or
  • the conversion failed.

In order to distinguish these two cases, you will have to inspect the value of endptr, which points to the first character that was not converted.

If endptr points to the start of the string specified in the first argument of strtol, then no characters were successfully converted, which means that the conversion failed.

If endptr does not point to the start of the string, then at least one character was successfully converted. Therefore, one could consider the conversion to have been successful.

On the other hand, if you call

strtol( "6junk", &endptr, 10 );

then the function will return 6 and endptr will point to the character j, which is the first character that was not converted. Depending on the situation, you may want to consider the conversion successful, or you may want to consider the conversion a failure, because not the entire string was converted. For example, if you ask the user to enter an integer and the user enters 6junk, then you will probably want to reject the input as invalid, even if the first character was successfully converted.

For this reason, you may want to test whether endptr points to the end of the string (i.e. to the terminating null character), in order to determine whether the entire string was successfully converted. Here is an example:

#include <stdlib.h>
#include <stdbool.h>

//This function will return true if the entire string was
//successfully converted to a long integer, otherwise it will
//return false.
bool convert_string_to_long( char str[], long *num )
{
    long result;
    char *endptr;

    //attempt to convert string to number
    result = strtol( str, &endptr, 10 );
    if ( endptr == str )
    {
        return false;
    }

    //test whether the entire string was converted
    if ( *endptr != '\0' )
    {
        return false;
    }

    //everything went ok, so pass the result
    *num = result;
    return true;
}

However, the code above is inconsistent in that it will accept leading whitespace characters, but reject trailing whitespace characters. Therefore, instead of testing whether endptr points to the terminating null character, it would probably be better to inspect the remaining characters, and to only reject the string if at least one remaining character is not a whitespace character, like this:

#include <stdlib.h>
#include <ctype.h>
#include <stdbool.h>

//This function will return true if the entire string was
//successfully converted to a long integer, otherwise it will
//return false.
bool convert_string_to_long( char str[], long *num )
{
    long result;
    char *endptr;

    //attempt to convert string to number
    result = strtol( str, &endptr, 10 );
    if ( endptr == str )
    {
        return false;
    }

    //verify that there are no unconverted characters, or that if
    //such characters do exist, that they are all whitespace
    //characters
    for ( ; *endptr != '\0'; endptr++ )
    {
        if ( !isspace( (unsigned char)*endptr ) )
        {
            return false;
        }
    }

    //everything went ok, so pass the result
    *num = result;
    return true;
}

Another issue is that it is possible that the user enters a value that is outside the range of representable values of a long int (i.e. too high or too low). In that case, the function strtol will set errno to ERANGE. We can detect whether strtol set errno to ERANGE by setting errno to 0 before calling strtol, and after the function call, we check whether errno has been changed to ERANGE. Here is an example:

#include <stdlib.h>
#include <stdbool.h>
#include <errno.h>

//This function will return true if the entire string was
//successfully converted to a long integer, otherwise it will
//return false.
bool convert_string_to_long( char str[], long *num )
{
    long result;
    char *endptr;

    //attempt to convert string to number
    errno = 0;
    result = strtol( str, &endptr, 10 );
    if ( endptr == str )
    {
        return false;
    }

    //verify that no range error occurred
    if ( errno == ERANGE )
    {
        return false;
    }

    //verify that there are no unconverted characters, or that if
    //such characters do exist, that they are all whitespace
    //characters
    for ( ; *endptr != '\0'; endptr++ )
    {
        if ( !isspace( (unsigned char)*endptr ) )
        {
            return false;
        }
    }

    //everything went ok, so pass the result
    *num = result;
    return true;
}

Here is a complete working example program:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <stdbool.h>
#include <errno.h>

//forward declarations
bool convert_string_to_long( char str[], long *num );
void get_line_from_user( const char prompt[], char buffer[], int buffer_size );

int main( void )
{
    //repeat forever
    for (;;)
    {
        char line[200];
        long num;

        //get input from user
        get_line_from_user(
            "Please enter an integer: ",
            line, sizeof line
        );

        //attempt to convert input to a string
        if ( convert_string_to_long( line, &num ) )
        {
            printf( "Input was successfully converted to: %ld\n", num );
        }
        else
        {
            printf( "Input was invalid!\n" );
        }
    }
}

//This function will return true if the entire string was
//successfully converted to a long integer, otherwise it will
//return false.
bool convert_string_to_long( char str[], long *num )
{
    long result;
    char *endptr;

    //attempt to convert string to number
    errno = 0;
    result = strtol( str, &endptr, 10 );
    if ( endptr == str )
    {
        return false;
    }

    //verify that no range error occurred
    if ( errno == ERANGE )
    {
        return false;
    }

    //verify that there are no unconverted characters, or that if
    //such characters do exist, that they are all whitespace
    //characters
    for ( ; *endptr != '\0'; endptr++ )
    {
        if ( !isspace( (unsigned char)*endptr ) )
        {
            return false;
        }
    }

    //everything went ok, so pass the result
    *num = result;
    return true;
}

//This function will read exactly one line of input from the
//user. It will remove the newline character, if it exists. If
//the line is too long to fit in the buffer, then the function
//will automatically reprompt the user for input. On failure,
//the function will never return, but will print an error
//message and call "exit" instead.
void get_line_from_user( const char prompt[], char buffer[], int buffer_size )
{
    for (;;) //infinite loop, equivalent to while(1)
    {
        char *p;

        //prompt user for input
        fputs( prompt, stdout );

        //attempt to read one line of input
        if ( fgets( buffer, buffer_size, stdin ) == NULL )
        {
            printf( "Error reading from input!\n" );
            exit( EXIT_FAILURE );
        }

        //attempt to find newline character
        p = strchr( buffer, '\n' );

        //make sure that entire line was read in (i.e. that
        //the buffer was not too small to store the entire line)
        if ( p == NULL )
        {
            int c;

            //a missing newline character is ok if the next
            //character is a newline character or if we have
            //reached end-of-file (for example if the input is
            //being piped from a file or if the user enters
            //end-of-file in the terminal itself)
            if ( (c=getchar()) != '\n' && !feof(stdin) )
            {
                if ( ferror(stdin) )
                {
                    printf( "Error reading from input!\n" );
                    exit( EXIT_FAILURE );
                }

                printf( "Input was too long to fit in buffer!\n" );

                //discard remainder of line
                do
                {
                    c = getchar();

                    if ( ferror(stdin) )
                    {
                        printf( "Error reading from input!\n" );
                        exit( EXIT_FAILURE );
                    }

                } while ( c != '\n' && c != EOF );

                //reprompt user for input by restarting loop
                continue;
            }
        }
        else
        {
            //remove newline character by overwriting it with
            //null character
            *p = '\0';
        }

        //input was ok, so break out of loop
        break;
    }
}

This program has the following behavior:

Please enter an integer: junk
Input was invalid!
Please enter an integer: 6junk
Input was invalid!
Please enter an integer: 60000000000000000000
Input was invalid!
Please enter an integer: 6
Input was successfully converted to: 6

As you can see, the number 60000000000000000000, which is too large to be representable as a long int on most platforms, was correctly rejected.