comp.lang.c
[Top] [All Lists]

Re: Convert native character string to ASCII array of integers

Subject: Re: Convert native character string to ASCII array of integers
From: Tomás Ó hÉilidhe
Date: Fri, 28 Mar 2008 06:09:28 -0700 PDT
Newsgroups: comp.lang.c

Richard Heathfield:

> #include <string.h>
>
> void MakeASCII(unsigned char *pos,char const *pc)
> {
>   const char *bcs =
>     " !\"#$%&'()*+,-./0123456789:;<=>?@"
>     "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
>     "[\\]^_`"
>     "abcdefghijklmnopqrstuvwxyz"
>     "{|}~";
>   const char *cur = NULL;
>
>   while(*pc != '\0')
>   {
>     cur = strchr(bcs, *pc);
>     if(cur != NULL)
>     {
>       *pos++ = (cur - bcs) + 32;
>     }
>     else
>     {
>       *pos++ = *pc;
>     }
>     ++pc;
>   }
>   *pos = '\0';
>
> }
>
> If you hit performance issues with that one, consider this alternative:
>
> #include <string.h>
> #include <limits.h>
>
> void MakeASCII(unsigned char *pos,char const *pc)
> {
>   const char *bcs =
>     " !\"#$%&'()*+,-./0123456789:;<=>?@"
>     "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
>     "[\\]^_`"
>     "abcdefghijklmnopqrstuvwxyz"
>     "{|}~";
>   static char att[UCHAR_MAX + 1] = {0};
>
>   const char *cur = bcs;
>   int i = 0;
>
>   if(att[' '] != 32) /* do we need to set up the array? */
>   {
>     /* defaults */
>     for(i = 0; i < UCHAR_MAX + 1; i++)
>     {
>       att[i] = (char)i;
>     }
>
>     /* known ASCII characters */
>     i = 32;
>     while(*cur != '\0')
>     {
>       att[*cur++] = i++;
>     }
>   }
>
>   while(*pos++ = att[*pc++])
>   {
>     continue;
>   }
>
> }


Very nice, the look-up method hadn't crossed my mind.

If we can be sure that all characters will be valid ASCII characters
then we can do the following:

#include <string.h>    /* strchr */
#include <stdio.h>     /* puts   */

typedef char OctetStorage;

void MakeASCII(OctetStorage *pos,char const *pc)
{
    static char const ascii[] =
      " !\"#$%&'()*+,-./0123456789:;<=>?@"
      "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
      "[\\]^_`"
      "abcdefghijklmnopqrstuvwxyz"
      "{|}~";


    for ( ; *pc; ++pos, ++pc)
       *pos = strchr(ascii,*pc) - ascii + ' ';

    *pos = 0;
}

int main(void)
{
    char hello[] = "hello";

    MakeASCII(hello,hello);

    puts(hello);

    return 0;
}

I wasn't sure whether I was able to replace:

    for ( ; *pc; ++pos, ++pc)
       *pos = strchr(ascii,*pc) - ascii + ' ';

with:

    while (*pc) *pos++ = strchr(ascii,*pc++) - ascii + ' ';

I thought there might be a sequence point violation if pos and pc
point to the same thing.. ?

<Prev in Thread] Current Thread [Next in Thread>
Privacy Policy