Convert-wchar

Converts multi byte strings (mbs) into wide character strings.

TODO: remove module; either replace it with another implementation (see iconv(3)) or with an external process which converts local character sets into UTF-8.  Consider 32bit unicode support in C11

Summary
Convert-wcharConverts multi byte strings (mbs) into wide character strings.
CopyrightThis program is free software.
Files
C-kern/api/string/convertwchar.hHeader file of Convert-wchar.
C-kern/string/convertwchar.cImplementation file Convert-wchar impl.
Types
struct convert_wchar_tExports convert_wchar_t.
Functions
test
unittest_string_convertwcharTests convert_wchar_t.
convert_wchar_tSupport conversion into wchar_t from multibyte characters.
lenStores size (in bytes) of the unconverted mb character sequence.
nextPoints to unconverted mb character sequence of len bytes.
internal_stateInternal state used in conversion.
lifetime
convert_wchar_INITStatic initializer, same as init_convertwchar.
convert_wchar_FREEStatic initializer.
init_convertwcharInits convert_wchar_t with a pointer to a mbs string.
initcopy_convertwcharCopies state from source to dest.
free_convertwcharFrees resources associated with convert_wchar_t.
query
currentpos_convertwcharReturns next character position of multibyte string where conversion starts.
read
next_convertwcharConverts next complete mb character sequence into a wide character.
skip_convertwcharSkips the next char_count characters.
peek_convertwcharConverts the next char_count characters without modifying conv.
inline implementation
Macros
free_convertwcharImplements convert_wchar_t.free_convertwchar as a no op.
currentpos_convertwcharImplements convert_wchar_t.currentpos_convertwchar.
next_convertwcharImplements convert_wchar_t.next_convertwchar.

Copyright

This program is free software.  You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.

Author

© 2011 Jörg Seebohn

Files

C-kern/api/string/convertwchar.h

Header file of Convert-wchar.

C-kern/string/convertwchar.c

Implementation file Convert-wchar impl.

Types

struct convert_wchar_t

typedef struct convert_wchar_t convert_wchar_t

Exports convert_wchar_t.

test

unittest_string_convertwchar

extern int unittest_string_convertwchar(void)

Tests convert_wchar_t.

convert_wchar_t

struct convert_wchar_t

Support conversion into wchar_t from multibyte characters.  This object holds the necessary state used for conversion of a multibyte character sequence into the corresponding sequence of wide characters.

Summary
lenStores size (in bytes) of the unconverted mb character sequence.
nextPoints to unconverted mb character sequence of len bytes.
internal_stateInternal state used in conversion.
lifetime
convert_wchar_INITStatic initializer, same as init_convertwchar.
convert_wchar_FREEStatic initializer.
init_convertwcharInits convert_wchar_t with a pointer to a mbs string.
initcopy_convertwcharCopies state from source to dest.
free_convertwcharFrees resources associated with convert_wchar_t.
query
currentpos_convertwcharReturns next character position of multibyte string where conversion starts.
read
next_convertwcharConverts next complete mb character sequence into a wide character.
skip_convertwcharSkips the next char_count characters.
peek_convertwcharConverts the next char_count characters without modifying conv.

len

size_t len

Stores size (in bytes) of the unconverted mb character sequence.

next

const char * next

Points to unconverted mb character sequence of len bytes.

internal_state

mbstate_t internal_state

Internal state used in conversion.

lifetime

convert_wchar_INIT

#define convert_wchar_INIT(
   string_len,
   string
) { .len = (string_len), .next = (string) }

Static initializer, same as init_convertwchar.

Parameter

string_lenDescribes length in bytes of string.  It is of type (size_t).
stringPoints to multibyte character sequence of type (const char*).

Example usage

const char * mbs = "..." ;
wchar_t      wchar ;
convert_wchar_t wconv = convert_wchar_INIT(strlen(mbs), mbs) ;
while(0 == next_convertwchar(&wconv, &wchar)) { ... } ;

convert_wchar_FREE

#define convert_wchar_FREE convert_wchar_INIT(0,
)

Static initializer.

init_convertwchar

extern int init_convertwchar(/*out*/convert_wchar_t *conv,
size_t input_len,
const char *input_string)

Inits convert_wchar_t with a pointer to a mbs string.

initcopy_convertwchar

extern int initcopy_convertwchar(/*out*/convert_wchar_t * restrict dest,
const convert_wchar_t * restrict source)

Copies state from source to dest.  This works only if destination memory does not overlap with source memory.

free_convertwchar

extern int free_convertwchar(convert_wchar_t *conv)

Frees resources associated with convert_wchar_t.

query

currentpos_convertwchar

extern const char * currentpos_convertwchar(convert_wchar_t *conv)

Returns next character position of multibyte string where conversion starts.

read

next_convertwchar

extern int next_convertwchar(convert_wchar_t *conv,
wchar_t *next_wchar)

Converts next complete mb character sequence into a wide character.  The resulting wide character is stored at *next_wchar.  In case of an error no more conversion should be done cause internal state is undefined.  The end of input is signaled by returning 0 and *next_wchar set to 0.

Returns

0Conversion was successfull and *next_wchar contains a valid wide character.
EILSEQConversion encountered an illegal or incomplete mb character sequence.

skip_convertwchar

extern int skip_convertwchar(convert_wchar_t *conv,
size_t char_count)

Skips the next char_count characters.  A character may be represented as more than one byte.

Returns

0The input pointer was moved forward until char_count characters were skipped.
EILSEQConversion encountered an illegal or incomplete mb character sequence.
ENODATAThe input pointer was moved forward until end of input was reached.  But the number of skipped characters was less then char_count.

peek_convertwchar

extern int peek_convertwchar(const convert_wchar_t *conv,
size_t char_count,
wchar_t *wchar_array)

Converts the next char_count characters without modifying conv.  If an error is encountered before all chars could be converted wchar_array is initialized partially.  Before return conv is restored to the state it was before the call to this function.

Returns

0wchar_array contains char_count valid characters.  If input string contains less then char_count characters the missing characters are set to L’\0’.
EILSEQConversion encountered an illegal or incomplete mb character sequence.

Macros

free_convertwchar

#define free_convertwchar(conv) (0)

Implements convert_wchar_t.free_convertwchar as a no op.

currentpos_convertwchar

#define currentpos_convertwchar(conv) ((conv)->next)

Implements convert_wchar_t.currentpos_convertwchar.

next_convertwchar

#define next_convertwchar(
   conv,
   next_wchar
) (__extension__ ({ size_t bytes = (size_t)((conv)->len ? mbrtowc( next_wchar, (conv)->next, (conv)->len, &(conv)->internal_state) : (unsigned)(*(next_wchar) = 0)) ; int _result_ ; if (bytes > (conv)->len) { _result_ = EILSEQ ; } else { (conv)->len -= bytes ; (conv)->next += bytes ; _result_ = 0 ; } _result_ ; }))

Implements convert_wchar_t.next_convertwchar.

#define next_convertwchar( conv, next_wchar ) \
   (__extension__ ({ size_t bytes = (size_t)((conv)->input_len ? mbrtowc( next_wchar, (conv)->next_input_char, (conv)->input_len, &(conv)->internal_state) : (unsigned)(*(next_wchar) = 0)) ; int result ; if (bytes > (conv)->input_len) { result = EILSEQ ; } else { (conv)->input_len -= bytes ; (conv)->next_input_char += bytes ; result = 0 ; } result ; }))
Converts multi byte strings (mbs) into wide character strings.
Implements Convert-wchar.
typedef struct convert_wchar_t convert_wchar_t
Exports convert_wchar_t.
struct convert_wchar_t
Support conversion into wchar_t from multibyte characters.
extern int unittest_string_convertwchar(void)
Tests convert_wchar_t.
size_t len
Stores size (in bytes) of the unconverted mb character sequence.
const char * next
Points to unconverted mb character sequence of len bytes.
mbstate_t internal_state
Internal state used in conversion.
#define convert_wchar_INIT(
   string_len,
   string
) { .len = (string_len), .next = (string) }
Static initializer, same as init_convertwchar.
extern int init_convertwchar(/*out*/convert_wchar_t *conv,
size_t input_len,
const char *input_string)
Inits convert_wchar_t with a pointer to a mbs string.
#define convert_wchar_FREE convert_wchar_INIT(0,
)
Static initializer.
extern int initcopy_convertwchar(/*out*/convert_wchar_t * restrict dest,
const convert_wchar_t * restrict source)
Copies state from source to dest.
extern int free_convertwchar(convert_wchar_t *conv)
Frees resources associated with convert_wchar_t.
extern const char * currentpos_convertwchar(convert_wchar_t *conv)
Returns next character position of multibyte string where conversion starts.
extern int next_convertwchar(convert_wchar_t *conv,
wchar_t *next_wchar)
Converts next complete mb character sequence into a wide character.
extern int skip_convertwchar(convert_wchar_t *conv,
size_t char_count)
Skips the next char_count characters.
extern int peek_convertwchar(const convert_wchar_t *conv,
size_t char_count,
wchar_t *wchar_array)
Converts the next char_count characters without modifying conv.
#define free_convertwchar(conv) (0)
Implements convert_wchar_t.free_convertwchar as a no op.
#define currentpos_convertwchar(conv) ((conv)->next)
Implements convert_wchar_t.currentpos_convertwchar.
#define next_convertwchar(
   conv,
   next_wchar
) (__extension__ ({ size_t bytes = (size_t)((conv)->len ? mbrtowc( next_wchar, (conv)->next, (conv)->len, &(conv)->internal_state) : (unsigned)(*(next_wchar) = 0)) ; int _result_ ; if (bytes > (conv)->len) { _result_ = EILSEQ ; } else { (conv)->len -= bytes ; (conv)->next += bytes ; _result_ = 0 ; } _result_ ; }))
Implements convert_wchar_t.next_convertwchar.
Close