MDN Library

Function Overview
Module List
Details of Modules

Function Overview

The MDN library (libmdn) is a group of modules that provide various processing with respect to multilingual domain name conversion. This library provides the following features.

Encoding (code set) conversion
Normalization of character strings
Analysis and reassembly of DNS messages
Matching, removal and addition of ZLD (Zero Level Domain)
Local encoding identification
Loading of client configuration files

Encoding (code set) conversion

Converts character string encoding and returns the result. Inside the MDN library, character strings are all handled as UTF-8 encoding. This module supports the following conversions.

Conversion from certain encoding methods to UTF-8
Conversion from UTF-8 to certain encoding methods

Encoding is roughly divided into the following two types.

Encoding used by applications (shift JIS, EUC, etc.)
Special encoding designed to be used for multilingual domain names (UTF-5, RACE etc.)

This module uses the iconv() utility for the first encoding conversion process and implements a unique conversion function for the latter encoding conversion.

Normalization of character strings

Normalizes given character strings. The following standard normalization functions are supported.

Converts lowercase ASCII to uppercase ASCII
Converts uppercase ASCII to lowercase ASCII
Converts lowercase to uppercase as specified in UnicodeData.txt
Converts uppercase to lowercase as specified in UnicodeData.txt
Unicode Normalization Form C
Unicode Normalization Form KC
Converts single-byte Japanese katakana to double-byte katakana
Converts double-byte minus symbols to single-byte hyphens
Converts Japanese periods (。) and double-byte periods (．) to periods (.)

Analysis and assembly of DNS messages

In the DNS proxy server (dnsproxy), encoded domain names included in DNS messages sent from the client are converted and normalized and the result is sent to the DNS server. This process is comprised of the following functions:

Analyzes DNS messages and extracts domain names
Re-constructs DNS messages using converted domain names

Matching, removal and addition of ZLD (Zero Level Domain)

To identify multilingual domain names, the following ZLD-related functions are provided:

Of the multiple number of ZLDs, finds the one that matches the domain name
Removes the ZLD portion from the domain name
Adds ZLD to the domain name

Local encoding identification

Automatically identifies the local encoding (code set) used by the application program. Basically, the application locale information is used, though the local encoding (code set) can also be specified using an environmental variable.

Loading of client configuration file

When the resolver library linked by the application is used to perform conversion or normalization, the encoding and normalization method to be used is described in the configuration file. A function is provided to load this file.

Module list

The MDN library consists of the following modules.

brace module: Conversion module for the proposed BRACE encoding domain name encoding method
converter module: Conversion module for character string encoding (code set)
debug module: Utility module for debug output
dn module: Extraction/compression module for domain names inside DNS messages
lace module: Conversion module for the proposed LACE encoding domain name encoding method
localencoding module: Guesses which encoding is used by the application
log module: Controls MDN library log output processing
msgheader module: Analyzes the header of the DNS message
msgtrans module: Converts the DNS message at the DNS proxy server
normalizer module: Normalizes character strings
race module: Conversion module for the proposed RACE encoding domain name encoding method
res module: Conversion module for proposed RACE domain name encoding method
resconf module: Provides an interface to perform encoding conversion or normalization of domain names by the resolver library or application
result module: Handles the result code returned by each library function
selectiveencode module: Finds domain names that include non-ASCII characters
strhash module: Implements a hash table that uses character strings as keys
translator module: Converts domain name according to the specified parameters
unicode module: Obtains various Unicode character properties
unormalize module: Performs standard normalization defined by Unicode
utf5 module: Loads the configuration file used by the resolver library or application during encoding conversion and normalization of domain names
utf8 module: Performs basic processing of UTF-8 encoding character strings
util module: Provides common functions used by other modules
ZLDrule module: Matches domain names and ZLD

The following diagram shows the invoking relationship of modules. debug and log modules called by most modules and util modules that store common functions are omitted in the diagram.

Details of Modules

The specifications of all modules included in MDN library are explained below. First, return values of functions commonly used by the modules are explained and then each module is discussed in detail.

Values returned by API functions

Almost all API functions of the MDN library return values of mdn_result_t, which is an enumeration type value. The values and their meanings are explained below.

mdn_success: Processing was successful.
mdn_notfound: The target of search processing could not be found.
mdn_invalid_encoding: Incorrect conversion of encoded input character string.
mdn_invalid_syntax: Incorrect file format.
mdn_invalid_name: Specified name is incorrect.
mdn_invalid_message: Entered DNS message is incorrect.
mdn_buffer_overflow: Insufficient buffer to store result.
mdn_noentry: Specified item does not exist.
mdn_nomemory: Memory allocation failed.
mdn_nofile: Specified file does not exist.
mdn_nomapping: Conversion could not be performed correctly because a character in the encoded character string (code set) does not exist in the target conversion character set.
mdn_context_required: Indicates that context information is required to correctly convert uppercase characters to lowercase characters.
mdn_failure: Indicates that an error occurred that does not fall into any of the above categories.

brace module

The brace module performs conversion between UTF-8 and the proposed BRACE encoding of multilingual domain names. This module is implemented as a low-order converter module, and is not directly called by the application. When converter module is requested in association with BRACE encoding conversion, this module is indirectly called.

This module provides the following API functions.

mdn__brace_open

mdn_result_t
mdn__brace_open(mdn_converter_t ctx, mdn_converter_dir_t dir)

Opens conversion context used for BRACE encoding. Actually, this does not do anything.

Always returns mdn_success.

mdn__brace_close

mdn_result_t
mdn__brace_close(mdn_converter_t ctx, mdn_converter_dir_t dir)

Closes conversion context used for BRACE encoding. Actually, this does not do anything.

Always returns mdn_success.

mdn__brace_convert

mdn_result_t
mdn__brace_convert(mdn_converter_t ctx, mdn_converter_dir_t dir,
	const char *from, char *to, size_t tolen)

Performs bidirectional conversion of BRACE and UTF-8 encoded character strings. The from input character string is converted and the result is written in the area specified by to and tolen. When dir is mdn_converter_l2u, BRACE strings are converted to UTF-8 encoding and when dir is mdn_converter_u2l, UTF-8 strings are converted to BRACE encoding.

One of the following values is returned:
mdn_success
mdn_buffer_overflow
mdn_invalid_encoding
mdn_nomemory

converter module

converter module converts character string encoding (code set). Because the MDN library uses UTF-8 character strings for internal processing, this module performs bidirectional conversion between the local encoding method and UTF-8.

Support is currently provided for the following encoding methods.

iconv()encoding support
The iconv() function provides general code set conversion functions and encoding support. The encoding methods supported by iconv() are implementation-dependent; in that regard, refer to the documentation included with iconv() for information on which encoding is actually available.
UTF-5
Proposed multilingual domain name encoding method. For details, refer to draft-jseng-utf5-01.txt
RACE
Proposed multilingual domain name encoding method. For details, refer to draft-ietf-idn-race-02.txt
BRACE
Proposed multilingual domain name encoding method. For details, refer to draft-ietf-idn-brace-00.txt
LACE
Proposed multilingual domain name encoding method. For details, refer to draft-ietf-idn-lace-00.txt

The converter module is specially designed for encoding conversion of domain names and is not suitable for general encoding conversion. For example, UTF-5, RACE, BRACE, and LACE encoding provide special handling of the delimiting periods used in domain names.

The converter module employs the "code conversion context" concept. When perform bidirectional conversion between a specific encoding method and UTF-8, first the code conversion context of that encoding is created. For actual code conversion, the encoding is not directly specified; instead this code conversion context is specified. The code conversion context is mdn_converter_t and is defined as the following opaque type.

typedef struct mdn_converter *mdn_converter_t;

This module provides the following API functions.

mdn_converter_initialize

mdn_result_t
mdn_converter_initialize(void)

Initializes the module. This function is always called before calling other API functions of this module.

One of the following values is returned:
mdn_success
mdn_nomemory

mdn_converter_create

mdn_result_t
mdn_converter_create(const char *name, mdn_converter_t *ctxp,
        int delayedopen)

Creates the code conversion context used for conversion between the local encoding specified by name and UTF-8, then initializes and stores it in the area specified by ctxp.

Currently provides UTF-5, RACE, BRACE, and LACE conversion functions. For encoding methods other than those listed above, conversion is performed using the iconv() utility provided with the system. In such a case, when this function is invoked iconv_open() is called. When delayedopen is true, calling of iconv_open() is delayed until the character string is actually converted.

In addition, mdn_converter_register can be also used to add new local encoding methods.

One of the following values is returned:
mdn_success
mdn_invalid_name
mdn_nomemory
mdn_failure

mdn_converter_destroy

void
mdn_converter_destroy(mdn_converter_t ctx)

Deletes the code conversion context created by mdn_converter_create and releases the allocated memory.

mdn_converter_convert

mdn_result_t
mdn_converter_convert(mdn_converter_t ctx,
	mdn_converter_dir_t dir, const char *from,
	char *to, size_t tolen)

Uses the code conversion context created by mdn_converter_create to perform code conversion of character strings and stores the result in to. tolen is the length of to. dir is used to specify the direction of conversion.

mdn_converter_l2u specifies the conversion method used to convert from local encoding to UTF-8.
mdn_converter_u2l the conversion method used to convert from UTF-8 to the local encoding method.

Unlike iconv(), when status-dependent encoding such as ISO-2022-JP is used, the status that is in effect when the function is called the first time is not maintained when this function is called the next time. Conversion starts from the initial status each time.

One of the following values is returned:
mdn_success
mdn_buffer_overflow
mdn_invalid_encoding
mdn_invalid_name
mdn_nomemory
mdn_failure

mdn_converter_localencoding

char *
mdn_converter_localencoding(mdn_converter_t ctx)

Returns the local encoding name of the code conversion context ctx.

mdn_converter_isasciicompatible

int
mdn_converter_isasciicompatible(mdn_converter_t ctx)

Returns whether or not the local encoding of the code conversion context ctxis ASCII-compatible. When the local encoding method is ASCII-compatible encoding, a value other than 0 is returned and if not, 1 is returned.

ASCII-compatible encoding consists of only alphenumeric characters and hyphens, meaning it is not possible to differentiate between domain names encoded using this encoding and standard ASCII domain names. Specifically, RACE encoding is of this type. These types of encoding are not generally used for local encoding by applications but are strong candidates for the encoding used to express domain names in the DNS protocol (because conventional DNS servers can be used without modification).

mdn_converter_addalias

mdn_result_t
mdn_converter_addalias(const char *alias_name, const char *real_name)

Used to register the alias alias_name for the encoding name real_name . Registered aliases can be specified in the name argument of mdn_converter_create.

One of the following values is returned:
mdn_success
mdn_nomemory

mdn_converter_aliasfile

mdn_result_t
mdn_converter_aliasfile(const char *path)

Loads the file specified by the path variable and registers the alias in accordance with the contents of the file.

The file path is a text file consisting of the following simple format.

     Alias   Formal name

In addition, comment lines begin with #.

One of the following values is returned:
mdn_success
mdn_nofile
mdn_invalid_syntax
mdn_nomemory

mdn_converter_resetalias

mdn_result_t
mdn_converter_resetalias(void)

Resets aliases registered using mdn_converter_addalias or mdn_converter_aliasfile to the initial default status (where no aliases are registered).

One of the following values is returned:
mdn_success
mdn_nomemory

mdn_converter_register

mdn_result_t
mdn_converter_register(const char *name,
	mdn_converter_openproc_t open,
	mdn_converter_closeproc_t close,
	mdn_converter_convertproc_t convert,
	int ascii_compatible)

Adds the encoding conversion function between the name local encoding method and UTF-8. open, close, and convert are used as pointers to processing functions such as conversion. 1 specifies ascii_compatible local encoding, 0 that local encoding is not ASCII compatible.

One of the following values is returned:
mdn_success and
mdn_nomemory

debug module

The debug module is a utility module for debug output.

This module provides the following API functions.

mdn_debug_hexstring

char *
mdn_debug_hexstring(const char *s, int maxbytes)

Returns a hexidecimal character string of s length. maxbytes indicates the maximum length expressed and when s exceeds that length, ... is appended to the string at that point.

The memory area allocated for the returned character string is used for the static variable held by this function and is in effect until the function is called the next time.

mdn_debug_xstring

char *
mdn_debug_xstring(const char *s, int maxbytes)

Of the s character strings, returns in \x{HH} format those character strings 128 bytes or larger. maxbytes indicates the maximum length expressed and when s exceeds this, ... is appended to the string at that point.

The memory area allocated for the returned character string is used for the static variable held by this function and is in effect until the function is called the next time.

mdn_debug_hexdata

char *
mdn_debug_hexdata(const char *s, int length, int maxlength)

Returns the length of byte row s in hexadecimal character strings. maxbytes indicates the maximum length expressed and when length exceeds this, ... is appended to the string at that point.

The memory area allocated for the returned character string is used for the static variable held by this function and is in effect until the function is called the next time.

mdn_debug_hexdump

void
mdn_debug_hexdump(const char *s, int length)

The standard error output is comprised of a hexidecimal dump of length of byte row s.

dn module

The dn module expands or compresses domain names in DNS messages. This provides the functional equivalent of res_comp and res_expand in the resolver library.

This module was designed under the assumption that it would only used by only other modules in the libary.

When a domain name is compressed, context information of type mdn__dn_t is used, as shown below:

#define MDN_DN_NPTRS	64
typedef struct {
	const unsigned char *msg;
	int cur;
	int offset[MDN_DN_NPTRS];
} mdn__dn_t;

This module provides the following API functions.

mdn__dn_expand

mdn_result_t
mdn__dn_expand(const char *msg, size_t msglen,
	const char *compressed, char *expanded,
	size_t buflen, size_t *complenp)

Expands the compressed domain name in DNS message msg of length msglen and stores the result in expanded. buflen is the size of expanded. Also, the length of compressed is stored in *complenp.

One of the following values is returned:
mdn_success
mdn_buffer_overflow
mdn_invalid_message

mdn__dn_initcompress

void
mdn__dn_initcompress(mdn__dn_t *ctx, const char *msg)

Initializes context information ctx for domain name compression. This function must be called before calling mdn__dn_compress. msg is the leading address in a DNS message where the compressed domain name is stored.

mdn__dn_compress

mdn_result_t
mdn__dn_compress(const char *name, char *sptr, size_t length,
	mdn__dn_t *ctx, size_t *complenp)

Compresses the domain name indicated by name and stores it in the location indicated by sptr. length is the length of available space sptr. When compression is performed, the previously compressed domain name information in ctx is referenced. The length of the compressed domain name is placed in complenp and also the information necessary for compression is added to ctx.

One of the following values is returned:
mdn_success
mdn_buffer_overflow
mdn_invalid_name

lace module

The lace module performs conversion between UTF-8 and the proposed LACE multilingual domain name encoding method. This module is implemented as a low-order converter module, and is not directly called by the application. When the converter module is requested for conversion with LACE encoding, this module is indirectly called.

This module provides the following API functions.

mdn__lace_open

mdn_result_t
mdn__lace_open(mdn_converter_t ctx, mdn_converter_dir_t dir)

Opens conversion context with LACE encoding. Actually, this does not do anything.

Always returns mdn_success.

mdn__lace_close

mdn_result_t
mdn__lace_close(mdn_converter_t ctx, mdn_converter_dir_t dir)

Closes conversion context with LACE encoding. Actually, this does not do anything.

Always returns mdn_success.

mdn__lace_convert

mdn_result_t
mdn__lace_convert(mdn_converter_t ctx, mdn_converter_dir_t dir,
	const char *from, char *to, size_t tolen)

Provides bidirectional conversion between LACE character strings and UTF-8 character strings. The from input character string is converted and the result is written in the area specified by to and tolen. When dir is mdn_converter_l2u, LACE encoding is converted to UTF-8 encoding. When it is mdn_converter_u2l, UTF-8 encoding is converted to LACE encoding.

One of the following values is returned:
mdn_success
mdn_buffer_overflow
mdn_invalid_encoding
mdn_nomemory

localencoding module

The localencoding module uses locale information to guess the encoding used by the application.

This module provides the following API functions.

mdn_localencoding_name

const char *
mdn_localencoding_name(void)

Guesses the type of encoding used by the application (the name passed to mdn_converter_create())and returns it based on the current locale information.

To guess the type of encoding, nl_langinfo() is used if it is available in the the system and if not, setlocale() or environmental variable information is used. In the latter case, the corrent encoding name may not be obtained.

When MDN_LOCAL_CODESET environmental variable is defined in order to deal with situations in which the correct encoding cannot be guessed from the locale information or the application is operating using different encoding than that the locale, this module returns the value of that variable as the encoding name regardless of the application locale.

log module

log module controls MDN library log output. A standard error output log is written by default. It can, however, be changed to another output method by registering the handler.

The log level can be set as well. The following five log levels are defined.

enum {
	mdn_log_level_fatal   = 0,
	mdn_log_level_error   = 1,
	mdn_log_level_warning = 2,
	mdn_log_level_info    = 3,
	mdn_log_level_trace   = 4,
	mdn_log_level_dump    = 5
};

This module provides the following API functions.

mdn_log_fatal

void
mdn_log_fatal(const char *fmt, ...)

Outputs a fatal level log. This level is used when a fatal error occurs that causes problems such as when program execution cannot be performed. Arguments are specified using the same format as printf.

mdn_log_error

void
mdn_log_error(const char *fmt, ...)

Outputs the error level log. This level is used when an error occurs that is not fatal. Arguments are specified using the same format as printf.

mdn_log_warning

void
mdn_log_warning(const char *fmt, ...)

Outputs a warning level log. This level is used to display a warning message. Arguments are specified using the same format as printf.

mdn_log_info

void
mdn_log_info(const char *fmt, ...)

Outputs info level log. This level is not used for errors but instead to output other potentially useful information. Arguments are specified using the same format as printf.

mdn_log_trace

void
mdn_log_trace(const char *fmt, ...)

Outputs the trace level log. This level is used to output API function trace information. Generally, this log does not need to be recorded for purposes other than debugging the library. The arguments are specified using the same format as printf.

mdn_log_dump

void
mdn_log_dump(const char *fmt, ...)

Outputs the dump level log. This level is used to output additional packet data dump for debugging. Generally, this level of log does not need to be recorded for purposes other than debugging the library. The arguments are specified using the same format as for printf.

mdn_log_setlevel

void
mdn_log_setlevel(int level)

Sets the level of log output. Logs higher than the set level are not output. When the log level is not specified with this function, the integer value set to the MDN_LOG_LEVEL environmental variable is used.

mdn_log_getlevel

int
mdn_log_getlevel(void)

Obtains and returns the integer value for the current level of log output.

mdn_log_setproc

void
mdn_log_setproc(mdn_log_proc_t proc)

Used to set the log output handler. proc is a pointer to the handler function. When the handler is not specified or NULL is specified for proc, a standard error log is output.

The mdn_log_proc_t handler type is defined as follows.

typedef void  (*mdn_log_proc_t)(int level, const char *msg);

The log level is passed to level and the message character string that should be displayed is passed to msg.

msgheader module

msgheader module analyses and assembles the DNS message header.

Analyzed header information is placed in the following structure. Since each field corresponds to a field of DNS message header, the explanation is omitted here.

typedef struct mdn_msgheader {
	unsigned int id;
	int qr;
	int opcode;
	int flags;
	int rcode;
	unsigned int qdcount;
	unsigned int ancount;
	unsigned int nscount;
	unsigned int arcount;
} mdn_msgheader_t;

This module provides the following API functions.

mdn_msgheader_parse

mdn_result_t
mdn_msgheader_parse(const char *msg, size_t msglen,
	mdn_msgheader_t *parsed)

Analyses the DNS message headers indicated by msg and msglen and stores the information in the structure indicated by parsed.

One of the following values is returned:
mdn_success
mdn_invalid_message

mdn_msgheader_unparse

mdn_result_t
mdn_msgheader_unparse(mdn_msgheader_t *parsed,
	char *msg, size_t msglen)

This function performs reverse processing of mdn_msgheader_parse, in which the DNS message header is structured from the structure data specified by parsed, after which it is stored in the area specified by msg and msglen.

One of the following values is returned:
mdn_success
mdn_buffer_overflow

mdn_msgheader_getid

unsigned int
mdn_msgheader_getid(const char *msg)

Extracts the ID from the DNS message specified by msg and returns it. This function is only useful for extracting the ID without analyzing the entire header. Since this function assumes the data indicated by msg is longer than the DNS message header length, always call the function after confirmation at the calling side.

mdn_msgheader_setid

void
mdn_msgheader_setid(char *msg, unsigned int id)

Sets the ID specified by id in the DNS message specified by msg. Since this function also assumes that the data indicated by msg is longer than the DNS message header length, always call the function after confirmation at the calling side.

msgtrans module

The msgtrans module provides a large portion of DNS message conversion processing performed by the DNS proxy server. This module is implemented as a high-order module for many other modules including the converter module and normalizer module.

Message conversion processing by the DNS proxy server is briefly explained below.

Conversion of a message from a client to the DNS server is as follows.

Request message received from client is analyzed and ZLD and encoding at the client side are determined.
Based on the determination result, ZLD are removed from domain names and encoding is converted to UTF-8.
Normalization processing is performed.
The encoding is converted from UTF-8 to the encoding method used by the DNS server side and ZLD are added.
The above processing is performed on all domain names included in the message and the conversion results are collectively placed in the DNS message format and then sent to the DNS server.

Conversion of messages from the DNS server to the client is as follows.

The reply message received from the DNS server is analyzed and removal of ZLD and conversion to UTF-8 encoding are performed on all domain names included in the message.
Encoding is converted to the client side encoding and ZLD are added.
The conversion results are collectively placed in the DNS message format and then sent to the client.

As explained above, various parameters with respect to ZLD at the client/server side and encoding are necessary for DNS message conversion. When specifying those parameters for API functions, it is troublesome to specify them using different arguments for various functions. To avoid this, the following structure can be used to pass the parameters collectively.

typedef struct mdn_msgtrans_param {
	int use_local_rule;
	mdn_zldrule_t local_rule;
	mdn_converter_t local_converter;
	mdn_converter_t local_alt_converter;
	char *local_zld;
	mdn_converter_t target_converter;
	mdn_converter_t target_alt_converter;
	char *target_zld;
	mdn_normalizer_t normalizer;
} mdn_msgtrans_param_t;

use_local_rule specifies the ZLD and encoding determination method fr the message at the input side.

When the value is true, matching processing is performed on the ZLD and encoding as specified by local_rule and the domain names included in the message, and the matches are used when converting the request message from the client to the DNS server; in this case, the judgement results are assigned to local_converter and local_zld.

On the other hand, if local_rule is false, the ZLD and encoding specified by local_converter and local_zld are used as is when converting the request message sent from DNS server to the client; in this case, the value of local_rule is not used. Regardless of the value of use_local_rule, local_alt_converter defines the alternate encoding method used to encode the message at the input side. When there is no alternate encoding, NULL is specified.

target_converter and target_zld are used to specify the output side encoding and ZLD. target_alt_converter is alternately used with target_converter when the conversion by target_converter to the output side encoding fails because the domain name to be converted includes characters that do not exist in the output side character set. Note that the encoding corresponding to local_alt_converter and target_alt_converter must be ASCII-compatible encoding, respectively.

normalizer specifies normalization method.

This module provides the following API functions.

mdn_msgtrans_translate

mdn_result_t
mdn_msgtrans_translate(mdn_msgtrans_param_t *param,
	const char *msg, size_t msglen,
	char *outbuf, size_t outbufsize,
	size_t *outmsglenp)

Converts the DNS messages specified by msg and msglen according to the conversion parameter param and stores the result in the area indicated by outbuf and outbufsize. The message length of the conversion result is stored in outmsglenp.

One of the following values is returned:
mdn_success
mdn_invalid_message
mdn_invalid_encoding
mdn_buffer_overflow

normalizer module

normalizer module normalizes character string. The following normalization methods are currently provided. In addition, API used to additionally register new normalization method is provided.

ascii-uppercase
Converts ASCII lowercase to uppercase
ascii-lowercase
Converts ASCII uppercase to lowercase
unicode-uppercase
Converts lowercase to uppercase in accordance with the lowercase/uppercase mapping described in Case Mappings that prescribes character properties of Unicode.
unicode-lowercase
Converts uppercase to lowercase in accordance with the same above document.
unicode-form-c
Normalizes characters in accordance with Normaliztion form C described in Unicode Normalization Forms that prescribes normalization method of Unicode.
unicode-form-kc
Normalizes characters in accordance with Unicode Normalization Form KC described in the above same document.
ja-kana-fullwidth
Converts Japanese single-byte katakana to double-byte katakana.
ja-fullwidth
Same as ja-kana-fullwidth. This is kept for compatibility with the previous version and may be eliminated in the future version. Use ja-kana-fullwidth.
ja-alnum-halfwidth
Converts Japanese double-byte alphanumeric characters and double-byte minus symbol to single-byte characters
ja-compose-voiced-sound
Converts Japanese double-byte katakana and following voiced consonant mark (゛) and circle attached to certain katakana (゜) to one katakana character attached with voiced consonant mark or circle.
ja-minus-hack
Converts Japanese double-byte minus symbol (－) to hyphen(-).
ja-delimiter-hack
Converts Japanese period (。) and double-byte period (．) to period (.).

The last ja-delimiter-hack is to assume Japanese period and double-byte period as the period that is the separator of domain name. This is originally provided to reduce steps or mistakes when user enters multilingual domain names. However, depending on browser, there are problems that domain names without period are recognized as keyword not domain name and also this method is supposed to exceed the scope of normalization of domain names, therefore, use of this normalization method should be avoided as much as possible.

More than one normalization methods can be used and they are applied in the order they were specified.

normalizer module uses the concept "normalization context". Prior to normalization, a normalization context is created and the normalization method to be used is registered in the context. For actual normalization procesesing, not the normalization method but this normalization context is specified. The type of normalization context is mdn_normalizer_t type and defined as the following opaque type.

typedef struct mdn_normalizer *mdn_normalizer_t;

This module provides the following API functions.

mdn_normalizer_initialize

mdn_result_t
mdn_normalizer_initialize(void)

Initializes module. Make sure to call this function before calling other API function of this module.

One of the following values is returned:
mdn_success
mdn_nomemory

mdn_normalizer_create

mdn_result_t
mdn_normalizer_create(mdn_normalizer_t *ctxp)

Creates an empty content for normalization and stores it in the area specified by ctxp. The returned content is empty and does not contain any normalization methods. To add normalization method, mdn_normalizer_add is used.

One of the following values is returned:
mdn_success
mdn_nomemory

mdn_normalizer_destroy

void
mdn_normalizer_destroy(mdn_normalizer_t ctx)

Deletes the normalization context created by mdn_normalizer_create and releases the allocated memory.

mdn_normalizer_add

mdn_result_t
mdn_normalizer_add(mdn_normalizer_t ctx, const char *scheme_name)

Adds the normalization method specified by scheme_name in the normalization context created by mdn_normalizer_create. More than one normalization methods can be specified in one context.

One of the following values is returned:
mdn_success
mdn_invalid_name
mdn_nomemory

mdn_normalizer_normalize

mdn_result_t
mdn_normalizer_normalize(mdn_normalizer_t ctx,
	const char *from, char *to, size_t tolen)

Applies the normalization method specified by ctx to the character strings encoded by UTF-8 from and writes the result in the area specified by to and tolen. When more than one normalization method is included in ctx, they are applied in the order they were added by mdn_normalizer_add.

One of the following values is returned:
mdn_success
mdn_invalid_encoding
mdn_nomemory

mdn_normalizer_register

mdn_result_t
mdn_normalizer_register(const char *scheme_name,
	mdn_normalizer_proc_t proc)

New normalization methods are registered in scheme_name.proc is a pointer to the processing function of that normalization method.

One of the following values is returned:
mdn_success
mdn_nomemory

race module

The race module performs conversion between UTF-8 and the proposed RACE multilingual domain name method. This module is implemented as a low-order module of converter module and is not directly called by the application. When converter module is requested for conversion with RACE encoding, this module is indirectly called.

This module provides the following API functions.

mdn__race_open

mdn_result_t
mdn__race_open(mdn_converter_t ctx, mdn_converter_dir_t dir)

Opens conversion context with RACE encoding. Actually, this does not do anything.

Always returns mdn_success.

mdn__race_close

mdn_result_t
mdn__race_close(mdn_converter_t ctx, mdn_converter_dir_t dir)

Closes conversion context with RACE encoding. Actually, this does not do anything.

Always returns mdn_success.

mdn__race_convert

mdn_result_t
mdn__race_convert(mdn_converter_t ctx, mdn_converter_dir_t dir,
	const char *from, char *to, size_t tolen)

Performs bidirectional conversion between RACE-encoded and UTF-8 encoded character strings. Converts the from input character string and writes the result in the area specified by to and tolen. When dir is mdn_converter_l2u, RACE encoding is converted to UTF-8 encoding. When it is mdn_converter_u2l, UTF-8 encoding is converted to RACE encoding.

One of the following values is returned:
mdn_success
mdn_buffer_overflow
mdn_invalid_encoding
mdn_nomemory

res module

The res module provides high level APIs used when multilingual domain names are processed at the client side (by the resolver library or an application) i.e. when domain name encoding conversion or normalization is performed. This module is designed on the assumption that it will be used together with resconf module, which is explained below.

Using APIs provided by the module, it is not necessary to directly call converter module or normalizer module function.

This module provides the following API functions.

mdn_res_localtoucs

mdn_result_t
mdn_res_localtoucs(mdn_resconf_t conf, const char *local_name,
	char *ucs_name, size_t ucs_name_len)

Converts local_name domain name character strings expressed in the local encoding used by the application to UTF-8 and stores the result in ucs_name. ucs_name_len is used to specify the size of the area secured for ucs_name beforehand.

conf is the client configuration context returned by resconf module. When conf is NULL, conversion is not performed and the contents of local_name is copied to ucs_name as is.

Conversion from local encoding to UTF-8 is performed when the local_name domain name is a valid conventional ASCII domain name (that is, it consists of alphanumeric characters, hyphens and periods), alternate encoding is set in the client configuraiton context conf, and conversion to UTF-8 from the alternate encoding is attempted and failed before conversion of the local encoding is performed. Because of this, even if mdn_res_ucstolocal could not convert the given domain name to the local encoding and converted to the alternate encoding, if it is given to this function, the correct UTF-8 encoded domain name can be obtained.

One of the following values is returned:
mdn_success
mdn_buffer_overflow
mdn_invalid_name
mdn_failure

mdn_res_ucstolocal

mdn_result_t
mdn_res_ucstolocal(mdn_resconf_t conf, const char *ucs_name,
	char *local_name, size_t local_name_len)

Performs reverse conversion of mdn_res_localtoucs, i.e., converts the ucs_name domain name character string expressed in UTF-8 to the local encoding used by the application and stores the result in local_name. local_name_len is used to specify the size of the area secured for local_name beforehand.

conf is the client configuration context returned by resconf module. When conf is NULL, conversion is not performed and the contents of local_name is copied in ucs_name as is.

When conversion fails because a character that is not in the character set of the local encoding is contained in the ucs_name domain name, if the alternate encoding is set in the client configuration context conf, conversion to the alternate encoding is performed instead of to the local encoding. Because of this, even if the DNS server returns a domain name that includes a character that is not included in the local encoding, conversion is performed without error. Note that character strings converted to the alternate encoding can be returned to UTF-8 character strings by mdn_res_localtoucs.

One of the following values is returned:
mdn_success
mdn_buffer_overflow
mdn_invalid_name
mdn_failure

mdn_res_normalize

mdn_result_t
mdn_res_normalize(mdn_resconf_t conf, const char *name,
                  char *normalized_name, size_t normalized_name_len)

Executes normalization on the name domain name expressed in UTF-8 according to the client configuration context conf and stores the result in normalized_name. normalized_name_len is used to specify the size of the area secured for normalized_name beforehand.

When conf is NULL, normalization is not performed and the contents of name is copied in normalized_name as is.

One of the following values is returned:
mdn_success
mdn_buffer_overflow
mdn_invalid_encoding
mdn_nomemory

mdn_res_ucstodns

mdn_result_t
mdn_res_ucstodns(mdn_resconf_t conf, const char *ucs_name, char *dns_name,
	size_t dns_name_len)

Converts the ucs_name domain name expressed in UTF-8 to the encoding used in the DNS protocol per the conf client configuration context and stores the result in dns_name. dns_name_len is used to specify the size of the area secured for dns_name_len beforehand.

When conf is NULL, conversion is not performed and the contents of ucs_name are copied to dns_name as is.

One of the following values is returned:
mdn_success
mdn_buffer_overflow
mdn_invalid_encoding
mdn_invalid_name
mdn_failure

mdn_res_dnstoucs

mdn_result_t
mdn_res_dnstoucs(mdn_resconf_t conf, const char *dns_name, char *ucs_name,
	size_t ucs_name_len)

Performs reverse conversion of mdn_res_ucstodns, i.e., converts the dns_name domain name expressed in the encoding used in the DNS protocol to UTF-8 per the conf client configuration context and stores the result in ucs_name. ucs_name_len is used to specify the size of the area secured for ucs_name_len beforehand.

When conf is NULL, conversion is not peformed and the contents of dns_name are copied to ucs_name as is.

One of the following values is returned:
mdn_success
mdn_buffer_overflow
mdn_invalid_encoding
mdn_invalid_name
mdn_failure

resconf module

The resconf module loads the client configuration file referenced when a multilingual domain name is processed at the client side (by a resolver library or application) and executes initialization in accordance with the settings described in the file. It also provides a function to extract the setting information.

The resconf module uses the "client configuration context" concept. Settings described in the client configuration file are stored in this client configuration context, which is used as an argument to call API functions to extract the set values. The client configuration context is defined by mdn_resconf_t and is of the following opaque type.

typedef struct mdn_resconf *mdn_resconf_t;

This module can be used as a single module but it is designed so that by combining it with res module multilingual domain names can easily be processed at the client side.

This module provides the following API functions.

mdn_resconf_initialize

mdn_result_t
mdn_resconf_initialize(void)

Executes initialization required when processing multilingual domain names. Always call this function before calling other API functions of this module. Since this function initializes all other modules used by this module, it is not necessary to call another initialization function.

One of the following values is returned:
mdn_success
mdn_nomemory

mdn_resconf_create

mdn_result_t
mdn_resconf_create(mdn_resconf_t *ctxp)

Creates and initializes client configuration context and stores it in the area indicated by ctxp. In the initial status, the contents of the client configuration file are not loaded. To do so, mdn_resconf_loadfile must be executed.

One of the following values is returned:
mdn_success
mdn_nomemory

mdn_resconf_destroy

void
mdn_resconf_destroy(mdn_resconf_t ctx)

Deletes the client configuration context created by mdn_resconf_create and releases the allocated memory.

mdn_resconf_loadfile

mdn_result_t
mdn_resconf_loadfile(mdn_resconf_t ctx, const char *file)

Loads the contents of the client configuration file specified by file and stores the setting contents in the ctx client configuration context. When file is NULL, the contents of the default client configuration file are loaded.

If the configuration file has already been loaded and another configuration file is loaded, the previous configuration file contents stored in the client configuration context are erased and replaced with the newly loaded configuration file contents.

One of the following values is returned:
mdn_success
mdn_nofile
mdn_invalid_syntax
mdn_invalid_name
mdn_nomemory

mdn_resconf_defaultfile

char *
mdn_resconf_defaultfile(void)

Returns the path to the default client configuration file. This is determined by the settings set when mDNkit is compiled. The default path is as follows:

/usr/local/etc/mdnres.conf

mdn_resconf_localconverter

mdn_converter_t
mdn_resconf_localconverter(mdn_resconf_t ctx)

Based on the ctx client configuration context information, returns the code conversion context to perform character code conversion between the local encoding and UTF-8. NULL is returned if the local encoding cannot be determined.

For details of code conversion context, refer to the converter module section.

mdn_resconf_alternateconverter

mdn_converter_t
mdn_resconf_alternateconverter(mdn_resconf_t ctx)

Based on the ctx client configuration context information, returns the code conversion context to perform character code conversion between the alternate encoding and UTF-8. The alternate encoding is used instead of the local encoding when a domain name could not be converted to the local encoding. NULL is returned if the client configuration file has not been loaded or the encoding method is not specified in the configuration file.

For code conversion context, refer to converter module section.

mdn_resconf_serverconverter

mdn_converter_t
mdn_resconf_serverconverter(mdn_resconf_t ctx)

Based on the information of client configuration context ctx, returns the code conversion context to perform character code conversion between the encoding used on DNS protocol and UTF-8. NULL is returned if the client configuration file has not been loaded or the encoding method is not specified in the configuration file.

For code conversion context, refer to converter module section.

mdn_resconf_zld

const char *
mdn_resconf_zld(mdn_resconf_t ctx)

Based on the information in the ctx client configuration context, returns the ZLD character string used together with some encoding methods to differentiate between multilingual domain names and conventional domain names. NULL is returned when ZLD is not used.

By default, mDNkit does not support ZLD and this function always returns NULL. For details of how to set mKNkit to support ZLD, refer to configure execute in the mDNkit Installation Guide.

mdn_resconf_normalizer

mdn_normalizer_t
mdn_resconf_normalizer(mdn_resconf_t ctx)

Based on the information of client configuration context ctx, returns the normalization context used to normalize domain names. NULL is returned if the client configuration file has not been loaded or the normalization method is not specified in the configuration file.

For details of normalization context, refer to the normalizer module section.

result module

The result module handles the mdn_result_t type value returned by each function in the library and converts the value to the corresponding message code.

This module provides the following API functions.

char *
mdn_result_tostring(mdn_result_t result)

Returns the message character string corresponding to the value result of mdn_result_t type.

An unknown result code character string is returned for undefined code.

selectiveencode module

The selectiveencode module finds domain names that include non-ASCII characters in text such as zone master files. Generally speaking it is of course impossible to determine which part of the text is the domain name; in actuality, however, the following rough assumptions are used to approximately implement it.

Non-ASCII characters appear only in domain names.

Specifically, the following algorithm is used to detect the domain name area.

Scans the text and finds non-ASCII characters.
Check characters before and after found non-ASCII characters to determine a range consisting of only the found character and also other non-ASCII characters or characters that can be used for conventional (not internationalized) domain names.
Returns the found range as the domain name.

This module provides the following API functions.

mdn_selectiveencode_findregion

mdn_result_t
mdn_selectiveencode_findregion(const char *s,
	char **startp, char **endp)

Scans s UTF-8 encoded character strings and finds the area in the domain that includes the first appearance of a non-ASCII character, then stores a pointer indicating the beginning of the area at startp and a pointer indicating the end of the area in endp.

One of the following values is returned:
mdn_success
mdn_notfound

strhash module

The strhash module implements a hash table that uses a character string as a key. The hash table is used by other modules in the library such as the converter module and normalizer module. This is a very general hash table implementation in which registration can be performed but there is no deletion function because it is not needed with this library.

The size of the hash table increases as the total numer of elements increases.

As shown below, the hash table is expressed in opaque data of mdn_strhash_t type.

typedef struct mdn_strhash *mdn_strhash_t;

This module provides the following API functions.

mdn_strhash_create

mdn_result_t
mdn_strhash_create(mdn_strhash_t *hashp)

Creates an empty hash table and stores the handle to the area indicated by hashp.

One of the following values is returned:
mdn_success
mdn_nomemory

mdn_strhash_destroy

void
mdn_strhash_destroy(mdn_strhash_t hash)

Deletes the hash table created by mdn_strhash_create and releases the allocated memory.

mdn_strhash_put

mdn_result_t
mdn_strhash_put(mdn_strhash_t hash, const char *key,
	void *value)

Used to register a key and value set in the hash table created by mdn_strhash_create. Since character strings are copied, there is no influence even if the memory indicated by key is released or the contents of the character strings are changed after this function is called. Contrarily, the contents of value are not copied, so use care when working with this item. (If you think carefully about it, it will become obvious that this value is not copied.)

When the same key is used for registration more than once, only the most recently registered key is effective.

One of the following values is returned:
mdn_success
mdn_nomemory

mdn_strhash_get

mdn_result_t
mdn_strhash_get(mdn_strhash_t hash,
	const char *key, void **valuep)

Searches for elements that have key in the hash table; if a corresponding element is found, the value is stored in valuep.

One of the following values is returned:
mdn_success
mdn_noentry

mdn_strhash_exists

int
mdn_strhash_exists(mdn_strhash_t hash, const char *key)

Returns 1 if there is an element that has the key in the hash table, and returns 0 if no element is found.

translator module

The translator module converts domain names in accordance with the given parameters. Values can provided for the following parameters.

Encoding (local encoding) of domain name passed as input
Alternate encoding of domain name passed as input (local alternate encoding)
ZLD (local ZLD) of domain name passed as input
Normalization method
Encoding after conversion of domain name (target encoding)
Encoding used when conversion to target encoding failed (target alternate encoding)
ZLD after conversion of domain name (target ZLD)

The domain name conversion procedure is complicated, for the following reasons:

Multilingual domain names are not always passed as the domain name, and it is possible that conventional ASCII domain names may be passed and processing thus must be changed accordingly.
With regard to ASCII-compatible encoding, differentiating between multilingual domain names and conventional ASCII domain names is not simple and ZLD, etc. need to be referenced.
Domain names may be passed that are encoded using local alternate encoding.
If conversion to the target encoding fails, alternate encoding must be used instead.

Specifically, the following algorithm is used for conversion.

Checks if the local ZLD is defined (empty or not).
If it is defined, checks whether or not the passed domain name matches.
If it matches, the domain name is determined to be a multilingual doman name and ZLD is removed from the domain, then processing proceeds to code conversion processing (Step 6).
When the local ZLD is not defined, checks whether or not the local encoding is ASCII-conpatible and also that the passed domain name is a valid conventional ASCII domain name.
When ASCII compatible encoding is used or the passed domain name includes an invalid character in a conventional ASCII domain name, the passed domain name is assumed to be a multilingual domain name and the procedure proceeds to code conversion processing (Step 6).
For situations other than the above, the passed domain name is assumed to be a conventional ASCII-domain name and is copied as is, then processing ends.
When the local alternate encoding is defined, the code is first converted from the local alternate encoding to UTF-8. If this is successful, processing proceeds to normalization processing (Step 8).
Executes code conversion from local encoding to UTF-8.
Executes normalization processing.
Executes code conversion from UTF-8 to the target encoding.
If the conversion fails because there is a character in the domain name that is not included in the target character set, code conversion from UTF-8 to the alternate encoding is executed instead.
When the target ZLD is defined, it is added to the domain name.

The following flow chart explains the above procedure.

This module uses the converter module for encoding and the normalizer module for normalization.

This module provides the following API functions.

mdn_translator_translate

mdn_result_t
mdn_translator_translate(mdn_converter_t local_converter,
	mdn_converter_t local_alternate_converter,
	const char *local_zld,
	mdn_normalizer_t normalizer,
	mdn_converter_t target_converter,
	mdn_converter_t target_alternate_converter,
	const char *target_zld,
	const char *from, char *to, size_t tolen)

Local encoding, local alternate encoding, target encoding and target alternate encoding are not the actual names of types of encoding and are specified by the corresponding code conversion context, which is defined by the following converter module variables: local_converter,alternate_converter and target_converter.

The target_alternate_converter variable is used instead of the target encoding if conversion to the target encoding by target_converter fails because the domain name contains a character that is not included in the target character set.

Normalization is specified by normalization context defined by the normalizer variable of the normalizer module.

The local ZLD and target ZLD must have been converted to the standard format by mdn_translator_canonicalZLD.

One of the following values is returned:
mdn_success
mdn_buffer_overflow
mdn_invalid_encoding
mdn_nomemory

mdn_translator_canonicalzld

mdn_translator_canonicalZLD

mdn_result_t
mdn_translator_canonicalzld(const char *zld,
mdn_translator_canonicalZLD(const char *ZLD,
	char **canonicalizedp)

Converts ZLD ZLD to the standard format and stores a pointer in the area specified by canonicalizedp. Since the area for the converted character string (*canonicalizedp) is secured by malloc(), release it when it is no longer needed.

The standard format for ZLD mentioned is as follows:

The standard format for empty ZLD ("" or ".") is NULL
The period is removed when the beginning is a period (.)
A period is added when the ending is not a period (.)
Lowercase characters are all converted to uppercase characters

One of the following values is returned:
mdn_success
mdn_nomemory

mdn_translator_matchzld

mdn_translator_matchZLD

int
mdn_translator_matchzld(const char *domain,
	const char *zld)
mdn_translator_matchZLD(const char *domain,
	const char *ZLD)

Checks whether or not the domain variable and ZLD ZLD match, and returns a 1 if they match and a 0 if not.

The ZLD must have been converted to the standard format by mdn_translator_canonicalZLD.

unicode module

The unicode module obtains various character properties of Unicode described in UnicodeData.txt. For details of the data described in Unicode.txt and the file format, refer to UnicodeData File Format.

Many modules in this library handle Unicode data as UTF-8 encoded character strings but this module handles Unicode data as unsigned long type data. Includes UCS-4 values.

This module provides a mutual conversion function between uppercase and lowercase Unicode characters. This is defined by Unicode Technical Report #21: Case Mappings. Among Unicode characters, a few characters require context information when uppercase is converted to lowercase. This is specified by the following enumeration type data.

typedef enum {
	mdn__unicode_context_unknown,
	mdn__unicode_context_final,
	mdn__unicode_context_nonfinal
} mdn__unicode_context_t;

When the context is FINAL, mdn__unicode_context_final is specified and when it is NON_FINAL, mdn__unicode_context_nonfinal is specified. mdn__unicode_context_unknown indicates that the context is unknown (has not yet been checked). For a detailed discussion of context information, refer to the above references.

This module provides the following API functions.

mdn__unicode_canonicalclass

int
mdn__unicode_canonicalclass(unsigned long c);

Obtains Canonical Combining Class for Unicode character c. 0 is returned for characters for which Canonical Combining Class is not defined.

mdn__unicode_decompose

mdn_result_t
mdn__unicode_decompose(int compat,
	unsigned long *v, size_t vlen,
	unsigned long c, int *decomp_lenp)

Decomposes Unicode characters c in accordance with Character Decomposition Mapping of UnicodeData.txt and writes the result in the area specified by v and vlen. When the value of compat is true, Compatibility Decomposition is performed and when false, Canonical Decomposition is performed. Decompose is performed recursively, i.e. each character resolved in accordance with Character Decomposition Mapping is further decomposed.

One of the following values is returned:
mdn_success
mdn_notfound
mdn_nomemory

mdn__unicode_compose

mdn_result_t
mdn__unicode_compose(unsigned long c1,
	unsigned long c2, unsigned long *compp)

Composes a sequence of the two Unicode characters c1 and c2 per the Character Decomposition Mapping in UnicodeData.txt and writes the result in the area specified by compp. Canonical Composition is always peformed.

One of the following values is returned:
mdn_success
mdn_notfound

mdn__unicode_iscompositecandidate

int
mdn__unicode_iscompositecandidate(unsigned long c)

As there are only a small number of Unicode characters that can begin Canonical Composition, this can be used for pre-screening of data in order to decrease the search overhead of mdn__unicode_compose.

mdn__unicode_toupper

mdn_result_t
mdn__unicode_toupper(unsigned long c, mdn__unicode_context_t ctx,
	unsigned long *v, size_t vlen, int *convlenp)

Converts Unicode characters c to uppercase in accordance with the Uppercase Mapping information in UnicodeData.txt and SpecialCasing.txt, and stores the result in the area specified by v. vlen is the size of the area that is secured for v beforehand. The number of characters in the conversion result is returned to *convlenp. Note that the conversion result may be greater than one character and that locale-dependent conversion is not performed.

ctx is context information where character c appears. Since most characters do not require context information when they are converted, usually mdn__unicode_context_unknown can be specified. When context information is necessary, this function returns mdn_context_required as the return value, and it is possible to call it again after obtaining the context information. To obtain context information, mdn__unicode_getcontext is used.

If no corresponding uppercase character exists, c is stored in v as is.

One of the following values is returned:
mdn_success
mdn_context_required
mdn_buffer_overflow

mdn__unicode_tolower

mdn_result_t
mdn__unicode_tolower(unsigned long c, mdn__unicode_context_t ctx,
	unsigned long *v, size_t vlen, int *convlenp)

Converts Unicode character c to lowercase in accordance with Uppercase Mapping information of UnicodeData.txt and SpecialCasing.txt information, and stores the result in the area specified by v. vlen is the size of area that is secured for v beforehand. The number of characters of the conversion result is returned to *convlenp.

If no corresponding uppercase character exists, c is stored in v as is.

One of the following values is returned:
mdn_success
mdn_context_required
mdn_buffer_overflow

mdn__unicode_getcontext

mdn__unicode_context_t
mdn__unicode_getcontext(unsigned long c)

Returns context information used for conversion of uppercase/lowercase characters. To obtain context information, first the character following the uppercase/lowercase character conversion target character is obtained and this function is called. If the return value is mdn__unicode_context_final or mdn__unicode_context_nonfinal, that context information is the context information to obtain. If mdn__unicode_context_unknown is returned, the next character is obtained and the function is called. In this way, processing continues until either the value of mdn__unicode_context_final or mdn__unicode_context_nonfinal is obtained. When processing reaches the end of the character string, mdn__unicode_context_final becomes the context.

Specifically, this function does the following. Refers "General Category" properties of Unicode character c and if it is "Lu", "Ll" or "Lt" mdn__unicode_context_nonfinal is returned, if it is "Mn" mdn__unicode_context_unknown is returned, and if it is other than the above, mdn__unicode_context_final is returned.

unormalize module

The unormalize module performs the standard normalization defined by Unicode. Normalization of Unicode is defined in Unicode Technical Report #15: Unicode Normalization Forms. This module implements the four normalization forms mentioned in this document.

This module provides the following API functions.

mdn__unormalize_formc

mdn_result_t
mdn__unormalize_formc(const char *from, char *to, size_t tolen)

Applies Unicode Normalization Form C normalization to a UTF-8 encoded from character string and writes the result in the area specified by to and tolen.

One of the following values is returned:
mdn_success
mdn_invalid_encoding
mdn_buffer_overflow
mdn_nomemory

mdn__unormalize_formd

mdn_result_t
mdn__unormalize_formd(const char *from, char *to, size_t tolen)

Applies Unicode Normalization Form D normalization to a UTF-8 encoded from character string and writes the result in the area specified by to and tolen.

One of the following values is returned:
mdn_success
mdn_invalid_encoding
mdn_buffer_overflow
mdn_nomemory

mdn__unormalize_formkc

mdn_result_t
mdn__unormalize_formkc(const char *from, char *to, size_t tolen)

Applies Unicode Normalization Form KC normalization to a UTF-8 encoded from character string and writes the result in the area specified by to and tolen.

One of the following values is returned:
mdn_success
mdn_invalid_encoding
mdn_buffer_overflow
mdn_nomemory

mdn__unormalize_formkd

mdn_result_t
mdn__unormalize_formkd(const char *from, char *to, size_t tolen)

Applies Unicode Normalization Form KC normalization to a UTF-8 encoded from character string and writes the result in the area specified by to and tolen.

One of the following values is returned:
mdn_success
mdn_invalid_encoding
mdn_buffer_overflow
mdn_nomemory

utf5 module

The utf5 module performs basic processing for the proposed UTF-5 domain name encoding system.

This module provides the following API functions.

mdn_utf5_getwc

int
mdn_utf5_getwc(const char *s, size_t len,
	unsigned long *vp)

Extracts the leading character of length len byte UTF-5 encoded character string s, converts it to UCS-4 and stores it in the area specified by vp and also returns the number of bytes in the (UTF-5 encoded) character strintg. 0 is returned if len is too short and ends in the middle of a character or the encoding is invalid.

mdn_utf5_putwc

int
mdn_utf5_putwc(char *s, size_t len, unsigned long v)

Converts UCS-4 characters v to UTF-5 encoding, writes them in the area specified by s and len and returns the number of bytes written. 0 is returned if len is too short to write.

The written UTF-5 character string is not terminated with a NULL character.

utf8 module

The utf8 module performs the basic processing of UTF-8 encoded character strings.

This module provides the following API functions.

mdn_utf8_mblen

int
mdn_utf8_mblen(const char *s)

Returns the length (number of bytes) of the leading character in the UTF-8 character string s. 0 is returned if the leading byte indicated by s is not valid for UTF-8.

This function returns the length by checking the leading byte of s; there is therefore a possibility of invalid byte in the 2nd and later byte. In particular, NULL bytes may exist in the middle, so you have to be careful when it is not certain that s is a valid UTF-8 character string.

mdn_utf8_getmb

int
mdn_utf8_getmb(const char *s, size_t len, char *buf)

Copies the leading character of s UTF-8 character strings of length len and returns the number of copied bytes.

buf must be large enough to hold any UTF-8 encoding, i.e. it must be 6 bytes or larger.

The written UTF-8 character string is not terminated with a NULL character.

mdn_utf8_getwc

int
mdn_utf8_getwc(const char *s, size_t len,
	unsigned long *vp)

This is almost same as mdn_utf8_getmb with the difference being that characters extracted from s are converted to UCS-4 and stored in the area indicated by vp.

mdn_utf8_putwc

int
mdn_utf8_putwc(char *s, size_t len, unsigned long v)

Converts UCS-4 character v to UTF-8 encoding, writes it in the area specified by s and len and returns the number of written bytes. 0 is returned when the value of v is invalid or len is too short.

The written UTF-8 character string is not terminated with a NULL character.

mdn_utf8_isvalidstring

int
mdn_utf8_isvalidstring(const char *s)

Checks whether or not character string s terminated with a NULL character is valid UTF-8 encoding and returns 1 if so and 0 if not.

mdn_utf8_findfirstbyte

char *
mdn_utf8_findfirstbyte(const char *s,
	const char *known_top)

In the character string known_top, checks the leading byte of UTF-8 characters including the byte indicated by s and returns it. NULL is returned if there are any incorrectly encoded UTF-8 characters or there is no leading byte between known_top and s.

util module

The util module provides utility type functions used by other modules. The only function currently provided is a character string collation function that does not differentiate between uppercase and lowercase characters.

This module provides the following API functions.

mdn_util_casematch

int
mdn_util_casematch(const char *s1, const char *s2, size_t n)

Compares the maximum n bytes from the beginning of character strings s1 and s2 and determines whether or not they are identical. Uppercase and lowercase ASCII characters (i.e. A to Z and a to z) are assumed to be the same. 1 is returned if they are found to be identical and 0 is returned if not. With the exception of the return value specifications, this function provides almost the same features as strcasencmp, which is provided in many systems.

ZLDrule module

The ZLDrule module matches the domain name and ZLD. It has a list of ZLDs that are probably used for domain names and the list of encodings corresponding to each ZLD, and performs matching with the given doman name, then returns the matched ZLD and encoding.

The ZLDrule module uses "context" concept for matching. Prior to matching, a context is created and ZLD and encoding are registered for the context. When domain name matching is performed, this context is used to specify the ZLD and encoding lists used for matching. The type of context is mdn_zldrule_t and is defined as the following opaque type.

typedef struct mdn_zldrule *mdn_zldrule_t;

This module provides the following API functions.

mdn_zldrule_create

mdn_result_t
mdn_zldrule_create(mdn_zldrule_t *ctxp)

Creates a context for ZLD matching and stores it in the area indicated by ctxp.

One of the following values is returned:
mdn_success
mdn_nomemory

mdn_zldrule_destroy

void
mdn_zldrule_destroy(mdn_zldrule_t ctx)

Deletes the context created by mdn_zldrule_create and releases the allocated memory.

mdn_zldrule_add

mdn_result_t
mdn_zldrule_add(mdn_zldrule_t ctx, const char *zld,
	const char **encodings, int nencodings)

Registers the ZLD and encoding list set specified by encodings and nencodings in context ctx created by mdn_zldrule_create.

Empty ZLDs such as "" or "." match all domain names, therefore, by specifying an empty value for ZLD the default encoding can be specified in those cases in which no ZLD match is found.

One of the following values is returned:
mdn_success
mdn_nomemory

mdn_zldrule_select

mdn_result_t
mdn_zldrule_select(mdn_zldrule_t ctx, const char *domain,
	char **zldp, mdn_converter_t *convctxp)

When a ZLD match is found, the pointer to the ZLD match is stored in the area specified by ZLDp. Since the return pointer has already been converted to the standard form by mdn_translator_canonicalZLD, it can be passed as the argument as is.

When only one encoding method corresponds to the matched ZLD, the code conversion context corresponding to that encoding method is stored in the area specified by convctxp. If there is more than one valid encoding method, a check is performed from the top of the list to determine whether or not domain is valid for the encoding. Of the valid encoding methods found, the code conversion context of the first one found is stored in the area specified by convctxp. If no valid encoding method is found, nothing is written in convctxp and mdn_invalid_encoding is returned.

When there is no ZLD match, mdn_notfound is returned and processing ends.

One of the following values is returned:
mdn_success
mdn_notfound
mdn_invalid_encoding