URL

Uniform Resource Locator

URLs are used to `locate’ resources, by providing an abstract identification of the resource location.

A generic URL consist of two main parts

<name-of-scheme> ':' <scheme-specific-part>

Common Internet Scheme Syntax

URL schemes that involve the direct use of an IP-based protocol to a specified host on the Internet use a common syntax for the scheme-specific data.  To indicate that a URL uses the common Internet scheme syntax the scheme-specific data starts with a double slash “//”.

Only URLs which use the common Internet scheme syntax are supported by this implementation

<scheme>'://'<user>':'<passwd>'@'<hostname>:<port>'/'<path>'?'<query>'#'<fragment>

The current implementation supports only the scheme ‘http’.

Summary
URLURLs are used to `locate’ resources, by providing an abstract identification of the resource location.
CopyrightThis program is free software.
Files
C-kern/api/io/url.hHeader file URL.
C-kern/io/url.cImplementation file URL impl.
Types
struct url_tExports url_t.
struct url_parts_tExport url_parts_t (array of 7 strings).
Enumerations
url_scheme_e
URL SyntaxSee RFC 1738.
Functions
test
unittest_io_urlUnittest for parsing URLs from strings.
url_tDescribes URL which has the common Internet scheme syntax.
schemeThe url scheme.
lifetime
new_urlParses full url from encoded string and fills in all components.
new2_urlParses url from encoded string and fills in all components.
newparts_urlFills in url components from substrings describing components.
delete_urlFrees all resources bound to url.
query
encode_urlEncodes all parts and combines them into one string.
getpart_urlReturn part of url as string.
fragment_urlReturns the anchor/fragment part of the url.
hostname_urlReturns name of IP network node or NULL if undefined.
passwd_urlReturns password or NULL if undefined.
path_urlReturns password or NULL if undefined.
port_urlReturns port or NULL if undefined.
query_urlReturns the query part of the url.
user_urlReturns username or NULL if undefined.
url_parts_tDefines url_parts_t as an array of 7 strings.
lifetime
url_parts_FREEStatic initializer.
inline implementation
Macros
getpart_urlImplements url_t.getpart_url.
fragment_urlImplements url_t.fragment_url.
hostname_urlImplements url_t.hostname_url.
passwd_urlImplements url_t.passwd_url.
path_urlImplements url_t.path_url.
port_urlImplements url_t.port_url.
query_urlImplements url_t.query_url.
user_urlImplements url_t.user_url.

Copyright

This program is free software.  You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.

Author

© 2011 Jörg Seebohn

Files

C-kern/api/io/url.h

Header file URL.

C-kern/io/url.c

Implementation file URL impl.

Types

struct url_t

typedef struct url_t url_t

Exports url_t.

struct url_parts_t

Export url_parts_t (array of 7 strings).

Enumerations

url_scheme_e

url_scheme_HTTPURL for HTTP protocol.  Example: “http://name:pas.nosp@m.word@www.ser.nosp@m.ver.com/path/to/resource”

URL Syntax

See RFC 1738.

Encoded characters in a URL

All characters except alphanumerics, the special characters “$-_.+!*’(),”, and reserved characters used for their reserved purposes may be used unencoded within a URL.

The characters “;”, “/”, “?”, “:”, “@”, “=” and “&” (HTTP also “#”) are the characters which may be reserved for special meaning within a scheme.  No other characters may be reserved within a scheme.

How to Encode a Character

Bytes are encoded by a character triplet consisting of the character “%” followed by the two hexadecimal digits forming the hexadecimal value of the byte.

Some or all of the parts “<user>:<password>@”, “:<password>”, “:<port>”, and “/<url-path>” may be excluded.

Functions

Summary
test
unittest_io_urlUnittest for parsing URLs from strings.

test

unittest_io_url

int unittest_io_url(void)

Unittest for parsing URLs from strings.

url_t

struct url_t

Describes URL which has the common Internet scheme syntax.  Therefore the resource must be located on the Internet.

Any URL consist of two main parts

<name-of-scheme> ':' <scheme-specific-part>

The scheme specific data start with a double slash “//” to indicate that it complies with the common Internet scheme syntax.

Undefined Fields vs.  Empty Fields

If a field is not defined null is returned from the corresponding query function.  For an empty field the empty string “” is returned.

Summary
schemeThe url scheme.
lifetime
new_urlParses full url from encoded string and fills in all components.
new2_urlParses url from encoded string and fills in all components.
newparts_urlFills in url components from substrings describing components.
delete_urlFrees all resources bound to url.
query
encode_urlEncodes all parts and combines them into one string.
getpart_urlReturn part of url as string.
fragment_urlReturns the anchor/fragment part of the url.
hostname_urlReturns name of IP network node or NULL if undefined.
passwd_urlReturns password or NULL if undefined.
path_urlReturns password or NULL if undefined.
port_urlReturns port or NULL if undefined.
query_urlReturns the query part of the url.
user_urlReturns username or NULL if undefined.

scheme

uint16_t scheme

The url scheme.  See url_scheme_e for a list of all supported values.

lifetime

new_url

int new_url(/*out*/url_t **url,
const char *encodedstr)

Parses full url from encoded string and fills in all components.  The string must contain a scheme prefix.  The enocded string is decoded (“%AB” -> character code 0xAB) and no conversion into the current local is made.  Encoded url strings should be be encoded from utf8 characeter encoding.

new2_url

int new2_url(/*out*/url_t **url,
url_scheme_e scheme,
const char *encodedstr)

Parses url from encoded string and fills in all components.  The url string must not contain any scheme prefix, it is read from parameter scheme.  See also new_url.

newparts_url

int newparts_url(/*out*/url_t **url,
url_scheme_e scheme,
url_parts_t *parts,
bool are_parts_encoded)

Fills in url components from substrings describing components.  If parameter are_parts_encoded is set to true the substrings are considered encoded and are therefore decoded before out parameter url is constructed.

delete_url

int delete_url(url_t **url)

Frees all resources bound to url.

query

encode_url

int encode_url(const url_t *url,
/*ret*/struct wbuffer_t *encoded_url_string)

Encodes all parts and combines them into one string.

getpart_url

const char * getpart_url(const url_t *url,
enum url_part_e part)

Return part of url as string.

fragment_url

const char * fragment_url(const url_t *url)

Returns the anchor/fragment part of the url.  Returns “<fragment>” for the following url:

http://<host>/path?<query>#<fragment>.

hostname_url

const char * hostname_url(const url_t *url)

Returns name of IP network node or NULL if undefined.

passwd_url

const char * passwd_url(const url_t *url)

Returns password or NULL if undefined.

path_url

const char * path_url(const url_t *url)

Returns password or NULL if undefined.

port_url

const char * port_url(const url_t *url)

Returns port or NULL if undefined.

query_url

const char * query_url(const url_t *url)

Returns the query part of the url.  Returns “query” for the following url:

http://<host>/path?<query>#<fragment>.

user_url

const char * user_url(const url_t *url)

Returns username or NULL if undefined.

url_parts_t

Defines url_parts_t as an array of 7 strings.  See string_t.

Summary
lifetime
url_parts_FREEStatic initializer.

lifetime

url_parts_FREE

#define url_parts_FREE { string_FREE, string_FREE, string_FREE, string_FREE, string_FREE, string_FREE, string_FREE }

Static initializer.

Macros

getpart_url

Implements url_t.getpart_url.

fragment_url

#define fragment_url(url) getpart_url(url, url_part_FRAGMENT)

Implements url_t.fragment_url.

hostname_url

#define hostname_url(url) getpart_url(url, url_part_HOSTNAME)

Implements url_t.hostname_url.

passwd_url

#define passwd_url(url) getpart_url(url, url_part_PASSWD)

Implements url_t.passwd_url.

path_url

#define path_url(url) getpart_url(url, url_part_PATH)

Implements url_t.path_url.

port_url

#define port_url(url) getpart_url(url, url_part_PORT)

Implements url_t.port_url.

query_url

#define query_url(url) getpart_url(url, url_part_QUERY)

Implements url_t.query_url.

user_url

#define user_url(url) getpart_url(url, url_part_USER)

Implements url_t.user_url.

URLs are used to `locate’ resources, by providing an abstract identification of the resource location.
Implements URL.
typedef struct url_t url_t
Exports url_t.
struct url_t
Describes URL which has the common Internet scheme syntax.
Defines url_parts_t as an array of 7 strings.
int unittest_io_url(void)
Unittest for parsing URLs from strings.
uint16_t scheme
The url scheme.
int new_url(/*out*/url_t **url,
const char *encodedstr)
Parses full url from encoded string and fills in all components.
int new2_url(/*out*/url_t **url,
url_scheme_e scheme,
const char *encodedstr)
Parses url from encoded string and fills in all components.
int newparts_url(/*out*/url_t **url,
url_scheme_e scheme,
url_parts_t *parts,
bool are_parts_encoded)
Fills in url components from substrings describing components.
int delete_url(url_t **url)
Frees all resources bound to url.
int encode_url(const url_t *url,
/*ret*/struct wbuffer_t *encoded_url_string)
Encodes all parts and combines them into one string.
const char * getpart_url(const url_t *url,
enum url_part_e part)
Return part of url as string.
const char * fragment_url(const url_t *url)
Returns the anchor/fragment part of the url.
const char * hostname_url(const url_t *url)
Returns name of IP network node or NULL if undefined.
const char * passwd_url(const url_t *url)
Returns password or NULL if undefined.
const char * path_url(const url_t *url)
Returns password or NULL if undefined.
const char * port_url(const url_t *url)
Returns port or NULL if undefined.
const char * query_url(const url_t *url)
Returns the query part of the url.
const char * user_url(const url_t *url)
Returns username or NULL if undefined.
#define url_parts_FREE { string_FREE, string_FREE, string_FREE, string_FREE, string_FREE, string_FREE, string_FREE }
Static initializer.
#define fragment_url(url) getpart_url(url, url_part_FRAGMENT)
Implements url_t.fragment_url.
#define hostname_url(url) getpart_url(url, url_part_HOSTNAME)
Implements url_t.hostname_url.
#define passwd_url(url) getpart_url(url, url_part_PASSWD)
Implements url_t.passwd_url.
#define path_url(url) getpart_url(url, url_part_PATH)
Implements url_t.path_url.
#define port_url(url) getpart_url(url, url_part_PORT)
Implements url_t.port_url.
#define query_url(url) getpart_url(url, url_part_QUERY)
Implements url_t.query_url.
#define user_url(url) getpart_url(url, url_part_USER)
Implements url_t.user_url.
Close