Axiom Docs

Use the unicode_codepoints_from_string function in APL to convert a UTF-8 string into an array of Unicode code points. This function is useful when you want to analyze or transform strings at the character encoding level, especially in multilingual datasets, log inspection, or byte-level debugging. You can use this function to detect non-printable or non-ASCII characters, analyze internationalized content, or perform detailed comparisons between strings that look visually similar but differ in underlying code points.

For users of other query languages

If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.

Splunk SPL users

In Splunk SPL, working with Unicode code points requires using eval expressions with ord or custom logic, which can be verbose. APL offers a built-in function for this, making it concise and efficient.

| eval codepoints=split(mvjoin(map(split("abc", ""), ord('<<FIELD>>')), ","), ",")

ANSI SQL users

ANSI SQL does not have a native function to extract Unicode code points. You typically need to use platform-specific functions or procedural logic. In APL, this is a single-function call.

-- Requires procedural logic or platform-specific functions like ASCII(), UNICODE(), etc.

Usage

Syntax

unicode_codepoints_from_string(source)

Parameters

Name	Type	Description
source	string	The input UTF-8 string to convert.

Returns

An array of integers, where each integer is the Unicode code point of the corresponding character in the input string.

Use case examples

Use this function to identify unusual characters in request URLs that might indicate obfuscated attacks or encoding issues.Query

['sample-http-logs']
| limit 100
| extend codepoints = unicode_codepoints_from_string(uri)
| mv-expand codepoints
| where codepoints < 32 or codepoints > 126
| project _time, uri, codepoints

Run in PlaygroundOutput

_time	uri	codepoints
2025-07-27T12:00:00Z	/api/v1/textdata/background/change£	163

This query flags URIs with non-standard characters, helping you identify suspicious or malformed requests.

array_concat: Combines multiple arrays. Useful when merging code point arrays from different strings.
array_length: Returns the number of elements in an array. Use it to check how many code points a string contains.
parse_path: Parses a path into components. Use it with unicode_codepoints_from_string when decoding or inspecting URL paths.
unicode_codepoints_to_string: TODO

Get started

Functions

Operators

Reference

Migration

unicode_codepoints_from_string

For users of other query languages

Usage

Syntax

Parameters

Returns

Use case examples

Get started

Functions

Operators

Reference

Migration

​For users of other query languages

​Usage

​Syntax

​Parameters

​Returns

​Use case examples

​List of related functions

For users of other query languages

Usage

Syntax

Parameters

Returns

Use case examples

List of related functions