Use the unicode_codepoints_from_string function in APL to convert a UTF-8 string into an array of Unicode code points. This function is useful when you want to analyze or transform strings at the character encoding level, especially in multilingual datasets, log inspection, or byte-level debugging. You can use this function to detect non-printable or non-ASCII characters, analyze internationalized content, or perform detailed comparisons between strings that look visually similar but differ in underlying code points.

For users of other query languages

If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.

Usage

Syntax

unicode_codepoints_from_string(source)

Parameters

NameTypeDescription
sourcestringThe input UTF-8 string to convert.

Returns

An array of integers, where each integer is the Unicode code point of the corresponding character in the input string.

Use case examples

Use this function to identify unusual characters in request URLs that might indicate obfuscated attacks or encoding issues.Query
['sample-http-logs']
| limit 100
| extend codepoints = unicode_codepoints_from_string(uri)
| mv-expand codepoints
| where codepoints < 32 or codepoints > 126
| project _time, uri, codepoints
Run in PlaygroundOutput
_timeuricodepoints
2025-07-27T12:00:00Z/api/v1/textdata/background/change£163
This query flags URIs with non-standard characters, helping you identify suspicious or malformed requests.
  • array_concat: Combines multiple arrays. Useful when merging code point arrays from different strings.
  • array_length: Returns the number of elements in an array. Use it to check how many code points a string contains.
  • parse_path: Parses a path into components. Use it with unicode_codepoints_from_string when decoding or inspecting URL paths.
  • unicode_codepoints_to_string: TODO