0%

SPARK-SQL内置函数之字符串函数

concat对于字符串进行拼接

1
2
3
4
concat(str1, str2, ..., strN) - Returns the concatenation of str1, str2, ..., strN.

Examples:`> SELECT concat('Spark', 'SQL');  
SparkSQL

concat_ws在拼接的字符串中间添加某种格式

1
2
3
4
concat_ws(sep, [str | array(str)]+) - Returns the concatenation of the strings separated by `sep`.

Examples:`> SELECT concat_ws(' ', 'Spark', 'SQL');  
Spark SQL

decode转码

1
2
3
4
decode(bin, charset) - Decodes the first argument using the second argument character set.

Examples: `> SELECT decode(encode('abc', 'utf-8'), 'utf-8');
abc

encode设置编码格式

1
2
3
4
encode(str, charset) - Encodes the first argument using the second argument character set.

Examples: `> SELECT encode('abc', 'utf-8');
abc

format_string/printf 格式化字符串

1
2
3
4
format_string(strfmt, obj, ...) - Returns a formatted string from printf-style format strings.

Examples:`> SELECT format_string("Hello World %d %s", 100, "days");
Hello World 100 days

initcap将每个单词的首字母变为大写,其他字母小写; lower全部转为小写,upper大写

1
2
3
4
initcap(str) - Returns `str` with the first letter of each word in uppercase. All other letters are in lowercase. Words are delimited by white space.

Examples:`> SELECT initcap('sPark sql');
Spark Sql

length返回字符串的长度

1
2
Examples:`> SELECT length('Spark SQL ');
10

levenshtein编辑距离(将一个字符串变为另一个字符串的距离)

1
2
3
4
levenshtein(str1, str2) - Returns the Levenshtein distance between the two given strings.

Examples:`> SELECT levenshtein('kitten', 'sitting');
3

lpad返回固定长度的字符串,如果长度不够,用某种字符补全,rpad右补全

1
2
3
4
lpad(str, len, pad) - Returns `str`, left-padded with `pad` to a length of `len`. If `str` is longer than `len`, the return value is shortened to `len` characters.

Examples:`> SELECT lpad('hi', 5, '??');
???hi

ltrim去除空格或去除开头的某些字符,rtrim右去除,trim两边同时去除

1
2
3
4
5
6
7
8
9
10
ltrim(str) - Removes the leading space characters from `str`.

ltrim(trimStr, str) - Removes the leading string contains the characters from the trim string

Examples:

> SELECT ltrim(' SparkSQL ');  
SparkSQL
> SELECT ltrim('Sp', 'SSparkSQLS');  
arkSQLS

regexp_extract 正则提取某些字符串,regexp_replace正则替换

1
2
3
4
5
6
7
Examples:`> SELECT regexp_extract('100-200', '(\d+)-(\d+)', 1);  
100

Examples: `> SELECT regexp_replace('100-200', '(\d+)', 'num');  
num-num

Examples: `> SELECT regexp_replace(regexp_replace(regexp_replace("json_arr","\"\\[","\\["),"\\]\"","\\]"),"\\\\","")

repeat复制给的字符串n次

1
2
Examples: `> SELECT repeat('123', 2);  
123123

instr返回截取字符串的位置/locate

1
2
3
4
5
6
7
instr(str, substr) - Returns the (1-based) index of the first occurrence of `substr` in `str`.

Examples:`> SELECT instr('SparkSQL', 'SQL');  
6

Examples:`> SELECT locate('bar', 'foobarbar');  
4

space 在字符串前面加n个空格

1
2
3
4
space(n) - Returns a string consisting of `n` spaces.

Examples:`> SELECT concat(space(2), '1');  
1

split以某些字符拆分字符串

1
2
3
4
split(str, regex) - Splits `str` around occurrences that match `regex`.

Examples:`> SELECT split('oneAtwoBthreeC', '[ABC]');      
``["one","two","three",""]

substr截取字符串,substring_index

Examples:

1
2
3
4
5
6
7
8
> SELECT substr('Spark SQL', 5);  
k SQL
> SELECT substr('Spark SQL', -3);  
SQL
> SELECT substr('Spark SQL', 5, 1);  
k
> SELECT substring_index('www.apache.org', '.', 2);  
www.apache

translate 替换某些字符串为

1
2
Examples: `> SELECT translate('AaBbCc', 'abc', '123');   
A1B2C3

get_json_object

1
2
3
4
get_json_object(json_txt, path) - Extracts a json object from `path`.

Examples:`> SELECT get_json_object('{"a":"b"}', '$.a');  
b

unhex

1
2
3
4
unhex(expr) - Converts hexadecimal `expr` to binary.

Examples:`> SELECT decode(unhex('537061726B2053514C'), 'UTF-8');  
Spark SQL

to_json

to_json(expr[, options]) - Returns a json string with a given struct value

Examples:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
> SELECT to_json(named_struct('a', 1, 'b', 2));   
{"a":1,"b":2}

> SELECT to_json(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy'));  
{"time":"26/08/2015"}

> SELECT to_json(array(named_struct('a', 1, 'b', 2));
[{"a":1,"b":2}]

> SELECT to_json(map('a', named_struct('b', 1)));  
{"a":{"b":1}}

> SELECT to_json(map(named_struct('a', 1),named_struct('b', 2)));  
{"[1]":{"b":2}}

> SELECT to_json(map('a', 1));  
{"a":1}

> SELECT to_json(array((map('a', 1))));  
[{"a":1}]