pyspark.sql.functions.array_except#
- pyspark.sql.functions.array_except(col1, col2)[source]#
Array function: returns a new array containing the elements present in col1 but not in col2, without duplicates.
New in version 2.4.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- Returns
ColumnA new array containing the elements present in col1 but not in col2.
Notes
This function does not preserve the order of the elements in the input arrays.
Examples
Example 1: Basic usage
>>> from pyspark.sql import Row, functions as sf >>> df = spark.createDataFrame([Row(c1=["b", "a", "c"], c2=["c", "d", "a", "f"])]) >>> df.select(sf.array_except(df.c1, df.c2)).show() +--------------------+ |array_except(c1, c2)| +--------------------+ | [b]| +--------------------+
Example 2: Except with no common elements
>>> from pyspark.sql import Row, functions as sf >>> df = spark.createDataFrame([Row(c1=["b", "a", "c"], c2=["d", "e", "f"])]) >>> df.select(sf.sort_array(sf.array_except(df.c1, df.c2))).show() +--------------------------------------+ |sort_array(array_except(c1, c2), true)| +--------------------------------------+ | [a, b, c]| +--------------------------------------+
Example 3: Except with all common elements
>>> from pyspark.sql import Row, functions as sf >>> df = spark.createDataFrame([Row(c1=["a", "b", "c"], c2=["a", "b", "c"])]) >>> df.select(sf.array_except(df.c1, df.c2)).show() +--------------------+ |array_except(c1, c2)| +--------------------+ | []| +--------------------+
Example 4: Except with null values
>>> from pyspark.sql import Row, functions as sf >>> df = spark.createDataFrame([Row(c1=["a", "b", None], c2=["a", None, "c"])]) >>> df.select(sf.array_except(df.c1, df.c2)).show() +--------------------+ |array_except(c1, c2)| +--------------------+ | [b]| +--------------------+
Example 5: Except with empty arrays
>>> from pyspark.sql import Row, functions as sf >>> from pyspark.sql.types import ArrayType, StringType, StructField, StructType >>> data = [Row(c1=[], c2=["a", "b", "c"])] >>> schema = StructType([ ... StructField("c1", ArrayType(StringType()), True), ... StructField("c2", ArrayType(StringType()), True) ... ]) >>> df = spark.createDataFrame(data, schema) >>> df.select(sf.array_except(df.c1, df.c2)).show() +--------------------+ |array_except(c1, c2)| +--------------------+ | []| +--------------------+