pyspark.sql.DataFrameNaFunctions.fill#
- DataFrameNaFunctions.fill(value, subset=None)[source]#
- Returns a new - DataFramewhich null values are filled with new value.- DataFrame.fillna()and- DataFrameNaFunctions.fill()are aliases of each other.- New in version 1.3.1. - Changed in version 3.4.0: Supports Spark Connect. - Parameters
- valueint, float, string, bool or dict, the value to replace null values with.
- If the value is a dict, then subset is ignored and value must be a mapping from column name (string) to replacement value. The replacement value must be an int, float, boolean, or string. 
- subsetstr, tuple or list, optional
- optional list of column names to consider. Columns specified in subset that do not have matching data types are ignored. For example, if value is a string, and subset contains a non-string column, then the non-string column is simply ignored. 
 
- Returns
- DataFrame
- DataFrame with replaced null values. 
 
 - Examples - >>> df = spark.createDataFrame([ ... (10, 80.5, "Alice", None), ... (5, None, "Bob", None), ... (None, None, "Tom", None), ... (None, None, None, True)], ... schema=["age", "height", "name", "bool"]) - Example 1: Fill all null values with 50 for numeric columns. - >>> df.na.fill(50).show() +---+------+-----+----+ |age|height| name|bool| +---+------+-----+----+ | 10| 80.5|Alice|NULL| | 5| 50.0| Bob|NULL| | 50| 50.0| Tom|NULL| | 50| 50.0| NULL|true| +---+------+-----+----+ - Example 2: Fill all null values with - Falsefor boolean columns.- >>> df.na.fill(False).show() +----+------+-----+-----+ | age|height| name| bool| +----+------+-----+-----+ | 10| 80.5|Alice|false| | 5| NULL| Bob|false| |NULL| NULL| Tom|false| |NULL| NULL| NULL| true| +----+------+-----+-----+ - Example 3: Fill all null values with to 50 and “unknown” for
- ‘age’ and ‘name’ column respectively. 
 - >>> df.na.fill({'age': 50, 'name': 'unknown'}).show() +---+------+-------+----+ |age|height| name|bool| +---+------+-------+----+ | 10| 80.5| Alice|NULL| | 5| NULL| Bob|NULL| | 50| NULL| Tom|NULL| | 50| NULL|unknown|true| +---+------+-------+----+ - Example 4: Fill all null values with “Spark” for ‘name’ column. - >>> df.na.fill(value = 'Spark', subset = 'name').show() +----+------+-----+----+ | age|height| name|bool| +----+------+-----+----+ | 10| 80.5|Alice|NULL| | 5| NULL| Bob|NULL| |NULL| NULL| Tom|NULL| |NULL| NULL|Spark|true| +----+------+-----+----+