Stub object necessary due to https://issues.scala-lang.org/browse/SI-8124
Stub object necessary due to https://issues.scala-lang.org/browse/SI-8124
Documentation for ops.core.dataframe.text
can be found at software.uncharted.sparkpipe.ops.core.dataframe.text
Applies regex matching to text in a field, and includes rows with hits.
Applies regex matching to text in a field, and includes rows with hits.
Column containing text to match against.
Regex pattern describing text matches.
DataFrame from previous stage
Filtered DataFrame
Checks for keyword matches in a text field and keeps rows with hits.
Checks for keyword matches in a text field and keeps rows with hits.
Column containing text to match against.
Keywords to match.
True if matching should be case sensitive, False otherwise.
DataFrame from previous stage
Filtered DataFrame
Pipeline op to filter a string column down to terms which match a certain pattern
Pipeline op to filter a string column down to terms which match a certain pattern
The name of an ArrayType(StringType) column in the input DataFrame
A Regex pattern describing words to include
Input pipeline data to filter.
Transformed pipeline data, with non-matching words removed from the specified column
Pipeline op to filter a string column down to terms of interest
Pipeline op to filter a string column down to terms of interest
The name of an ArrayType(StringType) column in the input DataFrame
A Set[String] of words to filter to
Input pipeline data to filter.
Transformed pipeline data, with the specified column filterd down to terms of interest
Apply a transformation to every String in an Array[String] column.
Apply a transformation to every String in an Array[String] column.
The name of an ArrayType(StringType) column in the input DataFrame
A transformation function String => O
Input pipeline data to transform
Transformed pipeline data, with the mapFcn applied to every term in every row of the Array[String] column
Removes all occurrences of pattern in a String column
Removes all occurrences of pattern in a String column
the name of a String column in the input DataFrame
a regular expression
Input pipeline data to transform
Transformed pipeline data, with instances of the given pattern in input removed
Replaces all occurrences of pattern in a String column with sub
Replaces all occurrences of pattern in a String column with sub
the name of a String column in the input DataFrame
a regular expression
the string to substitute for the pattern
Input pipeline data to transform
Transformed pipeline data, with instances of the given pattern in input replaced with sub
Splits a String column into an Array[String] column using a delimiter (whitespace, by default)
Splits a String column into an Array[String] column using a delimiter (whitespace, by default)
the name of a String column in the input DataFrame
a delimiter to split the String column on
Input pipeline data to transform
Transformed pipeline data, with the given string column split on the delimiter
Applies regex matching to text in a field, and excludes rows with hits.
Applies regex matching to text in a field, and excludes rows with hits.
Column containing text to match against.
Regex pattern describing text matches.
DataFrame from previous stage
Filtered DataFrame
Checks for keyword matches in a text field and removes rows with hits.
Checks for keyword matches in a text field and removes rows with hits.
Column containing text to match against.
Keywords to match.
True if matching should be case sensitive, False otherwise.
DataFrame from previous stage
Filtered DataFrame
Pipeline op to remove stop patterns from a string column
Pipeline op to remove stop patterns from a string column
The name of an ArrayType(StringType) column in the input DataFrame
A Regex pattern describing words to remove
Input pipeline data to filter.
Transformed pipeline data, with matching words removed from the specified column
Pipeline op to remove stop words from a string column
Pipeline op to remove stop words from a string column
The name of an ArrayType(StringType) column in the input DataFrame
A Set[String] of words to remove
Input pipeline data to filter.
Transformed pipeline data, with stop words removed from the specified column
Produces a Map[String,Int] of unique terms from an Array[String] column along with associated counts
Produces a Map[String,Int] of unique terms from an Array[String] column along with associated counts
The name of an ArrayType(StringType) column in the input DataFrame
Input pipeline data to analyze
the Map[String, Int] of unique terms and their counts
Common pipeline operations for dealing with textual data