numeric

Value Members

object docs

Stub object necessary due to https://issues.scala-lang.org/browse/SI-8124
Stub object necessary due to https://issues.scala-lang.org/browse/SI-8124
Documentation for ops.core.dataframe.numeric can be found at software.uncharted.sparkpipe.ops.core.dataframe.numeric

Attributes
protected[this]
See also
software.uncharted.sparkpipe.ops.core.dataframe.numeric
def enumerate(input: DataFrame): DataFrame

Convert all compatible columns within a DataFrame into Doubles.
Convert all compatible columns within a DataFrame into Doubles. Supports source columns of the following types:
- FloatType - DoubleType - IntegerType - LongType - DateType - TimestampType
input
Input DataFrame to convert
returns
Transformed DataFrame, where all suitable columns have been converted to Doubles, and incompatible columns have been dropped.

Exceptions thrown
java.lang.IllegalArgumentException if the input DataFrame does not contain any compatible columns
def numericRangeFilter[T](filters: Seq[(String, T, T)], exclude: Boolean = false)(df: DataFrame)(implicit n: Numeric[T]): DataFrame

A generalized n-dimensional range filter operation.
A generalized n-dimensional range filter operation. Works on any value compatible with Numeric.
filters
Sequence of column (name, min, max) tuples, 1 for each dimension of the data
exclude
Boolean indicating whether values in the range are excluded or included.
df
Dataframe to apply filter to
returns
Transformed dataframe, where records inside/outside the specified time range have been removed.
def summaryStats(sc: SparkContext)(input: DataFrame): Seq[SummaryStats]

Computes summary statistics using online algorithms for each compatible column in an input DataFrame.
Computes summary statistics using online algorithms for each compatible column in an input DataFrame. Ignores null fields within a row without causing the summarizer for that column to return NaN. Statistics returned include: - min - max - mean - variance - normL1 - normL2 - numNonzeros
input
Input DataFrame to analyze
returns
a Seq[(String, OnlineStatSummarizer)], with one OnlineStatSummarizer per column (paired with the column name)
package util

package numeric

Value Members

object docs

def enumerate(input: DataFrame): DataFrame

def numericRangeFilter[T](filters: Seq[(String, T, T)], exclude: Boolean = false)(df: DataFrame)(implicit n: Numeric[T]): DataFrame

def summaryStats(sc: SparkContext)(input: DataFrame): Seq[SummaryStats]

package util

Inherited from AnyRef

Inherited from Any

Ungrouped