Package

software.uncharted.sparkpipe.ops.core.dataframe

numeric

Permalink

package numeric

Numeric pipeline operations that operate on DataFrames.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. numeric
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. object docs

    Permalink

    Stub object necessary due to https://issues.scala-lang.org/browse/SI-8124

    Stub object necessary due to https://issues.scala-lang.org/browse/SI-8124

    Documentation for ops.core.dataframe.numeric can be found at software.uncharted.sparkpipe.ops.core.dataframe.numeric

    Attributes
    protected[this]
    See also

    software.uncharted.sparkpipe.ops.core.dataframe.numeric

  2. def enumerate(input: DataFrame): DataFrame

    Permalink

    Convert all compatible columns within a DataFrame into Doubles.

    Convert all compatible columns within a DataFrame into Doubles. Supports source columns of the following types:

    - FloatType - DoubleType - IntegerType - LongType - DateType - TimestampType

    input

    Input DataFrame to convert

    returns

    Transformed DataFrame, where all suitable columns have been converted to Doubles, and incompatible columns have been dropped.

    Exceptions thrown

    java.lang.IllegalArgumentException if the input DataFrame does not contain any compatible columns

  3. def numericRangeFilter[T](filters: Seq[(String, T, T)], exclude: Boolean = false)(df: DataFrame)(implicit n: Numeric[T]): DataFrame

    Permalink

    A generalized n-dimensional range filter operation.

    A generalized n-dimensional range filter operation. Works on any value compatible with Numeric.

    filters

    Sequence of column (name, min, max) tuples, 1 for each dimension of the data

    exclude

    Boolean indicating whether values in the range are excluded or included.

    df

    Dataframe to apply filter to

    returns

    Transformed dataframe, where records inside/outside the specified time range have been removed.

  4. def summaryStats(sc: SparkContext)(input: DataFrame): Seq[SummaryStats]

    Permalink

    Computes summary statistics using online algorithms for each compatible column in an input DataFrame.

    Computes summary statistics using online algorithms for each compatible column in an input DataFrame. Ignores null fields within a row without causing the summarizer for that column to return NaN. Statistics returned include: - min - max - mean - variance - normL1 - normL2 - numNonzeros

    input

    Input DataFrame to analyze

    returns

    a Seq[(String, OnlineStatSummarizer)], with one OnlineStatSummarizer per column (paired with the column name)

  5. package util

    Permalink

Inherited from AnyRef

Inherited from Any

Ungrouped