prefuse.util
Class DataLib

java.lang.Object
  extended by prefuse.util.DataLib

public class DataLib
extends java.lang.Object

Functions for processing an iterator of tuples, including the creation of arrays of particular tuple data values and summary statistics (min, max, median, mean, standard deviation).

Author:
jeffrey heer

Constructor Summary
DataLib()
           
 
Method Summary
static int count(java.lang.Iterable<? extends Tuple<?>> tuples, java.lang.String field)
          Get the number of values in a data column.
static double deviation(java.lang.Iterable<? extends Tuple<?>> tuples, java.lang.String field)
          Get the standard deviation of a tuple data value.
static double deviation(java.lang.Iterable<? extends Tuple<?>> tuples, java.lang.String field, double mean)
          Get the standard deviation of a tuple data value.
static java.lang.Class<?> inferType(TupleSet<? extends Tuple<?>> tuples, java.lang.String field)
          Infer the data field type across all tuples in a TupleSet.
static Tuple<?> max(java.lang.Iterable<? extends Tuple<?>> tuples, java.lang.String field)
          Get the Tuple with the maximum data field value.
static Tuple<?> max(java.lang.Iterable<? extends Tuple<?>> tuples, java.lang.String field, java.util.Comparator<java.lang.Object> cmp)
          Get the Tuple with the maximum data field value.
static Tuple<?> max(TupleSet<? extends Tuple<?>> tuples, java.lang.String field)
          Get the Tuple with the maximum data field value.
static Tuple<?> max(TupleSet<? extends Tuple<?>> tuples, java.lang.String field, java.util.Comparator<java.lang.Object> cmp)
          Get the Tuple with the maximum data field value.
static double mean(java.lang.Iterable<? extends Tuple<?>> tuples, java.lang.String field)
          Get the mean value of a tuple data value.
static Tuple<?> median(java.lang.Iterable<? extends Tuple<?>> tuples, java.lang.String field)
          Get the Tuple with the median data field value.
static Tuple<?> median(java.lang.Iterable<? extends Tuple<?>> tuples, java.lang.String field, java.util.Comparator<java.lang.Object> cmp)
          Get the Tuple with the median data field value.
static Tuple<?> median(TupleSet<? extends Tuple<?>> tuples, java.lang.String field)
          Get the Tuple with the median data field value.
static Tuple<?> median(TupleSet<? extends Tuple<?>> tuples, java.lang.String field, java.util.Comparator<java.lang.Object> cmp)
          Get the Tuple with the median data field value.
static Tuple<?> min(java.lang.Iterable<? extends Tuple<?>> tuples, java.lang.String field)
          Get the Tuple with the minimum data field value.
static Tuple<?> min(java.lang.Iterable<? extends Tuple<?>> tuples, java.lang.String field, java.util.Comparator<java.lang.Object> cmp)
          Get the Tuple with the minimum data field value.
static Tuple<?> min(TupleSet<? extends Tuple<?>> tuples, java.lang.String field)
          Get the Tuple with the minimum data field value.
static Tuple<?> min(TupleSet<? extends Tuple<?>> tuples, java.lang.String field, java.util.Comparator<java.lang.Object> cmp)
          Get the Tuple with the minimum data field value.
static java.lang.Object[] ordinalArray(java.lang.Iterable<? extends Tuple<?>> tuples, java.lang.String field)
          Get a sorted array containing all column values for a given tuple iterator and field.
static java.lang.Object[] ordinalArray(java.lang.Iterable<? extends Tuple<?>> tuples, java.lang.String field, java.util.Comparator<java.lang.Object> cmp)
          Get a sorted array containing all column values for a given table and field.
static java.lang.Object[] ordinalArray(TupleSet<? extends Tuple<?>> tuples, java.lang.String field)
          Get a sorted array containing all column values for a given tuple iterator and field.
static java.lang.Object[] ordinalArray(TupleSet<? extends Tuple<?>> tuples, java.lang.String field, java.util.Comparator<java.lang.Object> cmp)
          Get a sorted array containing all column values for a given table and field.
static java.util.Map<java.lang.Object,java.lang.Integer> ordinalMap(java.lang.Iterable<? extends Tuple<?>> tuples, java.lang.String field)
          Get map mapping from column values (as Object instances) to their ordinal index in a sorted array.
static java.util.Map<java.lang.Object,java.lang.Integer> ordinalMap(java.lang.Iterable<? extends Tuple<?>> tuples, java.lang.String field, java.util.Comparator<java.lang.Object> cmp)
          Get map mapping from column values (as Object instances) to their ordinal index in a sorted array.
static java.util.Map<java.lang.Object,java.lang.Integer> ordinalMap(TupleSet<? extends Tuple<?>> tuples, java.lang.String field)
          Get map mapping from column values (as Object instances) to their ordinal index in a sorted array.
static java.util.Map<java.lang.Object,java.lang.Integer> ordinalMap(TupleSet<? extends Tuple<?>> tuples, java.lang.String field, java.util.Comparator<java.lang.Object> cmp)
          Get map mapping from column values (as Object instances) to their ordinal index in a sorted array.
static double sum(java.lang.Iterable<? extends Tuple<?>> tuples, java.lang.String field)
          Get the sum of a tuple data value.
static java.lang.Object[] toArray(java.lang.Iterable<? extends Tuple<?>> tuples, java.lang.String field)
          Get an array containing all data values for a given tuple iteration and field.
static double[] toDoubleArray(java.lang.Iterable<? extends Tuple<?>> tuples, java.lang.String field)
          Get an array of doubles containing all column values for a given table and field.
static int uniqueCount(java.lang.Iterable<? extends Tuple<?>> tuples, java.lang.String field)
          Get the number of distinct values in a data column.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DataLib

public DataLib()
Method Detail

toArray

public static java.lang.Object[] toArray(java.lang.Iterable<? extends Tuple<?>> tuples,
                                         java.lang.String field)
Get an array containing all data values for a given tuple iteration and field.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
an array containing the data values

toDoubleArray

public static double[] toDoubleArray(java.lang.Iterable<? extends Tuple<?>> tuples,
                                     java.lang.String field)
Get an array of doubles containing all column values for a given table and field. The Table.canGetDouble(String) method must return true for the given column name, otherwise an exception will be thrown.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
an array of doubles containing the column values

ordinalArray

public static java.lang.Object[] ordinalArray(java.lang.Iterable<? extends Tuple<?>> tuples,
                                              java.lang.String field)
Get a sorted array containing all column values for a given tuple iterator and field.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
an array containing the column values sorted

ordinalArray

public static java.lang.Object[] ordinalArray(java.lang.Iterable<? extends Tuple<?>> tuples,
                                              java.lang.String field,
                                              java.util.Comparator<java.lang.Object> cmp)
Get a sorted array containing all column values for a given table and field.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
cmp - a comparator for sorting the column contents
Returns:
an array containing the column values sorted

ordinalArray

public static java.lang.Object[] ordinalArray(TupleSet<? extends Tuple<?>> tuples,
                                              java.lang.String field)
Get a sorted array containing all column values for a given tuple iterator and field.

Parameters:
tuples - a TupleSet
field - the column / data field name
Returns:
an array containing the column values sorted

ordinalArray

public static java.lang.Object[] ordinalArray(TupleSet<? extends Tuple<?>> tuples,
                                              java.lang.String field,
                                              java.util.Comparator<java.lang.Object> cmp)
Get a sorted array containing all column values for a given table and field.

Parameters:
tuples - a TupleSet
field - the column / data field name
cmp - a comparator for sorting the column contents
Returns:
an array containing the column values sorted

ordinalMap

public static java.util.Map<java.lang.Object,java.lang.Integer> ordinalMap(java.lang.Iterable<? extends Tuple<?>> tuples,
                                                                           java.lang.String field)
Get map mapping from column values (as Object instances) to their ordinal index in a sorted array.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
a map mapping column values to their position in a sorted order of values

ordinalMap

public static java.util.Map<java.lang.Object,java.lang.Integer> ordinalMap(java.lang.Iterable<? extends Tuple<?>> tuples,
                                                                           java.lang.String field,
                                                                           java.util.Comparator<java.lang.Object> cmp)
Get map mapping from column values (as Object instances) to their ordinal index in a sorted array.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
cmp - a comparator for sorting the column contents
Returns:
a map mapping column values to their position in a sorted order of values

ordinalMap

public static java.util.Map<java.lang.Object,java.lang.Integer> ordinalMap(TupleSet<? extends Tuple<?>> tuples,
                                                                           java.lang.String field)
Get map mapping from column values (as Object instances) to their ordinal index in a sorted array.

Parameters:
tuples - a TupleSet
field - the column / data field name
Returns:
a map mapping column values to their position in a sorted order of values

ordinalMap

public static java.util.Map<java.lang.Object,java.lang.Integer> ordinalMap(TupleSet<? extends Tuple<?>> tuples,
                                                                           java.lang.String field,
                                                                           java.util.Comparator<java.lang.Object> cmp)
Get map mapping from column values (as Object instances) to their ordinal index in a sorted array.

Parameters:
tuples - a TupleSet
field - the column / data field name
cmp - a comparator for sorting the column contents
Returns:
a map mapping column values to their position in a sorted order of values

count

public static int count(java.lang.Iterable<? extends Tuple<?>> tuples,
                        java.lang.String field)
Get the number of values in a data column. Duplicates will be counted.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
the number of values

uniqueCount

public static int uniqueCount(java.lang.Iterable<? extends Tuple<?>> tuples,
                              java.lang.String field)
Get the number of distinct values in a data column.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
the number of distinct values

min

public static Tuple<?> min(java.lang.Iterable<? extends Tuple<?>> tuples,
                           java.lang.String field)
Get the Tuple with the minimum data field value.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
the Tuple with the minimum data field value

min

public static Tuple<?> min(java.lang.Iterable<? extends Tuple<?>> tuples,
                           java.lang.String field,
                           java.util.Comparator<java.lang.Object> cmp)
Get the Tuple with the minimum data field value.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
cmp - a comparator for sorting the column contents
Returns:
the Tuple with the minimum data field value

min

public static Tuple<?> min(TupleSet<? extends Tuple<?>> tuples,
                           java.lang.String field,
                           java.util.Comparator<java.lang.Object> cmp)
Get the Tuple with the minimum data field value.

Parameters:
tuples - a TupleSet
field - the column / data field name
Returns:
the Tuple with the minimum data field value

min

public static Tuple<?> min(TupleSet<? extends Tuple<?>> tuples,
                           java.lang.String field)
Get the Tuple with the minimum data field value.

Parameters:
tuples - a TupleSet
field - the column / data field name
Returns:
the Tuple with the minimum data field value

max

public static Tuple<?> max(java.lang.Iterable<? extends Tuple<?>> tuples,
                           java.lang.String field)
Get the Tuple with the maximum data field value.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
the Tuple with the maximum data field value

max

public static Tuple<?> max(java.lang.Iterable<? extends Tuple<?>> tuples,
                           java.lang.String field,
                           java.util.Comparator<java.lang.Object> cmp)
Get the Tuple with the maximum data field value.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
cmp - a comparator for sorting the column contents
Returns:
the Tuple with the maximum data field value

max

public static Tuple<?> max(TupleSet<? extends Tuple<?>> tuples,
                           java.lang.String field,
                           java.util.Comparator<java.lang.Object> cmp)
Get the Tuple with the maximum data field value.

Parameters:
tuples - a TupleSet
field - the column / data field name
Returns:
the Tuple with the maximum data field value

max

public static Tuple<?> max(TupleSet<? extends Tuple<?>> tuples,
                           java.lang.String field)
Get the Tuple with the maximum data field value.

Parameters:
tuples - a TupleSet
field - the column / data field name
Returns:
the Tuple with the maximum data field value

median

public static Tuple<?> median(java.lang.Iterable<? extends Tuple<?>> tuples,
                              java.lang.String field)
Get the Tuple with the median data field value.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
the Tuple with the median data field value

median

public static Tuple<?> median(java.lang.Iterable<? extends Tuple<?>> tuples,
                              java.lang.String field,
                              java.util.Comparator<java.lang.Object> cmp)
Get the Tuple with the median data field value.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
cmp - a comparator for sorting the column contents
Returns:
the Tuple with the median data field value

median

public static Tuple<?> median(TupleSet<? extends Tuple<?>> tuples,
                              java.lang.String field,
                              java.util.Comparator<java.lang.Object> cmp)
Get the Tuple with the median data field value.

Parameters:
tuples - a TupleSet
field - the column / data field name
Returns:
the Tuple with the median data field value

median

public static Tuple<?> median(TupleSet<? extends Tuple<?>> tuples,
                              java.lang.String field)
Get the Tuple with the median data field value.

Parameters:
tuples - a TupleSet
field - the column / data field name
Returns:
the Tuple with the median data field value

mean

public static double mean(java.lang.Iterable<? extends Tuple<?>> tuples,
                          java.lang.String field)
Get the mean value of a tuple data value. If any tuple does not have the named field or the field is not a numeric data type, NaN will be returned.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
the mean value, or NaN if a non-numeric data type is encountered

deviation

public static double deviation(java.lang.Iterable<? extends Tuple<?>> tuples,
                               java.lang.String field)
Get the standard deviation of a tuple data value. If any tuple does not have the named field or the field is not a numeric data type, NaN will be returned.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
the standard deviation value, or NaN if a non-numeric data type is encountered

deviation

public static double deviation(java.lang.Iterable<? extends Tuple<?>> tuples,
                               java.lang.String field,
                               double mean)
Get the standard deviation of a tuple data value. If any tuple does not have the named field or the field is not a numeric data type, NaN will be returned.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
mean - the mean of the column, used to speed up accurate deviation calculation
Returns:
the standard deviation value, or NaN if a non-numeric data type is encountered

sum

public static double sum(java.lang.Iterable<? extends Tuple<?>> tuples,
                         java.lang.String field)
Get the sum of a tuple data value. If any tuple does not have the named field or the field is not a numeric data type, NaN will be returned.

Parameters:
tuples - an iterator over tuples
field - the column / data field name
Returns:
the sum, or NaN if a non-numeric data type is encountered

inferType

public static java.lang.Class<?> inferType(TupleSet<? extends Tuple<?>> tuples,
                                           java.lang.String field)
Infer the data field type across all tuples in a TupleSet.

Parameters:
tuples - the TupleSet to analyze
field - the data field to type check
Returns:
the inferred data type
Throws:
java.lang.IllegalArgumentException - if incompatible types are used


Copyright © 2008 Regents of the University of California