T-Drive data sample from Microsoft Research, generated by Salt and visualized with Leaflet - View Fullscreen

Analyze and Share data, visually

Salt is a library for creating interactive visualizations of massive datasets.

Visualizing big data is hard. Usually this means creating highly aggregated visualizations that can conceal localized patterns behind sums and averages. Salt helps you scalably transform your data, so that it can be explored interactively at increasing levels of detail.

Salt generates scalable representations of billions of data points, enabling visualizations including geographic heatmaps, cross-plots or time series, and the layering of multiple data sources and dimensions for contextual overlay.

Scalable: Create interactive visualizations of datasets in the billions of records

Analytic Layers: Layer custom analytics and aggregations from multiple datasets

Tiles: Efficient, generalized data representations for more than just maps

Taxi pickup and dropoff locations with tile-based analytics overlay summarizing the top Twitter hashtags by region - View Fullscreen

Salt enables the visualization of non-numeric data as well with custom, flexible analytics for keywords, topics, sentiment, and more. Above, top Twitter hashtags have been plotted in word clouds by region above the a dataset of 187 million New York taxi trips. Zooming in de-aggregates the word clouds, allowing you to explore local trends, while zooming out gives you a high-level overview of hashtag usage in the New York City area.

Flexible Inputs, Flexible Outputs

Building on the flexibility of Apache Spark, Salt is capable of ingesting data in a wide variety of formats and schemas. For example, a 2 million node, 10 million link graph in GraphML format is rendered using Salt in a web-maps-like pannable, zoomable interface with trillions of points of resolution. We chose to render edges using binning rather than overplotting which ensures that dense cluster structure remains visible.

Massive graphs with Salt

Salt divides up results into tiles of aggregated data representations. The coordinate system (2D geographic, 2D cartesian, 1D time-series, etc), data format, serialization, and storage are flexible and totally up to you. For example, we took the same taxi data from above and generated tiled data output compatible with the Torque TileJSON format to created an animated time-bucketed view of a day of taxi activity. Try it yourself!

Ready to see your data?

Try the Examples

Get the Code

Review the Docs

or contact us for more information.