The Python Package Index has libraries for practically every data visualization need—from Pastalog for real-time visualizations of neural network training to Gaze Parser for eye movement research. Some of these libraries can be used no matter what field of application, yet many of them are intensely focused on accomplishing a specific task.
An overview of 11 interdisciplinary Python data visualization libraries, from most popular to least, follows.
Simple and powerful visualizations can be generated using the Matplotlib Python Library. More than a decade old, it is the most widely-used library for plotting in the Python community. A wide range of graphs from histograms to heat plots to line plots can be plotted using Matplotlib.
Many other libraries are built on top of Matplotlib and are designed to work in conjunction with analysis, it being the first Python data visualization library. Libraries like pandas and matplotlib are “wrappers” over Matplotlib allowing access to a number of Matplotlib’s methods with less code.
The versatility of Matplotlib can be used to make many visualization types:-
- Scatter plots
- Bar charts and Histograms
- Line plots
- Pie charts
- Stem plots
- Contour plots
- Quiver plots
One can create grids, labels, legends etc. with ease since everything is customizable.
Seaborn is a popular data visualization library that is built on top of Matplotlib. Seaborn’s default styles and color palettes are much more sophisticated than Matplotlib. Beyond that, Seaborn is a higher-level library, meaning it’s easier to generate certain kinds of plots, including heat maps, time series, and violin plots.
Ggplot is a python visualization library based on R’s ggplot2 and the Grammar of Graphics. It lets you construct plots using high-level grammar without thinking about the implementation details. Ggplot operates differently compared to Matplotlib: it lets users layer components to create a full plot. For example, the user can start with axes, and then add points, then a line, a trend line, etc. The Grammar of Graphics has been hailed as an “intuitive” method for plotting, though, seasoned Matplotlib users might need time to adjust to this new mindset.
Bokeh is native to Python, not ported over from R, unlike ggplot. Bokeh, like ggplot, is also based on The Grammar of Graphics. It also supports streaming, and real-time data and its unique selling proposition is its ability to create interactive, web-ready plots, which can easily output as JSON objects, HTML documents, or interactive web applications.
Bokeh has three interfaces with varying degrees of control to accommodate different types of users. The topmost level is for creating charts quickly. It includes methods for creating common charts such as bar plots, box plots, and histograms. The middle level allows the user to control the basic building blocks of each chart (for example, the dots in a scatter plot) and has the same specificity as Matplotlib. The bottom level is geared toward developers and software engineers. It has no pre-set defaults and requires the user to define every element of the chart.
While Plotly is widely known as an online platform for data visualization, very few people know that it can be can be accessed from a Python notebook. Like Bokeh, Plotly’s strength lies in making interactive plots, and it offers some charts not found in most libraries, like contour plots.
Pygal, like Plotly and Bokeh, offers interactive plots that can be embedded in a web browser. The ability to output charts as SVGs, is its prime differentiator. For work involving smaller datasets, SVGs will do just fine. However, for charts with hundreds of thousands of data points, they become sluggish and have trouble rendering.
It’s easy to create a nice-looking chart with just a few lines of code since each chart type is packaged into a method and the built-in styles are pretty.
Altair is a declarative statistical visualization python library based on Vega-lite. Declarative means you only need to mention the links between data columns to the encoding channels, such as x-axis, y-axis, color, etc. and the rest of the plotting details are handled automatically. Being declarative makes Altair simple, friendly and consistent. It is easy to design effective and beautiful visualizations with a minimal amount of code using Altair.
Geoplotlib is a toolbox used for plotting geographical data and map creation. It can be used to create a variety of map-types, like choropleths, heatmaps, and dot density maps. Pyglet (an object-oriented programming interface) is required to be installed to use Geoplotlib.
Geoplotlib reduces the complexity of designing visualizations by providing a set of in-built tools for the most common tasks such as density visualization, spatial graphs, and shape files.
Since most Python data visualization libraries don’t offer maps, it’s good to have a library dedicated to them.
Dealing with missing data is cumbersome. The completeness of a dataset can be gauged quickly with Missingno, rather than painstakingly searching through a table. The user can filter and sort data based on completion or spot correlations with a heat map or a dendrogram.
Leather is designed to work with all data types and produces charts such as SVGs, so that they can be scaled without losing image quality. Leather’s creator, Christopher Groskopf, puts it best: “Leather is the Python charting library for those who need charts now and don’t care if they’re perfect.” Since this library is relatively new, some of the documentation is still in progress. The charts that can be made are pretty basic—but that’s the intention.
Would you add any other python data visualization libraries to this list? Please share your favorites in a comment below.
Guest Author – Quincy is part of the team at Springboard and is passionate about online learning and strong coffee.