This article is part of a series on Data Science.

Data visualization is the art and science of representing data graphically to communicate information clearly and efficiently. By leveraging visual elements such as charts, graphs, maps, and interactive dashboards, visualization transforms raw data into a form that is more accessible and understandable for both technical and non-technical audiences.

Within the broader field of data science, visualization plays a critical role. It enables data scientists to explore large and complex datasets, identify meaningful patterns, trends, and outliers, and communicate their findings effectively. Beyond mere presentation, visualization serves as an analytical tool that helps guide further investigations and decision-making.

The utility of data visualization spans diverse domains including business, healthcare, education, government, and environmental science. Whether monitoring key performance indicators in a corporation, tracking patient outcomes in medicine, or visualizing geographic data for urban planning, effective visual communication is indispensable.

By converting abstract numbers into visual stories, data visualization not only enhances comprehension but also fosters collaboration among stakeholders, supporting evidence-based decisions and strategic planning.

Common Data Visualization Techniques

Data visualization techniques vary widely depending on the nature of the data and the analytical goals. Selecting an appropriate visualization method is crucial to accurately represent the underlying information and to facilitate effective interpretation.

Some of the most commonly used techniques include:

In addition to these, there are specialized visualization methods such as network graphs to show relationships between entities, time series visualizations for detailed temporal analysis, and interactive visualizations that allow users to explore data dynamically.

Jacques Bertin’s Visual Variables

Jacques Bertin, a seminal figure in the field of cartography and data visualization, identified a set of fundamental visual variables that form the building blocks for graphical representation of data. These variables determine how visual elements encode information and help viewers interpret visualizations effectively.

Bertin’s visual variables include:

  1. Position: The spatial location of elements within a graphic. Position is the most precise visual variable and is often used to encode quantitative data, as humans excel at perceiving spatial differences. For example, in a scatter plot, each point’s x and y coordinates correspond to variable values.
  2. Size: The relative dimensions of graphical elements, such as the length of a bar or the area of a circle, to represent magnitude or quantity. Larger elements generally indicate greater values.
  3. Shape: The form or outline of an element used to distinguish categories or types. For example, different shapes (circles, squares, triangles) in a scatter plot can represent distinct groups.
  4. Value: The lightness or darkness of a color, often used to represent numeric intensity or density. Darker shades can indicate higher values, as seen in grayscale heatmaps.
  5. Color (Hue): The actual color or tint of elements, used to differentiate qualitative categories or highlight specific data points. For example, red and blue may represent different political affiliations.
  6. Orientation: The angle or direction of elements, such as lines or textures, which can convey trends or directional information. For example, the tilt of bars in a bar chart may suggest increasing or decreasing values.
  7. Texture (Pattern): The surface pattern or repetition of marks within an area, which can provide additional differentiation especially in black-and-white or print media. Examples include stripes, dots, or crosshatching.

Understanding and applying these visual variables allows designers and data scientists to create clear, effective, and interpretable visualizations. Bertin’s work laid the conceptual groundwork for later theories such as Leland Wilkinson’s Grammar of Graphics, which systematizes the construction of complex visualizations.

Visualization Examples by Analytical Purpose

Data visualizations serve different analytical goals, and selecting the appropriate type depends on the specific question being addressed. Below are common purposes for visualization along with examples of techniques that suit each purpose:

  1. Time Series Analysis: Visualizations of data points indexed over time, used to detect trends, cycles, and seasonal effects. Line graphs are the most common tool for this purpose, as they clearly show how values evolve.
  2. Ranking: Displaying data ordered by magnitude to identify leaders and laggards. Bar charts sorted by value are typically used to rank categories or entities, such as sales by region or performance scores.
  3. Part-to-Whole Relationships: Illustrate how individual parts contribute to a total. Pie charts and stacked bar charts effectively show proportions, such as market share or budget allocations.
  4. Deviation: Visual representations that highlight differences or variations from a baseline or expected value. Diverging bar charts or error bars can show deviations in measurements or survey responses.
  5. Sorting: Organizing data to uncover patterns or structures, often used in heatmaps or clustered bar charts where sorting reveals underlying groupings or trends.
  6. Frequency Distribution: Visualizations such as histograms reveal the distribution of values within a dataset, indicating modes, skewness, and outliers.
  7. Correlation Analysis: Scatter plots are fundamental to investigating relationships between two quantitative variables, identifying positive, negative, or no correlation patterns.
  8. Nominal Comparison: Comparing categories without an inherent order, using bar charts or icon arrays to highlight differences across groups.
  9. Geospatial Visualization: Mapping data onto geographic coordinates to analyze spatial patterns. Choropleth maps or point maps are common for demographic or environmental data.

Choosing the right visualization type is critical to effectively convey insights, avoid misinterpretation, and engage the audience. Understanding each type’s strengths and limitations enables data scientists to tailor their visual communication to the data’s story.

2D vs. 3D Visualization: Comparative Analysis and Use Cases

Data visualizations can be broadly classified into two categories based on dimensionality: two-dimensional (2D) and three-dimensional (3D). Each has its advantages, challenges, and ideal applications depending on the context and data complexity.

Two-Dimensional (2D) Visualization

2D visualizations represent data along two axes (typically x and y) and are the most common form used in data science. Their simplicity makes them highly interpretable, quick to generate, and easy to integrate into reports and dashboards.

Advantages of 2D Visualizations:

Common 2D Visualization Types: bar charts, line graphs, pie charts, histograms, scatter plots, heatmaps, and geographic maps.

Three-Dimensional (3D) Visualization

3D visualizations add depth (the z-axis) to the spatial representation of data, allowing the depiction of complex, multivariate datasets or spatial phenomena where a third dimension is inherent, such as geographic elevation or molecular structures.

Advantages of 3D Visualizations:

Challenges and Considerations:

Common 3D Visualization Types: 3D scatter plots, 3D surface plots, 3D bar charts, 3D line graphs, 3D heatmaps, and globe-based geographic visualizations.

Choosing Between 2D and 3D Visualization

The choice between 2D and 3D depends on the dataset, the message to be conveyed, and the audience’s familiarity with complex visual forms. For exploratory data analysis and most business applications, 2D visualizations generally suffice and are preferred for clarity and precision.

3D visualizations are advantageous when the data inherently involves three spatial dimensions or when interactive exploration can uncover insights not visible in 2D. However, designers must carefully weigh the cognitive and technical costs before opting for 3D representations.

Grammar of Graphics and Visualization Design

The Grammar of Graphics, first formalized by Leland Wilkinson, provides a systematic framework for constructing and understanding data visualizations. Rather than viewing charts as isolated types, this grammar decomposes visualizations into fundamental components, allowing flexible composition and deeper insight into how graphical representations encode data.

According to Wilkinson’s grammar, a graphic is composed of layers that include:

This layered approach allows the creation of complex and customized visualizations by combining components in different ways. It also supports principles such as consistency, reusability, and extensibility in visualization design.

The Grammar of Graphics is widely implemented in the ggplot2 library in R, which has become a gold standard for statistical graphics. ggplot2 allows users to declaratively specify how data should be mapped to visual properties and supports layering of geometric objects and statistical summaries seamlessly.

Other tools and libraries, such as Python’s plotnine (a ggplot2-inspired library), also adopt this grammar, enabling expressive and elegant visualization construction.

Fundamental graphical elements in the grammar include:

Understanding and applying the Grammar of Graphics framework enables data scientists and visualization designers to create clear, precise, and effective visual representations that can be tailored to diverse analytical needs.

References

  1. Data and information visualization — Wikipedia
  2. Jacques Bertin — Wikipedia
  3. Text Visualization Browser
  4. Kucher, K., and A. Kerren. Text Visualization Techniques: Taxonomy, Visual Survey, and Community Insights. 2015 IEEE Pacific Visualization Symposium (PacificVis), 2015.
  5. TrustMLVis Browser
  6. ggplot2 Documentation
  7. Matplotlib Documentation
  8. Plotly for Python
  9. D3.js Library
  10. Bertin, J. (1983). Semiology of Graphics: Diagrams, Networks, Maps. University of Wisconsin Press. A foundational text introducing visual variables in data visualization.
  11. Wilkinson, L. (2005). Grammar of Graphics. Springer. This book formalizes the systematic approach to creating statistical graphics.
  12. Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. A practical guide to the widely-used R package based on the Grammar of Graphics.
  13. Heer, J., Bostock, M., & Ogievetsky, V. (2010). A Tour Through the Visualization Zoo. Communications of the ACM, 53(6), 59–67. Link An accessible overview of a wide variety of visualization types.
  14. Plotnine: Grammar of Graphics for Python