Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Lecture 7 - (03/03/2026)

Today we will delve deeper into making maps and visualizing spatial data with Altair, including:

  • Symbol Maps

  • Choropleth Maps

  • Cartographic Projections

Symbol Maps

Now that we’ve established the idea that we can have a seperate layers for maps, let’s try to put it to use. We’ll examine the U.S. commercial flight network, considering both airports and flight routes. To do so, we’ll need three datasets.

For our base map, we’ll use a TopoJSON file for the United States at 10m resolution, containing features for states or counties:

import pandas as pd
import altair as alt
from vega_datasets import data
usa = data.us_10m.url
usa
'https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/us-10m.json'

For the airports, we will use a dataset with fields for the longitude and latitude coordinates of each airport as well as the iata airport code — for example, ‘JFK’ for John F. Kennedy International Airport.

airports = data.airports.url
airports
'https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/airports.csv'

Finally, we will use a dataset of flight routes, which contains origin and destination fields with the IATA codes for the corresponding airports:

flights = data.flights_airport.url
flights
'https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/flights-airport.csv'

Let’s start by creating a base map using the albersUsa projection, and add a layer that plots circle marks for each airport:

world = data.world_110m.url
world
'https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/world-110m.json'
alt.layer(
    alt.Chart(alt.topo_feature(usa, 'states')).mark_geoshape(
        fill='#ddd', stroke='#fff', strokeWidth=1
    ),
    alt.Chart(airports).mark_circle(size=9).encode(
        latitude='latitude:Q',
        longitude='longitude:Q',
        tooltip='iata:N'
    )
).project(
    type='albersUsa'
).properties(
    width=900,
    height=500
).configure_view(
    stroke=None
)
Loading...

Now, instead of showing all airports in an undifferentiated fashion, let’s identify major hubs by considering the total number of routes that originate at each airport. We’ll use the routes dataset as our primary data source: it contains a list of flight routes that we can aggregate to count the number of routes for each origin airport.

However, the routes dataset does not include the locations of the airports! To augment the routes data with locations, we need a new data transformation: lookup. The lookup transform takes a field value in a primary dataset and uses it as a key to look up related information in another table. In this case, we want to match the origin airport code in our routes dataset against the iata field of the airports dataset, then extract the corresponding latitude and longitude fields.

alt.layer(
    alt.Chart(alt.topo_feature(usa, 'states')).mark_geoshape(
        fill='#ddd', stroke='#fff', strokeWidth=1
    ),
    alt.Chart(flights).mark_circle().transform_aggregate(
        groupby=['origin'],
        routes='count()'
    ).transform_lookup(
        lookup='origin',
        from_=alt.LookupData(data=airports, key='iata',
                             fields=['state', 'latitude', 'longitude'])
    ).encode(
        latitude='latitude:Q',
        longitude='longitude:Q',
        tooltip=['origin:N', 'routes:Q'],
        size=alt.Size('routes:Q', scale=alt.Scale(range=[0, 1000]), legend=None),
        order=alt.Order('routes:Q', sort='descending')
    )
).project(
    type='albersUsa'
).properties(
    width=900,
    height=500
).configure_view(
    stroke=None
)
Loading...

Which section of the US have the highest number of outgoing routes?

Now that we can see the airports, which may wish to interact with them to better understand the structure of the air traffic network. We can add a rule mark layer to represent paths from origin airports to destination airports, which requires two lookup transforms to retreive coordinates for each end point. In addition, we can use a single selection to filter these routes, such that only the routes originating at the currently selected airport are shown.

# interactive selection for origin airport
# select nearest airport to mouse cursor
origin = alt.selection_point(
    on='mouseover', nearest=True,
    fields=['origin'], empty='none'
)

# shared data reference for lookup transforms
foreign = alt.LookupData(data=airports, key='iata',
                         fields=['latitude', 'longitude'])
    
alt.layer(
    # base map of the United States
    alt.Chart(alt.topo_feature(usa, 'states')).mark_geoshape(
        fill='#ddd', stroke='#fff', strokeWidth=1
    ),
    # route lines from selected origin airport to destination airports
    alt.Chart(flights).mark_rule(
        color='#000', opacity=0.35
    ).transform_filter(
        origin # filter to selected origin only
    ).transform_lookup(
        lookup='origin', from_=foreign # origin lat/lon
    ).transform_lookup(
        lookup='destination', from_=foreign, as_=['lat2', 'lon2'] # dest lat/lon
    ).encode(
        latitude='latitude:Q',
        longitude='longitude:Q',
        latitude2='lat2',
        longitude2='lon2',
    ),
    # size airports by number of outgoing routes
    # 1. aggregate flights-airport data set
    # 2. lookup location data from airports data set
    # 3. remove Puerto Rico (PR) and Virgin Islands (VI)
    alt.Chart(flights).mark_circle().transform_aggregate(
        groupby=['origin'],
        routes='count()'
    ).transform_lookup(
        lookup='origin',
        from_=alt.LookupData(data=airports, key='iata',
                             fields=['state', 'latitude', 'longitude'])
    ).transform_filter(
        'datum.state !== "PR" && datum.state !== "VI"'
    ).add_params(
        origin
    ).encode(
        latitude='latitude:Q',
        longitude='longitude:Q',
        tooltip=['origin:N', 'routes:Q'],
        size=alt.Size('routes:Q', scale=alt.Scale(range=[0, 1000]), legend=None),
        order=alt.Order('routes:Q', sort='descending') # place smaller circles on top
    )
).project(
    type='albersUsa'
).properties(
    width=900,
    height=500
).configure_view(
    stroke=None
)
Loading...

Choropleth Maps

A choropleth map uses shaded or textured regions to visualize data values. Sized symbol maps are often more accurate to read, as people tend to be better at estimating proportional differences between the area of circles than between color shades. Nevertheless, choropleth maps are popular in practice and particularly useful when too many symbols become perceptually overwhelming.

For example, while the United States only has 50 states, it has thousands of counties within those states. Let’s build a choropleth map of the unemployment rate per county, back in the recession year of 2008. In some cases, input GeoJSON or TopoJSON files might include statistical data that we can directly visualize.

In this case, however, we have two files: our TopoJSON file that includes county boundary features (usa), and a separate text file that contains unemployment statistics:

unemp = data.unemployment.url
unemp
'https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/unemployment.tsv'

To integrate our data sources, we will again need to use the lookup transform, augmenting our TopoJSON-based geoshape data with unemployment rates. We can then create a map that includes a color encoding for the looked-up rate field.

alt.Chart(alt.topo_feature(usa, 'counties')).mark_geoshape(
    stroke='#aaa', strokeWidth=0.25
).transform_lookup(
    lookup='id', from_=alt.LookupData(data=unemp, key='id', fields=['rate'])
).encode(
    alt.Color('rate:Q',
              scale=alt.Scale(domain=[0, 0.3], clamp=True), 
              legend=alt.Legend(format='%')),
    alt.Tooltip('rate:Q', format='.0%')
).project(
    type='albersUsa'
).properties(
    width=900,
    height=500
).configure_view(
    stroke=None
)
Loading...

A major concern for choropleth maps is the choice of colors. Above, we used Altair’s default ‘yellowgreenblue’ scheme for heatmaps.

Here are some other schemes that can be used.

def map_(scheme):
    return alt.Chart().mark_geoshape().project(type='albersUsa').encode(
        alt.Color('rate:Q', scale=alt.Scale(scheme=scheme), legend=None)
    ).properties(width=305, height=200)

alt.hconcat(
    map_('magma'), map_('viridis'), map_('blueorange'),
    data=alt.topo_feature(usa, 'counties')
).transform_lookup(
    lookup='id', from_=alt.LookupData(data=unemp, key='id', fields=['rate'])
).configure_view(
    stroke=None
).resolve_scale(
    color='independent'
)
Loading...

More colors can be found here

Cartographic Projections

Now that we have some experience creating maps, let’s take a closer look at cartographic projections.

  • All map projections necessarily distort the surface in some fashion. Depending on the purpose of the map, some distortions are acceptable and others are not; therefore, different map projections exist in order to preserve some properties of the sphere-like body at the expense of other properties.

Some of the properties we might wish to consider include:

- Area: Does the projection distort region sizes?

- Bearing: Does a straight line correspond to a constant direction of travel?

- Distance: Do lines of equal length correspond to equal distances on the globe?

- Shape: Does the projection preserve spatial relations (angles) between points?

Selecting an appropriate projection thus depends on the use case for the map. For example, if we are assessing land use and the extent of land matters, we might choose an area-preserving projection. If we want to visualize shockwaves emanating from an earthquake, we might focus the map on the quake’s epicenter and preserve distances outward from that point. Or, if we wish to aid navigation, the preservation of bearing and shape may be more important.

Cylindrical projections map the sphere onto a surrounding cylinder, then unroll the cylinder. If the major axis of the cylinder is oriented north-south, meridians are mapped to straight lines. Pseudo-cylindrical projections represent a central meridian as a straight line, with other meridians “bending” away from the center.

world = data.world_110m.url

map = alt.layer(
    # use the sphere of the Earth as the base layer
    alt.Chart({'sphere': True}).mark_geoshape(
        fill='#e6f3ff'
    ),
    # add a graticule for geographic reference lines
    alt.Chart({'graticule': True}).mark_geoshape(
        stroke='#ffffff', strokeWidth=1
    ),
    # and then the countries of the world
    alt.Chart(alt.topo_feature(world, 'countries')).mark_geoshape(
        fill='#2a1d0c', stroke='#706545', strokeWidth=0.5
    )
).properties(
    width=600,
    height=400
)
minimap = map.properties(width=225, height=225)
alt.hconcat(
    minimap.project(type='equirectangular').properties(title='equirectangular'),
    minimap.project(type='mercator').properties(title='mercator'),
    minimap.project(type='transverseMercator').properties(title='transverseMercator'),
    minimap.project(type='naturalEarth1').properties(title='naturalEarth1')
).properties(spacing=10).configure_view(stroke=None)
Loading...
  • Equirectangular: Scale lat, lon coordinate values directly.

  • Mercator: Project onto a cylinder, using lon directly, but subjecting lat to a non-linear transformation. Straight lines preserve constant compass bearings (rhumb lines), making this projection well-suited to navigation. However, areas in the far north or south can be greatly distorted.

  • Transverse Mercator: A mercator projection, but with the bounding cylinder rotated to a transverse axis. Whereas the standard Mercator projection has highest accuracy along the equator, the Transverse Mercator projection is most accurate along the central meridian.

  • Natural Earth: A pseudo-cylindrical projection designed for showing the whole Earth in one view.

Conic projections map the sphere onto a cone, and then unroll the cone on to the plane. Conic projections are configured by two standard parallels, which determine where the cone intersects the globe.

minimap = map.properties(width=180, height=130)
alt.hconcat(
    minimap.project(type='conicEqualArea').properties(title='conicEqualArea'),
    minimap.project(type='conicEquidistant').properties(title='conicEquidistant'),
    minimap.project(type='conicConformal', scale=35, translate=[90,65]).properties(title='conicConformal'),
    minimap.project(type='albers').properties(title='albers'),
    minimap.project(type='albersUsa').properties(title='albersUsa')
).properties(spacing=10).configure_view(stroke=None)
Loading...
  • Conic Equal Area: Area-preserving conic projection. Shape and distance are not preserved, but roughly accurate within standard parallels.

  • Conic Equidistant: Conic projection that preserves distance along the meridians and standard parallels.

  • Conic Conformal: Conic projection that preserves shape (local angles), but not area or distance.

  • Albers: A variant of the conic equal area projection with standard parallels optimized for creating maps of the United States.

  • Albers USA: A hybrid projection for the 50 states of the United States of America. This projection stitches together three Albers projections with different parameters for the continental U.S., Alaska, and Hawaii.

Azimuthal projections map the sphere directly onto a plane.

minimap = map.properties(width=180, height=180)
alt.hconcat(
    minimap.project(type='azimuthalEqualArea').properties(title='azimuthalEqualArea'),
    minimap.project(type='azimuthalEquidistant').properties(title='azimuthalEquidistant'),
    minimap.project(type='orthographic').properties(title='orthographic'),
    minimap.project(type='stereographic').properties(title='stereographic'),
    minimap.project(type='gnomonic').properties(title='gnomonic')
).properties(spacing=10).configure_view(stroke=None)
Loading...
  • Azimuthal Equal Area: Accurately projects area in all parts of the globe, but does not preserve shape (local angles).

  • Azimuthal Equidistant: Preserves proportional distance from the projection center to all other points on the globe.

  • Orthographic: Projects a visible hemisphere onto a distant plane. Approximately matches a view of the Earth from outer space.

  • Stereographic: Preserves shape, but not area or distance.

  • Gnomonic: Projects the surface of the sphere directly onto a tangent plane. Great circles around the Earth are projected to straight lines, showing the shortest path between points.

Let’s play around a bit with vega-lite cartographic projections

The examples above all draw from the vega-datasets collection, including geometric (TopoJSON) and tabular (airports, unemployment rates) data. A common challenge to getting starting with geographic visualization is collecting the necessary data for your task.

A number of data providers abound, including services such as the United States Geological Survey and U.S. Census Bureau.

In many cases you may have existing data with a geographic component, but require additional measures or geometry. To help you get started, here is one workflow:

  1. Visit Natural Earth Data and browse to select data for regions and resolutions of interest. Download the corresponding zip file(s).

  2. Go to MapShaper and drop your downloaded zip file onto the page. Revise the data as desired, and then “Export” generated TopoJSON or GeoJSON files.

  3. Load the exported data from MapShaper for use with Altair!