Lecture 6 - (26/02/2026)
Today we will introduce the basics of creating maps and visualizing spatial data with Altair, including:
GeoJSON / TopoJSON
Geoshape Marks
Point Maps
GeoJSON and TopoJSON¶
GeoJSON models geographic features within a specialized JSON format. A GeoJSON feature can include geometric data – such as longitude, latitude coordinates that make up a country boundary – as well as additional data attributes.
Here is a GeoJSON feature object for the boundary of the Manhattan:
{
"type": "Feature",
"properties": {
"name": "Manhattan",
"cartodb_id": 4,
"created_at": "2013-03-09T02:42:03.692Z",
"updated_at": "2013-03-09T02:42:03.989Z"
},
"geometry": {
"type": "MultiPolygon",
"coordinates": [
[[-74.010928, 40.684491],...]
]
}
}{'type': 'Feature',
'properties': {'name': 'Manhattan',
'cartodb_id': 4,
'created_at': '2013-03-09T02:42:03.692Z',
'updated_at': '2013-03-09T02:42:03.989Z'},
'geometry': {'type': 'MultiPolygon',
'coordinates': [[[-74.010928, 40.684491], Ellipsis]]}}The feature includes a properties object, which can include any number of data fields, plus a geometry object, which in this case contains a single polygon that consists of [longitude, latitude] coordinates for the borough boundary. The coordinates continue off to the right for a long time.
import pandas as pd
import altair as alt
from vega_datasets import dataLet’s load a TopoJSON file of world countries (at 110 meter resolution):
world = data.world_110m.url
world'https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/world-110m.json'world_topo = data.world_110m()world_topo.keys()dict_keys(['type', 'transform', 'objects', 'arcs'])world_topo['type']'Topology'world_topo['objects'].keys()dict_keys(['land', 'countries'])In the data above, the objects property indicates the named elements we can extract from the data: geometries for all countries, or a single polygon representing all land on Earth. Either of these can be unpacked to GeoJSON data we can then visualize.
As TopoJSON is a specialized format, we need to instruct Altair to parse the TopoJSON format, indicating which named faeture object we wish to extract from the topology. The following code indicates that we want to extract GeoJSON features from the world dataset for the countries object:
alt.topo_feature(world, 'countries')UrlData({
format: TopoDataFormat({
feature: 'countries',
type: 'topojson'
}),
url: 'https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/world-110m.json'
})This alt.topo_feature method call expands to the following Vega-Lite JSON:
Geoshape Marks¶
To visualize geographic data, Altair provides the geoshape mark type. To create a basic map, we can create a geoshape mark and pass it our TopoJSON data, which is then unpacked into GeoJSON features, one for each country of the world:
alt.Chart(
alt.topo_feature(world, 'countries')
).mark_geoshape()We can customize the colors and stroke widths using standard mark properties. Using the project method we can also add our own map projection:
alt.Chart(alt.topo_feature(world, 'countries')).mark_geoshape(
fill='black', stroke='white', strokeWidth=0.3
).project(
type='mercator'
)
alt.Chart(alt.topo_feature(world, 'countries')).mark_geoshape(
fill='black', stroke='white', strokeWidth=0.3
).project(
type='albers'
)
By default Altair automatically adjusts the projection so that all the data fits within the width and height of the chart.
We can also specify projection parameters, such as scale (zoom) and translate (moving), to customize the projection settings.
Let’s try to focus on Europe.
alt.Chart(alt.topo_feature(world, 'countries')).mark_geoshape(
fill='black', stroke='white', strokeWidth=0.3
).project(
type='mercator', scale=400, translate=[100, 550]
)Note how the 110m resolution of the data becomes apparent at this scale. To see more detailed coast lines and boundaries, we need an input file with more fine-grained geometries.
So far our map shows countries only. Using the layer operator, we can combine multiple map elements. Altair includes data generators we can use to create data for additional map layers:
The sphere generator (
{'sphere': True}) provides a GeoJSON representation of the full sphere of the Earth. We can create an additionalgeoshapemark that fills in the shape of the Earth as a background layer.The graticule generator (
{'graticule': ...}) creates a GeoJSON feature representing a graticule: a grid formed by lines of latitude and longitude. The default graticule has meridians and parallels every between latitude. For the polar regions, there are meridians every . These settings can be customized using the and properties.
Let’s layer sphere, graticule, and country marks into a reusable map specification:
map = alt.layer(
# use the sphere of the Earth as the base layer
alt.Chart({'sphere': True}).mark_geoshape(
fill='LightBlue'
),
# add a graticule for geographic reference lines
alt.Chart({'graticule': True}).mark_geoshape(
stroke='white', strokeWidth=1
),
# and then the countries of the world
alt.Chart(alt.topo_feature(world, 'countries')).mark_geoshape(
fill='black', stroke='white', strokeWidth=0.3
)
).properties(
width=600,
height=400
)We can extend the map with a desired projection and draw the result. The sphere layer provides the light blue background, the graticule layer provides the white geographic reference lines.
map.project(
type='naturalEarth1', scale=110, translate=[300, 200]
).configure_view(stroke=None)Point Maps¶
In addition to the geometric data provided by GeoJSON or TopoJSON files, many tabular datasets include geographic information in the form of fields for longitude and latitude coordinates, or references to geographic regions such as country names, state names, postal codes, etc., which can be mapped to coordinates using a geocoding service. In some cases, location data is rich enough that we can see meaningful patterns by projecting the data points alone!
Let’s look at a dataset of 5-digit zip codes in the United States, including longitude, latitude coordinates for each post office in addition to a zip_code field.
zipcodes = data.zipcodes.url
zipcodes'https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/zipcodes.csv'We can visualize each post office location using a small (1-pixel) square mark.
While cartographic projections map (longitude, latitude) coordinates to (x, y) coordinates, they can do so in arbitrary ways. There is no guarantee, for example, that longitude x and latitude y! Instead, Altair includes special longitude and latitude encoding channels to handle geographic coordinates.
alt.Chart(zipcodes).mark_square(
size=1, opacity=1
).encode(
longitude='longitude:Q', # apply the field named 'longitude' to the longitude channel
latitude='latitude:Q' # apply the field named 'latitude' to the latitude channel
).project(
type='albersUsa'
).properties(
width=900,
height=500
).configure_view(
stroke=None
).interactive()Plotting zip codes only, we can see the outline of the United States and discern meaningful patterns in the density of post offices, without a base map or additional reference elements!
The albersUsa projection takes some liberties with the actual geometry of the earth and scales the data to fit the US. We could achieve the same effect if we provided the scale and translate parameters.
alt.Chart(zipcodes).mark_square(
size=1, opacity=1
).encode(
longitude='longitude:Q', # apply the field named 'longitude' to the longitude channel
latitude='latitude:Q' # apply the field named 'latitude' to the latitude channel
).project(
type='mercator'
).properties(
width=900,
height=500
).configure_view(
stroke=None
).interactive()We can now go on to ask more questions of our dataset. For example, is there any rhyme or reason to the allocation of zip codes? To assess this question we can add a color encoding based on the first digit of the zip code. Let’s add a calculate transform to extract the first digit, and encode the result using the color channel:
map = alt.Chart(zipcodes).transform_calculate(
digit='datum.zip_code[0]'
).mark_square(
size=2,
opacity=1
).encode(
longitude='longitude:Q',
latitude='latitude:Q',
color='digit:N'
).project(
type='albersUsa'
).properties(
width=900,
height=500
).configure_view(
stroke=None
)To zoom in on a specific digit, we can add a filter transform to limit the data shown.
import altair as alt
digit_sel = alt.selection_point(
fields=["digit"],
bind="legend", # binded to the legend
empty="all"
)
alt.Chart(zipcodes).transform_calculate(
digit="datum.zip_code[0]"
).transform_filter(
digit_sel
).mark_square(
size=2,
opacity=1
).encode(
longitude="longitude:Q",
latitude="latitude:Q",
color="digit:N"
).add_params( # need to add a parameter
digit_sel
).project(
type="albersUsa"
).properties(
width=900,
height=500
).configure_view(
stroke=None
)For a better version take a look at: Ben Fry’s Zipdecode Visualization
Furthermore we might wonder what a specific sequence of zipcodes may indicate. One way to do so would be to connect each consecutive zip code using a line mark.
alt.Chart(zipcodes).transform_filter(
'-150 < datum.longitude && 22 < datum.latitude && datum.latitude < 55' # filters our Puerto Rico and American Samoa
).transform_calculate(
digit='datum.zip_code[0]'
).mark_line(
strokeWidth=0.5
).encode(
longitude='longitude:Q',
latitude='latitude:Q',
color='digit:N',
order='zip_code:O'
).project(
type='albersUsa'
).properties(
width=900,
height=500
).configure_view(
stroke=None
)We can now see how zip codes further cluster into smaller areas, indicating a hierarchical allocation of codes by location, but with some notable variability within local clusters.
This visualisation is based on Robert Kosara’s ZipScribble Visualization