A Technical Overview of the District Density Scale

Background

CityLab first calculated a Congressional Density Index in 2018 for the 116th Congress.¹ In 2022 the Dashboard team duplicated CityLab’s methods for the 118th Congress, creating the District Density Index.²Most recently, in 2026, we updated this resource to the 119th Congress and made a few modifications to the original methods, creating the District Density Scale. Our methods and technical decisions are detailed below.

Reach out to us at [email protected] with any questions, suggestions, or issues you notice.

Categorizing Census Tract-Level Household Density

The first step in calculating the District Density Scale is to assign all census tracts across the country to a household density category. Household density is calculated as: 2020 Decennial Census count of occupied households / square mile. Census tract-level household density categories are as follows (see Validation/Sensitivity Analyses section below to learn more about this decision):

Very low density: less than 102 households/sq. mile
Low density: 102 – 800 households/sq. mile
Medium density: 800 – 2213 households/sq. mile
High density: greater than 2213 households/sq. mile

Determining Percent of CD Population in Each Census Tract-Level Household Density Category

Each tract is then assigned to the congressional district(s) in which it is located. To account for the fact that census tracts do not always perfectly nest in congressional district boundaries, we calculated the percent of each congressional district’s total population that overlaps with each census tract (using 2020 Decennial Census counts).

From this information, we can calculate the percent of each CD’s population that is in each census tract-level household density category. We will refer to these percents as “CD-level household density distributions” in later sections. For example, Alaska’s At-Large CD:

Alaska’s At-Large CD's Household Density Distribution

Very low density	Low density	Medium density	High density
48%	23%	17%	12%

Assigning CDs to District Density Scale Clusters

Once we have calculated CD-level household density distributions for all congressional districts, the next step is to use this information to assign each congressional district to a District Density Scale cluster. We use four clusters (see Validation/Sensitivity Analyses section below to learn why we chose four clusters), which differs from CityLab’s original choice of six clusters. To make this assignment, we use a fuzzy-c means clustering algorithm to determine the likelihood of each district being categorized into one of the four clusters based on the CD-level household density distribution within each congressional district.

Fuzzy c-means clustering is an unsupervised algorithm that uses model inputs (in this case, CD-level household density distributions) to assign observations (in this instance, congressional districts) to clusters based on distance to the center of a cluster. C-means clustering assesses the degree to which a point can be assigned to each cluster (its “degree of similarity” to each cluster), as opposed to deterministically assigning points to a single ‘best fit’ cluster. For example, the degree to which a district is assigned to one cluster could be 0.7, and to another cluster could be 0.4. C-means clustering acknowledges that discrete groupings may not be perfect fits into only one cluster—hence calling it fuzzy. We ultimately assign districts to density clusters based on the district’s highest “degree of similarity” across the four clusters.

The below graphic summarizes CD-level household density distributions across all congressional districts assigned to the four clusters. Cluster names were inspired by the original CityLab names.

Edge/Cusp Cases

Though we ultimately assigned districts to density clusters based on the district’s highest degree of similarity, there are districts on the “cusp” of more than one cluster and/or with characteristics of another density cluster.

For example, California’s 18th District is categorized as “Predominantly Urban” (degree of similarity: 0.39) because of its significant overlap with dense areas like San Jose. But it also has characteristics very similar to the “Urban-Suburban Mix” cluster (degree of similarity: 0.35) because it includes suburban and semi-rural areas in Santa Clara, Santa Cruz, and San Benito counties.

As a slightly different example, Alaska’s At-Large district (i.e., one congressional district for the entire state) is in the “Predominantly Rural” cluster (degree of similarity: 0.83). It doesn’t have a strong similarity to any other density clusters, but about 12% of its census tracts are classified as “High Density” due to the presence of urban areas (like Anchorage) across the state.

Validation/Sensitivity Analyses

Assigning tracts to household density categories
- We used the density categories originally chosen by CityLab. Due to potential limitations with the categorization method (the original categories are from a 2014 survey³ and may not reflect current household distributions, categories were originally constructed at the zip-code level, and the original categories were not peer-reviewed) we completed a sensitivity analysis exploring the impact of different tract-level household density categories on the final District Density Scale. We found that different methods of categorization did not significantly change the final categories, and the minor changes observed during sensitivity analysis were districts moving to adjacent density clusters—that is, moving over one density cluster.
Choosing the number of District Density Scale clusters
- CityLab originally chose six clusters because “six clusters fit well with understandings of density and demographics as well with trends in politics”.⁴ There is no universally agreed-upon way to define rural/suburban/urban areas. We decided to reassess the number of clusters because we found that some of our users had trouble distinguishing between the original four suburban clusters. Additionally, the original urban cluster was too narrow as it was nearly exclusively composed of congressional districts located in Los Angeles and New York City.
- We used various methods to identify the number of clusters most useful in understanding household density across congressional districts (Elbow, Gap Statistic, and Average Silhouette Methods)⁵ and found that most suggested two to four clusters, but there was no agreement across methods. After carefully reviewing national maps, Congressional District Health Dashboard metric distributions by cluster, and average number of CDs in each cluster, we settled on four clusters, or categories, to make up the District Density Scale. We think four categories—Predominantly Rural, Rural-Suburban Mix, Urban-Suburban Mix, and Predominantly Urban—best capture the underlying concepts we are trying to represent and are easy to understand.

Download Data

Click this link to download an Excel file with the District Density Scale categories for every congressional district in the 119th Congress.

For interested users, we have also provided an Excel file that includes the degree of similarity for each of the four clusters for every congressional district. You can use this information to identify CDs that may have similarity to multiple clusters, even though we ultimately assigned each CD to the cluster with the highest degree of similarity.

References

Montgomery, D. (2018 November 20). CityLab’s Congressional Density Index. Bloomberg. https://www.bloomberg.com/news/articles/2018-11-20/citylab-s-congressional-density-index
Department of Population Health, NYU Langone Health. (2022). 2022 CDHD District Density Index. Congressional District Health Dashboard. https://www.congressionaldistricthealthdashboard.org/article/2022-cdhd-district-density-index
Kolko, J. (2015 May 2021). How Suburban Are Big American Cities? FiveThirtyEight. https://fivethirtyeight.com/features/how-suburban-are-big-american-cities/
Montgomery, D. (2018 October). CityLab Congressional Density Index: Methodology. CityLab GitHub. https://github.com/theatlantic/citylab-data/blob/master/citylab-congress/methodology.md
Data Novia. (2018). Determining The Optimal Number Of Clusters: 3 Must Know Methods. https://www.datanovia.com/en/lessons/determining-the-optimal-number-of-clusters-3-must-know-methods/