During the high-energy, three-day marathon-style DataDive® event that took place March 4-7, 2021, volunteers on this project set out to do the following:
- Explore the Assessment, Cleanup, and Redevelopment Exchange System (ACRES) database, a little-explored public database published and maintained by the Environmental Protection Agency (EPA), which contains 25 years worth of historical data for environmental cleanup sites in the U.S.
- Assess the ACRES dataset with an eye toward targeting which variables in the data are most related to - or predictive of - the cost of a brownfield cleanup, and build prototype models to determine those variables
- Apply Natural Language Processing classification techniques to free-text fields in order to derive more detailed information about the category of property (school, hospital, railway, etc.) undergoing a cleanup; and use clustering techniques to produce groupings of similar cleanup sites
- While all properties were classified as brownfields, the results of the assessments, and those sites requiring cleanup, differed by state
- Both previous use (e.g., industrial vs. commercial) and future planned use (e.g., residential vs. nonresidential) may impact the cost of a brownfield cleanup
- Sites with contaminants found in sediments, soil, groundwater, and surface water are, on average, costlier than sites with contaminant found in building materials
- Certain contaminants (e.g., pesticides, other metals, PAHs), as well as economic indicators (e.g., percent below poverty line, vacant housing, median income) may be predictive of cleanup cost
All communities face complex social and environmental challenges. The impact of these challenges is most acute in historically underserved urban areas, economically depressed rural towns, and tribal communities, which are disproportionately home to minorities and people living on low incomes. Rehabilitating underutilized properties in “environmental justice” communities can be an opportunity to both clean up environmental contamination and stimulate economic growth and quality of life improvements. However, with potential high cost and liability for cleanup due to unknown environmental conditions, these properties (known as brownfields) can deter investment, which will exacerbate the decay of a neighborhood if left abandoned for years.
Impacts of brownfields extend beyond their burden on economic growth: exposure to chemicals and unsafe building conditions impact community health, increase cases of crime and vandalism, and attract illegal dumping. Children and pregnant women are the most vulnerable when living near contaminated properties, as documented through birth defects, higher rates of asthma, and elevated levels of lead in their blood.
Redevelopment of contaminated properties not only eliminates exposure to toxic chemicals in vulnerable populations, the reuse of these properties presents unique opportunities to catalyze the revitalization of underserved communities. However, the cost of assessment and cleanup of these brownfields can vary wildly, and these processes are technically complex and subject to a regulatory process, all of which can be a barrier to developers looking to invest in the area. To redevelop a site, owners need to go through up to three general stages, depending on the outcomes of each: Phase I Environmental Site Assessment, typically costing between $2,000 and $5000; Phase II Site Investigation, typically costing $10,000 to $30,000 but can range up to $100,000 on larger sites; and finally, Cleanup and Remediation, which costs $50,000 to $1,000,000, but can range to >$10 million on more complicated sites. (See the EPA's Brownfields Road Map for more information.)
If the cost of investigation and cleanup of a property is understood early in the planning process, communities can make risk-based decisions, secure appropriate funding, and accelerate redevelopment of these sites. Furthermore, this understanding will enhance the ability for community organizations to invest in real estate and control redevelopment in their neighborhood and combat gentrification.
Community Lattice, one of the data.org Inclusive Growth and Recovery Challenge award winners, advances community development by equipping people with the data and resources they need to create economically sustainable, socially equitable, and resilient places. They work primarily with underserved urban and rural communities throughout the U.S. to identify and overcome environmental challenges inhibiting equitable growth.
Community Lattice sought DataKind’s help to better understand cleanup costs of brownfields, which are a significant strain on an area’s tax base and inhibit economic growth. Utilizing the the U.S. Environmental Protection Agency's (EPA) Assessment, Cleanup, and Redevelopment Exchange System (ACRES) data, which contains site, contaminant, and cost information on over 25 years of brownfields redevelopment projects, Community Lattice wanted to explore creating a Cleanup Cost Calculator to predict the cost of brownfields cleanup to address the environmental uncertainty and financial risk associated with brownfields redevelopments.
The model will utilize publicly available data and will allow users to enter site-specific information to understand what other similar sites have been cleaned up, as well as the corresponding cost. This site comparison and predictive model will enable disadvantaged communities to (a) better understand the environmental hazards surrounding them, (b) help them identify and prioritize brownfields redevelopment projects, (c) apply for grant funding to advance cleanup and redevelopment projects, and (d) attract public and private partners who can invest in redevelopment of these contaminated properties so that disenfranchised communities are able to benefit from economic revitalization. If successful, this could potentially transform a community’s ability to secure funding, improve community health, and create economic opportunities.
During the DataDive event, volunteers dove head-first into a sparse, messy, and technically complex ACRES dataset in order to understand in more detail what aspects of a cleanup site are responsible for its cleanup cost.
Due to the highly specified nature of the dataset, as well as the fact that there are no universal data dictionaries for the ACRES database, the first order of business was to conduct some research. Members of the volunteer team generated data dictionaries, reports, and presentations to orient current and future volunteers who were new to the subject area, highlighting what brownfields are, what steps comprise the cleanup process, what types of sites and contaminants are identified, and how cleanups are funded. With this background, volunteers were able to then assess the dataset with confidence.
Next, the team cleaned the data. They wrote cleaning code to standardize the ACRES dataset for analysis, re-coding categorical variables and information missing from the database, saving Community Lattice and others who will work with this dataset in the future valuable time.
Third, the team explored the dataset. They took the ACRES database and conducted exploratory analysis on brownfield cleanups according to:
- Phase (Phase I Assessment, Phase II Assessment, Cleanup, and Remediation)
- Over time
- Type of redevelopment site
Then, the volunteer team worked on the cost modeling component of the project, conducting feature engineering on the ACRES dataset and attempting a number of prototype modeling approaches in order to isolate those variables most important to the cost of a brownfield cleanup. To this end, linear regression, logistic regression, and random forest modeling techniques were applied to the ACRES data.
Finally, the team applied Seeded LDA (Latent Dirichlet Allocation), a topic modeling technique used in NLP, to parse free-text fields describing the site being cleaned up and isolate terms indicative of the category or type of property (such as a former industrial site). By count, the topics most frequently tagged in the ACRES dataset were: “agriculture”, “mining”, and “oils”.
All of the above proved to be valuable early exploration into and learnings about this dataset - and these learnings will both be shared with representatives of the EPA to inform future usage and data collection, and built upon for a more in-depth iteration of the project (see the next section).
"[Everyone] has been so great and helpful! The data cleaning and EDA have been super helpful (definitely saving like a month of my time), and the idea to try and model whether something will go to cleanup was something I hadn’t thought of yet and was a good idea." -Shannon Loomis, Community Lattice
While the DataDive event represented a big step forward in the analysis of the ACRES database and construction of a cleanup cost model, it was only the first step in the ongoing collaboration between DataKind and Community Lattice. These insights will give users critical information about brownfields projects with similar challenges so community developers can better estimate time and cost of revitalization efforts. This information will be used to support community-based organizations in leveraging resources to address environmental issues and to fulfill their own vision for revitalization. We’re expanding the work from this project into a full DataCorps engagement, and we’re eager to kick off in late Spring 2021! Stay tuned for a May Call for Volunteers shared across our social media channels – for this project and a few others. Until then, start thinking of how you can share skills in these areas as a volunteer:
- R programming
- Exploratory data analysis (EDA)
- Data munging and cleaning
- Feature extraction
- Statistical modeling
- Principal component analysis (PCA) or other dimensionality reduction techniques
- Clustering: k-means or other methods
- NLP classification methods
- Data visualization
- Shiny application development
"With all the changes that are being brought about by the new administration, this is an ideal time to provide EPA with insights from this DataDive." -Dr. Cynthia Annett, Kansas State University & Tribal Technical Assistance to Brownfields
- Powering Public Data for Communities: Highlights from Virtual DataDive️ Event
- About Community Lattice
- Brownfields 101
- DataKind & Community Lattice Participate in Human Rights and Data Science Conference
- About Kansas State University’s Tribal Technical Assistance to Brownfields Program
- How Can Data Science Drive Equitable Growth and Recovery? Our $10M Challenge Offers Some Insights into the Future.
- Interested in sponsoring a project? Partner with us.
- Interested in supporting our work? Donate here.
- Interested in volunteering with DataKind? Look no further.
- Interested in submitting a project? Go for it!
Image above courtesy of Community Lattice.