Geospatial Scripts

This repository contains a collection of Python scripts created for processing, cleaning, and cross-referencing geospatial and demographic data from IBGE (Brazilian Institute of Geography and Statistics), SISMIGRA, and CRAI. The scripts were developed with a focus on automation, generating geographic meshes (GeoPackage files), and heat maps for the state of São Paulo and all of Brazil.

Methodology

The data engineering of this project used the GeoPandas library as the main engine for spatial operations, alongside the following methodologies:

Official Data Extraction (APIs): Direct connections to IBGE V2 and V3 APIs to fetch territorial meshes (Shapefiles converted to GeoJSONs) of municipalities, states, and aggregated census data.
Spatial and Tabular Joins (Inner Joins): Cross-referencing government databases (SISMIGRA, CRAI Databases) with maps using normalized municipality and district names.
Graph Theory: Use of the NetworkX library with the Minimum Spanning Tree algorithm to calculate the shortest distances and draw railway paths connecting isolated train and subway stations in São Paulo.
Cleaning and Standardization: Cleaning algorithms to remove accents (Regex) and adjust text formatting using a base data dictionary, minimizing data loss when joining with the official IBGE database.
Demographics by Categories: Cross-filters applied using the Pandas library to classify gender densities (Demographic Proportion) and isolate specific nationalities.

Prerequisites

To run the scripts, it is recommended to install the following libraries:

pip install geopandas pandas requests networkx shapely

Tip: It is also possible to run these scripts natively using the Python environment attached to QGIS.

Script Index

🌍 IBGE Integrations

process_censo.py: Connects to the IBGE API, downloads the territorial limits (mesh) of São Paulo, and groups the total populations from the Demographic Census.

📊 SISMIGRA Processing

process_sismigra.py: The first basic SISMIGRA join, uniting municipality data with the official São Paulo mesh.
process_sismigra_historico.py: Reads demographic data, counts records by city (filling zero-count municipalities with -1), and generates a layer of absolute data.
process_sismigra_predominancia.py: Performs socio-demographic reading of the file to calculate proportions (Male Majority, Female Majority, or Balanced).

🏠 CRAI Database

process_crai.py: Reads data from the CRAI Database, uses the Data Dictionary to standardize columns, and cross-references it with the official map of districts in the capital.
filter_bolivia.py: Filters the general CRAI layer, retaining exclusively data corresponding to Bolivian immigrants.

💼 National Entrepreneurs Processing (MEIs)

process_meis_nacional.py: Queries the IBGE National API, processes the extraction of state acronyms (UFs), and maps entrepreneurs across the entire Brazilian territory.

🚆 Urban Mobility (Graphs)

draw_rail_lines.py: Loads Shapefiles of isolated stations and draws the official connection line for Trains and Subways via the Minimum Spanning Tree algorithm.

🧹 Cleaning Utilities

extract_csv.py: Cleans the heavy original CSV tables, extracting only the columns used in processing and adding blank checking columns.
test_agg.py: Support script used to test Pandas aggregation functions before inserting them into production pipelines.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Geospatial Scripts

Methodology

Prerequisites

Script Index

🌍 IBGE Integrations

📊 SISMIGRA Processing

🏠 CRAI Database

💼 National Entrepreneurs Processing (MEIs)

🚆 Urban Mobility (Graphs)

🧹 Cleaning Utilities

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
draw_rail_lines.py		draw_rail_lines.py
extract_csv.py		extract_csv.py
filter_bolivia.py		filter_bolivia.py
process_censo.py		process_censo.py
process_crai.py		process_crai.py
process_meis_nacional.py		process_meis_nacional.py
process_sismigra.py		process_sismigra.py
process_sismigra_historico.py		process_sismigra_historico.py
process_sismigra_predominancia.py		process_sismigra_predominancia.py
test_agg.py		test_agg.py

Folders and files

Latest commit

History

Repository files navigation

Geospatial Scripts

Methodology

Prerequisites

Script Index

🌍 IBGE Integrations

📊 SISMIGRA Processing

🏠 CRAI Database

💼 National Entrepreneurs Processing (MEIs)

🚆 Urban Mobility (Graphs)

🧹 Cleaning Utilities

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages