Detecting Corruption in Procurement Data with Interactive Reporting Tools and GraphDBs
💻Live App 👈
As a Data Scientist for the Secretaría de Transparencia of the Presidency of Colombia I collaborated on the creation of transparency web portal Paco. The goal was to facilitate the understanding and exploration of government procurement and early detection of corruption risk through an easy-to-use application.
Among my contributions was the setup of ETL pipelines from multiple data sources that feed the website and the creation of on-demand automated PDF Reports that users can generate to summarize up-to-date contracting activity of tenders, government entities, and cities and departments. I created the reports with SQL queries and the FPDF python library. The city and department reports included a network of shared tenders, visualized through NetworkX to assess the extent to which geographically-based entities hired the same tenders, and a zooming feature was implemented to help visualize densely connected nodes.
I also created a proposal for the early detection of corruption with graph data science on a Neo4j graph database of public procurement.