4.2 KiB
4.2 KiB
AMS-Data-Mine
A migration and ETL pipeline to move legacy FileMaker Pro data into PostgreSQL, ingest and consolidate DGdata and EENDR survey CSVs (2010–2015), and maintain automated backups.
Repository Structure
.
├── docker-compose.yml # Defines Postgres (and optional tunnel) services
├── .env # Environment variables (create locally)
├── csv_postgres.py # Python script to import CSV files into Postgres
├── clean.py # Utility to sanitize CSV column names
├── backup_postgres_single_db.sh # Backup script for Postgres (docker exec + pg_dump + gzip)
├── sql/ # SQL scripts for merging survey data
│ ├── merge_dgdata.sql
│ ├── merge_eendr.sql
│ └── merge_surveys.sql
└── README.md # This file
Prerequisites
- Docker & Docker Compose
- Python 3.7+
- pip packages:
pip install pandas psycopg2-binary python-dotenv - (Optional) Cloudflare Tunnel token for secure exposure
1. Environment Setup
- Create a file named
.envat the project root with the following:POSTGRES_USER=your_pg_username POSTGRES_PASSWORD=your_pg_password POSTGRES_DB=your_db_name # If using cloudflared tunnel: TUNNEL_TOKEN=your_cloudflare_tunnel_token - Start services:
docker-compose up -d
This brings up:
- postgres: PostgreSQL 15, port 5432
- cloudflared (if configured): runs
tunnel runto expose Postgres
2. Migrating FileMaker Pro Data
- Export each FileMaker Pro table as a CSV file.
- (Optional) Clean column names to valid SQL identifiers:
python3 clean.py path/to/input.csv path/to/output.csv - Place your CSV files into the host directory mounted by Docker (default
/home/ams/postgres/csv_files/).
3. Ingesting CSV Data
Run the import script:
python3 csv_postgres.py
What it does:
- Reads all
.csvfiles from/home/ams/postgres/csv_files/. - Drops entirely empty columns and converts DataFrame types to
INTEGER,FLOAT, orTEXT. - Creates tables named
survey_data_<filename>and inserts all rows. - Moves processed CSVs to
/home/ams/postgres/csv_files_old/.
4. Merging Survey Data with SQL
Example queries for merging tables:
-
sql/merge_dgdata.sql
DROP TABLE IF EXISTS dgdata_merged; CREATE TABLE dgdata_merged AS SELECT * FROM survey_data_dgdata_2010 UNION ALL SELECT * FROM survey_data_dgdata_2011 -- ...repeat through 2015... ; -
sql/merge_eendr.sql
DROP TABLE IF EXISTS eendr_merged; CREATE TABLE eendr_merged AS SELECT * FROM survey_data_eendr_2010 UNION ALL -- ...through 2015... ; -
sql/merge_surveys.sql
DROP TABLE IF EXISTS surveys_final; CREATE TABLE surveys_final AS SELECT COALESCE(d.survey_id, e.survey_id) AS survey_id, d.common_field1, d.common_field2, d.unique_dg_field, e.unique_eendr_field FROM dgdata_merged d FULL OUTER JOIN eendr_merged e USING (survey_id, common_field1, common_field2);
Columns not present in one survey will appear as NULL.
Run any merge script with:
psql -h localhost -U $POSTGRES_USER -d $POSTGRES_DB -f sql/merge_surveys.sql
5. Automated Backups
Backups are handled by backup_postgres_single_db.sh:
#!/bin/bash
# Load variables
source .env
# Settings
CONTAINER_NAME=postgres
POSTGRES_USER=$POSTGRES_USER
POSTGRES_PASSWORD=$POSTGRES_PASSWORD
POSTGRES_DB=$POSTGRES_DB
BACKUP_DIR=/home/ams/postgres/backups
TIMESTAMP=$(date +"%Y%m%d%H%M%S")
BACKUP_FILE=$BACKUP_DIR/${POSTGRES_DB}_backup_$TIMESTAMP.sql
mkdir -p $BACKUP_DIR
docker exec -e PGPASSWORD=$POSTGRES_PASSWORD -t $CONTAINER_NAME \
pg_dump -U $POSTGRES_USER $POSTGRES_DB > $BACKUP_FILE
gzip $BACKUP_FILE
# Optional retention:
# find $BACKUP_DIR -type f -name "${POSTGRES_DB}_backup_*.sql.gz" -mtime +7 -delete
Schedule daily backups (e.g., at 3 AM) via cron:
0 3 * * * /path/to/backup_postgres_single_db.sh
