AMS_DATA_MINE/README.md
2025-04-21 15:48:13 +00:00

5.3 KiB
Raw Blame History

AMS_DATA_MINE

This repository contains all the code and configuration needed to migrate a legacy FileMaker Pro database to PostgreSQL, import and consolidate CSV survey data, and manage automated backups.

Project Structure

├── postgres/               # PostgreSQL server setup and configuration (e.g., Docker Compose, init scripts)
│   ├── docker-compose.yml
│   └── initdb/             # SQL files to initialize the database schema
├── csv_import/             # Scripts for importing CSV files into staging tables
│   └── import_csvs.sh      # Bulk imports all survey CSVs for DGdata and EENDR
├── sql/                    # SQL migration and data-merging scripts
│   ├── migrate_fmp.sql     # Translates FileMaker Pro exports into PostgreSQL-ready tables
│   ├── merge_dgdata.sql    # Unifies DGdata surveys 20102015 into one table
│   ├── merge_eendr.sql     # Unifies EENDR surveys 20102015 into one table
│   └── merge_surveys.sql   # Combines DGdata and EENDR merged tables into a final survey table
├── backups/                # Backup scripts and cron job configuration
│   ├── backup.sh           # Runs pg_dump and prunes old backups
│   └── crontab.entry       # Cron definition for daily backups at 2AM
└── README.md               # This file

Prerequisites

  • PostgreSQL 12 or newer (local or Docker)
  • Docker & Docker Compose (if using containerized setup)
  • Bash shell
  • Cron daemon for scheduled backups

1. Setup PostgreSQL Server

  1. Clone this repository locally.
  2. Navigate to the postgres/ directory:
    cd postgres/
    
  3. Launch PostgreSQL (Docker Compose):
    docker-compose up -d
    
  4. Verify connectivity:
    psql -h localhost -U <your_user> -d <your_db>
    

2. Data Migration from FileMaker Pro

  1. Export FileMaker tables as CSV files.
  2. Use the SQL in sql/migrate_fmp.sql to create corresponding tables and load data.
    psql -h localhost -U <user> -d <db> -f sql/migrate_fmp.sql
    

3. Importing Survey CSV Data

The csv_import/import_csvs.sh script bulk-loads all DGdata and EENDR CSVs (20102015) into staging tables.

cd csv_import/
./import_csvs.sh

This script assumes the CSV files follow the naming convention dgdata_<year>.csv and eendr_<year>.csv.

4. Merging Survey Data

4.1 Merge DGdata (20102015)

File: sql/merge_dgdata.sql

-- Combine individual DGdata tables into one
DROP TABLE IF EXISTS dgdata_merged;
CREATE TABLE dgdata_merged AS
  SELECT * FROM dgdata_2010
  UNION ALL SELECT * FROM dgdata_2011
  UNION ALL SELECT * FROM dgdata_2012
  UNION ALL SELECT * FROM dgdata_2013
  UNION ALL SELECT * FROM dgdata_2014
  UNION ALL SELECT * FROM dgdata_2015;

4.2 Merge EENDR (20102015)

File: sql/merge_eendr.sql

DROP TABLE IF EXISTS eendr_merged;
CREATE TABLE eendr_merged AS
  SELECT * FROM eendr_2010
  UNION ALL SELECT * FROM eendr_2011
  UNION ALL SELECT * FROM eendr_2012
  UNION ALL SELECT * FROM eendr_2013
  UNION ALL SELECT * FROM eendr_2014
  UNION ALL SELECT * FROM eendr_2015;

4.3 Combine into Final Survey Table

File: sql/merge_surveys.sql

-- Create a unified survey table with NULLs for missing columns
DROP TABLE IF EXISTS surveys_final;
CREATE TABLE surveys_final AS
  SELECT
    d.survey_id,
    d.common_field1,
    d.common_field2,
    d.unique_dg_field,
    NULL          AS unique_eendr_field
  FROM dgdata_merged d
  UNION ALL
  SELECT
    e.survey_id,
    e.common_field1,
    e.common_field2,
    NULL         AS unique_dg_field,
    e.unique_eendr_field
  FROM eendr_merged e;

-- Alternatively, using a full outer join
-- to align on survey_id and preserve all columns
-- DROP TABLE IF EXISTS surveys_final;
-- CREATE TABLE surveys_final AS
-- SELECT COALESCE(d.survey_id, e.survey_id) AS survey_id,
--        d.common_field1,
--        e.common_field2,
--        d.unique_dg_field,
--        e.unique_eendr_field
-- FROM dgdata_merged d
-- FULL OUTER JOIN eendr_merged e USING (survey_id);

5. Automated Backups

A daily backup is scheduled via cron to run at 2AM.

File: backups/crontab.entry

# m h dom mon dow command
0 2 * * * /usr/local/bin/backup.sh

File: backups/backup.sh

#!/usr/bin/env bash
# PostgreSQL connection details
PGUSER="your_pg_user"
PGPASSWORD="your_pg_password"
PGHOST="localhost"
PGPORT="5432"
DBNAME="ams_data_mine"

# Backup directory and rotation
BACKUP_DIR="/var/backups/ams_data_mine"
RETENTION_DAYS=30

# Create backup directory if it doesn't exist
mkdir -p "$BACKUP_DIR"

# Generate filename with date
DATE_STR=$(date +"%F")
BACKUP_FILE="$BACKUP_DIR/${DBNAME}_$DATE_STR.sql"

# Perform the dump
pg_dump -U "$PGUSER" -h "$PGHOST" -p "$PGPORT" "$DBNAME" > "$BACKUP_FILE"

# Remove backups older than retention period
find "$BACKUP_DIR" -type f -mtime +$RETENTION_DAYS -delete

Contributing

  1. Fork the repository.
  2. Create a feature branch (git checkout -b feature/YourFeature).
  3. Commit your changes (git commit -m 'Add new feature').
  4. Push to the branch (git push origin feature/YourFeature).
  5. Open a pull request.

License

This project is released under the MIT License. See LICENSE for details.