183 lines
5.3 KiB
Markdown
183 lines
5.3 KiB
Markdown
# AMS_DATA_MINE
|
||
|
||
This repository contains all the code and configuration needed to migrate a legacy FileMaker Pro database to PostgreSQL, import and consolidate CSV survey data, and manage automated backups.
|
||
|
||
## Project Structure
|
||
|
||
```bash
|
||
├── postgres/ # PostgreSQL server setup and configuration (e.g., Docker Compose, init scripts)
|
||
│ ├── docker-compose.yml
|
||
│ └── initdb/ # SQL files to initialize the database schema
|
||
├── csv_import/ # Scripts for importing CSV files into staging tables
|
||
│ └── import_csvs.sh # Bulk imports all survey CSVs for DGdata and EENDR
|
||
├── sql/ # SQL migration and data-merging scripts
|
||
│ ├── migrate_fmp.sql # Translates FileMaker Pro exports into PostgreSQL-ready tables
|
||
│ ├── merge_dgdata.sql # Unifies DGdata surveys 2010–2015 into one table
|
||
│ ├── merge_eendr.sql # Unifies EENDR surveys 2010–2015 into one table
|
||
│ └── merge_surveys.sql # Combines DGdata and EENDR merged tables into a final survey table
|
||
├── backups/ # Backup scripts and cron job configuration
|
||
│ ├── backup.sh # Runs pg_dump and prunes old backups
|
||
│ └── crontab.entry # Cron definition for daily backups at 2 AM
|
||
└── README.md # This file
|
||
```
|
||
|
||
## Prerequisites
|
||
|
||
- PostgreSQL 12 or newer (local or Docker)
|
||
- Docker & Docker Compose (if using containerized setup)
|
||
- Bash shell
|
||
- Cron daemon for scheduled backups
|
||
|
||
## 1. Setup PostgreSQL Server
|
||
|
||
1. Clone this repository locally.
|
||
2. Navigate to the `postgres/` directory:
|
||
```bash
|
||
cd postgres/
|
||
```
|
||
3. Launch PostgreSQL (Docker Compose):
|
||
```bash
|
||
docker-compose up -d
|
||
```
|
||
4. Verify connectivity:
|
||
```bash
|
||
psql -h localhost -U <your_user> -d <your_db>
|
||
```
|
||
|
||
## 2. Data Migration from FileMaker Pro
|
||
|
||
1. Export FileMaker tables as CSV files.
|
||
2. Use the SQL in `sql/migrate_fmp.sql` to create corresponding tables and load data.
|
||
```bash
|
||
psql -h localhost -U <user> -d <db> -f sql/migrate_fmp.sql
|
||
```
|
||
|
||
## 3. Importing Survey CSV Data
|
||
|
||
The `csv_import/import_csvs.sh` script bulk-loads all DGdata and EENDR CSVs (2010–2015) into staging tables.
|
||
|
||
```bash
|
||
cd csv_import/
|
||
./import_csvs.sh
|
||
```
|
||
|
||
This script assumes the CSV files follow the naming convention `dgdata_<year>.csv` and `eendr_<year>.csv`.
|
||
|
||
## 4. Merging Survey Data
|
||
|
||
### 4.1 Merge DGdata (2010–2015)
|
||
|
||
**File: `sql/merge_dgdata.sql`**
|
||
```sql
|
||
-- Combine individual DGdata tables into one
|
||
DROP TABLE IF EXISTS dgdata_merged;
|
||
CREATE TABLE dgdata_merged AS
|
||
SELECT * FROM dgdata_2010
|
||
UNION ALL SELECT * FROM dgdata_2011
|
||
UNION ALL SELECT * FROM dgdata_2012
|
||
UNION ALL SELECT * FROM dgdata_2013
|
||
UNION ALL SELECT * FROM dgdata_2014
|
||
UNION ALL SELECT * FROM dgdata_2015;
|
||
```
|
||
|
||
### 4.2 Merge EENDR (2010–2015)
|
||
|
||
**File: `sql/merge_eendr.sql`**
|
||
```sql
|
||
DROP TABLE IF EXISTS eendr_merged;
|
||
CREATE TABLE eendr_merged AS
|
||
SELECT * FROM eendr_2010
|
||
UNION ALL SELECT * FROM eendr_2011
|
||
UNION ALL SELECT * FROM eendr_2012
|
||
UNION ALL SELECT * FROM eendr_2013
|
||
UNION ALL SELECT * FROM eendr_2014
|
||
UNION ALL SELECT * FROM eendr_2015;
|
||
```
|
||
|
||
### 4.3 Combine into Final Survey Table
|
||
|
||
**File: `sql/merge_surveys.sql`**
|
||
```sql
|
||
-- Create a unified survey table with NULLs for missing columns
|
||
DROP TABLE IF EXISTS surveys_final;
|
||
CREATE TABLE surveys_final AS
|
||
SELECT
|
||
d.survey_id,
|
||
d.common_field1,
|
||
d.common_field2,
|
||
d.unique_dg_field,
|
||
NULL AS unique_eendr_field
|
||
FROM dgdata_merged d
|
||
UNION ALL
|
||
SELECT
|
||
e.survey_id,
|
||
e.common_field1,
|
||
e.common_field2,
|
||
NULL AS unique_dg_field,
|
||
e.unique_eendr_field
|
||
FROM eendr_merged e;
|
||
|
||
-- Alternatively, using a full outer join
|
||
-- to align on survey_id and preserve all columns
|
||
-- DROP TABLE IF EXISTS surveys_final;
|
||
-- CREATE TABLE surveys_final AS
|
||
-- SELECT COALESCE(d.survey_id, e.survey_id) AS survey_id,
|
||
-- d.common_field1,
|
||
-- e.common_field2,
|
||
-- d.unique_dg_field,
|
||
-- e.unique_eendr_field
|
||
-- FROM dgdata_merged d
|
||
-- FULL OUTER JOIN eendr_merged e USING (survey_id);
|
||
```
|
||
|
||
## 5. Automated Backups
|
||
|
||
A daily backup is scheduled via cron to run at 2 AM.
|
||
|
||
**File: `backups/crontab.entry`**
|
||
```cron
|
||
# m h dom mon dow command
|
||
0 2 * * * /usr/local/bin/backup.sh
|
||
```
|
||
|
||
**File: `backups/backup.sh`**
|
||
```bash
|
||
#!/usr/bin/env bash
|
||
# PostgreSQL connection details
|
||
PGUSER="your_pg_user"
|
||
PGPASSWORD="your_pg_password"
|
||
PGHOST="localhost"
|
||
PGPORT="5432"
|
||
DBNAME="ams_data_mine"
|
||
|
||
# Backup directory and rotation
|
||
BACKUP_DIR="/var/backups/ams_data_mine"
|
||
RETENTION_DAYS=30
|
||
|
||
# Create backup directory if it doesn't exist
|
||
mkdir -p "$BACKUP_DIR"
|
||
|
||
# Generate filename with date
|
||
DATE_STR=$(date +"%F")
|
||
BACKUP_FILE="$BACKUP_DIR/${DBNAME}_$DATE_STR.sql"
|
||
|
||
# Perform the dump
|
||
pg_dump -U "$PGUSER" -h "$PGHOST" -p "$PGPORT" "$DBNAME" > "$BACKUP_FILE"
|
||
|
||
# Remove backups older than retention period
|
||
find "$BACKUP_DIR" -type f -mtime +$RETENTION_DAYS -delete
|
||
```
|
||
|
||
## Contributing
|
||
|
||
1. Fork the repository.
|
||
2. Create a feature branch (`git checkout -b feature/YourFeature`).
|
||
3. Commit your changes (`git commit -m 'Add new feature'`).
|
||
4. Push to the branch (`git push origin feature/YourFeature`).
|
||
5. Open a pull request.
|
||
|
||
## License
|
||
|
||
This project is released under the MIT License. See [LICENSE](LICENSE) for details.
|
||
|