Update README.md

This commit is contained in:
ahauck176 2025-04-21 15:54:52 +00:00
parent e34d694496
commit df04ef3527

183
README.md
View File

@ -1,182 +1 @@
# AMS_DATA_MINE
This repository contains all the code and configuration needed to migrate a legacy FileMaker Pro database to PostgreSQL, import and consolidate CSV survey data, and manage automated backups.
## Project Structure
```bash
├── postgres/ # PostgreSQL server setup and configuration (e.g., Docker Compose, init scripts)
│ ├── docker-compose.yml
│ └── initdb/ # SQL files to initialize the database schema
├── csv_import/ # Scripts for importing CSV files into staging tables
│ └── import_csvs.sh # Bulk imports all survey CSVs for DGdata and EENDR
├── sql/ # SQL migration and data-merging scripts
│ ├── migrate_fmp.sql # Translates FileMaker Pro exports into PostgreSQL-ready tables
│ ├── merge_dgdata.sql # Unifies DGdata surveys 20102015 into one table
│ ├── merge_eendr.sql # Unifies EENDR surveys 20102015 into one table
│ └── merge_surveys.sql # Combines DGdata and EENDR merged tables into a final survey table
├── backups/ # Backup scripts and cron job configuration
│ ├── backup.sh # Runs pg_dump and prunes old backups
│ └── crontab.entry # Cron definition for daily backups at 2AM
└── README.md # This file
```
## Prerequisites
- PostgreSQL 12 or newer (local or Docker)
- Docker & Docker Compose (if using containerized setup)
- Bash shell
- Cron daemon for scheduled backups
## 1. Setup PostgreSQL Server
1. Clone this repository locally.
2. Navigate to the `postgres/` directory:
```bash
cd postgres/
```
3. Launch PostgreSQL (Docker Compose):
```bash
docker-compose up -d
```
4. Verify connectivity:
```bash
psql -h localhost -U <your_user> -d <your_db>
```
## 2. Data Migration from FileMaker Pro
1. Export FileMaker tables as CSV files.
2. Use the SQL in `sql/migrate_fmp.sql` to create corresponding tables and load data.
```bash
psql -h localhost -U <user> -d <db> -f sql/migrate_fmp.sql
```
## 3. Importing Survey CSV Data
The `csv_import/import_csvs.sh` script bulk-loads all DGdata and EENDR CSVs (20102015) into staging tables.
```bash
cd csv_import/
./import_csvs.sh
```
This script assumes the CSV files follow the naming convention `dgdata_<year>.csv` and `eendr_<year>.csv`.
## 4. Merging Survey Data
### 4.1 Merge DGdata (20102015)
**File: `sql/merge_dgdata.sql`**
```sql
-- Combine individual DGdata tables into one
DROP TABLE IF EXISTS dgdata_merged;
CREATE TABLE dgdata_merged AS
SELECT * FROM dgdata_2010
UNION ALL SELECT * FROM dgdata_2011
UNION ALL SELECT * FROM dgdata_2012
UNION ALL SELECT * FROM dgdata_2013
UNION ALL SELECT * FROM dgdata_2014
UNION ALL SELECT * FROM dgdata_2015;
```
### 4.2 Merge EENDR (20102015)
**File: `sql/merge_eendr.sql`**
```sql
DROP TABLE IF EXISTS eendr_merged;
CREATE TABLE eendr_merged AS
SELECT * FROM eendr_2010
UNION ALL SELECT * FROM eendr_2011
UNION ALL SELECT * FROM eendr_2012
UNION ALL SELECT * FROM eendr_2013
UNION ALL SELECT * FROM eendr_2014
UNION ALL SELECT * FROM eendr_2015;
```
### 4.3 Combine into Final Survey Table
**File: `sql/merge_surveys.sql`**
```sql
-- Create a unified survey table with NULLs for missing columns
DROP TABLE IF EXISTS surveys_final;
CREATE TABLE surveys_final AS
SELECT
d.survey_id,
d.common_field1,
d.common_field2,
d.unique_dg_field,
NULL AS unique_eendr_field
FROM dgdata_merged d
UNION ALL
SELECT
e.survey_id,
e.common_field1,
e.common_field2,
NULL AS unique_dg_field,
e.unique_eendr_field
FROM eendr_merged e;
-- Alternatively, using a full outer join
-- to align on survey_id and preserve all columns
-- DROP TABLE IF EXISTS surveys_final;
-- CREATE TABLE surveys_final AS
-- SELECT COALESCE(d.survey_id, e.survey_id) AS survey_id,
-- d.common_field1,
-- e.common_field2,
-- d.unique_dg_field,
-- e.unique_eendr_field
-- FROM dgdata_merged d
-- FULL OUTER JOIN eendr_merged e USING (survey_id);
```
## 5. Automated Backups
A daily backup is scheduled via cron to run at 2AM.
**File: `backups/crontab.entry`**
```cron
# m h dom mon dow command
0 2 * * * /usr/local/bin/backup.sh
```
**File: `backups/backup.sh`**
```bash
#!/usr/bin/env bash
# PostgreSQL connection details
PGUSER="your_pg_user"
PGPASSWORD="your_pg_password"
PGHOST="localhost"
PGPORT="5432"
DBNAME="ams_data_mine"
# Backup directory and rotation
BACKUP_DIR="/var/backups/ams_data_mine"
RETENTION_DAYS=30
# Create backup directory if it doesn't exist
mkdir -p "$BACKUP_DIR"
# Generate filename with date
DATE_STR=$(date +"%F")
BACKUP_FILE="$BACKUP_DIR/${DBNAME}_$DATE_STR.sql"
# Perform the dump
pg_dump -U "$PGUSER" -h "$PGHOST" -p "$PGPORT" "$DBNAME" > "$BACKUP_FILE"
# Remove backups older than retention period
find "$BACKUP_DIR" -type f -mtime +$RETENTION_DAYS -delete
```
## Contributing
1. Fork the repository.
2. Create a feature branch (`git checkout -b feature/YourFeature`).
3. Commit your changes (`git commit -m 'Add new feature'`).
4. Push to the branch (`git push origin feature/YourFeature`).
5. Open a pull request.
## License
This project is released under the MIT License. See [LICENSE](LICENSE) for details.
# AMS-Data-Mine