.gitignore | ||
intervention.md | ||
README.md |
TOC
Servers
ip | role |
---|---|
10.3.0.3 |
primary |
10.6.1.3 |
backup |
The problem
Just open the log give us information about the error:
The network name cannot be found.
2023-10-19 09:45:56.140 BST [4600] LOG: archive command failed with exit code 1
2023-10-19 09:45:56.140 BST [4600] DETAIL: The failed archive command was: copy "pg_wal\000000010000007C00000019" "\\10.6.1.3\archivelog\000000010000007C00000019"
2023-10-19 09:45:56.140 BST [4600] WARNING: archiving write-ahead log file "000000010000007C00000019" failed too many times, will try again later
The command executed by archive_command
is failing.
The archive_command
on PostgreSql main configuration file postgresql.conf
is set as:
archive_command = 'copy "%p" "\\\\10.6.1.3\\\archivelog\\%f"'
Additionally, We can see the following line:
The network name cannot be found.
Without connecting again, in my oppinion, the ip address: 10.6.1.3
(which is the backup server), either is not accessible via windows shared drive or the shared drive \\10.6.1.3\archivelog
is full (0% space available).
Anyway, the result of the archive_command
is ERROR. And the default behaviour of PostgreSql in such case is DON'T delete the WAL file until archive_command
succeed.
PostgreSql Documentation is clear at that point, including an example:
While designing your archiving setup, consider what will happen if the archive command fails repeatedly because some aspect requires operator intervention or the archive runs out of space. For example, this could occur if you write to tape without an autochanger; when the tape fills, nothing further can be archived until the tape is swapped. You should ensure that any error condition or request to a human operator is reported appropriately so that the situation can be resolved reasonably quickly. The pg_wal/ directory will continue to fill with WAL segment files until the situation is resolved. (If the file system containing pg_wal/ fills up, PostgreSQL will do a PANIC shutdown. No committed transactions will be lost, but the database will remain offline until you free some space.)
That's the cause of the problems you're facing. As the result of archive_command
fail, WAL are not being automatically deleted then the disk will run out of space. PostgreSql is waiting for manual intervention so archive_command
ends with OK and it can continue deleting old WAL files.
Solutions
Solution 1: Solve the connectivity problems
First solution should be fast and doesn't involve PostgreSql reconfiguration/restart.
You have to check:
- Connectivity from primary to the shared Network drive:
\\10.6.1.3\archivelog
- Disk space available on the backup server in the underlying drive of the shared network drive
\\10.6.1.3\archivelog
Additional actions on this solution
To avoid possible future failures.
Add check to the monitorization tool to check:
- Connectivity from primary server to the shared Network drive
- Connectivity from backup server to the shared Network drive
- Disk space available on the backup server in the underlying drive of the shared network drive
Solution 2: Change archive_command
This solution will involve PostgreSql server to be restarted to apply changes.
Modify archive_command
to copy WAL files locally:
archive_command = 'copy "%p" "C:\\APIS\\Postgres12\\primary_archivelog\\%f"'
Create the destination directory:
mkdir C:\APIS\Postgres12\primary_archivelog
And restart PostgreSql.
Note: This solution is not recommended because you won't have cross-repliaction. That is, in our initial recomendations we suggest that primary server copies the WAL files to the backup server to be able to execute a Point-in-time-recover (PITR), that is a best-practice.
Additional notes and recommendations
PostgreSQL version
You're running version 12 which has EOL (end-of-life) support on November 14, 2024.
Our recommendation is to upgrade to v16.
v16 will be on production state for many months when final upgrade happens, so we suggest to plan the version upgrade as soon as possible.
Archivelog location
Consider having a external shared storage shared with both primary and backup for archivelog storage.
Thales Hyderabad info
You can safely ignore this.
Archive: wetransfer_hyderabad-metro-apis-pg-logs_2023-10-19_0855.zip
Written using Zippy
Length Date Time Name
--------- ---------- ----- ----
9315434 2023-10-19 10:56 10.3.0.3 primary server PG log.zip
5903497 2023-10-19 10:56 10.6.1.3 backup server PG logs.zip
--------- -------
15218931 2 files