postgresql_disk_filling_report/intervention.md

97 lines
3.7 KiB
Markdown
Raw Normal View History

2023-10-28 07:07:13 +00:00
# intervention 20231027
2023-10-28 07:14:22 +00:00
## Troubleshoot
2023-10-28 07:07:13 +00:00
My first action was check the shared drive where `archive_commad` is supposed to be sending _WAL_ files (archivelog): `\\10.6.1.3\archivelog`.
That share does *not* exist anymore.
I check both servers for the existance of such drive and noone of them has that share.
2023-10-28 07:14:22 +00:00
## first solution implementation
2023-10-28 07:07:13 +00:00
Then I look for a place to write archivelogs, I saw that servers have a `R:\` drive with plenty of space.
So I decided to use a cross copy between both servers, that is:
* Primary will copy to backup as: `\\10.6.1.3\R$\postgresql`
* Backup will copy to primary as: `\\10.6.0.3\R$\postgresql`
Using that approach it's a best practice to get archived from each other, and solve switchover/failover issues in the future.
Then I tried adding a network drive, mapping the shared `R:\` in to `Z:\` as:
* primary's `Z:\` as: `\\10.6.1.3\R$\postgresql`
* backup's `Z:\` as : `\\10.6.0.3\R$\postgresql`
My idea at that stage was have a unique `postgresql.conf` because `archive_command` will be the same for both servers:
```conf
archive_command = 'copy "%p" "Z:\\archivelog\\%f"'
```
This setting also create a good configuration, We will not care about switchover/failover in terms of config changes.
2023-10-28 07:14:22 +00:00
## The problem
I perform all my tests on the _backup_ server.
Summary: No matter which command I set on `postgresql.conf->archive_command`, postgresql report *Permission Denied* .
I try all the options I can imagine:
* My prefered solution using `Z:\`
* Direct copy to `\\10.6.0.3\R$`
* Add a new shared drive on the _primary_, for example I shared `\\10.6.0.3\postgresql`
* Grant permissions to `network service` windows "user"
* Grant permissions to `Everyone` windows group.
* Combinations of the above options
Until I run out of options.
Of course when I copied the file via powershell with the Admin user, it worked. All the time.
So I'm sure the problem comes from the user which runs PostgreSQL service, I had faced similar problems in the past.
The problem is that I'm not a windows admin, my knowledge is limited here, I tried everything I could think, but maybe a windows sysadmin will know how to solve that permission problem.
2023-10-28 07:15:56 +00:00
## Current config
It was late for me so I decide to do a temporary solution.
What I did was creaete a local folder on both servers:
```
R:\postgresql\local\archivelog
```
And use:
```conf
archive_command = 'copy "%p" "R:\\postgresql\\local\\archivelog\\%f"'
```
2023-10-28 07:18:17 +00:00
So both _primary_ and _backup_ could execute `archive_command` without problems.
That is far from a recommended practice but solves the `archive_command` to be failing all the time.
As a consequence, PostgreSQL should start removing _WAL_ files from `pg_wal`.
I had to restart the _primary_ server to apply that config, sorrry for that.
## To be done
2023-10-28 07:15:56 +00:00
2023-10-28 07:14:22 +00:00
2023-10-28 07:07:13 +00:00
I modified postgresql.conf so archivecommand is:
archive_command = 'copy "%p" "Z:\\archivelog\%f
```conf
#archive_command = 'copy "%p" "\\\\10.6.1.3\\\archivelog\\%f"' # command to use to archive a logfile segment
archive_command = 'copy "%p" "\\\\10.6.1.3\\\R\$\\postgresql\\archivelog\\%f"'
#archive_command = 'copy "%p" "Z:\\archivelog\\%f"'
#archive_command = 'copy "%p" "R:\\postgresql\\local\\archivelog\\%f"'
```
I tried many options but nothing works, it was related to windows permissions. I tried copying from the powershell with admin user and the copy from one server to the other worked.
I tried adding permissions (as much as I could remember) but nothing worked.
So At the end I decided to archive locally on "R:"
So both server are archiving into "R:\postgresql\local\archivelog"
I restarted the master instance of postgresql because of this, to apply the new setup.