postgresql_disk_filling_report/intervention.md
2023-10-28 10:34:05 +02:00

153 lines
5.7 KiB
Markdown

<!-- vim-markdown-toc GFM -->
* [intervention 20231027](#intervention-20231027)
* [Troubleshoot](#troubleshoot)
* [first solution implementation](#first-solution-implementation)
* [The problem](#the-problem)
* [Current config](#current-config)
* [To be done](#to-be-done)
* [Option #1 for archiving (recommended)](#option-1-for-archiving-recommended)
* [Option #2 for archiving](#option-2-for-archiving)
* [Additional steps for any solution](#additional-steps-for-any-solution)
* [Archivelog folder cleanup](#archivelog-folder-cleanup)
* [old](#old)
<!-- vim-markdown-toc -->
# intervention 20231027
## Troubleshoot
My first action was check the shared drive where `archive_commad` is supposed to be sending _WAL_ files (archivelog): `\\10.6.1.3\archivelog`.
That share does *not* exist anymore.
I check both servers for the existance of such drive and noone of them has that share.
## first solution implementation
Then I look for a place to write archivelogs, I saw that servers have a `R:\` drive with plenty of space.
So I decided to use a cross copy between both servers, that is:
* Primary will copy to backup as: `\\10.6.1.3\R$\postgresql`
* Backup will copy to primary as: `\\10.6.0.3\R$\postgresql`
Using that approach it's a best practice to get archived from each other, and solve switchover/failover issues in the future.
Then I tried adding a network drive, mapping the shared `R:\` in to `Z:\` as:
* primary's `Z:\` as: `\\10.6.1.3\R$\postgresql`
* backup's `Z:\` as : `\\10.6.0.3\R$\postgresql`
My idea at that stage was have a unique `postgresql.conf` because `archive_command` will be the same for both servers:
```conf
archive_command = 'copy "%p" "Z:\\archivelog\\%f"'
```
This setting also create a good configuration, We will not care about switchover/failover in terms of config changes.
## The problem
I perform all my tests on the _backup_ server.
Summary: No matter which command I set on `postgresql.conf->archive_command`, postgresql report *Permission Denied* .
I try all the options I can imagine:
* My prefered solution using `Z:\`
* Direct copy to `\\10.6.0.3\R$`
* Add a new shared drive on the _primary_, for example I shared `\\10.6.0.3\postgresql`
* Grant permissions to `network service` windows "user"
* Grant permissions to `Everyone` windows group.
* Combinations of the above options
Until I run out of options.
Of course when I copied the file via powershell with the Admin user, it worked. All the time.
So I'm sure the problem comes from the user which runs PostgreSQL service, I had faced similar problems in the past.
The problem is that I'm not a windows admin, my knowledge is limited here, I tried everything I could think, but maybe a windows sysadmin will know how to solve that permission problem.
## Current config
It was late for me so I decide to do a temporary solution.
What I did was creaete a local folder on both servers:
```
R:\postgresql\local\archivelog
```
And use:
```conf
archive_command = 'copy "%p" "R:\\postgresql\\local\\archivelog\\%f"'
```
So both _primary_ and _backup_ could execute `archive_command` without problems.
That is far from a recommended practice but solves the `archive_command` to be failing all the time.
As a consequence, PostgreSQL should start removing _WAL_ files from `pg_wal`.
I had to restart the _primary_ server to apply that config, sorrry for that.
## To be done
### Option #1 for archiving (recommended)
As I say, this is far to be a good solution.
In my opinion, the best option will be the one I already mention, map one network drive from one server to the other into `Z:\` and use:
```conf
archive_command = 'copy "%p" "Z:\\archivelog\\%f"'
```
We should investigate permissions for this solution.
### Option #2 for archiving
In the case we can't achieve the #1 solution, I suggest to keep the current configuration and perform the synchronization via _scheduled_ tasks.
So, for example, we will launch `rsync R:\postgresql\local\archivelog 10.6.x.3\R:\postgresql\archivelog` (the syntax will be wrong, I had never used `rsync` on windows...).
To copy archivelogs from one server to the opposite.
Alternatives to `rsync`:
* [cwRsync](https://www.itefix.net/cwrsync)
* [robocopy](https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/robocopy?redirectedfrom=MSDN)
### Additional steps for any solution
#### Archivelog folder cleanup
A scheduled tasks should be deployed on both _primary_ and _bakcup_ server to keep the side of the _archivelog_ folder under control.
For example, using [this](https://jackworthen.com/2018/03/15/creating-a-scheduled-task-to-automatically-delete-files-older-than-x-in-windows/) solution.
The folder to cleanup will be:
```
R:\postgresql\local\archivelog
```
Or if we achieve the `Z:\` drive solution:
```
R:\postgresql\archivelog
```
## old
I modified postgresql.conf so archivecommand is:
archive_command = 'copy "%p" "Z:\\archivelog\%f
```conf
#archive_command = 'copy "%p" "\\\\10.6.1.3\\\archivelog\\%f"' # command to use to archive a logfile segment
archive_command = 'copy "%p" "\\\\10.6.1.3\\\R\$\\postgresql\\archivelog\\%f"'
#archive_command = 'copy "%p" "Z:\\archivelog\\%f"'
#archive_command = 'copy "%p" "R:\\postgresql\\local\\archivelog\\%f"'
```
I tried many options but nothing works, it was related to windows permissions. I tried copying from the powershell with admin user and the copy from one server to the other worked.
I tried adding permissions (as much as I could remember) but nothing worked.
So At the end I decided to archive locally on "R:"
So both server are archiving into "R:\postgresql\local\archivelog"
I restarted the master instance of postgresql because of this, to apply the new setup.