postgresql_disk_filling_report/intervention.md

133 lines
4.9 KiB
Markdown
Raw Normal View History

2023-10-28 07:07:13 +00:00
# intervention 20231027
2023-10-28 07:14:22 +00:00
## Troubleshoot
2023-10-28 07:07:13 +00:00
My first action was check the shared drive where `archive_commad` is supposed to be sending _WAL_ files (archivelog): `\\10.6.1.3\archivelog`.
That share does *not* exist anymore.
I check both servers for the existance of such drive and noone of them has that share.
2023-10-28 07:14:22 +00:00
## first solution implementation
2023-10-28 07:07:13 +00:00
Then I look for a place to write archivelogs, I saw that servers have a `R:\` drive with plenty of space.
So I decided to use a cross copy between both servers, that is:
* Primary will copy to backup as: `\\10.6.1.3\R$\postgresql`
* Backup will copy to primary as: `\\10.6.0.3\R$\postgresql`
Using that approach it's a best practice to get archived from each other, and solve switchover/failover issues in the future.
Then I tried adding a network drive, mapping the shared `R:\` in to `Z:\` as:
* primary's `Z:\` as: `\\10.6.1.3\R$\postgresql`
* backup's `Z:\` as : `\\10.6.0.3\R$\postgresql`
My idea at that stage was have a unique `postgresql.conf` because `archive_command` will be the same for both servers:
```conf
archive_command = 'copy "%p" "Z:\\archivelog\\%f"'
```
This setting also create a good configuration, We will not care about switchover/failover in terms of config changes.
2023-10-28 07:14:22 +00:00
## The problem
I perform all my tests on the _backup_ server.
Summary: No matter which command I set on `postgresql.conf->archive_command`, postgresql report *Permission Denied* .
I try all the options I can imagine:
* My prefered solution using `Z:\`
* Direct copy to `\\10.6.0.3\R$`
* Add a new shared drive on the _primary_, for example I shared `\\10.6.0.3\postgresql`
* Grant permissions to `network service` windows "user"
* Grant permissions to `Everyone` windows group.
* Combinations of the above options
Until I run out of options.
Of course when I copied the file via powershell with the Admin user, it worked. All the time.
So I'm sure the problem comes from the user which runs PostgreSQL service, I had faced similar problems in the past.
The problem is that I'm not a windows admin, my knowledge is limited here, I tried everything I could think, but maybe a windows sysadmin will know how to solve that permission problem.
2023-10-28 07:15:56 +00:00
## Current config
It was late for me so I decide to do a temporary solution.
What I did was creaete a local folder on both servers:
```
R:\postgresql\local\archivelog
```
And use:
```conf
archive_command = 'copy "%p" "R:\\postgresql\\local\\archivelog\\%f"'
```
2023-10-28 07:18:17 +00:00
So both _primary_ and _backup_ could execute `archive_command` without problems.
That is far from a recommended practice but solves the `archive_command` to be failing all the time.
As a consequence, PostgreSQL should start removing _WAL_ files from `pg_wal`.
I had to restart the _primary_ server to apply that config, sorrry for that.
## To be done
2023-10-28 07:23:54 +00:00
### Option #1 for archiving (recommended)
2023-10-28 07:20:15 +00:00
2023-10-28 07:19:17 +00:00
As I say, this is far to be a good solution.
2023-10-28 07:20:15 +00:00
In my opinion, the best option will be the one I already mention, map one network drive from one server to the other into `Z:\` and use:
2023-10-28 07:19:17 +00:00
```conf
archive_command = 'copy "%p" "Z:\\archivelog\\%f"'
```
2023-10-28 07:15:56 +00:00
2023-10-28 07:23:54 +00:00
We should investigate permissions for this solution.
### Option #2 for archiving
In the case we can't achieve the #1 solution, I suggest to keep the current configuration and perform the synchronization via _scheduled_ tasks.
So, for example, we will launch `rsync R:\postgresql\local\archivelog 10.6.x.3\R:\postgresql\archivelog` (the syntax will be wrong, I had never used `rsync` on windows...).
To copy archivelogs from one server to the opposite.
2023-10-28 07:14:22 +00:00
2023-10-28 08:30:58 +00:00
### Additional steps for any solution
#### Archivelog cleanup
A scheduled tasks should be deployed on both _primary_ and _bakcup_ server to keep the side of the _archivelog_ folder under control.
For example, using [this](https://jackworthen.com/2018/03/15/creating-a-scheduled-task-to-automatically-delete-files-older-than-x-in-windows/) solution.
The folder to cleanup will be:
```
R:\postgresql\local\archivelog
```
Or if we achieve the `Z:\` drive solution:
```
R:\postgresql\archivelog
```
2023-10-28 07:24:18 +00:00
2023-10-28 07:07:13 +00:00
2023-10-28 07:20:15 +00:00
2023-10-28 07:19:17 +00:00
## old
2023-10-28 07:07:13 +00:00
I modified postgresql.conf so archivecommand is:
archive_command = 'copy "%p" "Z:\\archivelog\%f
```conf
#archive_command = 'copy "%p" "\\\\10.6.1.3\\\archivelog\\%f"' # command to use to archive a logfile segment
archive_command = 'copy "%p" "\\\\10.6.1.3\\\R\$\\postgresql\\archivelog\\%f"'
#archive_command = 'copy "%p" "Z:\\archivelog\\%f"'
#archive_command = 'copy "%p" "R:\\postgresql\\local\\archivelog\\%f"'
```
I tried many options but nothing works, it was related to windows permissions. I tried copying from the powershell with admin user and the copy from one server to the other worked.
I tried adding permissions (as much as I could remember) but nothing worked.
So At the end I decided to archive locally on "R:"
So both server are archiving into "R:\postgresql\local\archivelog"
I restarted the master instance of postgresql because of this, to apply the new setup.