postgresql_disk_filling_report/intervention.md

126 lines
4.9 KiB
Markdown
Raw Permalink Normal View History

2023-10-28 07:07:13 +00:00
2023-10-28 08:34:05 +00:00
<!-- vim-markdown-toc GFM -->
* [intervention 20231027](#intervention-20231027)
* [Troubleshoot](#troubleshoot)
* [first solution implementation](#first-solution-implementation)
* [The problem](#the-problem)
* [Current config](#current-config)
* [To be done](#to-be-done)
* [Option #1 for archiving (recommended)](#option-1-for-archiving-recommended)
* [Option #2 for archiving](#option-2-for-archiving)
* [Additional steps for any solution](#additional-steps-for-any-solution)
* [Archivelog folder cleanup](#archivelog-folder-cleanup)
<!-- vim-markdown-toc -->
2023-10-28 07:07:13 +00:00
# intervention 20231027
2023-10-28 07:14:22 +00:00
## Troubleshoot
2023-10-28 07:07:13 +00:00
My first action was check the shared drive where `archive_commad` is supposed to be sending _WAL_ files (archivelog): `\\10.6.1.3\archivelog`.
That share does *not* exist anymore.
I check both servers for the existance of such drive and noone of them has that share.
2023-10-28 07:14:22 +00:00
## first solution implementation
2023-10-28 07:07:13 +00:00
Then I look for a place to write archivelogs, I saw that servers have a `R:\` drive with plenty of space.
So I decided to use a cross copy between both servers, that is:
* Primary will copy to backup as: `\\10.6.1.3\R$\postgresql`
* Backup will copy to primary as: `\\10.6.0.3\R$\postgresql`
Using that approach it's a best practice to get archived from each other, and solve switchover/failover issues in the future.
Then I tried adding a network drive, mapping the shared `R:\` in to `Z:\` as:
* primary's `Z:\` as: `\\10.6.1.3\R$\postgresql`
* backup's `Z:\` as : `\\10.6.0.3\R$\postgresql`
My idea at that stage was have a unique `postgresql.conf` because `archive_command` will be the same for both servers:
```conf
archive_command = 'copy "%p" "Z:\\archivelog\\%f"'
```
This setting also create a good configuration, We will not care about switchover/failover in terms of config changes.
2023-10-28 07:14:22 +00:00
## The problem
I perform all my tests on the _backup_ server.
Summary: No matter which command I set on `postgresql.conf->archive_command`, postgresql report *Permission Denied* .
I try all the options I can imagine:
* My prefered solution using `Z:\`
* Direct copy to `\\10.6.0.3\R$`
2023-10-28 08:35:19 +00:00
* Add a new shared drive on the _primary_, for example I shared `\\10.6.0.3\postgresql` using `R:\postgresql\`
2023-10-28 07:14:22 +00:00
* Grant permissions to `network service` windows "user"
* Grant permissions to `Everyone` windows group.
2023-10-28 08:40:40 +00:00
* Combinations of the above options (yes, I performed +10 combinations)
2023-10-28 07:14:22 +00:00
Until I run out of options.
2023-10-28 08:40:40 +00:00
Of course, when I copied any file via powershell with the Admin user, it worked. All the time.
2023-10-28 07:14:22 +00:00
So I'm sure the problem comes from the user which runs PostgreSQL service, I had faced similar problems in the past.
2023-10-28 08:40:40 +00:00
The problem is that I'm not a windows admin, my knowledge is limited here, I tried everything I could think, but maybe a windows sysadmin will know how to solve that permission problem.
2023-10-28 07:14:22 +00:00
2023-10-28 07:15:56 +00:00
## Current config
2023-10-28 08:40:40 +00:00
I decide to do a temporary solution to bypass the current problem of `archive_command` failing.
What I did was creaete a local folder on both servers:
2023-10-28 07:15:56 +00:00
```
R:\postgresql\local\archivelog
```
And use:
```conf
archive_command = 'copy "%p" "R:\\postgresql\\local\\archivelog\\%f"'
```
2023-10-28 07:18:17 +00:00
So both _primary_ and _backup_ could execute `archive_command` without problems.
That is far from a recommended practice but solves the `archive_command` to be failing all the time.
As a consequence, PostgreSQL should start removing _WAL_ files from `pg_wal`.
I had to restart the _primary_ server to apply that config, sorrry for that.
## To be done
2023-10-28 07:23:54 +00:00
### Option #1 for archiving (recommended)
2023-10-28 07:20:15 +00:00
2023-10-28 07:19:17 +00:00
As I say, this is far to be a good solution.
2023-10-28 07:20:15 +00:00
In my opinion, the best option will be the one I already mention, map one network drive from one server to the other into `Z:\` and use:
2023-10-28 07:19:17 +00:00
```conf
archive_command = 'copy "%p" "Z:\\archivelog\\%f"'
```
2023-10-28 07:15:56 +00:00
2023-10-28 08:40:40 +00:00
We must solve the permission problem to use this solution.
2023-10-28 07:23:54 +00:00
### Option #2 for archiving
2023-10-28 08:40:40 +00:00
In the case we can't achieve the solution #1, I suggest to keep the current configuration and perform the synchronization via _scheduled_ tasks.
So, for example, we will launch `rsync R:\postgresql\local\archivelog 10.6.x.3\R:\postgresql\archivelog` (_warning_ syntax will be wrong, it's a linux command).
2023-10-28 07:23:54 +00:00
To copy archivelogs from one server to the opposite.
2023-10-28 07:14:22 +00:00
2023-10-28 08:40:40 +00:00
Alternatives for `rsync` on windows:
2023-10-28 08:33:35 +00:00
* [cwRsync](https://www.itefix.net/cwrsync)
* [robocopy](https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/robocopy?redirectedfrom=MSDN)
2023-10-28 07:14:22 +00:00
2023-10-28 08:30:58 +00:00
### Additional steps for any solution
2023-10-28 08:33:35 +00:00
#### Archivelog folder cleanup
2023-10-28 08:30:58 +00:00
A scheduled tasks should be deployed on both _primary_ and _bakcup_ server to keep the side of the _archivelog_ folder under control.
For example, using [this](https://jackworthen.com/2018/03/15/creating-a-scheduled-task-to-automatically-delete-files-older-than-x-in-windows/) solution.
The folder to cleanup will be:
```
R:\postgresql\local\archivelog
```
Or if we achieve the `Z:\` drive solution:
```
R:\postgresql\archivelog
```