postgresql_disk_filling_report/intervention.md
2023-10-28 10:40:40 +02:00

4.9 KiB

intervention 20231027

Troubleshoot

My first action was check the shared drive where archive_commad is supposed to be sending WAL files (archivelog): \\10.6.1.3\archivelog.
That share does not exist anymore.
I check both servers for the existance of such drive and noone of them has that share.

first solution implementation

Then I look for a place to write archivelogs, I saw that servers have a R:\ drive with plenty of space.
So I decided to use a cross copy between both servers, that is:

  • Primary will copy to backup as: \\10.6.1.3\R$\postgresql
  • Backup will copy to primary as: \\10.6.0.3\R$\postgresql

Using that approach it's a best practice to get archived from each other, and solve switchover/failover issues in the future.

Then I tried adding a network drive, mapping the shared R:\ in to Z:\ as:

  • primary's Z:\ as: \\10.6.1.3\R$\postgresql
  • backup's Z:\ as : \\10.6.0.3\R$\postgresql

My idea at that stage was have a unique postgresql.conf because archive_command will be the same for both servers:

archive_command = 'copy "%p" "Z:\\archivelog\\%f"'

This setting also create a good configuration, We will not care about switchover/failover in terms of config changes.

The problem

I perform all my tests on the backup server.

Summary: No matter which command I set on postgresql.conf->archive_command, postgresql report Permission Denied .

I try all the options I can imagine:

  • My prefered solution using Z:\
  • Direct copy to \\10.6.0.3\R$
  • Add a new shared drive on the primary, for example I shared \\10.6.0.3\postgresql using R:\postgresql\
  • Grant permissions to network service windows "user"
  • Grant permissions to Everyone windows group.
  • Combinations of the above options (yes, I performed +10 combinations)

Until I run out of options.
Of course, when I copied any file via powershell with the Admin user, it worked. All the time.
So I'm sure the problem comes from the user which runs PostgreSQL service, I had faced similar problems in the past.
The problem is that I'm not a windows admin, my knowledge is limited here, I tried everything I could think, but maybe a windows sysadmin will know how to solve that permission problem.

Current config

I decide to do a temporary solution to bypass the current problem of archive_command failing.
What I did was creaete a local folder on both servers:

R:\postgresql\local\archivelog

And use:

archive_command = 'copy "%p" "R:\\postgresql\\local\\archivelog\\%f"'

So both primary and backup could execute archive_command without problems.

That is far from a recommended practice but solves the archive_command to be failing all the time.
As a consequence, PostgreSQL should start removing WAL files from pg_wal.

I had to restart the primary server to apply that config, sorrry for that.

To be done

As I say, this is far to be a good solution.
In my opinion, the best option will be the one I already mention, map one network drive from one server to the other into Z:\ and use:

archive_command = 'copy "%p" "Z:\\archivelog\\%f"'

We must solve the permission problem to use this solution.

Option #2 for archiving

In the case we can't achieve the solution #1, I suggest to keep the current configuration and perform the synchronization via scheduled tasks.
So, for example, we will launch rsync R:\postgresql\local\archivelog 10.6.x.3\R:\postgresql\archivelog (warning syntax will be wrong, it's a linux command). To copy archivelogs from one server to the opposite.

Alternatives for rsync on windows:

Additional steps for any solution

Archivelog folder cleanup

A scheduled tasks should be deployed on both primary and bakcup server to keep the side of the archivelog folder under control.
For example, using this solution.
The folder to cleanup will be:

R:\postgresql\local\archivelog

Or if we achieve the Z:\ drive solution:

R:\postgresql\archivelog