postgresql_disk_filling_report/intervention.md
2023-10-28 10:33:35 +02:00

5.1 KiB

intervention 20231027

Troubleshoot

My first action was check the shared drive where archive_commad is supposed to be sending WAL files (archivelog): \\10.6.1.3\archivelog.
That share does not exist anymore.
I check both servers for the existance of such drive and noone of them has that share.

first solution implementation

Then I look for a place to write archivelogs, I saw that servers have a R:\ drive with plenty of space.
So I decided to use a cross copy between both servers, that is:

  • Primary will copy to backup as: \\10.6.1.3\R$\postgresql
  • Backup will copy to primary as: \\10.6.0.3\R$\postgresql

Using that approach it's a best practice to get archived from each other, and solve switchover/failover issues in the future.

Then I tried adding a network drive, mapping the shared R:\ in to Z:\ as:

  • primary's Z:\ as: \\10.6.1.3\R$\postgresql
  • backup's Z:\ as : \\10.6.0.3\R$\postgresql

My idea at that stage was have a unique postgresql.conf because archive_command will be the same for both servers:

archive_command = 'copy "%p" "Z:\\archivelog\\%f"'

This setting also create a good configuration, We will not care about switchover/failover in terms of config changes.

The problem

I perform all my tests on the backup server.

Summary: No matter which command I set on postgresql.conf->archive_command, postgresql report Permission Denied .

I try all the options I can imagine:

  • My prefered solution using Z:\
  • Direct copy to \\10.6.0.3\R$
  • Add a new shared drive on the primary, for example I shared \\10.6.0.3\postgresql
  • Grant permissions to network service windows "user"
  • Grant permissions to Everyone windows group.
  • Combinations of the above options

Until I run out of options.
Of course when I copied the file via powershell with the Admin user, it worked. All the time.
So I'm sure the problem comes from the user which runs PostgreSQL service, I had faced similar problems in the past.
The problem is that I'm not a windows admin, my knowledge is limited here, I tried everything I could think, but maybe a windows sysadmin will know how to solve that permission problem.

Current config

It was late for me so I decide to do a temporary solution.
What I did was creaete a local folder on both servers:

R:\postgresql\local\archivelog

And use:

archive_command = 'copy "%p" "R:\\postgresql\\local\\archivelog\\%f"'

So both primary and backup could execute archive_command without problems.

That is far from a recommended practice but solves the archive_command to be failing all the time.
As a consequence, PostgreSQL should start removing WAL files from pg_wal.

I had to restart the primary server to apply that config, sorrry for that.

To be done

As I say, this is far to be a good solution.
In my opinion, the best option will be the one I already mention, map one network drive from one server to the other into Z:\ and use:

archive_command = 'copy "%p" "Z:\\archivelog\\%f"'

We should investigate permissions for this solution.

Option #2 for archiving

In the case we can't achieve the #1 solution, I suggest to keep the current configuration and perform the synchronization via scheduled tasks.
So, for example, we will launch rsync R:\postgresql\local\archivelog 10.6.x.3\R:\postgresql\archivelog (the syntax will be wrong, I had never used rsync on windows...).
To copy archivelogs from one server to the opposite.

Alternatives to rsync:

Additional steps for any solution

Archivelog folder cleanup

A scheduled tasks should be deployed on both primary and bakcup server to keep the side of the archivelog folder under control.
For example, using this solution.
The folder to cleanup will be:

R:\postgresql\local\archivelog

Or if we achieve the Z:\ drive solution:

R:\postgresql\archivelog

old

I modified postgresql.conf so archivecommand is: archive_command = 'copy "%p" "Z:\archivelog%f

#archive_command = 'copy "%p" "\\\\10.6.1.3\\\archivelog\\%f"'		# command to use to archive a logfile segment
archive_command = 'copy "%p" "\\\\10.6.1.3\\\R\$\\postgresql\\archivelog\\%f"'
#archive_command = 'copy "%p" "Z:\\archivelog\\%f"'
#archive_command = 'copy "%p" "R:\\postgresql\\local\\archivelog\\%f"'

I tried many options but nothing works, it was related to windows permissions. I tried copying from the powershell with admin user and the copy from one server to the other worked. I tried adding permissions (as much as I could remember) but nothing worked.

So At the end I decided to archive locally on "R:"

So both server are archiving into "R:\postgresql\local\archivelog"

I restarted the master instance of postgresql because of this, to apply the new setup.