postgresql_disk_filling_report/intervention.md
2023-10-28 09:23:54 +02:00

4.4 KiB

intervention 20231027

Troubleshoot

My first action was check the shared drive where archive_commad is supposed to be sending WAL files (archivelog): \\10.6.1.3\archivelog.
That share does not exist anymore.
I check both servers for the existance of such drive and noone of them has that share.

first solution implementation

Then I look for a place to write archivelogs, I saw that servers have a R:\ drive with plenty of space.
So I decided to use a cross copy between both servers, that is:

  • Primary will copy to backup as: \\10.6.1.3\R$\postgresql
  • Backup will copy to primary as: \\10.6.0.3\R$\postgresql

Using that approach it's a best practice to get archived from each other, and solve switchover/failover issues in the future.

Then I tried adding a network drive, mapping the shared R:\ in to Z:\ as:

  • primary's Z:\ as: \\10.6.1.3\R$\postgresql
  • backup's Z:\ as : \\10.6.0.3\R$\postgresql

My idea at that stage was have a unique postgresql.conf because archive_command will be the same for both servers:

archive_command = 'copy "%p" "Z:\\archivelog\\%f"'

This setting also create a good configuration, We will not care about switchover/failover in terms of config changes.

The problem

I perform all my tests on the backup server.

Summary: No matter which command I set on postgresql.conf->archive_command, postgresql report Permission Denied .

I try all the options I can imagine:

  • My prefered solution using Z:\
  • Direct copy to \\10.6.0.3\R$
  • Add a new shared drive on the primary, for example I shared \\10.6.0.3\postgresql
  • Grant permissions to network service windows "user"
  • Grant permissions to Everyone windows group.
  • Combinations of the above options

Until I run out of options.
Of course when I copied the file via powershell with the Admin user, it worked. All the time.
So I'm sure the problem comes from the user which runs PostgreSQL service, I had faced similar problems in the past.
The problem is that I'm not a windows admin, my knowledge is limited here, I tried everything I could think, but maybe a windows sysadmin will know how to solve that permission problem.

Current config

It was late for me so I decide to do a temporary solution.
What I did was creaete a local folder on both servers:

R:\postgresql\local\archivelog

And use:

archive_command = 'copy "%p" "R:\\postgresql\\local\\archivelog\\%f"'

So both primary and backup could execute archive_command without problems.

That is far from a recommended practice but solves the archive_command to be failing all the time.
As a consequence, PostgreSQL should start removing WAL files from pg_wal.

I had to restart the primary server to apply that config, sorrry for that.

To be done

As I say, this is far to be a good solution.
In my opinion, the best option will be the one I already mention, map one network drive from one server to the other into Z:\ and use:

archive_command = 'copy "%p" "Z:\\archivelog\\%f"'

We should investigate permissions for this solution.

Option #2 for archiving

In the case we can't achieve the #1 solution, I suggest to keep the current configuration and perform the synchronization via scheduled tasks.
So, for example, we will launch rsync R:\postgresql\local\archivelog 10.6.x.3\R:\postgresql\archivelog (the syntax will be wrong, I had never used rsync on windows...).
To copy archivelogs from one server to the opposite.

old

I modified postgresql.conf so archivecommand is: archive_command = 'copy "%p" "Z:\archivelog%f

#archive_command = 'copy "%p" "\\\\10.6.1.3\\\archivelog\\%f"'		# command to use to archive a logfile segment
archive_command = 'copy "%p" "\\\\10.6.1.3\\\R\$\\postgresql\\archivelog\\%f"'
#archive_command = 'copy "%p" "Z:\\archivelog\\%f"'
#archive_command = 'copy "%p" "R:\\postgresql\\local\\archivelog\\%f"'

I tried many options but nothing works, it was related to windows permissions. I tried copying from the powershell with admin user and the copy from one server to the other worked. I tried adding permissions (as much as I could remember) but nothing worked.

So At the end I decided to archive locally on "R:"

So both server are archiving into "R:\postgresql\local\archivelog"

I restarted the master instance of postgresql because of this, to apply the new setup.