AirflowException: Celery command failed - The recorded hostname does not match this instance's hostname

Kyle Bridenstine

I'm running Airflow on a clustered environment running on two AWS EC2-Instances. One for master and one for the worker. The worker node though periodically throws this error when running "$airflow worker":

[2018-08-09 16:15:43,553] {jobs.py:2574} WARNING - The recorded hostname ip-1.2.3.4 does not match this instance's hostname ip-1.2.3.4.eco.tanonprod.comanyname.io
Traceback (most recent call last):
  File "/usr/bin/airflow", line 27, in <module>
    args.func(args)
  File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 387, in run
    run_job.run()
  File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 198, in run
    self._execute()
  File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 2527, in _execute
    self.heartbeat()
  File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 182, in heartbeat
    self.heartbeat_callback(session=session)
  File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 50, in wrapper
    result = func(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 2575, in heartbeat_callback
    raise AirflowException("Hostname of job runner does not match")
airflow.exceptions.AirflowException: Hostname of job runner does not match
[2018-08-09 16:15:43,671] {celery_executor.py:54} ERROR - Command 'airflow run arl_source_emr_test_dag runEmrStep2WaiterTask 2018-08-07T00:00:00 --local -sd /var/lib/airflow/dags/arl_source_emr_test_dag.py' returned non-zero exit status 1.
[2018-08-09 16:15:43,681: ERROR/ForkPoolWorker-30] Task airflow.executors.celery_executor.execute_command[875a4da9-582e-4c10-92aa-5407f3b46d5f] raised unexpected: AirflowException('Celery command failed',)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 52, in execute_command
    subprocess.check_call(command, shell=True)
  File "/usr/lib64/python3.6/subprocess.py", line 291, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'airflow run arl_source_emr_test_dag runEmrStep2WaiterTask 2018-08-07T00:00:00 --local -sd /var/lib/airflow/dags/arl_source_emr_test_dag.py' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/dist-packages/celery/app/trace.py", line 382, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/lib/python3.6/dist-packages/celery/app/trace.py", line 641, in __protected_call__
    return self.run(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 55, in execute_command
    raise AirflowException('Celery command failed')
airflow.exceptions.AirflowException: Celery command failed

When this error occurs the task is marked as failed on Airflow and thus fails my DAG when nothing actually went wrong in the task.

I'm using Redis as my queue and postgreSQL as my meta-database. Both are external as AWS services. I'm running all of this on my company environment which is why the full name of the server is ip-1.2.3.4.eco.tanonprod.comanyname.io. It looks like it wants this full name somewhere but I have no idea where I need to fix this value so that it's getting ip-1.2.3.4.eco.tanonprod.comanyname.io instead of just ip-1.2.3.4.

The really weird thing about this issue is that it doesn't always happen. It seems to just randomly happen every once in a while when I run the DAG. It's also occurring on all of my DAGs sporadically so it's not just one DAG. I find it strange though how it's sporadic because that means other task runs are handling the IP address for whatever this is just fine.

Note: I've changed the real IP address to 1.2.3.4 for privacy reasons.

Answer:

https://github.com/apache/incubator-airflow/pull/2484

This is exactly the problem I am having and other Airflow users on AWS EC2-Instances are experiencing it as well.

cwurtz

The hostname is set when the task instance runs, and is set to self.hostname = socket.getfqdn(), where socket is the python package import socket.

The comparison that triggers this error is:

fqdn = socket.getfqdn()
if fqdn != ti.hostname:
    logging.warning("The recorded hostname {ti.hostname} "
        "does not match this instance's hostname "
        "{fqdn}".format(**locals()))
    raise AirflowException("Hostname of job runner does not match")

It seems like the hostname on the ec2 instance is changing on you while the worker is running. Perhaps try manually setting the hostname as described here https://forums.aws.amazon.com/thread.jspa?threadID=246906 and see if that sticks.

Collected from the Internet

Please contact javaer1[email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Hostname in certificate didn't match?

What's the difference between inventory_hostname and ansible_hostname

How to change result of hostname command

Docker Compose hostname command not working

SSL peer failed hostname validation in Spring SAML

Failed to start hostname.service: Unit hostname.service is masked

Relabel instance to hostname in Prometheus

NodeJS TLS Hostname/IP doesn't match certificate's altnames

Hostname/IP doesn't match certificate's altnames: "Host: registry.npmjs.org. is not in the cert's altnames

Why does my hostname and /etc/hostname get reset on reboot?

sudo command trying to search for hostname

What's the default file for `hostname`?

how to match hostname machine

Why does the hostname command's output differ from /etc/hosts?

My machine has no hostname "Failed to start hostname.service: Unit hostname.service is masked."

Hostname does not show up in router?

Why isn’t the hostname command getting the FQDN from /etc/hostname?

Celery worker introspection to obtain hostname/nodename

colorize hostname in command line prompt

Regex match fix string in hostname

Kubernetes : hostname regex failed

Hostname does not match in Lollipop devices but works fine in Postman and marshmallow devices

"hostname: command not found" CentOS 8

Gradle 'AppName' project refresh failed Cause: hostname in certificate didn't match

SSH hostname command on remote server shows hostname of local server

How to set hostname of an instance in azure scale set?

failed to match connection hostname (api.shinyapps.io) against server certificate names

Celery + Flask + Docker, consumer: Cannot connect to amqp://admin:**@rabbit:5672/myhost: failed to resolve broker hostname

Getting MongooseServerSelectionError: Hostname/IP does not match certificate's altnames: IP: xxx.xx.xx.xx is not in the cert's list: