How to solve Python package conflict in Cloud Composer
Background
Google Cloud Composer is a fully managed workflow orchestration service built on Apache Airflow.
Airflow is a monolith Python service. To run an Airflow data pipeline(DAG), Python PyPI packages that are required must be installed alongside Airflow. For example, suppose the data pipeline requires interactions with 3rd party service such as Salesforce. In that case, we need to install Python package such as simple-salesforce alongside Airflow.
Usually, this won't be an issue. However, when there is a need to use a very old(or very new) version of Python package, Python package conflict issue will occur.
Issue
After following the guide installing Python package (simple-salesforce==0.74.3), an error showed after updating Cloud Composer:
UPDATE operation on this environment failed 22 minutes ago with the following error message:
An error occurred before the new web server image has been created.
Analysis
Google documents the troubleshooting process of handling this issue here. After following the document, I understand that Google uses Cloud Build to build a Docker image that installs required Python packages on top of the Composer Airflow base image. After digging into the Cloud Build logs, I found that the root cause of this issue:
python3 -m pip check requests 2.22.0 has requirement urllib3!=1.25.0,!=1.25.1,=1.21.1, but you have urllib3 1.26.3.
Solution
I added urllib3==1.25.4
in the list of Cloud Composer dependencies which downgraded the version of its pre-installed urllib3. After this, simple-salesforce==0.74.3
could successfully be installed.
Of course, some regression tests were done to ensure that the existing DAGs ran well with the downgraded urllib3
version.
Key learning
When there is an issue installing Python packages in Cloud Composer, check the Cloud Build logs to find out the error, then resolve the Python dependency conflict.