I'm wondering how to pre-build worker container and at the same time use setup.py file for multiple file dependencies.
Even when i used this official template for that i still had Insights: "SDK worker container image pre-building: can be enabled". Is it a bug?
When you specify additional dependencies for a Python pipeline using flags like
--requirements_file,--setup_file,--extra_package, you can pre-build a container image and install these dependencies before starting a Dataflow Python worker. It can be accomplished by supplying a--prebuild_sdk_container_imagepipeline option, see: https://cloud.google.com/dataflow/docs/guides/build-container-image#pre-build_a_container_image_when_submitting_the_job . This option might be helpful for users who want to optimize worker startup time, but don't want to build their own custom container manually.However, when you use a custom container image, it is better to install necessary dependencies directly in the image, when you build it: https://cloud.google.com/dataflow/docs/guides/build-container-image#preinstall_using_a_dockerfile . If you do that, you no longer need to supply the pipeline options to install dependencies at runtime, and pre-building workflow is not necessary.
Custom containers provide more control over image customization and result in a more reproducible runtime environments compared to container pre-building.
If a pipeline package is installed in the custom container image, supplying the
--setup_optionis not necessary, unless you made some changes locally that are no longer reflected in the custom image. If you omit--setup_option, the insight will not be shown in the next observation period (next day).