Bazel supports scaling out builds with a remote execution system. Unfortunately, it is very easy for ruleset authors to release rules that work when executed locally but do not work when executed remotely. This blog post explains ruleset authors can set up a simple remote execution system to verify that their rulesets work correctly.
The Bazel team has created a simple remote execution system named Buildfarm. While this isn’t as full-featured as other systems like BuildBuddy, it is free and easy to get started with. Buildfarm makes an excellent base for testing our rulesets against remote executors.
While Buildfarm has provided some examples which can be used to start up
a remote build cluster locally, I found it easier to use docker-compose.
With a little bit of docker-compose and buildfarm configuration magic, starting
a Bazel remote execution cluster becomes as easy as docker compose up
.
To create a minimal buildfarm cluster, we need to create a docker-compose composition with three containers:
- A container named
redis
, which contains the necessary Redis instance - A container named
server
, which contains the buildfarm server - A container named
worker
, which contains the single buildfarm worker
The server
and worker
need a buildfarm configuration file, which we
create as config/config.yml
:
|
|
Then we can create a docker-compose.yml
:
|
|
We can then start up the buildfarm cluster with a simple docker compose up
, which looks like:
sengelha@P-DV-LAP-SENG:~/build-server % docker compose up
[+] Running 3/0
⠿ Container build-server-redis-1 Created 0.0s
⠿ Container build-server-server-1 Created 0.0s
⠿ Container build-server-worker-1 Created 0.0s
Attaching to build-server-redis-1, build-server-server-1, build-server-worker-1
build-server-redis-1 | 1:C 07 Mar 2023 16:07:53.677 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
build-server-redis-1 | 1:C 07 Mar 2023 16:07:53.677 # Redis version=5.0.9, bits=64, commit=00000000, modified=0, pid=1, just started
build-server-redis-1 | 1:C 07 Mar 2023 16:07:53.677 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
build-server-redis-1 | 1:M 07 Mar 2023 16:07:53.678 * Running mode=standalone, port=6379.
build-server-redis-1 | 1:M 07 Mar 2023 16:07:53.678 # Server initialized
build-server-redis-1 | 1:M 07 Mar 2023 16:07:53.678 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
build-server-redis-1 | 1:M 07 Mar 2023 16:07:53.678 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
build-server-redis-1 | 1:M 07 Mar 2023 16:07:53.702 * DB loaded from disk: 0.024 seconds
build-server-redis-1 | 1:M 07 Mar 2023 16:07:53.702 * Ready to accept connections
build-server-server-1 | Mar 07, 2023 4:07:54 PM build.buildfarm.common.config.BuildfarmConfigs loadConfigs
build-server-server-1 | INFO: BuildfarmConfigs(digestFunction=SHA256, defaultActionTimeout=600, maximumActionTimeout=3600, maxEntrySizeBytes=2147483648, prometheusPort=9090, server=Server(instanceType=SHARD, name=server, actionCacheReadOnly=false, port=8980, grpcMetrics=GrpcMetrics(enabled=false, provideLatencyHistograms=false), casWriteTimeout=3600, bytestreamTimeout=3600, sslCertificatePath=null, sslPrivateKeyPath=null, runDispatchedMonitor=true, dispatchedMonitorIntervalSeconds=1, runOperationQueuer=true, ensureOutputsPresent=false, maxRequeueAttempts=5, useDenyList=true, grpcTimeout=3600, executeKeepaliveAfterSeconds=60, recordBesEvents=false, admin=Admin(deploymentEnvironment=null, clusterEndpoint=null, enableGracefulShutdown=false), metrics=Metrics(publisher=LOG, logLevel=FINEST, topic=null, topicMaxConnections=0, secretName=null), maxCpu=0, clusterId=, cloudRegion=null, publicName=172.18.0.3:8980), backplane=Backplane(type=SHARD, redisUri=redis://redis:6379, jedisPoolMaxTotal=4000, workersHashName=Workers, workerChannel=WorkerChannel, actionCachePrefix=ActionCache, actionCacheExpire=2419200, actionBlacklistPrefix=ActionBlacklist, actionBlacklistExpire=3600, invocationBlacklistPrefix=InvocationBlacklist, operationPrefix=Operation, operationExpire=604800, preQueuedOperationsListName={Arrival}:PreQueuedOperations, processingListName={Arrival}:ProcessingOperations, processingPrefix=Processing, processingTimeoutMillis=20000, queuedOperationsListName={Execution}:QueuedOperations, dispatchingPrefix=Dispatching, dispatchingTimeoutMillis=10000, dispatchedOperationsHashName=DispatchedOperations, operationChannelPrefix=OperationChannel, casPrefix=ContentAddressableStorage, casExpire=604800, subscribeToBackplane=true, runFailsafeOperation=true, maxQueueDepth=100000, maxPreQueueDepth=1000000, priorityQueue=false, queues=[Queue(name=cpu, allowUnmatched=true, properties=[Property(name=min-cores, value=*), Property(name=max-cores, value=*)])], redisPassword=null, timeout=10000, redisNodes=[], maxAttempts=20, cacheCas=false), worker=Worker(port=8981, grpcMetrics=GrpcMetrics(enabled=false, provideLatencyHistograms=false), publicName=worker:8981, capabilities=Capabilities(cas=true, execution=true), root=/tmp/worker, inlineContentLimit=1048567, operationPollPeriod=1, dequeueMatchSettings=DequeueMatchSettings(acceptEverything=true, allowUnmatched=false), storages=[Cas(type=FILESYSTEM, path=cache, hexBucketLevels=0, maxSizeBytes=0, fileDirectoriesIndexInMemory=false, skipLoad=false, execRootCopyFallback=false, target=null)], executeStageWidth=16, executeStageWidthOffset=0, inputFetchStageWidth=3, inputFetchDeadline=60, linkInputDirectories=true, realInputDirectories=[external], execOwner=null, defaultMaxCores=0, limitGlobalExecution=false, onlyMulticoreTests=false, allowBringYourOwnContainer=false, errorOperationRemainingResources=false, executionPolicies=[]), executionWrappers=ExecutionWrappers(cgroups=/usr/bin/cgexec, unshare=/usr/bin/unshare, linuxSandbox=/app/build_buildfarm/linux-sandbox, asNobody=/app/build_buildfarm/as-nobody, processWrapper=/app/build_buildfarm/process-wrapper, skipSleep=/app/build_buildfarm/skip_sleep, skipSleepPreload=/app/build_buildfarm/skip_sleep_preload.so, delay=/app/build_buildfarm/delay.sh))
build-server-server-1 | SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
build-server-server-1 | SLF4J: Defaulting to no-operation (NOP) logger implementation
build-server-server-1 | SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
build-server-server-1 |
build-server-server-1 | . ____ _ __ _ _
build-server-server-1 | /\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
build-server-server-1 | ( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
build-server-server-1 | \\/ ___)| |_)| | | | | || (_| | ) ) ) )
build-server-server-1 | ' |____| .__|_| |_|_| |_\__, | / / / /
build-server-server-1 | =========|_|==============|___/=/_/_/_/
build-server-server-1 | :: Spring Boot :: (v2.7.4)
build-server-server-1 |
build-server-worker-1 | Mar 07, 2023 4:07:55 PM build.buildfarm.common.config.BuildfarmConfigs loadConfigs
build-server-worker-1 | INFO: BuildfarmConfigs(digestFunction=SHA256, defaultActionTimeout=600, maximumActionTimeout=3600, maxEntrySizeBytes=2147483648, prometheusPort=9090, server=Server(instanceType=SHARD, name=server, actionCacheReadOnly=false, port=8980, grpcMetrics=GrpcMetrics(enabled=false, provideLatencyHistograms=false), casWriteTimeout=3600, bytestreamTimeout=3600, sslCertificatePath=null, sslPrivateKeyPath=null, runDispatchedMonitor=true, dispatchedMonitorIntervalSeconds=1, runOperationQueuer=true, ensureOutputsPresent=false, maxRequeueAttempts=5, useDenyList=true, grpcTimeout=3600, executeKeepaliveAfterSeconds=60, recordBesEvents=false, admin=Admin(deploymentEnvironment=null, clusterEndpoint=null, enableGracefulShutdown=false), metrics=Metrics(publisher=LOG, logLevel=FINEST, topic=null, topicMaxConnections=0, secretName=null), maxCpu=0, clusterId=, cloudRegion=null, publicName=172.18.0.4:8980), backplane=Backplane(type=SHARD, redisUri=redis://redis:6379, jedisPoolMaxTotal=4000, workersHashName=Workers, workerChannel=WorkerChannel, actionCachePrefix=ActionCache, actionCacheExpire=2419200, actionBlacklistPrefix=ActionBlacklist, actionBlacklistExpire=3600, invocationBlacklistPrefix=InvocationBlacklist, operationPrefix=Operation, operationExpire=604800, preQueuedOperationsListName={Arrival}:PreQueuedOperations, processingListName={Arrival}:ProcessingOperations, processingPrefix=Processing, processingTimeoutMillis=20000, queuedOperationsListName={Execution}:QueuedOperations, dispatchingPrefix=Dispatching, dispatchingTimeoutMillis=10000, dispatchedOperationsHashName=DispatchedOperations, operationChannelPrefix=OperationChannel, casPrefix=ContentAddressableStorage, casExpire=604800, subscribeToBackplane=true, runFailsafeOperation=true, maxQueueDepth=100000, maxPreQueueDepth=1000000, priorityQueue=false, queues=[Queue(name=cpu, allowUnmatched=true, properties=[Property(name=min-cores, value=*), Property(name=max-cores, value=*)])], redisPassword=null, timeout=10000, redisNodes=[], maxAttempts=20, cacheCas=false), worker=Worker(port=8981, grpcMetrics=GrpcMetrics(enabled=false, provideLatencyHistograms=false), publicName=worker:8981, capabilities=Capabilities(cas=true, execution=true), root=/tmp/worker, inlineContentLimit=1048567, operationPollPeriod=1, dequeueMatchSettings=DequeueMatchSettings(acceptEverything=true, allowUnmatched=false), storages=[Cas(type=FILESYSTEM, path=cache, hexBucketLevels=0, maxSizeBytes=242484730675, fileDirectoriesIndexInMemory=false, skipLoad=false, execRootCopyFallback=false, target=null)], executeStageWidth=16, executeStageWidthOffset=0, inputFetchStageWidth=3, inputFetchDeadline=60, linkInputDirectories=true, realInputDirectories=[external], execOwner=null, defaultMaxCores=0, limitGlobalExecution=false, onlyMulticoreTests=false, allowBringYourOwnContainer=false, errorOperationRemainingResources=false, executionPolicies=[]), executionWrappers=ExecutionWrappers(cgroups=/usr/bin/cgexec, unshare=/usr/bin/unshare, linuxSandbox=/app/build_buildfarm/linux-sandbox, asNobody=/app/build_buildfarm/as-nobody, processWrapper=/app/build_buildfarm/process-wrapper, skipSleep=/app/build_buildfarm/skip_sleep, skipSleepPreload=/app/build_buildfarm/skip_sleep_preload.so, delay=/app/build_buildfarm/delay.sh))
build-server-worker-1 | SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
build-server-worker-1 | SLF4J: Defaulting to no-operation (NOP) logger implementation
build-server-worker-1 | SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
build-server-worker-1 |
build-server-worker-1 | . ____ _ __ _ _
build-server-worker-1 | /\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
build-server-worker-1 | ( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
build-server-worker-1 | \\/ ___)| |_)| | | | | || (_| | ) ) ) )
build-server-worker-1 | ' |____| .__|_| |_|_| |_\__, | / / / /
build-server-worker-1 | =========|_|==============|___/=/_/_/_/
build-server-worker-1 | :: Spring Boot :: (v2.7.4)
build-server-worker-1 |
build-server-server-1 | [2023-03-07 16:07:56.155] - 1 INFO [main] --- build.buildfarm.server.BuildFarmServer: buildfarm-server-172.18.0.3:8980-ba21b9c9-b46f-47ca-a966-b4ff73cf4d9e initialized
build-server-worker-1 | [2023-03-07 16:07:56.269] - 7 WARNING [main] --- build.buildfarm.admin.aws.AwsAdmin: Missing cloudRegion configuration. AWS Admin will not be enabled.
build-server-server-1 | [2023-03-07 16:07:56.360] - 1 INFO [Thread-1] --- build.buildfarm.instance.shard.DispatchedMonitor: DispatchedMonitor: Running
build-server-server-1 | [2023-03-07 16:07:56.441] - 1 INFO [main] --- build.buildfarm.metrics.prometheus.PrometheusPublisher: Started Prometheus HTTP Server on port 9090
build-server-server-1 | [2023-03-07 16:07:56.445] - 1 WARNING [main] --- build.buildfarm.admin.aws.AwsAdmin: Missing cloudRegion configuration. AWS Admin will not be enabled.
build-server-worker-1 | [2023-03-07 16:07:56.911] - 7 INFO [main] --- build.buildfarm.cas.cfc.CASFileCache: Initializing cache at: /tmp/worker/cache
build-server-worker-1 | [2023-03-07 16:07:56.968] - 7 INFO [main] --- build.buildfarm.cas.cfc.CASFileCache: Scanning Cache Root...
build-server-worker-1 | [2023-03-07 16:07:57.139] - 7 INFO [main] --- build.buildfarm.cas.cfc.CASFileCache: {"keys":27027,"dirs":105,"delete":105}
build-server-worker-1 | [2023-03-07 16:07:57.151] - 7 INFO [main] --- build.buildfarm.cas.cfc.CASFileCache: Populating Directories...
build-server-worker-1 | [2023-03-07 16:07:57.566] - 7 INFO [main] --- build.buildfarm.cas.cfc.CASFileCache: {"invalid dirs":0}
build-server-worker-1 | [2023-03-07 16:07:57.567] - 7 INFO [main] --- build.buildfarm.cas.cfc.CASFileCache: Creating Index
build-server-worker-1 | [2023-03-07 16:07:57.575] - 7 INFO [main] --- build.buildfarm.cas.cfc.CASFileCache: Index Created
build-server-worker-1 | [2023-03-07 16:07:57.576] - 7 INFO [main] --- build.buildfarm.cas.cfc.CASFileCache: Startup Time: 0s
build-server-worker-1 | [2023-03-07 16:07:57.916] - 7 INFO [main] --- build.buildfarm.metrics.prometheus.PrometheusPublisher: Started Prometheus HTTP Server on port 9090
build-server-worker-1 | [2023-03-07 16:07:57.923] - 7 INFO [main] --- build.buildfarm.worker.shard.Worker: buildfarm-worker-worker:8981-97af9529-dbbc-4db3-baa6-028c736c0590 initialized
Now simply test any Bazel workspaces against the remote executors by adding a
--remote_executor=grpc://localhost:8980
option to your bazel invocations.