Troubleshoot CloudSmith

This page documents known issues, diagnostic steps, and remediation procedures for both on-premises and Azure PaaS deployments.


On-premises issues

Portal not reachable after install

Symptom: The installer completed without errors but navigating to https://<management-host-fqdn> returns a connection refused or timeout.

Check the Windows Firewall rule:

Get-NetFirewallRule -DisplayName 'CloudSmith Portal' | Select-Object DisplayName, Enabled, Direction, Action

If the rule is missing or disabled, recreate it:

New-NetFirewallRule -DisplayName 'CloudSmith Portal' -Direction Inbound -Protocol TCP -LocalPort 443 -Action Allow

Check the VM is running:

Get-VM -Name 'cloudsmith-docker' | Select-Object Name, State

If State is not Running, start it:

Start-VM -Name 'cloudsmith-docker'

Check container health inside the VM:

# SSH into the VM
ssh cloudsmith@192.168.100.10
sudo docker compose -f /opt/cloudsmith/docker-compose.yml ps

All six containers — cloudsmith-api, cloudsmith-portal, cloudsmith-postgres, cloudsmith-prometheus, cloudsmith-loki, cloudsmith-otel — should show status Up.


aspnet:9.0 image health check fails

Symptom: The cloudsmith-api container exits immediately after starting with an error about curl not being found, or the container restarts in a loop.

Cause: Earlier versions of the on-premises Docker Compose configuration used curl for the API container health check. The mcr.microsoft.com/dotnet/aspnet:9.0 base image does not include curl. This was corrected in the installer by switching the health check to use the .NET health probe directly.

Resolution: Update CloudSmith to a version that includes the corrected Compose file:

# On the management host
Update-CloudSmith

If Update-CloudSmith is not available, SSH into the VM and manually update the health check in /opt/cloudsmith/docker-compose.yml:

# Replace this:
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8080/health/live"]

# With this:
healthcheck:
  test: ["CMD-SHELL", "wget -qO- http://localhost:8080/health/live || exit 1"]
  interval: 10s
  timeout: 5s
  retries: 3

Then restart the stack:

sudo docker compose -f /opt/cloudsmith/docker-compose.yml up -d

Azure PaaS issues

First health check cold start (ACA scale-to-zero)

Symptom: The portal or API takes 30–90 seconds to respond on the first request after a period of inactivity. Subsequent requests are fast.

Cause: Azure Container Apps with the Consumption plan scale to zero replicas when idle. The first request after scale-to-zero triggers a cold start — ACA provisions a container instance, pulls any cached layers, and starts the .NET process. This is normal behavior on dev/test deployments.

Resolution options:

  1. Set minimum replicas to 1 (recommended for production): Update apiAppMinReplicas and portalAppMinReplicas in your parameters file to 1 and redeploy. This incurs continuous compute cost but eliminates cold starts.

  2. Accept cold starts for dev/test: For dev environments, a cold start on the first request is acceptable. After the first request warms up the container, subsequent requests are fast.

  3. Use a health-check ping: Add an external monitoring service (Azure Monitor, UptimeRobot, etc.) that pings /health/ready every few minutes to keep the container warm.


Initial admin token expired (24-hour TTL)

Symptom: The first-run setup wizard returns: "The initial admin token has expired (24-hour TTL)."

Cause: The cloudsmith-initial-admin-token Key Vault secret is generated when the API starts for the first time. If more than 24 hours pass between the initial API start and the first wizard completion, the token is expired and rejected.

Resolution: Restart the API container app twice to purge the expired token and mint a fresh one:

# Get the active revision name
REV=$(az containerapp revision list \
  --name ca-cloudsmith-api-<env>-<region>-001 \
  --resource-group rg-cloudsmith-<env>-<region>-001 \
  --query "[?properties.active].name | [0]" -o tsv)

# First restart — purges the expired token hash
az containerapp revision restart \
  --name ca-cloudsmith-api-<env>-<region>-001 \
  --resource-group rg-cloudsmith-<env>-<region>-001 \
  --revision "$REV"

# Wait 30 seconds for the container to finish starting
Start-Sleep 30   # or: sleep 30

# Second restart — writes a fresh token to Key Vault
az containerapp revision restart \
  --name ca-cloudsmith-api-<env>-<region>-001 \
  --resource-group rg-cloudsmith-<env>-<region>-001 \
  --revision "$REV"

After the second restart, retrieve the new token:

az keyvault secret show \
  --vault-name <KEY_VAULT_NAME> \
  --name cloudsmith-initial-admin-token \
  --query value -o tsv

Paste this value into the wizard’s Initial admin token field and complete setup.


Container Apps can’t pull the image

Symptom: ACA shows a provisioning error or the container app stays in Deploying state indefinitely. The container logs show failed to pull image or unauthorized.

Check that the image tag exists:

docker manifest inspect ghcr.io/cloudsmith-cloud/cloudsmith-api:<your-tag>
docker manifest inspect ghcr.io/cloudsmith-cloud/cloudsmith-portal:<your-tag>

The API and portal have separate Git histories and therefore different commit SHAs. If you pinned to a specific SHA, confirm both SHAs exist in their respective image repositories.

Use latest for dev, explicit SHAs for production:

"imageTag": { "value": "latest" }

Tag-required policy blocks resource creation

Symptom: The Bicep deployment fails with RequestDisallowedByPolicy on a resource creation step.

Cause: The Bicep pack deploys Azure Policy assignments that require all 7 CAF mandatory tags. If commonTags in your parameters file is missing any of the 7 required tags, the policy denies the resource write.

Required tags: Environment, Workload, CostCenter, Owner, BusinessUnit, DataClassification, Criticality

Verify all 7 are present in your parameters file and redeploy.


Log locations

Component Log location
On-premises API ssh cloudsmith@<vm-ip>docker logs cloudsmith-api
On-premises Portal docker logs cloudsmith-portal
On-premises PostgreSQL docker logs cloudsmith-postgres
On-premises Relay docker logs cloudsmith-relay (on the Relay host)
PaaS API (ACA) Azure Portal → Container Apps → ca-cloudsmith-api-* → Log stream, or: az containerapp logs show --name ca-cloudsmith-api-<env>-<region>-001 --resource-group rg-cloudsmith-<env>-<region>-001 --follow
PaaS Portal (ACA) az containerapp logs show --name ca-cloudsmith-portal-<env>-<region>-001 ...
PaaS structured traces Log Analytics Workspace → log-cloudsmith-<env>-<region>-001 → ContainerAppConsoleLogs_CL

Getting help

If the steps on this page do not resolve your issue, open a discussion at github.com/cloudsmith-cloud or contact support.cloudsmith.cloud.

Include the following in any support request:

  • Deployment model (on-premises / Azure PaaS)
  • OS version (Windows Server 2022 / 2025)
  • CloudSmith version (from cs --version or the portal footer)
  • Relevant log output (from the log locations above)
  • The exact error message and any traceId value from API error responses