whackur

Hands-on notes on AI, blockchain (Web3 · Solidity), local LLMs, and physical AI.

Chainlink CCIP EVM Contracts: Architecture from Source Code

EVM chains are isolated by design. A contract on Arbitrum has no native way to call a contract on Base, and sending USDC from Ethereum Mainnet to Polygon is not a built-in operation. Cross-chain bridges filled that gap for years, but the category has seen repeated security incidents that highlight how wide the attack surface is.

Circle CCTP V2 EVM Contracts: A Source-Code Walkthrough

Moving USDC across blockchains comes down to two designs. Lock-and-mint freezes the original token on the source chain and issues a wrapped version elsewhere, concentrating a large TVL target inside a single smart contract. Burn-and-mint does the opposite: the source chain burns the original, and a trusted authority mints an equal amount on the destination. No lockbox, no wrapped variant.

ERC-8004 Agent Reputation: On-Chain Registration and Lookup

A prior post on this blog covered the ERC-8004/8126/8196 trust stack: three Ethereum registries that handle agent identity, reputation, and validation. If you want the conceptual overview first, start there. This post is about something narrower: how feedback actually gets written to the Reputation Registry, how to read it back, and what services are available today to browse and index that data.

Agentic Payments in June 2026: x402, UCP, and MPP Implementation Progress

Agentic payments in June 2026 moved from “can an agent pay” toward “who authorized the payment, how is it verified, and how is it traced.” The major open protocols (x402, UCP, MPP/pay.sh, and ACP) each had meaningful implementation changes. All figures and commit details below come from GitHub API, PyPI, and npm registry checks.

AI x Blockchain in 2026: Bittensor, DePIN, and Agent Finance

“AI plus blockchain” spent most of the early 2020s as a marketing phrase. By mid-2026, it has organized into four distinct layers, each with its own competitive dynamics. Which projects have staying power, and what are the actual evaluation criteria at this point?

Figure: From HELOC Lender to Blockchain-Native Capital Markets

Two companies go by the name Figure. One is Figure AI, the humanoid robotics company. The other is Figure Technology Solutions, which is what this post is about. They are unrelated.

Harness-1: Teaching Search Agents to Offload State

Search agents and the state problem A search agent is an AI system that answers a question by iterating through multiple searches. Unlike a one-shot retrieval lookup, it reads intermediate results, adjusts its search strategy, compares candidate documents, and checks whether specific claims are actually supported by what it found. Tasks like analyzing financial filings, tracing multi-hop facts across sources, or interpreting complex regulations need this kind of iterative work. A single query won’t get you there.

Lighter: A ZK-SNARK Orderbook DEX on Ethereum

Decentralized exchanges come in two main forms. AMMs (Automated Market Makers) like Uniswap set prices algorithmically from pool ratios and work as permissionless swap venues. Orderbook DEXes match individual buy and sell orders by price and time priority, the same way centralized exchanges like Binance or Coinbase operate. Orderbooks enable limit orders, tighter spreads with active market-making, and more precise price discovery. AMMs do not.

LLM Observability Without LangSmith: Five Open-Source Tools Compared

At some point in building LLM applications or agents, you need to know why a call failed, what the tool invocation looked like, or why the agent got stuck in a loop. LangSmith, LangChain’s commercial observability platform, has been the default answer for this: it covers trace visualization, prompt versioning, and evaluation in one place. Its usage-based pricing and cloud-hosted architecture are where teams start looking for alternatives.

The AI Agent Trust Stack: ERC-8004, ERC-8126, and ERC-8196

When an AI agent acts on behalf of a user (spending funds, calling contracts, accessing paid APIs), the obvious question is: how do you know this agent is safe? The Ethereum community has three standards that address different layers of that question. They don’t solve the whole problem, but they’re building toward a coherent stack.

DeepSWE: A Benchmark for Long-Horizon Coding Agents

SWE-bench has been the default coding-agent leaderboard for a while, but it has well-known weaknesses. Most tasks come from existing public issues and PR patches, so a high score might partly reflect memorization. Most tasks are also single-file bug fixes, which is not representative of the multi-file, long-horizon work that a coding agent does in practice.

Mixture of Agents: How Layering Open-Source LLMs Beat GPT-4 Omni

Instead of scaling a single model up, what happens when you stack multiple models in layers and have each one refine the previous layer’s output? Together AI’s research team answered that in June 2024 with arXiv:2406.04692. Using only open-source models, their Mixture of Agents (MoA) configuration scored 65.1% on AlpacaEval 2.0, versus 57.5% for GPT-4 Omni.

Open Knowledge Format: A Shared Vocabulary for Agent Knowledge

When AI agents fail in production, the model is often not the problem. The missing context is. Table schemas, metric definitions, runbooks, join paths between systems, and API deprecation notices are scattered across catalog vendors, internal wikis, code comments, and personal notes. Every agent developer solves the same context assembly problem from scratch.

Qwen3.6-35B-A3B: Community Reviews, Uncensored Variants, and MTP Benchmarks

Alibaba released Qwen3.6-35B-A3B in April 2026: a 35B-parameter MoE model with around 3B active per token, a 262K native context, and an official SWE-bench score of 73.4%. Two months in, it’s the most widely tested 35B-class model in the local LLM community.

Robot Learning: A Tutorial (From Classical Robotics to Generalist Policies)

“Robot Learning: A Tutorial” (arXiv:2510.12403) is a paper-length tutorial by Francesco Capuano, Caroline Pascal, Adil Zouitine, Thomas Wolf, and Michel Aractingi, from the University of Oxford and Hugging Face. It covers the full arc of robot learning methods, from classical dynamics-based control through reinforcement learning, imitation learning, and generalist vision-language-action models, using the Hugging Face LeRobot library throughout.

Secret Voting Architecture with FHE, SP1, and Groth16

On-chain secret voting creates three tensions at once. Votes must stay hidden while still being tallied. Off-chain computation cannot be trusted without proof, yet results need to land on-chain. And the EVM cannot run heavy cryptographic operations natively, but it still needs to verify them. FHE (Fully Homomorphic Encryption), SP1 zkVM, and Groth16 each take on one of these.

VibeThinker-3B: Packing Verifiable Reasoning into 3 Billion Parameters

“Small model beats big model” papers appear regularly. Usually the claim holds on a specific benchmark under specific conditions, not across the board. WeiboAI’s VibeThinker-3B, published June 15, 2026, follows a similar structure but draws a clearer boundary: the claim is not that a 3B model replaces a frontier generalist. The claim is that verifiable reasoning can be compressed into a small model, while open-domain knowledge and general dialogue still benefit from more parameters.

Future AGI: Evaluate, Observe, and Improve AI Agents in One Place

If you have shipped an AI agent, this will sound familiar. The demo runs fine. Then it hits production, the hallucinations start, and you can’t tell what went wrong or why. So you bolt on one tool for evals, another for tracing, another for guardrails. The real problem is that none of them talk to each other, so the loop you need to actually fix things never closes.