Links

Friday 2025-02-21 Assorted Links
Assorted Links links
Published: 2025-02-21
Friday 2025-02-21 Assorted Links

Assorted links for Friday, Febuary 21:

  1. How to Store a Knowledge Graph in a Database

    A knowledge graph represents information as a set of nodes and the relationships between those nodes.

    When your source data consists of assets like technical documentation, research publications, or highly interconnected websites, a knowledge graph returns better results than a simple vector search. That’s because a knowledge graph search can traverse links between nodes, finding semantically relevant results two or more steps away from the first node.

  2. AI Agents Are About to Blow Up the Business Process Layer

    Agentic AI is all about autonomy (think self-driving cars), employing a system of agents to constantly adapt to dynamic environments and independently create, execute and optimize results.

    When agentic AI is applied to business process workflows, it can replace fragile, static business processes with dynamic, context-aware automation systems.

  3. Storing, querying and keeping embeddings updated: options and best practices
  4. Database and AI: solutions for keeping embeddings updated
    • Using a Database Trigger
    • Using Change Tracking
    • Using an Azure Function Sql Trigger binding
    • Using Azure Logic Apps
    • Using Change Data Capture
    • Using the new Change Event Stream
  5. Introducing Change Event Streaming: Join the Azure SQL Database Private Preview for Change Data Streaming
Thursday 2025-02-20 Assorted Links
Assorted Links links
Published: 2025-02-20
Thursday 2025-02-20 Assorted Links

Assorted links for Thursday, Febuary 20:

  1. Data Infrastructure, Not AI Models, Will Drive IT Spend in 2025

    As organizations race to implement Artificial Intelligence (AI) initiatives, they’re encountering an unexpected bottleneck: the massive cost of data infrastructure required to support AI applications.

    I’m seeing organizations address these challenges through innovative architectural approaches. One promising direction is the adoption of leaderless architectures combined with object storage. This approach eliminates the need for expensive data movement by leveraging cloud-native storage solutions that simultaneously serve multiple purposes.

    Another key strategy involves rethinking how data is organized and accessed. Rather than maintaining separate infrastructures for streaming and batch processing, companies are moving toward unified platforms that can efficiently handle both workloads. This reduces infrastructure costs and simplifies data governance and access patterns.

  2. Object Store Apps: Cloud Native’s Freshest Architecture

    An increasing number of start-ups and end-users find that using cloud object storage as the persistence layer saves money and engineering time that would otherwise be needed to ensure consistency.

  3. The Feds Push for WebAssembly Security Over eBPF

    According to a National Institute of Standards and Technology (NIST) paper, “A Data Protection Approach for Cloud-Native Applications” (authors: Wesley Hales from LeakSignal; Ramaswamy Chandramouli, a supervisory computer scientist at NIST), WebAssembly could and should be integrated across the cloud native service mesh sphere in particular to enhance security.

  4. Deep Dive Into DeepSeek-R1: How It Works and What It Can Do

    During DeepSeek-R1’s training process, it became clear that by rewarding accurate and coherent answers, nascent model behaviors like self-reflection, self-verification, long-chain reasoning and autonomous problem-solving point to the possibility of emergent reasoning that is learned over time, rather than overtly taught — thus possibly paving the way for further breakthroughs in AI research.

  5. Use Azure Cosmos DB as a Docker container in CI/CD pipelines

    The Linux-based Azure Cosmos DB emulator is available as a Docker container and can run on a variety of platforms, including ARM64 architectures like Apple Silicon. It allows local development and testing of applications without needing an Azure subscription or incurring service costs. You can easily run it as a Docker container, and use it for local development and testing.

Wednesday 2025-02-19 Assorted Links
Assorted Links links
Published: 2025-02-19
Wednesday 2025-02-19 Assorted Links

Assorted links for Wednesday, Febuary 19:

  1. AI used to design a multi-step enzyme that can digest some plastics

    A new paper today describes a success in making a brand-new enzyme with the potential to digest plastics. But it also shows how even a simple enzyme may have an extremely complex mechanism—and one that’s hard to tackle, even with the latest AI tools.

  2. 3 takeaways from red teaming 100 generative AI products
    1. Generative AI systems amplify existing security risks and introduce new ones
    2. Humans are at the center of improving and securing AI
    3. Defense in depth is key for keeping AI systems safe
  3. AIs and Robots Should Sound Robotic

    We have a simple proposal: all talking AIs and robots should use a ring modulator. In the mid-twentieth century, before it was easy to create actual robotic-sounding speech synthetically, ring modulators were used to make actors’ voices sound robotic.

  4. 2025 OWASP Top 10 for LLM Applications: A Quick Guide
    1. LLM01: Prompt injection
    2. LLM02: Sensitive information disclosure
    3. LLM03: Supply chain
    4. LLM04: Data and model poisoning
    5. LLM05: Improper output handling
    6. LLM06: Excessive agency
    7. LLM07: System prompt leakage
    8. LLM08: Vector and embedding weaknesses
    9. LLM09: Misinformation
    10. LLM10: Unbounded consumption
  5. Cloud vs. On-Prem: Which Is Better for Your Kubernetes Cluster?

    Cloud solutions offer unparalleled flexibility and ease of scaling, while on-premises setups provide unmatched control and security for sensitive workloads.

Tuesday 2025-02-18 Assorted Links
Assorted Links links
Published: 2025-02-18
Tuesday 2025-02-18 Assorted Links

Assorted links for Tuesday, Febuary 18:

  1. A brief and incomplete comparison of memory corruption detection tools

    ASAN detects a lot more types of memory errors, but it requires that you recompile everything. This can be limiting if you suspect that the problem is coming from a component you cannot recompile (say because you aren’t set up to recompile it, or because you don’t have the source code). Valgrind and AppVerifier have the advantage that you can turn them on for a process without requiring a recompilation.

  2. Why Mocks Fail: Real-Environment Testing for Microservices
    • Use mocks for edge cases and scenarios requiring controlled inputs.
    • Leverage real environments to validate integration flows, complex API behaviors and performance characteristics against real dependencies.
  3. Emerging Patterns in Building GenAI Products:
    • Direct Prompting: Send prompts directly from the user to a Foundation LLM
    • Embeddings: Transform large data blocks into numeric vectors so that embeddings near each other represent related concepts
    • Evals: Evaluate the responses of an LLM in the context of a specific task
    • Hybrid Retriever: Combine searches using embeddings with other search techniques
    • Query Rewriting: Use an LLM to create several alternative formulations of a query and search with all the alternatives
    • Reranker: Rank a set of retrieved document fragments according to their usefulness and send the best of them to the LLM.
    • Retrieval Augmented Generation (RAG): Retrieve relevant document fragments and include these when prompting the LLM
  4. How Meta discovers data flows via lineage at scale

    In order to build high-quality data lineage, we developed different techniques to collect data flow signals across different technology stacks: static code analysis for different languages, runtime instrumentation, and input and output data matching, etc.

  5. Sam Altman lays out roadmap for OpenAI’s long-awaited GPT-5 model

    GPT-5 will be a system that brings together features from across OpenAI’s current AI model lineup, including conventional AI models, SR models, and specialized models that do tasks like web search and research.

Monday 2025-02-17 Assorted Links
Assorted Links links
Published: 2025-02-17
Monday 2025-02-17 Assorted Links

Assorted links for Monday, Febuary 17:

  1. Time Bandit ChatGPT jailbreak bypasses safeguards on sensitive topics

    A ChatGPT jailbreak flaw, dubbed “Time Bandit,” allows you to bypass OpenAI’s safety guidelines when asking for detailed instructions on sensitive topics, including the creation of weapons, information on nuclear topics, and malware creation.

  2. A US Treasury Threat Intelligence Analysis Designates DOGE Staff as ‘Insider Threat’

    An internal email reviewed by WIRED calls DOGE staff’s access to federal payments systems “the single greatest insider threat risk the Bureau of the Fiscal Service has ever faced.”

  3. Optimizing for Developer Productivity Creates a Winning DevEx

    Developer productivity is not about having 50 tools. It’s about improving experience, speed and productivity with the right kinds of tools.

  4. Microsoft.Testing.Platform: Now Supported by All Major .NET Test Frameworks

    Microsoft.Testing.Platform is a lightweight and portable alternative to VSTest for running tests in all contexts, including continuous integration (CI) pipelines, CLI, Visual Studio Test Explorer, and VS Code Text Explorer. The Microsoft.Testing.Platform is embedded directly in your test projects, and there’s no other app dependencies, such as vstest.console or dotnet test needed to run your tests.

  5. OpenAI’s secret weapon against Nvidia dependence takes shape

    OpenAI is entering the final stages of designing its long-rumored AI processor with the aim of decreasing the company’s dependence on Nvidia hardware, according to a Reuters report released Monday. The ChatGPT creator plans to send its chip designs to Taiwan Semiconductor Manufacturing Co. (TSMC) for fabrication within the next few months, but the chip has not yet been formally announced.

Monday 2025-01-20 Assorted Links
Assorted Links links
Published: 2025-01-20
Monday 2025-01-20 Assorted Links

Assorted links for Monday, January 20:

  1. The Boring Option: Migrating Segment Efforts Storage at Strava
  2. How do you test your tests?
  3. How Facebook keeps its large-scale infrastructure hardware up and running
  4. JUring: Experimental IO_uring For Java With Big Performance Gains

    JUring is a high-performance Java library that provides bindings to Linux’s io_uring asynchronous I/O interface using Java’s Foreign Function & Memory API. Doing Random reads JUring achieves 33% better performance than Java NIO FileChannel operations for local files and 78% better performance for remote files.

  5. How we built the GitHub Skyline CLI extension using GitHub
Thursday 2025-01-16 Assorted Links
Assorted Links links
Published: 2025-01-16
Thursday 2025-01-16 Assorted Links

Assorted links for Thursday, January 16:

  1. Fast commits for ext4

    The Linux 5.10 release included a change that is expected to significantly increase the performance of the ext4 filesystem; it goes by the name “fast commits” and introduces a new, lighter-weight journaling method.

  2. Building Faster AMD64 Memset Routines
  3. How NuGet resolves package dependencies
  4. Maximizing Developer Effectiveness
  5. What’s good about offset pagination; designing parallel cursor-based web APIs
Wednesday 2025-01-15 Assorted Links
Assorted Links links
Published: 2025-01-15
Wednesday 2025-01-15 Assorted Links

Assorted links for Wednesday, January 15:

  1. Cloud PUE: Comparing AWS, Azure and GCP Global Regions

    New data reveals how efficiently the major cloud providers run and cool their data centers – from AWS’s and Azure’s tropical struggles to Google’s industry-leading performance.

  2. The No-Order File System

    In this paper, we introduce the No-Order File System (NoFS), a simple, lightweight file system that employs a novel technique called backpointer based consistency to provide crash consistency without ordering writes as they go to disk.

  3. Whose Code is it Anyway?

    In order to measure the engineering effectiveness of Yelp, we need to measure the effectiveness of its organizations and the teams that make up those organizations. But how do we know what a team is responsible for? We needed a way to assign an owner to something (let’s call this an entity) that we want to measure. Once an entity has an owner, we can collect metrics on that entity and derive the health score (i.e., effectiveness) for that owner. These metrics can then be aggregated by team, organization, or even the entire Engineering division, so that we can identify areas that we can collectively improve. And this is how the Ownership microservice was born.

  4. How we ported Linux to the M1
  5. Nix + Bazel = fully reproducible, incremental builds