Links

Wednesday 2024-06-26 Assorted Links
Assorted Links links
Published: 2024-06-26
Wednesday 2024-06-26 Assorted Links

Assorted links for Wednesday, June 26:

  1. Speed Up Your CI/CD Pipeline with Change-Based Testing in a Yarn-Based Monorepo: I note that only building and testing what changed is one of the core value propositions of Bazel, but adopting Bazel often requires large investment in engineering and training.
  2. What makes a good REST API?
  3. How to use DORA metrics to improve software delivery
  4. Don’t Get Lost in the Metrics Maze: A Practical Guide to SLOs, SLIs, Error Budgets, and Toil
  5. Static B-Trees

    In this section, we generalize the techniques we developed for binary search to static B-trees and accelerate them further using SIMD instructions. In particular, we develop two new implicit data structures:

    • The first is based on the memory layout of a B-tree, and, depending on the array size, it is up to 8x faster than std::lower_bound while using the same space as the array and only requiring a permutation of its elements.
    • The second is based on the memory layout of a B+ tree, and it is up to 15x faster than std::lower_bound while using just 6-7% more memory — or 6-7% of the memory if we can keep the original sorted array.
Tuesday 2024-06-25 Assorted Links
Assorted Links links
Published: 2024-06-25
Tuesday 2024-06-25 Assorted Links

Assorted links for Tuesday, June 25:

  1. Radioactive drugs strike cancer with precision

    Pluvicto and Lutathera are both built around small protein sequences, known as peptides. These peptides specifically bind to target receptors on cancer cells—PSMA in the case of prostate cancer and somatostatin receptors in the case of Lutathera—and deliver radiation through the decay of unstable lutetium.

    Administered via infusion into the bloodstream, these drugs circulate throughout the body until they firmly attach to the surfaces of tumor cells they encounter. Anchored at these target sites, the lutetium isotope then releases two types of radiation that aid in cancer treatment. The primary emission consists of beta particles, high-energy electrons capable of penetrating tumors and surrounding cells, tearing into DNA and causing damage that ultimately triggers cell death.

  2. Amazon Exploring MM-Local Memory Allocations To Help With Current/Future Speculation Attacks

    Back in 2019 after various speculation-based CPU vulnerabilities began coming to light, Amazon engineers proposed process-local memory allocations for hiding KVM secrets. They were striving for an alternative mitigation for vulnerabilities like L1TF by essentially providing some memory regions for kernel allocations out of view/access from other kernel code. Amazon engineers this week laid out a new proposal after five years of ongoing Linux kernel improvements for MM-local memory allocations for dealing with current and future speculation-based cross-process attacks.

  3. TypeSpec: An API design language that either competes with, or augments, OpenAPI.
  4. Optimize Kubernetes Pods’ Startup Time Using VolumeSnapshots: If your K8S application uses enormous, static data sources, using VolumeSnapshots may speed up its launch time significantly.
  5. Building a GitOps CI/CD Pipeline with GitHub Actions (SOC 2)
Monday 2024-06-24 Assorted Links
Assorted Links links
Published: 2024-06-24
Monday 2024-06-24 Assorted Links

Assorted links for Monday, June 24:

  1. The time smart quotes prevented the entire Office division from committing code
  2. Video annotator: a framework for efficiently building video classifiers using vision-language models and active learning

    We introduce a novel framework, Video Annotator (VA), which leverages active learning techniques and zero-shot capabilities of large vision-language models to guide users to focus their efforts on progressively harder examples, enhancing the model’s sample efficiency and keeping costs low.

    VA seamlessly integrates model building into the data annotation process, facilitating user validation of the model before deployment, therefore helping with building trust and fostering a sense of ownership. VA also supports a continuous annotation process, allowing users to rapidly deploy models, monitor their quality in production, and swiftly fix any edge cases by annotating a few more examples and deploying a new model version.

  3. PVF: A novel metric for understanding AI systems’ vulnerability against SDCs in model parameters

    Parameter vulnerability factor (PVF) is a novel metric we’ve introduced with the aim to standardize the quantification of AI model vulnerability against parameter corruptions.

  4. Keeping main green in a monorepo
  5. Researchers describe how to tell if ChatGPT is confabulating

    …[T]he researchers focus on what they call semantic entropy. This evaluates all the statistically likely answers evaluated by the LLM and determines how many of them are semantically equivalent. If a large number all have the same meaning, then the LLM is likely uncertain about phrasing but has the right answer. If not, then it is presumably in a situation where it would be prone to confabulation and should be prevented from doing so.

Friday 2024-06-21 Assorted Links
Assorted Links links
Published: 2024-06-21
Friday 2024-06-21 Assorted Links

Assorted links for Friday, June 21:

  1. MLow: Meta’s low bitrate audio codec

    After nearly two years of active development and testing, we are proud to announce Meta Low Bitrate audio codec, aka MLow, which achieves two-times-better quality than Opus (POLQA MOS 1.89 vs 3.9 @ 6kbps WB). Even more importantly, we are able to achieve this great quality while keeping MLow’s computational complexity 10 percent lower than that of Opus.

  2. Unlocking the power of unstructured data with RAG

    To make the most of their unstructured data, development teams are turning to retrieval-augmented generation, or RAG, a method for customizing large language models (LLMs). They can use RAG to keep LLMs up to date with organizational knowledge and the latest information available on the web. They can also use RAG and LLMs to surface and extract insights from unstructured data.

  3. LXC vs. Docker: Which One Should You Use?

    LXC is not typically used for application development but for scenarios requiring full OS functionality or direct hardware integration. Its ability to provide isolated and secure environments with minimal overhead makes it suitable for infrastructure virtualization where traditional VMs might be too resource-intensive.

    Docker’s utility in supporting rapid development cycles and complex architectures makes it a valuable tool for developers aiming to improve efficiency and operational consistency in their projects.

  4. AES-GCM and breaking it on nonce reuse
  5. Next-Level Boilerplate: An Inside Look Into Our .Net Clean Architecture Repo

    Clean architecture is a widely adopted opinionated way to structure your code and to separate the concerns of the application into layers. The main idea is to separate the business logic from the infrastructure and presentation layers.

Thursday 2024-06-20 Assorted Links
Assorted Links links
Published: 2024-06-20
Thursday 2024-06-20 Assorted Links

Assorted links for Thursday, June 20:

  1. How we improved push processing on GitHub

    A push triggers a Kafka event, which is fanned out via independent consumers to many isolated jobs that can process the event without worrying about any other consumers.

  2. Leveraging Rust in High-Performance Web Services

    Rust’s ownership model is a fundamental feature that enhances both speed and safety. Every value in Rust has a unique owner, responsible for its cleanup when it’s no longer needed. This eliminates the need for a garbage collector and ensures efficient memory management. The ownership rules are enforced at compile time, which means there’s no runtime overhead.

  3. systemd 256 Released With run0, systemd-vpick, importctl & Other New Features
  4. Maintaining large-scale AI capacity at Meta

    Outside of special cases, Meta maintains its fleet of clusters using a technique called maintenance trains. This is used for all capacity, including compute and storage capacity. A small number of servers are taken out of production and maintained with all applicable upgrades. Trains provide the guarantee that all capacity minus one maintenance domain is up and running 24/7, thus providing capacity predictability. This is mandatory for all capacity that is used for online and recurring training.

  5. How Meta trains large language models at scale
Wednesday 2024-06-19 Assorted Links
Assorted Links links
Published: 2024-06-19
Wednesday 2024-06-19 Assorted Links

Assorted links for Wednesday, June 19:

  1. Arm64 on GitHub Actions: Powering faster, more efficient build systems

    Developers can now take advantage of Arm-based hardware hosted by GitHub to build and deploy their release assets anywhere Arm architecture is used. Best of all, these runners are priced at 37% less than our x64 Linux and Windows runners.

  2. Develop Kubernetes Operators in Java without Breaking a Sweat
  3. The Energy Footprint of Humans and Large Language Models

    Assuming an 8-hour workday and considering 260 workdays per year brings the annual energy cost of one person’s hour of daily work to around 6 kWh[a].

    Now for the energy cost of running an LLM. We have set a target of 250 words in an hour. LLMs generate tokens, parts of words, so if we use the standard ratio (for English) of 0.75 words per token, our target for one hour of work is around 333 tokens. Measurements with Llama 65B reported around 4 Joules per output token [4]. This leads to 1,332 Joules for 333 tokens, about 0.00037 kWh.

  4. Microsoft is reworking Recall after researchers point out its security problems

    Microsoft’s upcoming Recall feature in Windows 11 has generated a wave of controversy this week following early testing that revealed huge security holes. The initial version of Recall saves screenshots and a large plaintext database tracking everything that users do on their PCs, and in the current version of the feature, it’s trivially easy to steal and view that database and all of those screenshots for any user on a given PC, even if you don’t have administrator access. Recall also does little to nothing to redact sensitive information from its screenshots or that database.

    First and most significantly, the company says that Recall will be opt-in by default, so users will need to decide to turn it on. It may seem like a small change, but many users never touch the defaults on their PCs, and for Recall to be grabbing all of that data by default definitely puts more users at risk of having their data stolen unawares.

    The company also says it’s adding additional protections to Recall to make the data harder to access. You’ll need to enable Windows Hello to use Recall, and you’ll need to authenticate via Windows Hello (whether it’s a face-scanning camera, fingerprint sensor, or PIN) each time you want to open the Recall app to view your data.

  5. Building Generative AI apps with .NET 8
Tuesday 2024-06-18 Assorted Links
Assorted Links links
Published: 2024-06-18
Tuesday 2024-06-18 Assorted Links

Assorted links for Tuesday, June 18:

  1. Composable data management at Meta

    By providing a reusable, state-of-the-art execution engine that is engine- and dialect-agnostic (i.e, it can be integrated with any data system and extended to follow any SQL-dialect semantic), Velox quickly received attention from the open-source community. Beyond our initial collaborators from IBM/Ahana, Intel, and Voltron Data, today more than 200 individual collaborators from more than 20 companies around the world participate in Velox’s continued development.

  2. New warp drive concept does twist space, doesn’t move us very fast

    A team of physicists has discovered that it’s possible to build a real, actual, physical warp drive and not break any known rules of physics. One caveat: the vessel doing the warping can’t exceed the speed of light, so you’re not going to get anywhere interesting any time soon. But this research still represents an important advance in our understanding of gravity.

  3. Biggest Windows 11 update in 2 years nearly finalized, enters Release Preview

    Windows 11 24H2 includes an updated compiler, kernel, and scheduler, all lower-level system changes made at least in part to better support Arm-based PCs. Existing Windows-on-Arm systems should also see a 10 or 20 percent performance boost when using x86 applications, thanks to improvements in the translation layer (which Microsoft is now calling Prism).

    There are more user-visible changes, too. 24H2 includes Sudo for Windows, the ability to create TAR and 7-zip archives from the File Explorer, Wi-Fi 7 support, a new “energy saver” mode, and better support for Bluetooth Low Energy Audio. It also allows users to run the Copilot AI chatbot in a regular resizable window that can be pinned to the taskbar instead of always giving it a dedicated strip of screen space.

  4. BitKeeper, Linux, and licensing disputes: How Linus wrote Git in 14 days
  5. Another US state repeals law that protected ISPs from municipal competition

    Minnesota this week eliminated two laws that made it harder for cities and towns to build their own broadband networks. The state-imposed restrictions were repealed in an omnibus commerce policy bill signed on Tuesday by Gov. Tim Walz, a Democrat.

    Minnesota was previously one of about 20 states that imposed significant restrictions on municipal broadband. The number can differ depending on who’s counting because of disagreements over what counts as a significant restriction. But the list has gotten smaller in recent years because states including Arkansas, Colorado, and Washington repealed laws that hindered municipal broadband.

    The Minnesota bill enacted this week struck down a requirement that municipal telecommunications networks be approved in an election with 65 percent of the vote. The law is over a century old, the Institute for Local Self-Reliance’s Community Broadband Network Initiative wrote yesterday.

Monday 2024-06-17 Assorted Links
Assorted Links links
Published: 2024-06-17
Monday 2024-06-17 Assorted Links

Assorted links for Monday, June 17:

  1. General Availability of .NET Aspire: Simplifying .NET Cloud-Native Development

    .NET Aspire brings together tools, templates, and NuGet packages that help you build distributed applications in .NET more easily.

  2. .NET Announcements and Updates from Microsoft Build 2024

    Here’s a look at our updates & announcements:

    • Artificial Intelligence: End-to-end scenarios for building AI-enabled applications, embracing the AI ecosystem, and deep integration with cloud services.
    • .NET Aspire: for building cloud-native distributed applications, releasing today.
    • C# 13: Improvements to much loved C# features to make them even better for you.
    • Performance: Reducing memory and execution time with critical benchmarks.
    • Enhancements to .NET libraries and frameworks including ASP.NET Core, Blazor, .NET MAUI, and more.
  3. We get more useful energy out of renewables than fossil fuels

    A new study by researchers at the UK’s University of Leeds, however, suggests that … renewables already produce more net energy than the fossil fuels they’re displacing. The key to understanding why is that it’s much easier to do useful things with electricity than it is with a hunk of coal or a glob of crude oil.

  4. Docker Documentation Gets an AI-Powered Assistant

    We recently launched a new tool to enhance Docker documentation: an AI-powered documentation assistant incorporating kapa.ai. Docker Docs AI is designed to get you the information you need by providing instant, accurate answers to your Docker-related questions directly within our documentation pages.

  5. FUSE Adds VirtIO-FS Multi-Queue For ~5x Performance Win With Linux 6.10

    With making use of multiple queues, the VirtIO-FS file-system code can be up to 5~5.5x faster for read and write performance.

Friday 2024-06-14 Assorted Links
Assorted Links links
Published: 2024-06-14
Friday 2024-06-14 Assorted Links

Assorted links for Friday, June 14:

  1. Updated Intel Meteor Lake Tuning For Linux Shows Huge Performance/Power Improvements. It is a minor tweak to the default Energy Performance Preference (EPP) value within the Intel P-State CPU frequency scaling driver.

    It’s like magic with one line of code changed in the Linux kernel that Intel is reporting up to 19% performance improvement for Intel Core Ultra “Meteor Lake” and up to an 11% improvement in performance per Watt. Or in another EPP mode, the power consumption during video playback can be reduced by 52%!

  2. These light paintings let us visualize invisible clouds of air pollution

    Light painting is a technique used in both art and science that involves taking long-exposure photographs while moving some kind of light source—a small flashlight, perhaps, or candles or glowsticks—to essentially trace an image with light. A UK collaboration of scientists and artists has combined light painting with low-cost air pollution sensors to visualize concentrations of particulate matter (PM) in select locations in India, Ethiopia, and Wales. The objective is to creatively highlight the health risks posed by air pollution, according to a new paper published in the journal Nature Communications.

  3. GPT-4 beats psychologists on a new test of social intelligence

    There were significant differences in SI between psychologists and AI’s ChatGPT-4 and Bing. ChatGPT-4 exceeded 100% of all the psychologists, and Bing outperformed 50% of PhD holders and 90% of bachelor’s holders. The differences in SI between Google Bard and bachelor students were not significant, whereas the differences with PhDs were significant; Where 90% of PhD holders excel on Google Bird.

  4. Wasm vs. Docker: Performant, Secure, and Versatile Containers
  5. Battery Arbitrage

    NYTimes: Since 2020, California has installed more giant batteries than anywhere in the world apart from China. They can soak up excess solar power during the day and store it for use when it gets dark.

    Those batteries play a pivotal role in California’s electric grid, partially replacing fossil fuels in the evening. Between 7 p.m. and 10 p.m. on April 30, for example, batteries supplied more than one-fifth of California’s electricity and, for a few minutes, pumped out 7,046 megawatts of electricity, akin to the output from seven large nuclear reactors.