r/accelerate 16d ago

News Altman says young people today are the luckiest ever AI will send them to space for work

Thumbnail
fortune.com
53 Upvotes

r/accelerate 17d ago

News Doom, Inc.: The well-funded global movement that wants you to fear AI - The Logic

Thumbnail
thelogic.co
63 Upvotes

r/accelerate 17d ago

News AI will forever transform the doctor-patient relationship

Thumbnail archive.ph
61 Upvotes

r/accelerate 11d ago

News Reuters: 71% of people are concerned AI will replace their job

Thumbnail
reuters.com
84 Upvotes

Disconcerting numbers.

  • 71% concerned AI will take job
  • 66% concerned AI will replace relationships
  • 61% concerned about AI increasing electricity consumption

Questions for the Community:

  • Do these percentages line up with what you’re hearing IRL?

  • Which fear (job loss, social isolation, or energy-drains) will move the political needle fastest and shape regulation?

  • If public sentiment turns sharply negative, how does that affect accelerate deployment timelines?

r/accelerate 5d ago

News The Hill: "Companies have invested billions into AI, 95% getting zero return" | This is a wildly misleading headline. Explanation included.

70 Upvotes

This is a wildly misleading headline that completely misrepresents what the report (which the vast majority of people sharing this article haven't even read) actually showed.

In reality, the study used a very small sample of 52 organizations (they never said which ones, or how these organizations were selected).

They found that over the 6 month period the study covered, that 90% of the custom enterprise AI solutions failed to show a return. Meanwhile, they also found that 40% of the integrations of general LLM tools (ChatGPT, etc) DID show a positive return, and that moreover, 90% of their employees were using AI tools every day and finding AI tools helpful to perform their jobs.

r/accelerate 2d ago

News Wojciech Zaremba: "It’s rare for competitors to collaborate. Yet that’s exactly what OpenAI and @AnthropicAI just did—by testing each other’s models with our respective internal safety and alignment evaluations. Today, we’re publishing the results. Frontier AI companies will inevitably compete on

Thumbnail x.com
55 Upvotes

r/accelerate 5d ago

News Ezra Klein's NYT piece on GPT-5's responses and their implications

Thumbnail
nytimes.com
67 Upvotes

From the Article:

"The knock on GPT-5 is that it nudges the frontier of A.I. capabilities forward rather than obliterates previous limits. I’m not here to argue otherwise. OpenAI has been releasing new models at such a relentless pace — the powerful o3 model came out four months ago — that it has cannibalized the shock we might have felt if there had been nothing between the 2023 release of GPT-4 and the 2025 release of GPT-5.

But GPT-5, at least for me, has been a leap in what it feels like to use an A.I. model. It reminds me of setting up thumbprint recognition on an iPhone: You keep lifting your thumb on and off the sensor, watching a bit more of the image fill in each time, until finally, with one last touch, you have a full thumbprint. GPT-5 feels like a thumbprint."

r/accelerate 8d ago

News OpenAI Teams Up with Retro Biosciences to Boost Longevity with Advanced Yamanaka Factors

Thumbnail x.com
58 Upvotes

Exciting news from OpenAI and Retro Biosciences! They’ve used AI (GPT-4b micro) to enhance Yamanaka factors, achieving a 50x boost in reprogramming efficiency to rewind cells to a youthful state, with improved DNA repair potential.

r/accelerate 7d ago

News Free veo generations this weekend only. Post your creations in this sub.

Post image
46 Upvotes

r/accelerate 4d ago

News Elon Musk's xAI secretly dropped its benefit corporation status while fighting OpenAI

Thumbnail
cnbc.com
19 Upvotes

r/accelerate 11d ago

News Sam Altman admits OpenAI ‘totally screwed up’ its GPT-5 launch and says the company will spend trillions of dollars on data centers

Thumbnail
fortune.com
48 Upvotes

r/accelerate 1d ago

News Daily AI Archive 8/28/2025

16 Upvotes
  • OpenAI launched a $50M People-First AI Fund to support U.S.-based nonprofits and community organizations, with applications open from Sept 8 to Oct 8, 2025. The grants aim to foster innovation and resilience, especially in areas like education, healthcare, and economic opportunity, with a focus on creative uses of AI. https://openai.com/index/supporting-nonprofit-and-community-innovation/
  • OpenAI GA’d the Realtime API and introduced gpt-realtime (speech-to-speech) with MCP server support, image input, SIP calling, reusable prompts, async function calls, context controls, and two new voices (Cedar, Marin); internal evals: Big Bench Audio 82.8%, MultiChallenge 30.5%, ComplexFuncBench 66.5%; pricing cut ~20% to $32/1M audio input tokens ($0.40 cached) and $64/1M audio output; EU data residency and safety guardrails. https://openai.com/index/introducing-gpt-realtime/
  • Anthropic is adding a revocable opt-in that lets chats and Claude Code from Free/Pro/Max accounts train new LMs and extends retention from 30 days to 5 years for opted-in sessions, applying only to new or resumed activity; Work, Gov, Education, and API traffic stay excluded. Users must pick a setting by September 28, 2025 to continue; you can change it anytime, and if you later turn it off, Anthropic stops using future data but cannot pull your data from models already trained or runs already underway. https://www.anthropic.com/news/updates-to-our-consumer-terms; https://www.anthropic.com/legal/non-user-privacy-policy
  • Microsoft released two in-house models: MAI-Voice-1, a high-fidelity, multi-speaker TTS that generates ~60 s of audio in <1 s on a single GPU, now powering Copilot Daily and Podcasts and available in Copilot Labs; and MAI-1-preview, an instruction-following MoE foundation LM trained end-to-end and post-trained across ~15,000 NVIDIA H100s, now live for public eval on LMArena, with limited API access for trusted testers and near-term Copilot text deployments. Voice-1 targets expressive narration and dialogue; the preview LM focuses on helpful, aligned responses, with rapid iteration planned through user feedback. MAI emphasizes a product strategy that orchestrates multiple specialized models, not a single monolith, mixing in-house, partner, and open-source systems. The org’s next-gen GB200 cluster is operational, signaling aggressive scaling beyond H100 and a pipeline for larger, faster updates. https://microsoft.ai/news/two-new-in-house-models/
  • xAI released grok-code-fast-1 a fast, low-cost reasoning LM for agentic coding, built from a new architecture with programming-heavy pretraining and post-training on real PRs, and it natively drives grep, terminal, and file edits in IDEs. Serving is tuned for low-latency tool loops with >90% prompt-cache hit rates in partner integrations, yielding a feel where dozens of tools fire before you finish the first paragraph of the thinking trace. It is strong across TS, Python, Java, Rust, C++, and Go, handling zero-to-one builds, codebase Q&A, and surgical bug fixes with minimal oversight. Availability: free for a limited time on GitHub Copilot, Cursor, Cline, Roo Code, Kilo Code, opencode, and Windsurf; API pricing is $0.20 per 1M input, $1.50 per 1M output, $0.02 per 1M cached input. Reported results include 70.8% on SWE-Bench-Verified via an internal harness, a stealth rollout as “sonic” with multiple checkpoints, and a near-term variant in training for multimodal inputs, parallel tool calling, and longer context; if these hold in real IDE loops, iteration time collapses and agentic coding trends toward default-grade automation. https://x.ai/news/grok-code-fast-1
  • AI2 released OLMoASR, a fully open ASR family (39M–1.5B params) trained from scratch on a curated 1M-hour dataset distilled from a 3M-hour pool, with every layer—data, filtering code, model weights, and evaluation—public. Across 21 unseen short- and long-form tests, the models match or nearly match Whisper’s zero-shot WER (e.g., OLMoASR-medium ≈ Whisper-medium; large-v2 closes the gap to ~0.4%), highlighting data curation as the main driver and providing a reproducible platform for ASR research. https://allenai.org/blog/olmoasr; models: https://huggingface.co/allenai/OLMoASR; code: https://github.com/allenai/OLMoASR
  • Apple (holy hell Apple releasing a PAPER?) | MobileCLIP2: Improving Multi-Modal Reinforced Training - MobileCLIP2 upgrades multi-modal reinforced training end to end: swap the base to DFN, replace OpenAI+DataComp teachers with a tuned DFN ensemble (ViT-L/14 + s39b) using per-teacher temperature for contrastive KD, pretrain CoCa on DFN-2B then fine-tune on MSCOCO-38k (plus ablate DOCCI/GBC/DCI) to boost caption diversity without hurting robustness, and pack the reinforced DFNDR datasets with 30 image augmentations and 5 captions per image so offline distillation stays compute-flat but 3.3–5× more sample-efficient than prior DataComp/DFN baselines and up to 1.7× at 13B seen. Architecture-wise, new 5-stage FastViT encoders (MCi3/4) shift heavy ops deeper to shrink latency at higher input resolutions and fill the speed/size gap between S2 and L; beam search and longer caption contexts bring no gain, while mixing captions from multiple captioners yields only additive but small improvements. Results: MobileCLIP2-S4 hits SigLIP-SO400M/14 zero-shot on IN-1k at half the parameters and outruns DFN ViT-L/14 at 2.5× lower latency; MobileCLIP2-B adds 2.2% IN-1k over MobileCLIP-B; S0/S2 set SoTA in the 3–7 ms regimes. Released code and scalable DR tooling make spinning new teacher ensembles and datasets trivial, pushing on-device VLM toward ubiquitous, low-latency intelligence without ceding accuracy. https://arxiv.org/abs/2508.20691; models: https://huggingface.co/collections/apple/mobileclip2-68ac947dcb035c54bcd20c47
  • StepFun released Step-Audio 2 it’s a SoTA end-to-end audio LM that ingests raw speech and emits interleaved text+audio tokens, coupling a frozen 25 Hz encoder with a 2× adaptor to 12.5 Hz, a CosyVoice 2 tokenizer (+6.6k audio tokens), and a flow-matching detokenizer with HiFi-GAN; history is prefilled for streaming, and external tools include web, weather, time, and a large audio search for timbre/style retrieval. Training stacks 1.356T tokens over 21 days: 100B ASR to align the adaptor, then 128B text + 128B audio to embed audio tokens, then 800B mixed data spanning ASR, TTS, S2TT, S2ST, continuations, and speech conversation, then a 200B cooldown with multilingual ASR, paralinguistics, and synthetic dialogues across ~50k speakers. SFT adds 4B tokens over curated ASR, AudioSet/AudioCaps QA, detailed paralinguistic captioning, CoVoST2 and CVSS pairs, scripted tool-call dialogues, and conversation synthesis. RL sharpens reasoning via two-stage PPO that rewards concise thinking, then learned preference scoring, followed by 400-iteration GRPO; actor lr 1e−6, critic lr 2.5e−6, batch 64. Results: SoTA or parity on ASR, paralinguistics (StepEval-Audio-Paralinguistic), audio understanding (MMAU), zh↔en S2TT and S2ST, tool calling (StepEval-Audio-Toolcall), and URO-Bench speech conversation. Step-Audio 2 mini (8.32B, Apache 2.0), initialized from Qwen2.5-7B with the Qwen2-Audio encoder, reproduces most gains with only web tool support and is available with scripts for local and realtime demos. This design proves that fully interleaved token generation plus retrieval-equipped tooling and RL can unlock low-latency, expressive, knowledge-grounded voice agents that scale with data and crush legacy cascades. https://arxiv.org/abs/2507.16632; Models: https://huggingface.co/collections/stepfun-ai/step-audio-2-68b003c3a47b273fffaf67a8

let me know if I missed anything

r/accelerate 3d ago

News Daily AI Archive 8/26/2025

21 Upvotes
  • Google has released gemini-2.5-flash-image-preview (codename: nano-banana) after lots of teasing with bananas on Twitter, and it's insanely good. It has pixel-perfect editing, and since it's a native model, it's really smart too, unlike most other image editing models. However, it does have some flaws compared to GPT-4o. For example, it's watermarked, which is super annoying, it can’t make transparent images, it doesn't know as many concepts, it's super low resolution, and it pretty much requires reference images. It's super censored (yes, even compared to GPT-4o, which is already really censored), but it's super FAST and has the best consistency I’ve ever seen. So if pixel-perfect consistency is important for your use case, definitely use this. It's amazing for that, absolutely no competition. If not, GPT-4o is probably still better. https://x.com/googleaistudio/status/1960344388560904213; https://blog.google/products/gemini/updated-image-editing-model/
  • Anthropic says educators are adopting AI tools like Claude primarily for curriculum development, research support, and administrative tasks, often using AI as a collaborator rather than full automation. However, grading remains contentious, nearly half of grading-related uses show heavy automation despite faculty viewing it as AI’s least effective and most ethically fraught application. https://www.anthropic.com/news/anthropic-education-report-how-educators-use-claude
  • AI2 launches Asta, a full-stack scientific agent ecosystem spanning agentic research assistants, AstaBench, and Asta resources, engineered for transparent, reproducible, cost-aware science: agents plan, execute, iterate, and cite every claim; AstaBench standardizes evaluation across 2,400+ problems in literature, code+execution, data analysis, and end-to-end discovery, reports Pareto frontiers over accuracy vs compute cost, enforces date-restricted retrieval on a 200M+ paper corpus, and runs in an Inspect-powered environment with agent-eval for time-invariant pricing and traceable logs; initial tests of 57 agents across 22 architectures show only 18 handle all tasks, with Asta v0 (mixture-of-LMs routed to 5 specialist helpers using claude-sonnet-4, gemini-2.0-flash, o3, gpt-4.1, gpt-4o) at 53%, ~10 points above ReAct-gpt-5, while cheap ReAct-claude-3-5-haiku hits 20% at $0.03 per problem and ReAct-gpt-5-mini reaches 31% at $0.04, revealing steep cost-accuracy tradeoffs; data analysis is hardest (<34%), literature understanding is most mature, Asta Paper Finder and Scholar QA lead search and QA, and model-agent interactions are nontrivial, with open-weight models far behind and gpt-5 seemingly tuned for ReAct control; Asta resources ships open agents, post-trained science LMs, the Scientific Corpus Tool exposing dense and sparse search plus graph-walking via MCP, and a sandboxed Computational Notebook, with upcoming skills for experiment replication, hypothesis generation, and scientific programming; net effect is a rigorous, open, production-grade substrate to compress the science loop from question to verified insight while making capability and cost legible, accelerating the removal of human-only research bottlenecks. https://allenai.org/blog/asta; https://allenai.org/blog/astabench; https://huggingface.co/spaces/allenai/asta-bench-leaderboard; https://www.datocms-assets.com/64837/1756213171-astabench-16.pdf
  • Qwen released Wan2.2-S2V-14B it converts audio plus a single reference image into cinematic human video by training a 14B DiT-based S2V model with Flow Matching on 3D-VAE latents, injecting audio using Wav2Vec with learnable layer fusion, causal temporal compression, and per-frame segment attention to visual tokens, which preserves tight lip sync and expressive micro-gestures without the cost of full 3D cross-attention; long-horizon stability comes from Motion Frames and FramePack, which compresses older context more aggressively so more history conditions each clip, maintaining identity, motion direction, and camera continuity across segments; prompts steer global scene and camera while audio controls local expressions and limb dynamics, with optional pose_video for explicit choreography; data is built via human-centric mining and rigorous filtering, including pose tracking (ViTPose→DWPose), clarity and motion scoring, face/hand sharpness checks, aesthetic ranking, subtitle-occlusion OCR, active-speaker verification (Light-ASD), and dense motion-centric captions from Qwen-VL2.5-72B; training uses hybrid parallelism, combining FSDP sharding with Context Parallelism (RingAttention+Ulysses) on 8×80GB, cutting iteration time ~100 s to ~12 s, supporting variable-length tokens and up to 48 frames at 1024×768 through a staged schedule from audio-encoder pretrain to SFT; results surpass OmniHuman and Hunyuan-Avatar on identity consistency under large motion and reach SOTA on frame and video quality with strong sync and identity metrics, while specialized baselines may retain advantages on certain hand-motion statistics; inference supports 480p or 720p, automatic length by audio, num_clip for previews, and pose+audio drives for precise edits and long-form continuity, making S2V a practical route from raw audio to studio-grade sequences. If these claims hold under open replication, S2V compresses the pipeline for audio-driven, multi-shot, cinema-consistent character video and accelerates end-to-end automated content production. https://huggingface.co/Wan-AI/Wan2.2-S2V-14B; paper: https://humanaigc.github.io/wan-s2v-webpage/content/wan-s2v.pdf
  • Helping people when they need it most - OpenAI are planning to broaden interventions beyond self-harm, adding reality-grounding for risky states (e.g., mania), making safeguards persistent across long/multi-session chats, tightening classifiers, and localizing resources with one-click emergency access. They aim to connect people earlier to human help via direct access to licensed therapists and one-click outreach to trusted contacts, with an opt-in for the assistant to notify a designated person in severe cases. For teens, they’ll add age-aware guardrails and parental controls and allow a teen-designated emergency contact; these upgrades are supported by GPT-5’s “safe completions.” https://openai.com/index/helping-people-when-they-need-it-most/
  • Google Translate is adding Gemini-powered real-time live conversation translation in 70+ languages (available today in the U.S., India, and Mexico) and a customizable speaking/listening practice beta that adapts to skill level (initially for English speakers learning Spanish/French and for Spanish, French, and Portuguese speakers learning English), with improvements to quality, multimodal translation, and TTS. Basically Google Translate is Duolingo now I guess which is cool https://blog.google/products/translate/language-learning-live-translate/
  • You can now customize the emoji in your NotebookLM notebooks… cool… I guess? https://x.com/NotebookLM/status/1960430881203712472
  • OpenAI has made some improvements to the responses API 1. Domain filtering to focus on specific sources 2. Source reporting 3. Pricing: $10/1K calls (down from $25 which is pretty huge actually) https://x.com/OpenAIDevs/status/1960425260576334274
  • Nous Research has released Hermes 4 today (and the technical report yesterday but was announced today) Hermes 4 releases open-weight hybrid reasoner LMs with structured multi-step reasoning and strong instruction following; all weights are public. It trains on ~5M samples (19B tokens) combining 3.5M reasoning with 1.6M non-reasoning items, enabling ~16k-token thinking traces. DataForge generates tasks via random walks on a PDDL-style DAG of struct→struct nodes; seed data is deduped by ModernBert at 0.7 cosine and filtered by an LM judge. Verified trajectories are built by rejection sampling against ~1k task verifiers in Atropos, with environments for strict answer-formatting, dynamic JSON schema validation, and interleaved tool use inside <think>. Training initializes from Llama 3.1 405B/70B and Qwen3 14B on modified TorchTitan; First-Fit Decreasing pre-packing and Flex Attention isolate per-sample attention, loss applies only to assistant tokens; runs use 192 B200s with a cosine schedule and 9k steps. Overlong reasoning is controlled by a second SFT that forces </think> at 30k tokens while masking everything except </think> and <eos>, teaching a counting policy that cuts length with minor accuracy tradeoffs. A single OpenAI-compatible endpoint standardizes lighteval and Atropos evals, and behavior shows frontier-level math/code with fewer refusals on RefusalBench plus higher contextual fidelity than peers. TL;DR: its not SoTA on intelligence but its high uncensored and good at creative writing and following instructions kinda disappointing they made it based on Llama 3 instead of Qwen 3 which would have been way better models and paper: https://huggingface.co/collections/NousResearch/hermes-4-collection-68a731bfd452e20816725728; evals: https://huggingface.co/collections/NousResearch/hermes-4-evaluations-68a72e80ad150b5dcf7586b6
  • Anthropic is testing a Claude extension for Chrome that lets Claude take actions in the browser with 1,000 Max plan users. Early experiments showed vulnerabilities to prompt injection attacks, but new safeguards such as permissions, confirmations, blocked sites, and classifiers reduced attack success rates from 23.6% to 11.2% and some browser-specific attacks to 0%. The research preview seeks real-world feedback to refine defenses before wider release, with testers advised to avoid sensitive use cases. https://www.anthropic.com/news/claude-for-chrome
  • New OpenAI Codex update 0.24.0 Added message queuing, image copy/paste & drag-drop, transcript mode, resume/edit conversations, and explicit web search. TUI improvements include hiding CoT, better diff display, simpler command approval, unified interrupt handling, and Powershell paste fix. Tooling changes add support for long-running commands, more reliable patching, capped retries, and better caching. Misc updates cover GPT-5 verbosity config, improved git/agents handling, and clearer error messages. https://github.com/openai/codex/releases/tag/rust-v0.24.0
  • OpenAI has clarified that political content aimed at broad or unspecified audiences is now allowed, so long as it is not manipulative toward a specific group or individual, and general persuasive political content is also permitted under the same condition. They explicitly declined to allow tailored or individualized political content because of risks around manipulation, and while they acknowledge broad support for erotica for consenting adults, they are deferring it until they can address safety and deployment concerns. Looking ahead, they plan to revisit erotica with the goal of enabling it responsibly, maintain a cautious stance on political personalization, and explore offering multiple sets of default model behaviors that reflect different value systems rather than a single universal default. TL;DR: lots of people want erotic content for ChatGPT and OpenAI said they arent opposed to it but they want to take more time to make sure they can make it safe so in the possibly soon future ChatGPT will get erotic mode https://openai.com/index/collective-alignment-aug-2025-updates/

pretty big day, but let me know if I missed anything else to make it even bigger!

r/accelerate 2d ago

News Daily AI Archive 8/27/2025

7 Upvotes
  • Anthropic paper | Detecting and countering misuse of AI: August 2025 - Agentic LMs now execute full-spectrum intrusion and fraud: a vibe hacking crew ran Claude Code with a persistent CLAUDE.md to encode TTPs, automate OSINT targeting, scan VPNs, enumerate AD, steal creds, move laterally, build evasion malware (obfuscated Chisel, new TCP proxies masked as MSBuild.exe), exfiltrate data, price ransoms, and drop boot-embedded HTML notes; NK operators simulate competence to pass interviews and ship daily work; a UK no-code RaaS ships ChaCha20+RSA with FreshyCalls/RecycledGate and shadow copy wipes; a China actor spans 12 ATT&CK tactics; AI now powers MCP stealer-log profiling, carding stores, romance bots, and synthetic IDs. Mitigations include bans, tailored classifiers, malware-gen detection, and IOC sharing, but the skill curve is collapsing to zero, so defense must field autonomous, continuously learning counter-agents at internet scale. https://www.anthropic.com/news/detecting-countering-misuse-aug-2025; https://www-cdn.anthropic.com/b2a76c6f6992465c09a6f2fce282f6c0cea8c200.pdf
  • Anthropic launched a National Security Advisory Council with 11 senior U.S. natsec leaders to shape AI use in defense, intelligence, and science, tied to Claude Gov models, a $200M DoD deal, 10k LLNL users, NNSA safeguards, $1 gov access, and joint model stress-testing for bio, cyber, and R&D risks. https://www.anthropic.com/news/introducing-the-anthropic-national-security-and-public-sector-advisory-council
  • Google has integrated Gemini CLI into the Zed code editor, allowing developers to generate, refactor, and review code with AI directly in their IDE while maintaining full control. https://developers.googleblog.com/en/gemini-cli-is-now-integrated-into-zed/
  • OpenAI + Anthropic ran cross-lab safety tests on each other’s public models. Claude 4 excelled at instruction hierarchy + prompt-extraction but was weaker on jailbreaks and often refused answers in hallucination tests; OpenAI o3/o4-mini resisted jailbreaks better, answered more, but hallucinated more; GPT-4o/4.1 were more jailbreak-prone yet sometimes best at person-hallucination accuracy. Scheming results were mixed across labs; reasoning sometimes helped, sometimes worsened. OpenAI says GPT-5 improved sycophancy, hallucinations, and misuse resistance; cross-lab testing surfaced useful gaps, showing value of ongoing joint safety evals. https://openai.com/index/openai-anthropic-safety-evaluation/
  • You will soon be able to branch conversations in ChatGPT allowing branching of a conversation to a new conversation after a response https://x.com/btibor91/status/1960623245956411548
  • OpenAI has open sourced their benchmark called HeathBench under MIT license on huggingaface today https://huggingface.co/datasets/openai/healthbench
  • PixVerse has released PixVerse V5 of their video gen model and it scores 2nd place on I2V and 3rd place on T2V on Artificial Analysis above Veo3 in both cases but slightly worse than SeeDance 1.0 but the upside is its significantly cheaper than Veo 3 and its even cheaper than SeeDance Which makes it an amazing price to performance ratio video model https://x.com/PixVerse_/status/1960730919993799024
  • OpenAI released big Codex updates: https://help.openai.com/en/articles/6825453-chatgpt-release-notes#h_dcaac4ec67
    • IDE Extension: The new extension brings codex into VS Code, Cursor, and other VS Code forks, so that you can seamlessly preview local changes and edit code
    • Sign in with ChatGPT: Available in both the IDE and CLI, eliminating API key setup and providing access directly through your existing ChatGPT plan
    • Seamless Local ↔ Cloud Handoff: Developers can pair with Codex locally and then delegate tasks to the cloud to execute asynchronously without losing state
    • Upgraded Codex CLI: Refreshed UI, new commands, and bug fixes
    • Code reviews in GitHub: Set up Codex to automatically review new PRs in a repo, or mention u/codex in PRs to get reviews and suggested fixes
  • Prime Intellect launched the Environments Hub, an open community platform for creating, sharing, and scaling RL environments to advance open-source AGI. The hub, along with their open-source RL infrastructure (prime-rl), aims to lower barriers to training and serving large agentic models by providing accessible compute, tools, and RFT. They also released SYNTHETIC-2, a planetary-scale dataset of four million verified reasoning traces, and introduced the Prime Collective Communications Library (PCCL) for decentralized global training. https://www.primeintellect.ai/blog/environments
  • Kimi released a new feature text to slides pretty self explanatory but cool for free of course https://x.com/crystalsssup/status/1960912750068273186
  • Tencent released HunyuanVideo-Foley which builds a TV2A stack that fixes data scarcity, modality imbalance, and mediocre audio by scaling a 100k-hour pipeline (8 s chunking, silence/SNR/bandwidth filters, AudioBox-aesthetics gating, ImageBind/AV-align checks, GenAU captions), then training a flow-matching hybrid with N1 dual-stream MMDiT blocks and N2 audio-only DiT blocks modulated by Synchformer sync features and interleaved RoPE for frame-level A/V coupling; text enters later via cross-attention to prevent text dominance. A REPA loss aligns mid-layer DiT states to ATST-Frame features through cosine similarity, stabilizing training and boosting fidelity; an enhanced DAC-VAE swaps RVQ for continuous 128-dim, 50 Hz latents at 48 kHz to improve reconstruction. Trained at scale (18 MMDiT + 36 DiT, d=1536, 12 heads, CFG 0.1), it lands SoTA on audio quality, visual-semantic alignment, and sync on Kling-Audio-Eval and MovieGen-Audio-Bench, with VGGSound distribution gaps likely due to its low-grade audio. Ablations show joint A/V self-attention followed by text cross-attention, interleaved RoPE, and shallow-layer REPA on the unimodal branch (ATST > EAT, EAT+ATST harmful) drive the gains. If reproducibility holds, this is a serious step toward fully automatic, pro-grade Foley for any video stream, compressing human post-production into a programmable primitive. https://huggingface.co/tencent/HunyuanVideo-Foley; paper; https://arxiv.org/abs/2508.16930: code: https://github.com/Tencent-Hunyuan/HunyuanVideo-Foley

let me know if I missed anything

r/accelerate 14d ago

News DeepSeek’s next AI model delayed by attempt to use Chinese chips | "DeepSeek was encouraged by authorities to adopt Huawei’s Ascend processor rather than use Nvidia...after R1"

Thumbnail archive.ph
24 Upvotes

r/accelerate 4d ago

News Daily AI Archive 8/25/2025

11 Upvotes
  • OpenAI launched the Learning Accelerator in India, partnering with IIT Madras, AICTE, and the Ministry of Education to expand AI research, training, and access—distributing ~500,000 ChatGPT licenses and AI training programs nationwide. The initiative includes study tools like ChatGPT’s study mode, an India-specific subscription Go tier, enhanced Indic language support, and leadership under Raghav Gupta to advance AI-enabled education across India and Asia Pacific. https://openai.com/global-affairs/learning-accelerator/
  • Video Overviews are now available in 80 languages globally, and they upgraded all Audio Overviews to be more comprehensive and in-depth. Non-English Audio Overviews will now mirror the rich, detailed experience of the English version https://blog.google/technology/google-labs/notebook-lm-audio-video-overviews-more-languages-longer-content/
  • InternVL3.5 was released today in like 50 billion sizes. It's an open multimodal family that meaningfully scales both reasoning and throughput via two concrete systems: Cascade RL and ViR+DvD. Cascade RL runs a coarse-to-fine post-training loop, first doing offline MPO (preference+DPO, quality+BCO, and LM generation terms) to warm-start then online GSPO without a reference model using normalized per-query advantages, yielding large reasoning gains versus InternVL3 with far less GPU time. ViR chooses per-patch visual token budgets (256 or 64) and is trained by ViCO: a consistency stage distills outputs from a frozen 256-token reference using KL at compression rates 1/4 and 1/16, then a router stage learns binary decisions from a loss ratio r_i≥τ to keep or compress, cutting visual tokens by roughly 50% with near-uniform accuracy retention. Decoupled vision-language deployment (DvD) splits ViT+MLP(+ViR) and the LM across servers, ships BF16 features over TCP or RDMA, and pipelines vision processing, transfer, and LM prefilling/decoding asynchronously, eliminating cross-blocking and pushing multimodal prefilling toward LM-only speeds. Training uses CPT→SFT→CascadeRL with NTP on response tokens, square-root averaging to de-bias length, JPEG perturbation, 32K context, and curated thinking data built by InternVL3-78B descriptions fed to DeepSeek-R1 with incorrect rollouts filtered; capability data add GUI and embodied skills. Test-time scaling exposes explicit deep thinking (a system prompt toggles stepwise reasoning with do_sample and temperature 0.6) and breadth via Best-of-N using a VisualPRM critic; authors report using TTS only for reasoning since perception already saturates. Models span 1B to 241B (Qwen3 and GPT-OSS LMs, InternViT encoders, dynamic high-res tiling) with practical deployment notes (30B fits on one A100, 38B needs two, 241B uses eight, vLLM recommended for 20B). Results claim open-source SoTA across general, reasoning, text, and agentic suites, with the 241B variant approaching top closed models. If these engineering choices replicate externally, InternVL3.5 materially lowers the cost of high-accuracy multimodal reasoning at scale while expanding agentic capability, accelerating open-source parity. You can get the 33 (!!!) models here: https://huggingface.co/collections/OpenGVLab/internvl35-68ac87bd52ebe953485927fb
  • Microsoft released VibeVoice which scales long-form, multi-speaker TTS by pairing an LM with a token-level diffusion head that predicts continuous acoustic VAE features per token, driven by hybrid context of role-tagged text and voice prompts, then decoded by a 7.5 Hz σ-VAE acoustic tokenizer that compresses 3200× while preserving fidelity and ≈2:1 speech-to-text token ratio, with a parallel ASR-trained semantic tokenizer aligning content to text. Training freezes both tokenizers and learns only the LM and diffusion head, uses Qwen2.5 at 1.5B and 7B, a sequence-length curriculum from 4,096 to 65,536, CFG 1.3, and DPM-Solver++ in 10 steps; inference streams segments up to 90 minutes within a 64K context with up to 4 speakers, capturing the conversational “vibe.” Subjective MOS shows SoTA preference, realism, and richness versus Eleven v3 alpha and Gemini 2.5 Pro preview TTS, with strong WER and speaker similarity, and the ultra-low-frame-rate tokenizer achieves leading PESQ and UTMOS despite extreme compression; short-utterance tests generalize well with fewer decoding steps. Caveats: compact human eval set, closed baselines lack prompt control, English and Chinese transcripts only, no overlapping speech, no non-speech audio, and deepfake risk; code, models, and demos are released for research. This architecture shows that next-token diffusion plus ultra-efficient speech tokens unlocks hour-scale, controllable conversational audio, accelerating the path to fully multimodal agents that speak, remember, and coordinate in real time. https://huggingface.co/microsoft/VibeVoice-1.5B; tokernizer: https://huggingface.co/microsoft/VibeVoice-Tokenizer

Let me know if I missed any news

r/accelerate 7d ago

News Daily AI Archive 8/22/2025

12 Upvotes
  • Kimi-k2-turbo-preview got another speed boost now at 60T/s https://x.com/Kimi_Moonshot/status/1958810602027327616
  • OpenAI announced plans for an OpenAI office in New Delhi opening later this year https://x.com/sama/status/1958922390731464805
  • GPT-4b micro, a scaled-down GPT-4o LM specialized for protein engineering, was trained on protein sequences plus biological text and tokenized 3D structure enriched with coevolutionary homologs, interaction groups, and descriptive context, enabling 64k-token controllable prompts and strong handling of intrinsically disordered Yamanaka targets. Prompted to generate diverse RetroSOX and RetroKLF sequences, it delivered high-hit, deeply edited designs (>100 aa average changes) that beat wild-type in human fibroblast screens, with >30% of SOX2 and ~50% of KLF4 suggestions outperforming baseline, far above conventional few-residue screens (<10% hits) and prior expert KLF4 attempts (1/19). Combining top variants produced >50× expression of reprogramming markers, earlier appearance of late markers (TRA-1-60, NANOG) by several days, and AP-positive colonies by day 10. Switching to mRNA delivery in mesenchymal stromal cells from three donors yielded >30% marker expression by day 7, colonies by day 12, and >85% endogenous activation of OCT4, NANOG, SOX2, and TRA-1-60, with derived iPSC lines showing trilineage differentiation, healthy karyotypes, and genomic stability surpassing contract-research baselines. γ-H2AX after doxorubicin stress dropped significantly for RetroSOX/KLF versus OSKM and GFP controls (p from 0.01 to 0.04), indicating stronger DNA damage repair and heightened rejuvenation potential. Development showed clean scaling laws on perplexity and downstream protein tasks, but the key is translation: promptable sequence generation, wide and deep edits, tight wet-lab screening, replication across donors, cell types, and delivery modes. This is a working blueprint for domain LMs coupled to fast experimentation that collapses protein search timelines and accelerates reprogramming and rejuvenation therapies. https://openai.com/index/accelerating-life-sciences-research-with-retro-biosciences/
  • Filter CBRN info at the source: score every pretraining document with a harmfulness classifier, drop those above a threshold, then pretrain from scratch on the filtered pool while size matching controls; measure deltas on WMDP and on usefulness spans (MMLU Natural Science, general MMLU, Prose, Code, Math) with a unified relative score so lower means safer and higher means preserved capability. Six detectors were tested: a small finetuned Constitutional LM, two prompted Constitutional variants with Claude 3.5 Sonnet, holdout loss via canary LMs (finetuned and prompted), FastText, and named entity string match. Prompted Sonnet tops synthetic F1 at 0.96 and the small finetuned model reaches 0.94, but cost constraints mandate scanning the corpus with the small finetuned model; hybrid reranking of its top 1 percent with the prompted classifier unexpectedly raised CBRN scores, likely distribution shift and prompting brittleness, so the best tradeoff uses the finetuned classifier alone, with named entity merging offering no consistent gain. Threshold sweeps show benign performance stable across Prose, Code, general MMLU and often Natural Science, Math is noisy; at threshold 0.939 accuracy on harmful evals drops 33 percent relative to random baseline overperformance (33.7±0.4 to 30.8±0.4, random 25) with no significant loss on the benign suite. https://alignment.anthropic.com/2025/pretraining-data-filtering/
  • AllenAI has released an open source version of Paper Finder https://x.com/allen_ai/status/1958560139159486692; code: https://github.com/allenai/asta-paper-finder
  • There is now projects only memory in ChatGPT which is important since GPT-5 is pretty sensitive to memories, so this is big for separating that and getting more performance https://help.openai.com/en/articles/6825453-ChatGPT-release-notes#h_fb3ac52750
  • Codex in ChatGPT has a new, currently hidden settings section enabling Codex to "Auto-review my pull requests" ("Allow Codex to run a code review on your initial PRs") which is pretty huge for autonomous coding https://x.com/btibor91/status/1959028131903545841
  • Google released Veo 3 on the FREE tier of Gemini (albeit only for this weekend which kinda sucks) https://x.com/GeminiApp/status/1959035394483503581
  • OpenAI increased Codex CLI Plus limits up 50% and also quote “More transparency coming next week as things settle.” I wonder what that means https://x.com/embirico/status/1959057942445269141
  • Meta partnered with MidJourney but considering how pathetically behind both Meta and Midjourney are seems kinda strange like a last ditch effort from both companies https://x.com/alexandr_wang/status/1958983843169673367
  • Jules now intelligently renders images within the diff viewer, providing an immediate visual context for your modifications. https://jules.google/docs/changelog/#render-images-in-the-diff-viewer
  • Sakana AI | Competition and Attraction Improve Model Fusion - M2N2 replaces fixed merge boundaries with evolutionary split points and SLERP mixing inside a live archive: pick parent A by an implicit fitness sharing objective that caps per-sample reward c/(z+ε), pick parent B by an attraction score g that prefers models that excel where A fails and where competition is low, then fuse by concatenating SLERP-interpolated parameter slices before and after a sampled split index; diversity emerges from resource competition tuned by α in f = ∑ s/(z^α+ε)·c, and coverage remains high while entropy rises then tapers as weak niches die; archive size trades early speed for final quality; warmup performs random merges; no gradients, low memory, cross-objective compatible. From scratch on MNIST, M2N2 outperforms other merge-based search and is more compute-efficient than CMA-ES; on LM fusion (WizardMath-7B + AgentEvol-7B), split-point and attraction materially beat GA, MAP-Elites, and CMA-ES, yielding stronger average across GSM8k and WebShop while maintaining coverage; for diffusion, it merges only U-Nets from JSDXL and SDXL-family seeds, keeps JSDXL tokenizer/text encoder, treats attention blocks as independently splittable chromosomes, trains with Normalized CLIP Similarity using per-sample worst subtraction to intensify competition, attains SoTA FID and NCS against seeds and CMA-ES, and preserves bilingual semantics with superior cross-lingual consistency without catastrophic forgetting. Limitation: mergeability collapses when seeds diverge too far, motivating compatibility metrics and attraction-aware co-evolution. This is how we accelerate model recombination at scale: gradient-free fusion that composes specialized skills into composite systems, turning the open model zoo into an ever-faster recombinatorial search engine. https://arxiv.org/abs/2508.16204; code: https://github.com/SakanaAI/natural_niches 

Let me know if I missed anything, especially any cool papers

r/accelerate 2d ago

News Zoltan X Yang collabo soon?

Thumbnail x.com
4 Upvotes

Let’s make it happen

r/accelerate 15d ago

News 🚨 Catch up with the AI industry, August 15, 2025

12 Upvotes