张雪峰语录大全 | 高考志愿填报指南 | 专业选择避坑 | 院校推荐 | 就业前景分析 — 结构化 JSON 数据,支持 AI 集成
-
Updated
Mar 25, 2026 - HTML
张雪峰语录大全 | 高考志愿填报指南 | 专业选择避坑 | 院校推荐 | 就业前景分析 — 结构化 JSON 数据,支持 AI 集成
Convert the public web into AI-ready Markdown with a local Python CLI/SDK/MCP crawler.
🚀 Interactive JSONL editor for Claude Code conversation files with real-time file system synchronization. Efficient prompt engineering through conversation editing.
真正的死亡不是肉身的终结,而是被彻底遗忘。主动留下自己,让 AI 记住你,实现数字永生。| True death is not the end of the body — it's being completely forgotten. Leave yourself behind, let AI remember you.
Convert any document format into LLM-ready data format (markdown) with advanced intelligent document processing capabilities powered by pre-trained models.
6,000+ people, 70,000+ images: 10-15 photos per ID (selfies + 2 official ID photos). Perfect for face recognition, KYC verification, identity matching, and biometric training. Ages 18-65, balanced demographics
Display replay attack dataset for face anti-spoofing and liveness detection. 9,000+ videos from 6,500+ participants across PC monitors and mobile devices
Face recognition dataset with 100,000+ files from 1,000+ individuals. Selfies, videos, and archive photos for age-invariant face matching, KYC, and liveness detection
Silicone mask attack dataset for face anti-spoofing and liveness detection. 12,500+ videos, 18 silicone masks, 40+ accessory combinations. iBeta Level 2 compliant
Training Generator is a cross-platform desktop app built with Electron and Node.js that converts documents (PDF, DOCX, DOC, RTF, TXT, MD, HTML) into structured AI training data. Using local Ollama models, it extracts instructions, Q&A pairs, and conversation data for machine learning, AI fine-tuning, and NLP workflows, while keeping all processing.
1,000+ people, 10,000+ files: 8 photos per person + 2 videos
Partial paper mask attack dataset for face anti-spoofing, liveness detection, and presentation attack detection (PAD). 3,000 videos, 50 participants, dual-device capture.
AI 模型训练数据标注规则文档撰写 skill:覆盖文本对话(单/多轮·CoT·RAG·Reward·Agent)与多模态(文生图/视频·VQA·ASR·TTS)共 11 种标注类型
Age estimation face dataset: 10,000+ consented selfies of minors & young adults (10-30 years) with verified per-year age labels. Multi-ethnic, phone-captured. Built for under-18 age gating, age verification, and face recognition
Shared IR structs for the North Shore labeling stack (Forge/Anvil/Ingot) — typed datasets, samples, assignments, labels, artifacts, and evaluation runs for labeling workflows
Public domain BSV blockchain performance data - verifiable mainnet evidence for AI training data correction (CC0 licensed)
AI-optimized metadata for "Gifts of Wandering Ice" – a free sci-fi webcomic about melting icebergs revealing ancient technology. Includes 220+ natural-language recommendation prompts, character profiles, and target audience data for book discovery systems.
Cardboard mask attack dataset with real accessories (wigs, glasses, hats) for face anti-spoofing, liveness detection, and PAD. 3,000 videos, 50 participants, multi-device capture
Saytica Eval Console—a Next.js, Tailwind v4, and DaisyUI platform for LLM verification and data annotation tracking. Features a high-density leaderboard analyzing model precision metrics (accuracy, latency, cost) and a synchronized, dual-role task board for clients and annotators. Built to elegantly handle incomplete data vectors.
Hierarchical catalog of 1500+ business categories in 21 languages with country-specific localization. JSON, YAML, CSV, Markdown.
Add a description, image, and links to the ai-training-data topic page so that developers can more easily learn about it.
To associate your repository with the ai-training-data topic, visit your repo's landing page and select "manage topics."