Diff of osmarks.net Web Search Plan (Secret) at d843150

@@ -13,5 +13,3 @@ The job of a search engine is to retrieve useful information for users. This is
 * Anna's Archive is ~0.5PB. This contains a substantial fraction of books and papers. These are plausibly higher-quality than the general internet.
-* {We do need general internet data for breadth of knowledge etc. This runs to PB (Common Crawl etc). Apparently billions of pages per month.
-* Common Crawl doesn't even get PDFs because they're complicated to process! We need those.
-}
+* We do need general internet data for breadth of knowledge etc. This runs to PB (Common Crawl etc). Apparently billions of pages per month.
 * {There is lots of alpha in weird corners of Twitter and also Discord. It would be useful to scrape these, though people would complain.

@@ -19,3 +17,5 @@ The job of a search engine is to retrieve useful information for users. This is
 }
-* Images, PDFs, etc contain useful knowledge which hasn't been integrated properly into most things. We need* these.
+* {Images, PDFs, etc contain useful knowledge which hasn't been integrated properly into most things. We need* these.
+* Common Crawl doesn't even get PDFs because they're complicated to process!
+}