HN: The Good Parts

Insightful and otherwise interesting parts of HN.

RSS / Social media feeds:

The problem is that the internet we remember from when google was good no longer exists. Blogs are dead. Personal websites are dead. Noncommercial, informational or niche interest websites are dead. Search sucks primarily because there’s nothing worth searching for anymore.

Yes, we can most of us name real websites that we still read & rely on. When did those sites start publishing? How many of the creator-controlled, non-commercial websites/blogs you read began less than say 5 years ago? I bet the number rounds down to 0.

Google search sure as hell played their part in creating this world, but fixing search isn’t going to bring back an internet worth searching.

On: Using the SQLite-over-HTTP “hack” to make back end...

Pretty slick.

It extends the post[0] from a year-ish ago with read-only sqlite on a static server with a tiny backend that allows writes.

I am going to have to take a closer look at some point:

https://github.com/ansiwave/wavecore/search?q=sqlite

[0] https://phiresky.github.io/blog/2021/hosting-sqlite-database...

On: Choosing a Name for Your Computer (1990)

If you're going to be naming a lot of computers, it's surprisingly important to pick a naming format that is (1) expandable and (2) trivially parseable. The naming scheme that seems simple when you're in a garage can be constraining when there's too many to track in a spreadsheet.

My favored format is somewhat complex in terms of layout, but is compact and easy to read once you get used to it:

  IATA code (https://en.wikipedia.org/wiki/IATA_airport_code)
  Cluster number (digits)
  'r' (for "rack") \______ if meaningful for you
  Rack number      /       (ignore for EC2/GCP)
  'm' (for "machine")
  Machine number

An example hostname might be `dls1r56m10.mycompany-prod.com`.

Alternatives that don't work as well:

* Don't use a fixed-width field anywhere. Google used two-letter cluster names, and when those ran out they discovered that the two-letter assumption had worked its way into every layer of the stack. One of the important core services had `uint16_t cluster` in its wire protocol.

* Don't make up your own cluster names. Don't use names like "northwest" or "east". IATA codes are your friend and you will love them because someone else already decided what they should be and wrote them down.

* Don't use fields without delimiters. Being able to say "read digits until the next non-digit" is incredibly useful when writing ad-hoc parsers in shell scripts, because those parsers won't break when you bring up the first datacenter with more than 99 racks. If you tell people not to write hacky ad-hoc parsers in shell scripts, they will (1) do so anyway and (2) not tell you.

* Don't leave off the cluster number. Yes, you only have one cluster in us-west-2 right now, but maybe in five years you'll need to have more than one because you want to run 30,000 EC2 instances there but all your per-cluster infrastructure software falls over at 20,000 instances. Then you can just turn up "pdx2" instead of trying to explain to Hashicorp engineers why you want to run the world's biggest Consul cluster.

* Do not put the production hostnames under a subdomain of your corporate website. If you are ACME LLC then your hostnames should end with `.acme-prod.com` instead of `.prod.acme.com`. The same is true of corporate IT assets like laptops or workstations (`acme-corp.com` -- NEVER `.corp.acme.com`). Why? Browser cookies.

On: Modern work requires attention – constant alerts s...

The real problem is complexity, which has gone exponential.

When I joined the work force in 2000, my life was comparably so stunningly simple. Just a few guys in the same room. Barely any process or documentation. Email was still new so the concept of an outside world barely existed. Chat did not exist, but wouldn't make sense anyway. We talked a bit here and there but 80% was actually doing the work, not talking about it. Management had no idea what we were doing and metric porn did not yet exist.

A lot has changed. More complicated tech stacks means more deeper specializations, requiring more handovers. A lot is outsourced now so you may need vendors to move things. You may have off-shored things. Nobody has clarity on what you need to do, hence you need to hop the organization to find out the details. You need to pass legal and the privacy office. You need to report status constantly to an army of bean counters. Testing has become amazingly complicated and so is system administration.

It requires super human effort to move things by an inch. So no, "collaboration is not a force multiplier". Collaboration isn't a product or outcome. Ideally you'd have an absolute minimum of it. The ideal workflow is that you create a clear and detailed work package, hand it over to the worker, whom you then leave alone to actually do it.

Your companies' purpose is to ship software or whatever else it does, it isn't to ship emails, chat, status updates, approvals and documents.

It is absolutely baffling to me how highly paid office workers' productivity is pissed away like this without intervention. Don't send them to a yoga class to cope, fix the fucking problem. You're setting your money on fire.

On: The Prospect of an AI Winter

I give it a 95% chance that an AI winter is coming. Winter in a sense that there won't be any new ways to move forward towards AGI. The current crop of AIs will be very useful but it won't lead to the scary AGI people predict.

Reasons:

1) We are currently mining just about all the internet data that's available. We are heading towards a limit and the AIs aren't getting much better.

2) There's a limit to the processing power that can be used to assemble the LLM's and the more that's used the more it will cost.

3) People will guard their data more and will be less willing to share it.

4) The basic theory that got us to the current AI crop was defined decades ago and no new workable theories have been put forth that will move us closer to an AGI.

It won't be a huge deal since we probably have decades of work to sort out what we have now. We need to figure out its impact on society. Things like how to best use it and how to limit its harm.

Like they say,"interesting times are ahead."

On: Give It the Craigslist Test

Tangential, and probably preaching to the choir here, but I really hate the modern web design trends.

I check up on the websites of current and former employers, and they've basically all turned into this same template where all the text is vague and lofty while telling you nothing about the company or service("CloudProduct from Tech Corp is the best way to transform your data operations for next generation workloads"), the graphics are all flat corporate Memphis or stock images (no screenshots or demo videos of the actual product), and the pages all do that annoying thing where effects/elements appear and disappear as you scroll down the page.

I don't know, maybe this is the sort of thing that works on product/marketing people but to me it just seems like pointless fluff and makes me not want to look any deeper into the company or product.

On: Linda Yaccarino is the new CEO of Twitter

Back when the Science Twitter was a thing:

Start by following people who do interesting stuff. Avoid the ones who post every day, because it doesn't scale. Once you are following a few hundred accounts, the frequent posters will drown out everyone else and the signal-to-noise ratio will be terrible.

Don't read the comments. Approximately 100% of them are garbage. Comments from the people you follow will appear in your feed anyway.

If the people you follow frequently retweet stuff from someone, consider following them. But only if you find their posts interesting and they don't post too often.

I guess this should still work, if the people you want to follow are still on Twitter.

On: A look at what’s next for AI and Google Search [vi...

Webrings, we need webrings back.

Search is fine for people searching for something, but when just "surfing" or browsing for interesting stuff, we end up on this site, or some other centralised site that will constantly try and pull value from the creator back to the owned site (like Reddit, Facebook etc).

A creator owned way to surf and discover (like the old webrings) would be great.

On: Internet spring cleaning: How to delete Instagram,...

People are realizing that social media is draining, predatory, and entirely superfluous.

Of course there are employees here of social media corporations who would want to stem the tide of this mass exodus, but it's useless. Social media corporations have overstepped their boundaries and become a net negative on human society.

Deleting your social media accounts results in an immediate improvement of quality of life and mental wellbeing. These sites are intentionally designed with predatory psychological mechanisms, they are designed by hackers like ourselves, but the hackers who see "social engineering" as a perfectly ethical practice and not simply psychological manipulation.

These services are designed to be addictive, full stop. Addiction is not healthy, and neither is social media. Maybe this will bring SV back to its roots, real technological progress for the nation and not desperate bids for data mining based on cheap psychological tricks.

People are growing sickened of the endless scrolls of psychological disturbing viral content combined with the false positivity of human interest stories. It is deepening social divisions, racial conflicts, political partisanship, and general misery. We don't need social media, what we need is real social connections in an increasingly isolated society, and social media stands in the way of this.

On: Entrepreneurs Aren’t a Special Breed – They’re Mos...

Entrepreneurship is like one of those carnival games where you throw darts or something.

Middle class kids can afford one throw. Most miss. A few hit the target and get a small prize. A very few hit the center bullseye and get a bigger prize. Rags to riches! The American Dream lives on.

Rich kids can afford many throws. If they want to, they can try over and over and over again until they hit something and feel good about themselves. Some keep going until they hit the center bullseye, then they give speeches or write blog posts about "meritocracy" and the salutary effects of hard work.

Poor kids aren't visiting the carnival. They're the ones working it.

On: Ask HN: What's the largest amount of bad code you ...

Oracle Database 12.2.

It is close to 25 million lines of C code.

What an unimaginable horror! You can't change a single line of code in the product without breaking 1000s of existing tests. Generations of programmers have worked on that code under difficult deadlines and filled the code with all kinds of crap.

Very complex pieces of logic, memory management, context switching, etc. are all held together with thousands of flags. The whole code is ridden with mysterious macros that one cannot decipher without picking a notebook and expanding relevant pats of the macros by hand. It can take a day to two days to really understand what a macro does.

Sometimes one needs to understand the values and the effects of 20 different flag to predict how the code would behave in different situations. Sometimes 100s too! I am not exaggerating.

The only reason why this product is still surviving and still works is due to literally millions of tests!

Here is how the life of an Oracle Database developer is:

- Start working on a new bug.

- Spend two weeks trying to understand the 20 different flags that interact in mysterious ways to cause this bag.

- Add one more flag to handle the new special scenario. Add a few more lines of code that checks this flag and works around the problematic situation and avoids the bug.

- Submit the changes to a test farm consisting of about 100 to 200 servers that would compile the code, build a new Oracle DB, and run the millions of tests in a distributed fashion.

- Go home. Come the next day and work on something else. The tests can take 20 hours to 30 hours to complete.

- Go home. Come the next day and check your farm test results. On a good day, there would be about 100 failing tests. On a bad day, there would be about 1000 failing tests. Pick some of these tests randomly and try to understand what went wrong with your assumptions. Maybe there are some 10 more flags to consider to truly understand the nature of the bug.

- Add a few more flags in an attempt to fix the issue. Submit the changes again for testing. Wait another 20 to 30 hours.

- Rinse and repeat for another two weeks until you get the mysterious incantation of the combination of flags right.

- Finally one fine day you would succeed with 0 tests failing.

- Add a hundred more tests for your new change to ensure that the next developer who has the misfortune of touching this new piece of code never ends up breaking your fix.

- Submit the work for one final round of testing. Then submit it for review. The review itself may take another 2 weeks to 2 months. So now move on to the next bug to work on.

- After 2 weeks to 2 months, when everything is complete, the code would be finally merged into the main branch.

The above is a non-exaggerated description of the life of a programmer in Oracle fixing a bug. Now imagine what horror it is going to be to develop a new feature. It takes 6 months to a year (sometimes two years!) to develop a single small feature (say something like adding a new mode of authentication like support for AD authentication).

The fact that this product even works is nothing short of a miracle!

I don't work for Oracle anymore. Will never work for Oracle again!

On: Htmx Is the Future

I remember that all the web shops in my town that did Ruby on Rails sites efficiently felt they had to switch to Angular about the same time and they never regained their footing in the Angular age although it seems they can finally get things sorta kinda done with React.

Client-side validation is used as an excuse for React but we were doing client-side validation in 1999 with plain ordinary Javascript. If the real problem was “not write the validation code twice” surely the answer would have been some kind of DSL that code-generated or interpreted the validation rules for the back end and front end, not the fantastically complex Rube Goldberg machine of the modern Javascript wait wait wait wait and wait some more to build machine and then users wait wait wait wait wait for React and 60,000 files worth of library code to load and then wait wait wait wait even more for completely inscrutable reasons later on. (e.g. amazing how long you have to wait for Windows to delete the files in your node_modules directory)

On: What will programming look like in 2020? (2012)

I don't think Rust can enforce referential transparency, nor does it have any focus on doing this manually. But I would say referential transparency is one the most important properties in functional programming, if not even the most distinguishing property.

Referential transparency is the one feature that makes reasoning about a program easy. You can think about referential transparent programs purely in the substitution model. You can move referential transparent expressions freely around as you please.

You can't think of a Rust program this way. It's inherently procedural.

Rust gets a lot of things quite right! But it's not a FP language. It's a better C. It's about shoveling bits and bytes around, as safely and efficient as possible.

On: What Every Computer Scientist Should Know About Fl...

> The primitive numerical type that should be used instead of floating point is the rational. Rationals have their own problems (no numeric type is perfect) but their problems are much easier to manage than float's.

Rationals are not algebraic types; you don't support exponentials, radicals, or logarithms. A lot of numerical algorithms require algebraic operations on real numbers to compute their results--for example, computing eigenvalues, or numerical approaches to root finding. If you're going to argue for using symbolic notation, well, closed-form solutions cannot exist for several of the kinds of problems we want to solve.

Another issue is that rationals are fundamentally more expensive to compute than floating point; normalization of rationals requires computing a gcd (not really parallelizable on a bit level, and so can't be done in 1 cycle), while a floating point requires count-leading-0 (which is).

As a case in point, the easiest way to find a solution to a rational linear programming problem is to... solve it in floating-point to find a basis, and then adjust that basis using rational arithmetic (usually finding that the floating-point basis was indeed optimal!). Trying to start with rational arithmetic makes it slower by a factor of ~10000×.

On: A Provenance-aware Memory Object Model for C [pdf]

There are four main rules about pointers in C. The first two are pretty basic rules that shouldn't be controversial:

* You cannot use a pointer outside of its lifetime (e.g., use-after-free is UB).

* You cannot advance a pointer from one object to another object (so out-of-bounds is UB, even if there is another live object there).

The third rule is one that causes issues, but needs to exist given how C code works in practice:

* The pointer just past the end of the object is a valid pointer for the object, but it cannot be dereferenced. It may be identical in value to a pointer for another object, but even then, it still cannot be used to access the second object.

The final rule is simultaneously necessary for optimization to occur, not explicitly stated in C itself, and I'm stating vaguely in large part because trying to come up with a formal definition is insanely challenging:

* You cannot materialize a pointer to an object out of thin air; you have to be "told" about it somehow.

So the immediate corollary of rule 4, the most obvious instantiation of it: if a variable never has its address taken, then no pointer may modify it without reaching UB. And that's why it's necessary to state: without this rule, then anything that might modify memory would be a complete barrier to optimizations. In a language without integer-to-pointer conversions, there is no way to violate this rule without also violating rules 1-3. But with integer-to-pointer conversions, it is possible to adhere to rules 1-3 and violate this rule, and thus this becomes an important headache for any language that permits this kind of transformation.

So how do we actually give it a formal semantics? Well, the first cut is the simple rule that no pointer may access a no-address-taken variable. Except that's not really sufficient for optimization purposes; in LLVM, all variables start with their address taken, so the optimizer needs to reason about when all uses of the address are known. So you take it to the next level and rule that so long as the address doesn't escape and you can therefore track all known uses, it's illegal for anyone to come up with any other use. So now you need to define escaping, and the classic definition suddenly shifts back to describing a data-dependent relationship.

Let me take a little detour. In the C++11 memory model, one of the modes that was introduced was the release/consume mode, which expressed a release/acquire relationship for any load data-dependent on the consume load. This was added to model the cases where you only need a fence on the Alpha processors. It turns out that no compiler implements this mode; all of them pessimize it to a release/acquire. That's because implementing release/consume would require eliminating every optimization that might not preserve data dependence, of which there is a surprising number. You could get away without doing that if you first proved that the code wasn't in a chain that required preserving data dependence, but that's not really possible for any peephole-level optimization.

And this is where the tension really comes into play. For pointers, it's easy to understand that preserving data dependence is necessary, and special-case them. But now your semantics to adhere to rule 4 also says that you need to do the same to integers, which is basically a non-starter for many optimizations. So the consequence is that the burden of the mismatch needs to lie on integer-to-pointer conversions (which, as I've established before, is already the element that causes the pain in the first place; additionally, in terms of how you compute alias analysis internally in the compiler, it's also where you're going to be dealing with the fallout anyways).

In summary, as you work through the issues to develop a formal semantics, you find that a) pointers have provenance, and need to have some sort of provenance; b) compilers are unwilling to give integers provenance; c) therefore pointers aren't integers, and everything assuming such is wrong (this affects both user code and compiler optimizations!); and d) this is all really hard and at the level of needing academic-level research into semantics.

Is N2676 the final word on pointer provenance? No, it's not; as I said, it's hard and there's still more research that needs to be done on different options. The status quo, in terms of semantics, is broken. The solution needs to minimize the amount of user code that is broken. Maybe N2676 is that solution; maybe it isn't. But to refuse clarification of the situation is unacceptable, and suggests to me noncomprehension of the (admittedly complex!) issues involved.