2017 week 04 in programming

The Line of Death

In web browsers, the browser itself usually fully controls the top of the window, while pixels under the top are under control of the site. If a user trusts pixels above the line of death, the thinking goes, they’ll be safe, but if they can be convinced to trust the pixels below the line, they’re gonna die. The bigger problem is that some attacker data is allowed above the LoD; while trusting the content below the LoD will kill your security, there are also areas of death above the line. “It’s a picture of an IE7 browser running on Windows Vista in the transparent Aero Glass theme with a page containing a JPEG of an IE7 browser running on Windows XP in the Luna aka Fisher Price theme?” I pointed out. Personally, my favorite approach was Tyler Close’s idea that the browser should use PetNames for site identity- think of them as a Gravatar icon for salted certificate hashes- not only would they make every HTTPS site’s identity look unique to each user, but this could also be used as a means of detecting fraudulent or misissued certificates. One enterprising security tester in Windows made a visually-perfect spoofing site of Paypal, where even the user gestures that displayed the ephemeral browser UI were intercepted and fake indicators were shown. Historically, some operating systems have attempted to mitigate the problem by introducing a secure user gesture that always shows trusted UI, but such measures tend to confuse users and often get “Optimized away” when the UX team’s designers get ahold of the product.

I analyzed ~2TB of code to build an index of the most common words in programming languages


Naughty Strings: A list of strings likely to cause issues as user-input data

The Big List of Naughty Strings is an evolving list of strings which have a high probability of causing issues when used as user-input data. The Big List of Naughty Strings is intended to help reveal such issues. Txt consists of newline-delimited strings and comments which are preceded with #. The comments divide the strings into sections for easy manual reading and copy/pasting into input forms. Please do not send pull requests with very-long strings, as that makes the list much more difficult to view. The Big List of Naughty Strings is intended to be used for software you own and manage. Some of the Naughty Strings can indicate security vulnerabilities, and as a result using such strings with third-party software may be a crime. The Big List of Naughty Strings is not a fully-comprehensive substitute for formal security/penetration testing for your service.

Django 2.0 will not support Python 2


Code That Doesn’t Exist Is The Code You Don’t Need To Debug

The less code we have the less complexity we need to handle and the fewer concerns we need to take care of. The important aspect is the response contains more data than what the code actually uses. There’s no unused response in the testThe dataset won’t be huge, so it’s easier to reason about itIf the code starts requiring more data from the response because of another test on the same function, the test will fail and we can start adding the rest of the response on an ad-hoc basisIf we always change code by changing the tests first, when the code starts requiring fewer data, we will always be removing it from the tests first in order to keep the minimum amount of code necessary to test itTDD forces us to write the minimum amount of code that satisfies a use case. In the last example, if we had used TDD it would have forced us to write the minimum amount of code in the dataset until we needed more data. The act of deleting code helps drive the system to a state where there’s only code that is necessary, a state where there’s less Software Entropy. Removing useless code should be rewardableBesides complexity, useless code can also represent part of a functionality that doesn’t provide any value. What we need is to reinforce the culture of removing code, for that the code that doesn’t exist is the code you don’t need to worry about.

How Discord Stores Billions of Messages

Voice chat heavy Discord servers send almost no messages. Private text chat heavy Discord servers send a decent number of messages, easily reaching between 100 thousand to 1 million messages a year. Large public Discord servers send a lot of messages. We looked at the largest channels on Discord and determined if we stored about 10 days of messages within a bucket that we could comfortably stay under 100MB. Buckets had to be derivable from the message id or a timestamp. We believe this will continue to work for a long time but as Discord continues to grow there is a distant future where we are storing billions of messages per day. We went from over 100 million total messages to more than 120 million messages a day, with performance and stability staying consistent. In a follow-up to this post we will explore how we make billions of messages searchable.

Caching at Reddit

At Reddit, we cache a wide variety of things, including database objects, query results, and memoization of function calls. The cache key is generated based on the contents of the context provided to the template-if that changes, a cache miss happens and the resulting template is re-cached. Our memoize cache provides an easy way for developers to cache the results of function calls. For the cached /r/all results we talked about earlier sets are replicated to all available caches, and reads happen from random caches with failover going to a different random cache. Cache requests for the trophy list for a user would first go to the new caches, a miss would occur, and the data would be retrieved from the old caches and added to the new ones. The subsequent update of the new trophies goes to the new cache, but the old cache is left un-updated. Though Reddit still has a relatively small engineering team, we do manage to make a lot of changes, and cache patterns can certainly change over the course of months.

Reading Uber’s Internal Emails [Uber Bug Bounty report worth $10,000]

This is a small proof of concept regarding “Reflective Cross-Site Scripting [ R-XSS ]” which I had found on Ebay. I am not an active participant in bug bounty programs, but one day I had finished all my office works so I was surfing on Facebook and received a message from my brother, Samir, asking for advice regarding some musical instruments. Once on eBay, I logged into the site to view details, and suddenly noticed “Help & Contact” menu, I followed that menu and went to “Customer Service” page where I saw a search field, I decided to check for “Cross-Site Scripting [ XSS ]” vulnerability and unexpectedly found POST type R-XSS. Testing For XSS. “As all security researchers do, I also have certain pathways to find vulnerabilities. I always use ‘>Test12345<” as it contains number, letter and syntax. This allows me to see how a website handles user inputs. Some questions like “Is the user input sanitized? how sensitive is user input?” can be answered from this idea.

How do I declare a function pointer in C?


Pixie - A small, fast, native lisp with “magical” powers

Pixie is a lightweight lisp suitable for both general use as well as shell scripting. It is written in RPython and as such supports a fairly fast GC and an amazingly fast tracing JIT. Pixie implements its own virtual machine. If you like Clojure, but are unhappy with the start-up time, or if you want something outside of the JVM ecosystem, then Pixie may be for you. Make build with jit will compile Pixie using the PyPy toolchain. The guts are written in RPython, just like the guts of most lisp interpreters are written in C. At runtime the only thing that is interpreted is the Pixie bytecode, that is until the JIT kicks in…. What’s this bit about “Magical powers"First of all, the word “Magic” is in quotes as it’s partly a play on words, pixies are small, light and often considered to have magical powers. The performance penalty of such a polymorphic call is completely removed by the RPython generated JIT. Influencing the JIT from user code. The origins of Pixie “I’ve always had it in the back of my head that a lisp on RPython would be a good project.”

Announcing Rust Language Server Alpha Release

The RLS has now reached a level of maturity where it should be able to run against most Cargo-based Rust projects. Rather than leaving each editor plugin to have to parse and understand the types in your program and provide you with capabilities like refactoring, the RLS centralizes all this logic and provides it to the editor via a standard language server protocol. The alpha release of the RLS has been run successfully on Linux, Mac, and Windows. Step 3: Set the RLS ROOT environment variable to point to where you checked out the RLS:. export RLS ROOT=/Source/rls. The current version of the RLS is built from a combination of two tools: racer and the Rust compiler. In the future, as we improve the rls integration with the main Rust tools, this will no longer be an issue. Contribute an editor plugin - there are already at least six plugins that follow the Language Server Protocol that the RLS uses, each in its own state of completion.

Trying to build a Mac OS 8 application on macOS Sierra (featuring Think C, BeOS, CodeWarrior, PowerPlant and other Mac programming nostalgia from the 1990s).

Motorola completely fumbled the Mac OS X transition and Mac programmers largely moved to Apple’s Project Builder. PowerPlant requires the Mac OS X 10.6 SDK or earlier to build - it won’t build against the 10.7 SDK and certainly won’t against the 10.12 SDK. The last versions of Xcode to include the 10.6 SDK were the Xcode 4.3 series for Mac OS X 10.7 Lion. Unlike PowerPlant - which had been updated for Mac OS X - I had not tried to run or compile the Mines application since Mac OS 8.5.1 back in 1999. I’ve never ported any apps from classic Mac OS to Mac OS X so I don’t know how smooth it will be. Of course, none of this has ever applied to Mac OS X. All the HLock, MoveHHi, MoreMasterPointers and other related Handle functions have always been no-ops on Mac OS X and Handles are now non-relocatable, relying on paging and better heap allocators to mediate fragmentation. Remember that you still need to obtain a copy of the Mac OS X 10.6 SDK and modify the Mac OS X platform Info.plist file or you’ll get errors about a missing SDK when you try to build. Updating my own Mines application code for PowerPlant and Mac OS changes required less than 20 minutes.

RethinkDB: why we failed

In the HN discussion thread people proposed many reasons for why RethinkDB failed, from inexplicable perversity of human nature and clever machinations of MongoDB’s marketing people, to failure to build an experienced go-to-market team, to lack of numeric type support beyond 64-bit float. New companies aren’t getting built on top of Oracle, so there is a window of opportunity to build a new infrastructure company. If we build a product that captures some of that market, we’ll end up building a very successful company. We set out to build a good database system, but users wanted a good way to do X. It’s not that we didn’t try to ship quickly, make RethinkDB fast, and build the ecosystem around it to make doing useful work easy. We had no intuition for products or markets, so we’d go through the motions of building a company without actually understanding what we were doing. Engineers love building developer tools, so they badly want developer tools companies to thrive. If you do set out to build a developer tools company, tread carefully.

How to Make a Neural Network - Intro to Deep Learning

How do we learn? In this video, I’ll discuss our brain’s biological neural network, then we’ll talk about how an artificial neural network works. We’ll create our own single layer feedforward network in Python, demo it, and analyze the implications of our results. This is the 2nd weekly video in my intro to deep learning series. Github.io/2015/07/27…. The guy at the beginning is my Jeet Kune Do instructor. Special thanks Catherine Olsson of OpenAI for being the hook to my backpropagation rap.

Toolkits for the Mind: How Tech Companies Are Shaped By Programming Languages

Software developers as a species tend to be convinced that programming languages have a grip on the mind strong enough to change the way you approach problems-even to change which problems you think to solve. If you want to know why Facebook looks and works the way it does and what kinds of things it can do for and to us next, you need to know something about PHP, the programming language Mark Zuckerberg built it with. Among programmers, PHP is perhaps the least respected of all programming languages. The programming language PHP ­created and sustains Facebook’s move-fast, hacker-oriented corporate culture. Programs written with a type system tend to be far more reliable than those written without one-useful when a program might trade $30 billion on a big day. The language’s rigor is like catnip to some people giving Jane Street an unusual advantage in the tight hiring market for programmers. Programming-language designer Guido van Rossum, who spent seven years at Google and now works at Dropbox, says that once a software company gets to be a certain size, the only way to stave off chaos is to use a language that requires more from the programmer up front.

I made a crash course on web security, for people who have never dealt with it before

Securing a web application is hard, but it’s also extremely important. There is so much to learn, and the learning curve is so steep, that newcomers to web development are often overwhelmed when they look at all that goes into making a simple login authenticated website. I made this crash course because I too faced a lot of trouble in learning and implementing most of the security features that come with standard web applications. This was because most of the resources on these topics are scattered and explained in a way not suitable for people just getting started. This course is aimed at newcomers who want to get up to speed with some of the most basic and important concepts like password management, session cookies, and some of the most common types of attacks. All posts are kept short, contain examples, and can be completed in a few days. These topics are compiled based on my experience in web development.

Ranges: the STL to the Next Level

Ranges provide a different approach to the STL that solves these two issues in a very elegant manner. Ranges were already used in some way by code using the STL before the Range concept was defined, but clumsily. Most of the time, what you are really trying to represent is a range, which corresponds better to the level of abstraction of your code. Such algorithms reuse the STL versions in their implementation, by forwarding the begin and the end of the range to the native STL versions. A range adaptor is an object that can be combined with a range in order to produce a new range. An important thing to note is that the ranges resulting from associations with range adaptors, although they are merely view over the ranges they adapt and don’t actually store elements, answer to the range interface so they are themselves ranges. Ranges raise the level of abstraction of code using the STL, therefore clearing up code using the STL from superfluous iterators.

The lost art of 3D rendering without shaders

The model is made up of triangles because triangles are easy to draw. A vertex describes an position in 3D space, but also the color of the triangle at that vertex and a normal vector for lighting calculations. To implement the 3D model for the cube we just need to provide a list of these triangles. The cube not only has vertices and triangles that determine its shape, but it also has a position in the 3D world, a scale, and an orientation that is given by three rotation angles. Func render() // 3: Take the cube, place it in the 3D world, adjust the viewpoint for // the camera, and project everything to two-dimensional triangles. In math terms, we need to project our triangles from 3D to 2D somehow. The depth buffer makes sure that a triangle that is further away does not obscure a triangle that is closer to the camera.

How a robot got Super Mario 64 and Portal “running” on an SNES

Can you really, playably emulate games like Super Mario 64 and Portal on a stock standard SNES only by hacking in through the controller ports? The answer is still no, but for a brief moment at this week’s Awesome Games Done Quick speedrunning marathon, it certainly looked like the impossible finally became possible. TASBot moved on to a few “Total control runs,” exploiting known glitches in Super Mario Bros. The method was taken to ridiculous extremes last year, when TASbot managed to “Beat” Super Mario Bros. After a few minutes of setup, the Zelda screen faded out, then faded back in on a bordered window with an ersatz logo for the “Super N64.” Without any forthcoming explanation from the runners on stage, TASBot started apparently playing through a glitch-filled speedrun of Super Mario 64 on the Super NES, following it up with a similar glitch-filled speedrun through Valve’s PC classic Portal. Streaming audio to the NES. To unwind how the TASBot team “Played” relatively modern games through 25-year-old SNES hardware, we need to go back to China’s Geekpwn hacking conference a few months ago. After taking total control of the SNES through a known Link to the Past glitch, TASBot essentially turns the SNES into a dumb pipe for video data. While the AGDQ SM64 and Portal TASBot runs were really just pre-recorded video and audio streaming through the SNES, Cecil said it would be theoretically possible to let a speedrunner actually play those games live on the SNES. Using a fixed, “Good enough” palette of 256 colors and no pre-processing, TASBot would be able to pipe through 10 fps gameplay with about 100 ms of latency, he said.

Webpack 2 is finally here


The first few chapters of my free book on implementing programming languages are up now

This book contains everything you need to implement a full-featured, efficient scripting language. You’ll learn both high-level concepts around parsing and semantics and gritty details like bytecode representation and garbage collection. Starting from main(), you’ll build a language that features rich syntax, dynamic typing, garbage collection, lexical scope, first-class functions, closures, classes, and inheritance. I got bitten by the language bug eight years ago when I was on paternity leave with a lot of free time between middle of the night feedings. I’ve cobbled together a number of languages of various ilk before worming my way into an honest-to-God full-time programming language job. Before I fell in love with languages, I was a game developer for eight years at Electronic Arts. If you like the book, you’ll probably like it too.

Everything you need to know about HTTP security headers

This article explains what secure headers are and how to implement these headers in Rails, Django, //blog. Js Use helmet Go Use unrolled/secure Nginx add header X-XSS-Protection “1; mode=block”; Apache Header always set X-XSS-Protection “1; mode=block” I want to know more. Content Security Policy can be thought of as much more advanced version of the X-XSS-Protection header above. If you’re an intrepid reader and went ahead and checked the headers appcanary.com returns2, you’ll see that we don’t have CSP implemented yet. The HSTS header solves the meta-problem: how do you know if the person you’re talking to actually supports encryption? Js Use helmet Go Use unrolled/secure Nginx add header X-Frame-Options “deny”; Apache Header always set X-Frame-Options “deny” I want to know more X-Content-Type-Options X-Content-Type-Options: nosniff; Why? The X-Content-Type-Options headers exist to tell the browser to shut up and set the damn content type to what I tell you, thank you.

And that, kids, is why we call it a “Patch”


The Practice of Programming: 18 Years Later

Over the new year holiday time I had a chance to get away from it all, and snuck up to Finland to sit in a lodge on the Gulf of Finland, sip coffee, take saunas and read. I brought along a few books, the only programming one being Brian W. Kernighan and Rob Pike’s “The Practice of Programming.” You want to provide the owner with as good a test case as you can manage. As a fan of testing, this chapter stood out; not just for it’s methodical evaluation of how, when and why to write tests, but also it’s use of data validation and test automation. It’s easy to delude yourself about how carefully you are testing, so try to ignore the code and think of the hard cases, not the easy ones. To quote Don Knuth describing how he creates tests for the TEX formatter, “I get into the meanest, nastiest frame of mind that I can manage, and I write the nastiest [testing] code I can think of; then I turn around and embed that in even nastier constructions that are almost obscene.” How many times have I written the obvious test instead of devoting a day or a few hours figuring out how to break my own code? 2. Code to roll your own RegEx parser in C. Telnetting from machine to machine to copy files and using checksum to test if the copy was properly performed.

The Problem With AMP

Google’s Accelerated Mobile Pages or AMP is a markup language similar to HTML that allows publishers to write mobile optimized content that loads “Instantly”. The goal of the AMP project is to make websites faster by essentially disabling external JavaScript and caching all pages on Google’s CDN. While the intentions of the project seem good, there are a number of issues with AMP that both promote lock-in and provide a poor user experience. These bugs are simply unacceptable for a standard that has been in use and promoted by Google for over a year, and there have been numerous complaints from users about AMP. The largest complaint by far is that the URLs for AMP links differ from the canonical URLs for the same content, making sharing difficult. Clicking on an AMP link feels like you never even leave the search page, and links to AMP content are displayed prominently in Google’s news carousel. Google has the ability to further change the AMP HTML specification to keep publishers in their ecosystem. Despite touting AMP HTML as an open standard, every one of the AMP Project’s core developers appears to be a Google employee. The AMP HTML Specification states that all AMP HTML pages must load a JavaScript file from https://cdn.

comments powered by Disqus