A small fix to syntax highlighting of PHP comments in emacs

If you're like me, you use emacs to edit a lot of PHP, and if you're like me, you sometimes run across code like this:


/* This is a comment
$foo = 'this is some commented out code'; # Bla bla bla
*/

That's not so bad, really, but the php-mode that I had been using got confused and ended the comment at the newline after the hash mark. Argh. It took a little searching and a bit of trial and error, but eventually, I found some documentation of syntax tables within emacs. From what I can tell, emacs has a little lexer-like routine that goes character by character during syntax highlighting. The built-in functionality seems like it might have some shortcuts over what might be hidden inside your favorite compiler suite, but whatever.

Let's start with what I added inside of php-mode.el. I added this after the code that this comment ;; Specify that cc-mode recognize Javadoc comment style describes, but I have a feeling it could live just about anywhere within that code block.


;; This works for me in GNU Emacs 21.4.1 and 22.2.1 on CentOS and Ubuntu, respectively.

(progn
(modify-syntax-entry ?/ ". 124b" php-mode-syntax-table)
(modify-syntax-entry ?* ". 23" php-mode-syntax-table)
(modify-syntax-entry ?# "< b" php-mode-syntax-table)
(modify-syntax-entry ?\n "> b" php-mode-syntax-table)
)

Okay, what does that mean?

(modify-syntax-entry ?/ ". 124b" php-mode-syntax-table)

modify-syntax-entry is a command that takes two or three arguments, as follows:

  1. ?/: This is the character whose lexing you are modifying

  2. ". 124b": This is how you are defining that character.

    • . The dot means that the character, a slash, can be used as punctuation

    • " " The single space means that the character can be used as whitespace (I think). Things did not work unless I had the space in there. Your mileage may vary, and I would love an explanation.

    • 1 The slash character can be the first character in a pair of opening comment characters: // or /*

    • 2 It can also be the second character in a pair of opening comment characters: //

    • 4 It can be the second character in a pair of closing comment characters: */

    • b This signifies the when the slash is used as the second character in a comment character pair, it is part of the "b" class of comments, so it must be closed by a "b" closer. We have defined \n as a "b" closer (the only one?), so there you go. // gets closed by \n. You can write this stuff as regular expressions and functions, but it won't be fun. There might be a time when you want another set of comment openers and closers, or any arbitrary number, really, but as far as I can tell, you only get "a" and "b".

  3. php-mode-syntax-table : This argument is optional, but presumably you are only doing this for the table with which you are currently working.

?* ". 23" The star gets used as the second character in a comment opening pair /*, and the first character in a comment closing pair */. ?# "< b" This sets the hash mark as a comment beginner, whitespace, and part of the b-team. Roughly the same thing happens to the newline.

That's about it.

A rough dissection of a recent sync between live and stage

A wanted our staging version of our internal site to get synced with production. Technically, we all did, but it was A's request that really kicked off the process. In the past, doing this meant that someone had to manually copy all of the files from the production side to the development side, after first making a copy so that developers could find and reapply their pending changes. With svn, things were a little more straightforward, but still required a certain amount of work.

B and I went into the production side and ran

$ svn diff

This spit out screens full of changes, and then errored. Someone had deleted a directory without telling svn. The solution was pretty easy

$ cd dir
$ svn revert missing_dir
$ svn rm missing_dir
$ svn commit missing_dir

Appropriate check in comments were left, and away we went to the next missing directory. Finally, svn diff ran all the way through, but it was a huge number of files to commit all at once -- and surely they didn't all relate to the same individual changes, right?

An easy way to see the list of changed files is

$ svn status

You can pipe that through grep to see *only* the modified files:

$ svn status | egrep '^M'

You can replace the M with other statuses. I'll leave that as an exercise for the reader. In any case, ideally, we would have taken a list of modified files and figured out which ones belonged to which sets of changes, but some bulk checkins were done under a blanket "getting up to date" style checkin. This isn't optimum, but works.

$ svn commit

Pow! Now the subversion repository has copies of everything that was in the production side.

$ cd ../stage

Let's pretend that that command put us in the correct directory.

Now, what kind of status is stage in? If only there were a way to tell what had changed. Oh!

$ svn diff

Pages and pages later, we realize that a lot has been changed. We want to update, but we don't want to lose our changes.

If we didn't care, we would just do:

$ svn revert

and subversion would do its best to overwrite all of our changes with up to date copies of what was in its repository. Since we just checked in the production side, that would be a good way to make stage match it fairly exactly (minus generated files). But, we don't want to lose our changes, so instead we just do:

$ svn update

Assuming that nothing tragic happens and there are no other problems like filesystem permissions or lock contention, subversion should go through every file and directory and apply changes to bring the stage side up to date without overwriting anything that had already changed on stage. The merging of changes happens not on a file-by-file basis, but on a line-by-line basis.

Let's explain that again, because it's important. I will use a completely fictional example. A and B are both checked out of the repository at the same time. This is not exactly how our production and stage servers were put into svn, but the operation is very similar, so bear with me.

$ svn co path_to_svn/some_project A
$ svn co path_to_svn/some_project B
$ cat A/monkey.txt
foo
bar
baz
$ emacs A/monkey.txt
$ cat A/monkey.txt
foo
monkey
baz
$ echo pirate >>B/monkey.txt
$ cat B/monkey.txt
foo
bar
baz
pirate
$ svn commit A/monkey.txt
(svn messages omitted)
$ svn update B/monkey.txt
(svn messages omitted)
$ cat B/monkey.txt
foo
monkey
baz
pirate

So, you can see that after the various svn magicks, B/monkey.txt has the changes from A and B.

Let's do one more thing, just for show:

$ svn revert B/monkey.txt
(svn messages omitted)
$ cat B/monkey.txt
foo
monkey
baz

This is important, because sometimes you want to throw away changes that you have made. With our stage site, it was assumed that we started with all files up to date with the live side, but it is possible that some things were *older* than on production, rather than *newer*. As such, those files might have to be reverted to be overwritten with the right stuff.

But let's get back from the theoretical to the reality.

$ svn update
(svn messages omitted)

Now, some of those omitted messages involved conflicts. There *were* conflicts. Ain't no thing: we can fix them! But how do we find them?

$ svn status | egrep '^C'
C some_dir/some_conflicting_file.php

I went through and fixed a bunch of conflicts one at a time. I find it useful to search files for <<<<, since that will get to the start of a conflict block. Conflicts also show up when you do an

$ svn diff

so, you can look for them that way, too. I used my best judgment to get files up to date, but there are many files that differ from the live site.

What is the final product?

Stage now has all the changes made to the production side since I first checked everything in. In most places, stage matches production exactly, but there are many unaccounted-for changes to stage that need to be audited and either checked in or thrown away. Do an

$ svn diff

today! The web sites will thank you for it!

Some notes about Subversion

Here's something I wrote up at work to start getting people up to speed on subversion. It's very basic, but you have to start somewhere.

With many people editing a site, it can be easy to lose track of who is editing what and what version of each file is in any given place. Let's imagine that we have a live site and a testing site which are both checked in to svn (subversion). We'll also imagine three developers, A, B, and C.

The live and test sites start out completely in sync with the repository.
A makes a change to live/index.php
B makes a change to test/monkey.php

Now, A should have made the change in test/index.php, tested it, and then pushed the changes to the live site, but perhaps it was a real emergency, so we can overlook it. More importantly, C has come along and wants to know if test/ is up to date. C can do an "svn status" to check.

test $ svn status
M monkey.php

The "M" stands for "modified." Here's a cheat sheet: http://knaddison.com/sites/knaddison.com/files/svn_codes.png

So, C does an "svn update"

test $ svn update
M monkey.php

However, C hasn't gotten the changes from the live site. This is A's fault. If A edits a file on the live site (surely only in an emergency), then A should check those changes in.

live $ svn commit index.php

An editor should open giving A the chance to explain their changes. A, being a thorough developer, types out a clear explanation.
"Fixed the spelling of our product name in the header."

Oh, I can see why we'd want that changed right away.

Now, A or C can go back to test and do an svn update to get the changes from the live server. No copying or backups are needed.

test $ svn update index.php

It's worth noting here that index.php was specifically updated. You can also do an svn update and get the updates for all of the subdirectories and their files.

What about monkey.php? Did that just get overwritten? No! Subversion will do its best to resolve simple conflicts. monkey.php was not changed on the live server, so there are no changes to merge back into the test server. Let's imagine that more development is going on, and C changes monkey.php on the live site. In this instance, it was not an emergency, so C's manager should yell at C for not following procedures -- but it isn't the end of the world. Lots of places work this way, and while they are not making the best use of svn, they are at an advantage to the same situation without svn. C can at least recover.

live $ svn commit monkey.php

In the editor, C gives the following reason for their edit: "Added blink tag to body text." Wow, C, really? C should be in extra trouble for that. Now let's go to the test site:

test $ svn update monkey.php
OMGWTFBBQLOLCANHASCONFLICTS!

That's not the actual error message. The actual error message is more subtle:

test $ svn update monkey.php
C monkey.php

C is for conflict, it's good enough for me. There are also some new files in test/ now:

monkey.php.mine
monkey.php.r30
monkey.php.r31

So, you've got all you need to figure out what happened. You can also just edit monkey.php and see what the conflict looks like. It will look something like this, except with version numbers interspersed.

<<<<<<<
<blink>We are awesome!</blink>
====
<marquee>We are awesome!</marquee>
>>>>>>> .r4040

Obviously, the blink tag is superior to the marquee tag, so C removes the other line and all of svn's conflict junk, leaving just the finished product. It's a good thing that this was done on test/ and not live/, since the extra angle brackets break html and/or php. With it fixed, C can do this:

test $ svn resolve monkey.php

and then

test $ svn commit monkey.php

Assuming that there were other changes, they should now be checked in.

== Some questions and answers ==

Whew, that's a lot of stuff to digest!
Can't A and B just make C do all the work, getting everything checked in and out in all the right places? Yes and no. They are the only ones who know what they have changed.

Should I make backups before making changes? No, you probably shouldn't, and if you do, they should *not* be in svn-controlled directories.

Can I put my log files right there in the svn controlled directory? You can, but you shouldn't. For one thing, we're svn controlling the web root, and you shouldn't put your log files in the web root. Just don't do it. Beyond that, it is possible to ignore files based on filename patterns, but those patterns require maintenance.

Is svn magic? No. It won't do things for you.

Can we make our live site off limits to users so that they can only edit the test site and then push their changes to the live site? Yes! We sure can, but first we have to make the live site amenable to that style of management. That means that our team has to first manage its own changes in a professional, responsible way. We are all responsible for committing our own changes to svn and generally cleaning up the cruft that has been collecting on our customer-facing web sites for years.

That means finding a new home adhoc and regular log files, ad hoc backups, and temp files. Every change on live and test need to be checked in atomically, and we need to make use of svn's tools for manipulating files. "svn rm" and "svn mv" allow svn to make the same changes to files on the live and test servers and track those changes for us.

WHERE IS THE SVN MAGICK YOU PROMISED?!?!?!?!!?!?!?!!????!?!!!!!!QUESTION MARKS!!!!!! I told you, it's not magic, but in not too long we can make some nice stuff happen automagically for us. :)

PHP, easy as APC

I moved my web services from Mediatemple to a box in a rack somewhere in Atlanta. I'm leasing the 1U from Chris Kelly, who is leasing several other Us, as well as power and bandwidth. My nerdier friends demand to know the specs (I ordered the box in parts from Newegg with "reasonable" as my only goal), but I'm not focused on such things -- merely being glad to be out of the Mediatemple grid server ghetto.

Okay, fine, it's an AMD 64X2 5400+ with 4G of RAM and a terabyte of storage. Another 4G of RAM is on its way. Mmm, RAM.

That is neither here nor there, since it's the personal (if physically distant) control that I wanted. Case in point: PHP opcode caching, in the form of APC. (I tried XCache, but things went haywire in ways that might not have been XCache's fault, but when in a community (like Gallery) it's often useful to stick with what other people are doing rather than reinventing the wheel.

I'd like to mention that XCache has an "isset" function and APC does not, meaning that if you want to store a FALSE (presumably the result of a complicated but memoizable computation) you have to wrap it in something else. You probably have to wrap everything then, but that's something that can be worked out.

So, Zach wanted to see benchmarks of APC versus not-APC. Enter our old friend ab.
I ran some tests from a neighboring box over gigabit ethernet because I wanted to get a real maximum requests number, including a bare minimum network overhead.

This information was common to each run:


This is ApacheBench, Version 2.0.40-dev Revision: 1.146 apache-2.0
Server Software: Apache/2.2.8
Server Hostname: gallery2.jpmullan.com
Server Port: 80
Concurrency Level: 1
Complete requests: 100
Failed requests: 0
Write errors: 0

APC On

ab -n 100 http://gallery2.jpmullan.com/

Time taken for tests: 13.41635 seconds
Total transferred: 694500 bytes
HTML transferred: 652400 bytes
Requests per second: 7.67 [#/sec] (mean)
Time per request: 130.416 [ms] (mean)
Time per request: 130.416 [ms] (mean, across all concurrent requests)
Transfer rate: 51.99 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 127 129 3.6 130 161
Waiting: 127 129 3.6 130 161
Total: 127 129 3.6 130 161

APC Off

$ ab -n 100 http://gallery2.jpmullan.com/

Time taken for tests: 33.743932 seconds
Total transferred: 694500 bytes
HTML transferred: 652400 bytes
Requests per second: 2.96 [#/sec] (mean)
Time per request: 337.439 [ms] (mean)
Time per request: 337.439 [ms] (mean, across all concurrent requests)
Transfer rate: 20.09 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 323 336 11.1 336 410
Waiting: 322 336 11.1 336 410
Total: 323 336 11.1 336 410

However, the front page only says so much. Let's view a large album inside of my gallery (incidentally full of pictures from my recent vacation in San Francisco).

APC On

ab -n 100 http://gallery2.jpmullan.com/v/scans/2008/04/20080420/

Time taken for tests: 26.385255 seconds
Total transferred: 1542900 bytes
HTML transferred: 1503000 bytes
Requests per second: 3.79 [#/sec] (mean)
Time per request: 263.853 [ms] (mean)
Time per request: 263.853 [ms] (mean, across all concurrent requests)
Transfer rate: 57.08 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 259 263 10.6 261 355
Waiting: 256 260 10.4 258 350
Total: 259 263 10.6 261 355

APC Off

ab -n 100 http://gallery2.jpmullan.com/v/scans/2008/04/20080420/

Time taken for tests: 52.541065 seconds
Total transferred: 1542900 bytes
HTML transferred: 1503000 bytes
Requests per second: 1.90 [#/sec] (mean)
Time per request: 525.411 [ms] (mean)
Time per request: 525.411 [ms] (mean, across all concurrent requests)
Transfer rate: 28.66 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 1.9 0 19
Processing: 517 524 5.1 525 548
Waiting: 504 510 4.7 510 533
Total: 517 524 6.3 525 567

Let's summarize: in the four test runs, APC allowed my server to return data at least twice as fast. That seems like reason enough to keep it.

APC StateRequests
per second
Speedup
On

7.67

259%

Off

2.96

On

3.79

199%

Off

1.90

Worse Than Youtube Comments

I like Youtube, but reading Youtube comments is like performing trepanation on your own head with a pencil eraser. I tell myself not to read them, but often they are in all caps, so they sneak into my brain anyway. I struggled with this pain for months before coming to a stunning realization: I can block Youtube comments with a tiny Greasemonkey script! However, simply shutting off comments didn't fill me with joy and hope. Fortunately, while discussing the dilemma with Zach, the solution presented itself: reduce the text of each comment to "Bla bla bla..." Genius! It may even have been Zach's idea.

I searched the web a bit until I found a script to donate some code -- feyntube replaces Youtube comments with quotes from Richard Feynman. Of course there is very little of the original code left, because I wanted each word to be replaced with a "bla" -- hopefully with similar capitalization and punctuation.

Before:

Picture 4

Wow, this is fairly egregiously nonsensical.

After:

Picture 3

Here's the code, with copious comments.

// ==UserScript==
// @name Youtube blablabla
// @namespace http://jpmullan.com
// @description transform YouTube comments into reasonable and smart ones
// @include http://*youtube*/*
// ==/UserScript==

// Based on feyntube, Written 2008 by Julien Oster <feyntube@julien-oster.de>,
// find the newest version at http://www.julien-oster.de/projects/feyntube

function blanode(node) {
var text = "";
// if the node has children, loop through them
if (node.hasChildNodes()) {
var children = node.childNodes;
for (var i=0; i < children.length; i++) {
blanode(children[i]);
}
} else if (node.nodeName == "#text") {
node.oldText = node.nodeValue;
node.newText = blatext(node.nodeValue);
node.nodeValue = node.newText;
/* Uncomment the following if you hate life only a little and want a
* reason to end it, like the ability to see the source text of
* comments. I do not recommend this course of action, but it is there.
*/
/*
node.parentNode.addEventListener('mouseover', function (event) {
node.nodeValue = node.oldText;
}, false)
node.parentNode.addEventListener('mouseout', function (event) {
node.nodeValue = node.newText;
}, false);
*/
}
return text;
}

function blatext(input_string) {
/* split the input on word boundaries */
var input_parts = input_string.split(/\b/);
var output_string = '';
var part;
for (var j = 0; j < input_parts.length; j++) {
part = input_parts[j];
if (part.match(/^[A-Z]+$/)) {
/* words that are all caps */
output_string += 'BLA';
} else if (part.match(/^[A-Z]\w+$/)) {
/* words that are capitalized */
output_string += 'Bla';
} else if (part.match(/^[a-z][a-z]+$/)) {
/* words that are all lowercase and have more than one character
* this leave possessives and the like intact, so you get bla's
* instead of bla'bla, which is the wrong kind of funny */
output_string += 'bla';
} else if ('I' == part) {
/* self reference? whatever */
output_string += 'Bla';
} else if (part.match(/^[0-9]+th$/)) {
/* fix ordinal numbers! */
output_string += 'blath';
} else if (part.match(/^[0-9]+rd$/)) {
output_string += 'blard';
} else if (part.match(/^[0-9]+st$/)) {
output_string += 'blast';
} else {
/* Okay, that's plenty. Anything else would be just mean. */
output_string += part;
}
}
return output_string;
}

var allDivs = document.getElementsByTagName('div');
for (var i = 0; i < allDivs.length; i++) {
div = allDivs[i];
if (div.hasAttribute('class')
&& div.getAttribute('class').match(/commentBody/)) {
blanode(div);
}
}

That's the meat of it, but this isn't a programming magazine from the 1980s that requires you to type ten thousand lines of BASIC into your C64 -- this is the internet, and it is full of many magicks. If you are the type of person to enjoy a good greasemonkey script from time to time, you might enjoy this one:

http://jpmullan.com/greasemonkey/youtubeblablabla.user.js

But wait there's more! I read Boing Boing, and they have comments, too. Most of the time their commenters are smart and reasonable people, but occasionally I read something from some troll on there that makes me temporarily lose my mind. I could just block specific people, but then I might still be caught unawares. The solution is to nuke the site from orbit. It's the only way to be sure.

Before:

Picture 2

After:

Picture 1

http://jpmullan.com/greasemonkey/boingboingblablabla.user.js

 1 2 3  5 Next →