A proper HTTP authentication for Wget Summary ======= The first part of the project aims at follow the RFC 2617 specifications regarding the HTTP authentication process. The second part considers to make Wget handle the authentication process in a more flexible and secure way for the user. Both parts imply the writing of unit tests to prevent any sort of regression. Benefits ======== Enhancing Wget's authentication process would help in several ways: - having a more logical and secure communication with HTTP servers - making the user-land more safe and flexible Deliverables ============ A significant contribution to Wget current HTTP authentication process that may be a patch mainly impacting 'http.c' to bring the RFC 2617 compliance. Moreover, the manual would see added information to make the user aware of Wget new behaviour. Problem ======= While the current HTTP authentication is operational and can satisfy user's needs, the way it is done could be greatly improved. I propose to improve it focusing on (by order of priority): (a) Security (b) Standard compliance (RFC #2617) (c) User-land flexibility Note 1: Wget currently records challenges by host, rather than by specific realm. As an effect, it can be confused when dealing with different realms on the same host, but also it can't decide wisely about who it sends auth info to. Note 2: (a), (b) and (c) are mutually inclusive, but are differentiated here to ease the approach. Solution ======== In order to address the elements previously listed just above I propose the following enhancements: Concerning (a): 1. When the username/password are specified Wget records these information and sends nothing by default. 2. Once a challenge is received Wget should first record authentication mechanism, realm and protection-space. The protection-space determines the domain over which credentials can be automatically applied. 3. For any new requested URI, Wget should be able to quickly tell if it matches a realm already received in order to decide whether or not it should append the WWW-Authorization header. 4. The username/password should never be sent through the Referer header (RFC 2616 Compliance, section 15.1.3). Concerning (b): 1. Generality When Wget receives a WWW-Authenticate header with several challenges, it must choose to use one of the challenges with the strongest auth-scheme. If another WWW-Authenticate header is received after we sent the username/password, we should check if the header contains new challenging methods and address them if any. 2. Basic Auth. The protection-space is the URI from which we first received the 401 code, and all URI deeper, and of course any URI on the same host for which the server issues a challenge with the same realm. If a prior request has been authorized, the same credentials can be reused automatically for all other requests within that protection-space. A single protection-space cannot extend outside the scope of its server. 3. Digest Auth. The protection-space is either explicitly specified into the 'domain' value, or otherwise is the entire server. When matching the protection-space, the WWW-Authorization header may be included preemptively to avoid extra round trips for authentication challenges. Wget should remember the nonce, nonce count and opaque values associated with an authentication session to construct the WWW-Authorization header in future requests within the relevant protection-space. If 'stale' is FALSE, or anything other than TRUE, or this directive is not present, the username/password are invalid, and new values must be obtained. If the 'algorithm' directive is not understood, the challenge should be ignored. If the 'qop-options' directive is present into the WWW-Authenticate header: - Wget records it to fill in the WWW-Authorization header 'qop' value. - the 'cnonce' directive is appended - the 'nonce-count' directive is appended. It is the hexadecimal count of the number of requests (including the current request) that the client has sent. According to the 'qop' directive Wget forges the right request-digest (RFC 2617, sections 3.2.2.1 to 3.2.2.3). When an Authentication-Info header comes with the WWW-Authenticate header, Wget's response must use the 'nextnonce' value for the WWW-Authorisation 'nonce'. Concerning (c): 1. When the user doesn't provide any username and/or password, Wget should give the possibility to input them from the terminal (with echo off). 2. The user should be allowed to authenticate using either .wgetrc files, --{http,}{user,password} options or by providing explicitly the auth info within the URI via the 'username:password@' schema (Only the latter case needs work). Implementing ============ To implement the above points -- (a), (b), (c) -- my work will mainly target several places into the 'http.c' file, including but not limited to the 'gethttp()' function. Plan ==== April 14 - May 26: Get the RFC 2617 and 2616 as night table books, use fluently the unit test framework, study deeper the current auth process May 26: Google SoC 2008 official kick off Week 1-3 (May 26 - June 16): Implement (a) - 1, 2, 3 and 4 Weeks 4-6 (June 17 - July 8): Implement (b) - 1 and 2 July 7 - July 14: Mid-term evaluation, at that time (a)1,2,3,4 should be done and perfectly working, (b)1,2 should be finished or should need no more than one week work. Weeks 7-9 (July 9 - August 6): Implement (b) - 3 Weeks 10 (August 7 - August 14): Implement (c) - 1 and 2 Weeks 11-12 (August 15 - August 29): Time devoted to testing, bug-fixing, listening to the community as getting feedback. In this last step, I will also update the manual so that it reflects the new features. September 3: SoC uploading date Communication ============= Even though I would like to be able to meet with my mentor for real life talks, the fact that we are living on two different continents make it impossible. Given this situation I propose the following way for a good communication: - Weekly email report about what has been done through the week (problems met, new ideas, suggestions, etc.) and what is next for the following week. - IRC conversation for discussing about (smaller) issues and questions - Email for bigger issues with explanations, opinions, etc. Moreover, in order to keep my mentor informed as I write code, I would propose to set up a new branch into the repository where I could commit. I would also propose a commit-often policy to let him track my work with a finer view. Qualification ============= I am a 22 year old French student interested in Computer Science since ages. While I'm still pursing a Master of Science, I've been having several opportunities to work in companies for the last tree years to put my knowledge at work. I took full time summer internships as well as part-time internships in some places of the world as I like multicultural environments, and I'm now feeling pretty comfortable with such a team play. I'm using Wget for years already, reading its sources for months, and the bit of hacking that I've gone through to write this proposal makes me feel pretty confidant in succeeding with this task. Wget is exactly the kind of software I like, you can sum up what it aims at in a 3-words sentence (e.g. it fetches files), but implies non-obvious choices and coding skills in several areas. Moreover, it is a well known tool in the Unix world as it comes by default with many distributions. This means that to enhance its behavior or add features may impact thousands of users. Within this context that makes me feel pretty excited and ready to run my text editor. Once the project is "finished" I just plan to do what I did for Wget before the SoC, simply continue to bring my little piece of help to the project, by submitting and discussing patches and address users' questions on the mailing list. Before getting my hand on Wget I've never worked on any Free Software before. But I've been using and hacked with tons of them. Wget is written in C, funtionality tests are written with the Perl language, two languages I have practiced on several school and personal projects, and read reference books about.