mroeder

PagePusher: Web Site Content Management System

Introduction

PagePusher manages the movement of web site assets between the development, staging/QA, and production servers.

This document begins with a Development Scenario which forms the basis of the subsequent Specification sections.

Development Scenario: Web Site Section

Sections in gray are not essential because they are already handled well by other authoring systems. They may be implemented in Version 2.

Albert the Manager orders the creation of a new section of the corporate web site. A new record is created in the CMS database to contain the name of the section, the fully qualified URL of its root page, and notes about its specification.

On the Dev server, Bob the Developer creates templates, directories, and sample pages for the new web site section. Each template gets a record in the database. These records will keep lists of which pages use the templates so that when a template is updated and the pages drawn from it are regenerated, those pages can be pushed. He notifies Albert of the URL on the dev server of the sample pages. (PagePusher does not need to manage Templates. Dreamweaver does that fine.)

Albert looks at the sample pages and a developer cycle ensues.

Once the templates are the way they need to be, Bob starts making actual pages. Each page's URL is entered into the page table on the database. Each page's row in the table records its URL, title, template, update date, version number, and development state. Fields are kept for the page's versions on the developer's workstation and dev, QA, and production servers. As the page is pushed forward, these fields get filled in. Version numbers start with 1 and are incremented each time a version is checked in.

When Bob has a set of pages written on his workstation that he feels look good and work well together, he pushes them onto the development server. The database is updated to reflect the new version numbers and the new versions are copied into the version tracking system. Albert and the rest of the team working on this project are notified of the new pages. They can now check in and check out pages as in a software development system. With each checkout, a page is marked Checked Out and no other developer may check it out. When a page is checked back in, the old page is copied to an archive and the page record is updated.

When the web site section is declared finished, Albert pushes it to the Staging/QA server. The system looks at the records for all the pages in the new section and compares their dates on the QA server with the dates on the Dev server. Where Dev has something newer, it copies the files to the QA server. Any pages that are replaced on the QA server are copied to a dated delta-archive directory. The record for each page is updated with the new information about which versions are where. The system notifies Calvin the QA Manager that new pages are on the QA server.

Calvin assigns pages to members of his team. They receive emails about their assignments and the page records are marked with their names. The QA team reviews the pages. When they find bugs in the web pages, they write them up in a bug database. Each record in the bug database contains a link to the page, a summary, a detailed problem description, the QAE's name, and various other tracking information. A debug cycle ensues.

Since these pages are a set, they can only be pushed all at once. The system will not push them to the production server until all the pages in the set are certified by QA. When the push happens, the pages' records in the database are updated to reflect their new state. Any pages on the production server that were replaced are first copied into a dated delta-archive directory.

What PagePusher is Not

PagePusher is not a content authoring system. You can use whatever tools you want to create the content for your web site. Whether you code your web pages by hand using Notepad, or let Dreamweaver manage your templates for you, PagePusher will work with you and stay out of the way. Likewise for images. Whether you use SuperPaint and GifConverter or PhotoShop, PagePusher doesn't care.

Developers who use Dreamweaver can elect to use its checkin/checkout system and let PagePusher guess about page updates by file modification dates. PagePusher's primary purpose is to manage the movement of files from Dev to QA to Production. It should stay out of your way otherwise.

PagePusher is not a web log analysis program. There are plenty of those already.

Infrastructure Servers and Network Design

System Description
Dev Development server runs LAMP. Content. This could be a web server and a set of web development workstations, or it could be a single workstation with its own Apache server accessed through localhost. Whatever database it uses could be on this server or it could be on some other server.
QA Quality Assurance / Staging server runs LAMP. Content. This could be a web server and a set of web development workstations, or it could be a single workstation with its own Apache server accessed through localhost. Whatever database it uses could be on this server or it could be on some other server.
Prod Production server runs LAMP. Content.
PagePusher CMS server runs LAMP. Infrastructure.

These four servers could be all on the same hardware, or they could all be on their own server farms. Prod really should be its own separate server from the rest of the system(?). Ideally the Dev and QA servers should be separate systems configured with the minimum functionality that will correctly reproduce the idiosyncrasies of the Prod server. And the PagePusher server should be its own machine with separate LAMP implementation to serve only PagePusher and not any of the content pages. Your network design will depend on the needs and size of your organization and web sites. Clever network design can isolate the servers and their clients from each other and enforce the workflow rules. However, these issues are all independent from PagePusher. As long as PagePusher can see the server directories that it needs to read and write to, it will be happy.

Archives

Delta-Archives are done by duplicating the folder structure of the web site directory and copying only the changed files to their corresponding places. Only the folders that actually contain changed pages are in the delta-archive. A complete archive, of course, copies everything. The root folder is the name of the web site followed by an eight-digit date code in yyyymmdd format.

Tables

Table Field Type Description
Users     Each user will have one row here.
  key longint  
  name varchar 80  
  e-mail varchar 80 e-mail address
  telephone varchar 12  
  password char 12 stored with MD5
Sites     Each web site maintained by the system will have a row in this table.
  Key longint  
  name varchar 80 For instance, Timberwoof, Infernosoft, Spagthorpe
  devserver varchar 80  
  qaserver varchar 80  
  prodserver varchar 80  
Server      
  key longint  
  site longint  
  httpserver varchar 80  
  httproot varchar 80  
  ftpserver varchar 80  
  ftproot varchar 80  
  ftpuser char 12  
  ftppassword char 12 not stored as MD5
SiteUsers     For each site, each user will have recorded information about that user's role on that site.
  user longint link to users.key
  site longint link to site.key
  role longint link to roles.key
Pages Site   There will be many pages for each site.
  Key longint  
  URL varchar 256 PagePusher.html
  dev_state integer {assigned, checked_out, checked_in, ready for QA}
  dev_version longint  
  dev_date date when this state was assigned
  Dev_Eng longint link to Users.key
  QA_state integer {in QA, assigned, in_progress, approved, rejected, in Production}
  QA_version longint  
  QA_date date when this state was assigned
  QA_eng longint link to Users.key
  prod_version longint  
  prod_date date when this state was assigned
Bugs     There could be multiple bugs for each page.
  Key longint bug number
  Site longint link
  Page longint link
  date date date
  version longint  
  state integer set
  summary text  
  description text  
  Dev_Eng longint link to Users.key
  QA_eng longint link to Users.key
Roles      
  key longint  
  description char !2 {Dev, QA, Manager}
       
       
       
       
       
       
       

Actions

Who What Description
Developer    
  check out page mark page as locked and owned by that developer
  check in page mark page as not locked but not released for QA
  reassign page send page to another developer
  assert page ready mark page as ready for QA
  roll back page from QA get page from QA server, overwriting this file
  roll back page from Prod get page from Production server, overwriting this file
QAE    
  get next page to check get next page from list of pages assigned to me
  approve page mark page as ready for Production
  reject page - enter bug mark page as not ready. Add information to bugbase.
  reassign page send page to another QAE
Manager    
  create/edit people  
  create/edit page metadata  
  assign page Determine which DEV or QAE should work on this page
  promote Dev pages Look through list of pages. Any page that's marked by a developer as Ready for QA, copy to the QA server and mark the page as In QA.
  promote QA pages Look through list of pages. Any page that's marked by a QAE as Ready for Production, copy to the Production serve and mark the page as In Production.
  demote QA pages Look through list of pages. Any page that's marked by a QAE as Rejected, assign to Dev engineer.
  import site parse source directory to populate pages table
  move page deal with moved or renamed pages
 

Copyright 2004 by Michael Roeder.