Managing a Large Site

There is no limitation with the number of webs and users a TWiki site can have. But there are several considerationgs needed to run a site having thousands of webs and tens of thousands of users. This topic discusses those.

User Management

The default user management scheme using TWikiUserMappingContrib and TWiki::Users::HtPasswdUser is not suitable for tens of thousands of users because of the following factors.

  • TWikiUserMappingContrib maintains the list of users on the TWiki.TWikiUsersTemplate topic
  • TWiki::Users::HtPasswdUser stores user account data on a text file

You need to use e.g. LdapContrib for TWiki::Users::LdapUserMapping and TWiki::LdapPasswdUser. Or you need to implement your own user mapping manager which is scalalbe. If your environment provides intranet single sign-on and user directory (e.g. LDAP or ActiveDirectory based), it's worth considering getting rid of user registration.

Web Management

If you have thousands of webs, you face the following issues.

  • Getting the list of all webs takes a long time due to directory traversal. This happens e.g. when you move a topic.
  • The frequency of administrator help requests increases. Each web should be made as self-service as possible.
The first issue can be solved by MetadataRepository with $TWiki::cfg{Mdrepo}{WebRecordRequired} = 1;

The second issue can be solved by making webs autonomous following AutonomousWebs.

Eliminating Impractical Operations

If you have thousands of webs, some operations take too long. Here are those costly operations and how to suppress them.

In all public webs

Setting the {NoInAllPublicWebs} configuration parameter to true has the following effects

  • On the "More topic actions" page, "in all public webs" links are suppressed since they are likely to time out.
  • On te WebSearch and WebSearchAdvance topics on all webs, the "All public webs" checkbox is suppressed.

SiteChanges

TWiki.SiteChanges, the topic showing all recent changes across all webs, should be deleted.

Statistics script use from browser

This is not about the number of webs, but about the number of accesses. If there are millions of page views in a month, the statistics script takes too longe and a times out would occur if it's invoked from browser.

Setting {Stats}{DisableInvocationFromBrowser} configuration parameter to true disable invocation of the statistics script from browser.

Multiple servers

For higher performance and availability, you may have multiple TWiki servers behind a load balancer for a single TWiki site. By having $TWiki::cfg{DataDir} and $TWiki::cfg{PubDir} on NFS or other file sharing mechanisms, you can have multiple servers for a single TWiki site easily. If a topic is saved simultaneously by two or more people, on different servers sharing $TWiki::cfg{DataDir}, something may break - cases of broken RCS files are reported though their causes haven't been identified.

Even if $TWiki::cfg{DataDir} and $TWiki::cfg{PubDir} are shared by multiple servers, log files should not be because of the frequency they are updated. For example:

use Sys::Hostname;
$TWiki::cfg{LogFile} = '/var/twiki/logs/log%DATE%.' . hostname . '.txt';
logYYYMM.SERVER_HOSTNAME.txt

If each server has its own log file, the statistics script needs to see log files of all the servers to provide real data. If {Stats}{LogFileGlob} configuration parameter is set as shown below, the statistics script reads access log files matching the file glob (wildcard) instead of the file specified by {LogFileName}.

$TWiki::cfg{Stats}{LogFileGlob} = "/var/twiki/logs/log%DATE%.*.txt";

Locking down the Main web

If you have tens of thousand of users, to prevent unaccounted-for topics from accumulating in the Main web, you should lock it down. Specifically, you should allow users to have only FirstLast, FirstLastLeftBar, and FirstLastBookmarks. This can be achieved by forbidding ordinary users CHANGE operation in the %USERWEB% web while customizing the isAdmin() method of the user mapping manager to make the user admin of their topics.

Rotating Trash and Sandbox

The Trash web accumulates deleted topics, attachments, and webs. To prevent the Trash web from growing indefinitely, it needs to be cleaned up periodically. One way to achieve it is to rotate Trash just like you do with log files. For example, a new Trash is created from the Trash template after Trash is moved to Trash1 after Trash1 is moved to Trash2 ... after Trash9 is moved to Trash10 after Trash10 is deleted. If you do this rotation daily, you keep deleted topics, attachments, and webs for 10 days.

In addition to Trash, the Sandbox web should be rotated as well. Otherwise, users may accumulate random stuff in Sandbox. Some of them may depend on that random stuff. Rotating Sandbox once a week in the following manner should be appropriate - create a new Sandbox from the Sandbox template after moving Sandbox to Sandbox1 after moving Sandbox1 to Sandbox2 after deleting Sandbox2.

User subwebs

If you have many users, a good number of them may want to have a web for their own use rather than for a team use. In that case, providing them with their own subweb in the Main web might be a good idea.

You can see how to do it at UserSubwebs.

User Masquerading

If you have thousands of webs, TWiki administrators (typically TWikiAdminGroup members) have a big power. Their administrative operations may need to be audited.

UserMasquerading provides a means to minimize the amount of time an administrator exercising the privilege and audit their activities.

In addition, UserMasquerading enables web owners to check access restriction settings on their own.

Multiple Disks

A single disk may not be able to house all webs. UsingMultipleDisks provides a way to use multiple disks.

Related Topics: AdminDocumentationCategory, MetadataRepository, AutonomousWebs, UsingMultipleDisks, UserSubwebs, UserMasquerading

r1 - 23 Aug 2013 - 06:31:00 - TWikiContributor
 
Linux & Open Source for AT91 Microchip Microprocessors

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries.

Microchip and others, are registered trademarks or trademarks of Microchip Technology Inc. and its subsidiaries. This site is powered by the TWiki collaboration platform

Arm® and others are registered trademarks or trademarks of Arm Limited (or its affiliates). Other terms and product names may be trademarks of others.

Ideas, requests, contributions ? Connect to LinksToCommunities page.

Syndicate this siteRSS ATOM