What is Content-Disposition? According to developer guide from Mozilla : "In a regular HTTP response, the Content-Disposition response header is a header indicating if the content is expected to be displayed inline in the browser, that is, as a Web page or as part of a Web page, or as an attachment, that is downloaded and saved locally.
Content disposition filter is a security feature against XSS attacks on SVG files.
Different values for the Content-Disposition headers
inline (This is the default value - indicating it can be displayed inside the Web page, or as the Web page)
attachment (which indicates it should be downloaded).
In AEM how the content disposition supports? Usually people might have complained in AEM websites, the pdf or an image which is supposed to be downloaded are getting open in new tab(usually on dispatcher URL).
In AEM there is a configuration in OSGI console - 'org.apache.sling.security.impl.ContentDispositionFilter'
In AEM we can configure Content Disposition Filter in multiple ways
Content Disposition Paths This option helps us to configure a list of paths where the content disposition filter will be applied followed by a list of mime-types to exclude on that path.
Some examples given below:
/content/*:image/png This will apply the filter to every node in /content except png/content
/*:image/png,image/svg+xml - This will apply the filter to every node in /content except svg images
/content/*:audio/mpeg - For the audio of type mpeg
/content/*:application/pdf - For pdf files to download instead of opening in other tab
Ensure the path must be an absolute path and can contain a wildcard ('*') at the end, to match every resource path with the given path prefix.
Excluded Resource Paths We can exclude a set of paths to be excluded, each resource path must be given as absolute and fully qualified path. In ths case prefix matching/wildcards are not supported. Enable For All Resource Paths
This feature flag controls enablement of the filter for all paths, except for the excluded paths defined by Excluded Resource Paths. If we set this to true, we are ignoring all content disposition paths (resource paths which has a property named 'jcr:data' or 'jcr:content jcr:data').
The Content Disposition details can be found in url
The new 'AEM Assets as a cloud service' which is part of AEM as a cloud (platform as a service solution) provides Digital Asset Management capabilities(storage, managing metadata online, versioning, upload and download) with below extended features.
Based on asset microservices(asset ingestion and processing).
Smart capabilities, such as AI/ML
Highly scalable
Always current
Always available
Auto scaled, deployed and monitored
In older AEM all the asset operations happened at AEM Author instance - which consumes considerable CPU, memory, and I/O resource.Asset processing and storage requirements demand resources which in turn create performance issues impact authoring and browsing experience of end users.
A High-level Architecture of Assets as a Cloud Service can be seen below
The generic steps followed in sequence are,
Clients send an upload request - then start uploading binary directly to cloud
Once the direct upload is completed, the client notifies AEM
Now the AEM sends a processing request to Assets Microservice
The
asset microservice now start processing the asset (based on the
rendition request from AEM) - asset microservice runs relevant
microservices for this. They access the binary from cloud and processed
assets are also placed in binary cloud.
Now assets microservice notifies AEM that renditions are available.
Assets as a Cloud Service Vs AEM Asset upload on premise
Assets as a Cloud Service uses direct binary access principle for upload and download - Previously Assets were uploaded directly to AEM author instance for processing.
Assets as a Cloud Service uses 'asset microservices' for asset processing, which is external to AEM - But in older AEM versions, all process happened within AEM.
In Assets as a Cloud Service DAM Asset Update Not available [ asset microservices provide a scalable, readily available service that covers most of the default asset processing (renditions, metadata extraction, text extraction for indexing)]. But in older AEM we had DAM Asset Update workflow as default.
Assets as a Cloud Service comes with post-processing workflows which can be used or customizations(where additional processing of assets is required that cannot be achieved using the processing profiles) -In older AEM we had default + customized workflow steps (Even though it looks as an advantage it had used AEM for all processing).
In Assets as a Cloud Service the standard Asset upload interface is the Touch-enabled UI - In older version Classic UI was available.
In Assets as a Cloud Service only the new upload APIs are supported -The older AEM Assets HTTP API(AEM 6.5), AssetManager Java API, is deprecated now Advantages of new cloud
The uploaded binaries do not go through AEM, which is now simply coordinating the upload process with the binary cloud storage configured for the deployment. finally clients get direct access to them to carry out their work. This minimizes the load on networks and duplication of binaries stored.
Binary cloud storage is fronted by a Content Delivery Network (CDN, Edge Network), which brings the upload endpoint closer to the client, helping to improve upload performance and user experience, especially for distributed teams uploading assets
More scalable and performant handling of asset uploads.
Ways of uploading Assets to Assets as a Cloud Service Upload using web interface, Adobe Asset Link, AEM desktop app or custom applications which uses the new HTTP API.
Post-processing workflows There are cases where we need additional processing to be done, which are not done by asset microservices(For eg. Generating a rendition which requires an integration with other application), additional post-processing workflows can be added to the configuration.
Post-processing workflows, once configured, are automatically executed by AEM after the microservices processing finishes. There is no need to add workflow launchers manually to trigger them.
Some examples for Post Processing workflow use cases are:
Custom workflow steps to process assets.
Additional processing done by external services.
Integrations to add metadata or properties to assets from external systems
How to create Post - Processing Workflows: Steps involved
Create one or more workflow models. - they are of regular AEM workflow models
Add specific workflow steps to these models.
Add 'DAM Update Asset Workflow Completed' Process step at the end(To inform AEM once the processing is done)
Create a configuration for the Custom Workflow Runner Service(configuration of an OSGi service) - This ensures the execution of a post-processing workflow model either by a path (folder location) or by a regular expression.
Supported File Formats
Adobe formats - AI, COLLAGE, DN, IDEAS, INDD, INDT, PDF, PROTO, PSB, PSD, XD Imaging file formats - BMP, EPS, GIF, JPEG, PNG, SVG, TIFF Image formats in Dynamic Media - PNG, GIF, TIFF, JPEG, BMP, PSD , EPS, PICT 3D formats - DN, gLB, gLTF, OBJ, STL, USDz Camera
Raw file formats - 3FR, ARW, CR2, CR3, CRW, DCR, DNG, ERF, FFF, GPR,
IIQ, KDC, MEF, MFW, MOS, MRW, NEF, NRW, ORF, PEF, RAF, RAW, RW2, RWL,
SRF, SRW, X3F Document formats - PDF,DOCX,DOC,PPTX,PPT, XLSX,XLS,ODF,OFG,ODM,ODP,ODS,ODT,EPUB,HTML,PS,RTF,TXT,XML Document formats in Dynamic Media - AI, PDF, INDD Video formats - 3G2,3GP,AVI,DIVX,F4V,FLV,M2T,M2TS,M2V,M4V,MKV,MOV,MP4,MPEG,MPG,MTS,OGV,QT,R3D,SWF,WEBM,WMV Video
formats in Dynamic Media for transcoding - MP4,MOV, QT,FLV,
F4V,WMV,MPG, VOB, M2V, MP2,M4V,AVI,WebM,OGV, OGG,MXF,MTS,MKV,R3D,
RM,RAM, RM,FLAC,MJ2, Audio formats - AIF, ASF, M4A, MP3, WAV, and WMA
When we think about AEM websites, SEO is one of the major consideration. To ensure the crawlers are crawling our website, we need to have sitemap.xml and a robots.txt which redirects the crawler to corresponding sitemap.xml
A robots.txt file lives at the root folder of the website. Below given the role of a robots.txt in any website. Robots.txt file acts as an entry point to any website and ensure the crawlers are accessing only the relevent items whcihwe have defined.
Click on image to see it big
robots.txt in AEM websites
Let us see how we can implement a robots.txt file in our AEM website. There are many ways to do this, but below is one of the easiest way to achieve the implementation.
Say we have multiple websites(multi-lingual) with language roots /en, /fr, /gb, /in
Let us see how we can enable robots.txt in our case.
Add robots.txt in Author
Login to the crxde and create a file called 'robots.txt' under path /content/dam/[sitename] Ensure the following lines are added to the 'robots.txt' in Author of AEM instance and publish the robots.txt
#Any search crawler can crawl our site User-agent: *
Add OSGi configurations for url mapping Now add below entry in OSGI console> configMgr - 'Apache Sling Resource Resolver Factory'
Add below mapping for section 'URL Mappings' /content/dam/sitename/robots.txt>/robots.txt$
Add rewrite rule/ allow access to robots.txt via dispatcher And allow the crawlers to access robots.txt via the dispatcher
Add allow rule for robots.txt in dispatcher /0010 { /type "allow" /url "/robots.txt"}
When you hit the www.[sitename]/robots.txt you should see the robots.txt file on public domain.
Now any search engine which tries to access our site will find the robots.txt and recognises, whether the crawler has got permission to crawl the site and what areas of the site has got crawl access. Some sample usage of robots.txt is given below
# Disallow googlebot accessing example.com/directory1/... and example.com/directory2/... # but allow access to subdirectories -> directory2/subdirectory1/... # All other directories on the site are allowed by default. User-agent: googlebot Disallow: /directory1/ Disallow: /directory2/ Allow: /directory2/subdirectory1/
# Block the entire site from xyzcrawler. User-agent: xyzcrawler Disallow: /
Let me know if you find a better way to do this; via comments section.
Before we start any AEM upgrades we should ensure that a detailed study is done on the release notes.
If the upgrades are planned to the next direct version (Say AEM 6.4 to AEM 6.5), We can just read the release notes of AEM 6.5 and proceed for the upgrade. But if the case is different (AEM 6.3 to AEM 6.5) ensure we are comparing the release notes for each versions.
For eg: Say we are upgrading from AEM 6.3 to AEM 6.5. We know there was an AEM 6.4 available. So while upgrade, first understand the release notes of AEM 6.4 and observe the changes between AEM 6.3 to AEM 6.4 and do the same comparison from AEM 6.4 to AEM 6.5. This process ensure that we are identifying every changes and accommodating all changes by taking precaution not to break anything during upgrades.
Notes: AEM content is being restructured out of /etc to other folders in the repository, along with guidelines on what content goes where, adhering to the following high-level rules: • AEM product code will always be placed in /libs, which must not be overwritten by custom code • Custom code should be placed in /apps, /content, and /conf
As a best practice and recommended option, we have to ensure the '/content/sitename' is hidden from appearing on the public domain.
Here I am going to explain one of the best approach for achieving the same. To achieve this, we will have to configure things on both publish and dispatcher.
Configurations on PUBLISH:
Configuring the Apache Sling Resource Resolver Factory to ensure the URLs are re-written at PUBLISH server.
One you save the configuration, and hit the webpage with http://<hostname>:<port>/fr/home.html, you will be able to see the home page over publish instance(without /content/sitename).
Note:After saving, it takes some time to auto-restart the relevant bundles.
Configurations on DISPATCHER: Now we are able to hit the publish server without content path. Now let us see how this can be achieved over dispatcher or the publish domain URL.
Step1: Ensure the 'mod_rewrite' module is loaded in apache. # # This file loads most of the modules included with the Apache HTTP # LoadModule rewrite_module modules/mod_rewrite.so
Step 2: To do this set 'DispatcherUseProcessedURL' property to 1 - This will ensure the dispatcher using processed URLS . <IfModule disp_apache.c> # This is enabled to ensure re-writes taking effect DispatcherUseProcessedURL 1 </IfModule>
Step 3: Now update the virtual host file and add the below rules in it.(Usually this configuration file sits in the conf.d module)
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteRule ^(.+)/$ $1
#shorten the URL RewriteRule ^/content/sitename/(.*).html$ $1.html [R,L] #Redirect the fr to the root folder to the home page
RewriteRule ^/?$ fr/home.html [R,L]
Now after restarting the Apache and hit the public domain URL https://[websitename.com]/fr/home.html, you can see the home page is loading.
Recently I had appeared for AEM Certification exam and thought I will share my experience with you all.
How did I register for exam?
I went through the AEM certification site and registered my-self choosing a date - A convenient date when I can make myself completely free.
- There were two options, PSI and Examity. I have chosen one.
- Due to the COVID lock-down, majority of the exams are happening online
Procedures on exam day.
The exam notification email said, I can login to the exam system half an hour before. Since this was the first experience, I logged into the system 1 hour before.
System Checks: The exam site asked me to install a secure browser, once installed they do check system requirements. They do a set of checks like system resources, camera, internet speed, browser used etc.
Check-in Personal Information: Once that is done, I was asked to take a pic of 1) Myself 2) my government issued ID 3) and a video scan of my room by rotating laptop around, including the desk where laptop was placed.
Once those files are checked into the system, I was asked to wait for the scheduled time. My recommendation here: If you are confident enough about your system, internet etc, login just half an hour before the exam - else you will have to wait a lot.
You can even disconnect and login back, but since camera was on, i did not attempt that.
Waiting for proctor
Now my scheduled time came. I had scheduled exam at 10 AM. But I was still getting message like 'Your exam will start once proctor joins on the scheduled time'. It went on for 10 minutes. I saw an option to chat with the executive. I pinged executive via the chat option. Even to connect the customer care executive - it took some time.
The executive told me , he can reschedule, but the replies were too late, so I was worried about the confirmation.
Check-in Expert Verifying my details:
Fortunately my on screen message changed to - 'Check-in verification expert is analysing details'. So I have asked the chat agent to hold on from rescheduling.
The verification agent(proctor) told to re-take the ID proof photo again - which was not clear according to him. I have done that and re-uploaded.
Starting the exam now:
After waiting for few minutes, My screen changed to "starting with exam". Then the proctor started sending me messages.
Proctor asked to scan the room again. rotating 4 sides of room, (He saw my ID card was on my desk/table - pinged me to remove it) once that was done, the proctor shared the terms and condition and then started the exam.
Notes: Even though exam was scheduled at a specific time, the exam started quite late after all these procedures. This means, same process is carried out for all persons who are taking exam in parall and this is the reason the proctor may not be able to start our exam on scheduled time. So I personally ask every one who takes the test to have patience and wait till the procedures are completed before taking exam.
My suggestions
- Ensure un-interrupted internet, power connectivity
- Ensure its a peaceful space where no one disturbs you.
I will be providing more tips for the AEM certification via my YouTube channel - Link is provided on right side of the webpage.
There are cases where during development we may need to setup https connection in our existing AEM instance.
By following procedure we can have both http and https on same AEM instance. This is very helpful while testing some of the AEM features which require SSL connections.
To start with, we need keys and certificates to configure SSL on AEM. We will use OpenSSL to set up keys and certificates. The method is tested on window, but should work on any other OS seamless way.
How to setup OpenSSL on Windows
Download OpenSSL from any URL - Ensure its relevant to your OS (including 86 Vs 64 Bit)
Unzip it.
Set the classpath
place the conf file in below path (Else you may get an error that openSSL conf cannot be found)
### Generate the SSL certificate and sign with the private key, will expire one year from now $ openssl x509 -req -days 365 -in localhost.csr -signkey localhostprivate.key -out localhost.crt
### Convert Private Key to DER format - SSL wizard requires key to be in DER format $ openssl pkcs8 -topk8 -inform PEM -outform DER -in localhostprivate.key -out localhostprivate.der -nocrypt
You will have the certificates now in local drive as shown below.
Use the SSL Wizard in AEM
Now login to AEM http://localhost:4502/aem/start.html
Tools > Security > SSL Configuration
For store credentials provide the Key store and Trust store password. [I have used admin for all, since its a localhost]