3 bookmaker.py is a helper for optimizing PDFs of books for the production of small self-printed, self-bound physical books. Towards this goal it offers various PDF manipulation options that may also be used indepéndently and for other purposes.
8 Concatenate two PDFs A.pdf and B.pdf to COMBINED.pdf:
9 bookmaker.py --input_file A.pdf --input_file B.pdf --output_file COMBINED.pdf
11 Produce OUTPUT.pdf containing all pages of (inclusive) page number range 3-7 from INPUT.pdf:
12 bookmaker.py -i INPUT.pdf --page_range 3-7 -o OUTPUT.pdf
14 Produce COMBINED.pdf from A.pdf's first 7 pages, B.pdf's pages except its first two, and all pages of C.pdf:
15 bookmaker.py -i A.pdf -p start-7 -i B.pdf -p 3-end -i C.pdf -o COMBINED.pdf
17 Crop each page 5cm from the left, 10cm from the bottom, 2cm from the right, and 0cm from the top:
18 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --crops "5,10,2,0"
20 Include all pages from INPUT.pdf, but crop pages 10-20 by 5cm each from bottom and top:
21 bookmaker.py -i INPUT.pdf -c "10-20:0,5,0,5" -o OUTPUT.pdf
23 Same crops for pages 10-20, but also crop all pages 30 and later by 3cm each from left and right:
24 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -c "10-20:0,5,0,5" -c "30-end:3,0,3,0"
26 Rotate by 90° pages 3, 5, 7; rotate page 7 once more by 90% (i.e. 180° in total):
27 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --rotate 3 -r 5 -r 7 -r 7
29 Initially declare 5cm crop from the left and 1cm crop from right, but alternate direction between even and odd pages:
30 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -c "5,0,1,0" -s
32 Quarter each OUTPUT.pdf page to carry 4 pages from INPUT.pdf, draw stencils into inner margins for cuts to carry binding strings:
33 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --nup4
35 Same --nup4, but define a printable-region margin of 1.3cm to limit the space for the INPUT.pdf pages in OUTPUT.pdf page quarters:
36 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -n --print_margin 1.3
38 Same --nup4, but draw lines marking printable-region margins, page quarts, spine margins:
39 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -n --analyze
43 For arguments like -p, page numbers are assumed to start with 1 (not 0, which is treated as an invalid page number value).
45 The target page shape so far is assumed to be A4 in portrait orientation; bookmaker.py normalizes all pages to this format before applying crops, and removes any source PDF /Rotate commands (for their production of landscape orientations).
47 For --nup4, the -c cropping instructions do not so much erase content outside the cropped area, but rather zoom into the page in a way that maximes the cropped area as much as possible into the available per-page area between printable-area margins and the borders to the other quartered pages. If the zoomed cropped area does not fit in neatly into its per-page area, this will preserve additional page content.
49 The --nup4 quartering puts pages into a specific order optimized for no-tumble duplex print-outs that can easily be folded and cut into pages of a small A6 book. Each unit of 8 pages from the source PDF is mapped thus onto two subsequent pages (i.e. front and back of a printed A4 paper):
58 To facilitate this layout, --nup4 also pads the input PDF pages to a total number that is a multiple of 8, by adding empty pages if necessary.
60 (To turn above double-sided example page into a tiny 8-page book: Cut the paper in two on its horizontal middle line. Fold the two halves by their vertical middle lines, with pages 3-2 and 7-6 on the folds' insides. This creates two 4-page books of pages 1-4 and pages 5-8. Fold them both closed and (counter-intuitively) put the book of pages 5-8 on top of the other one (creating a temporary page order of 5,6,7,8,1,2,3,4). A binding cut stencil should be visible on the top left of this stack – cut it out (with all pages folded together) to add the same inner-margin upper cut to each page. Turn around your 8-pages stack to find the mirror image of aforementioned stencil on the stack's back's bottom, and cut that out too. Each page now has binding cuts on top and bottom of its inner margins. Swap the order of both books (back to the final page order of 1,2,3,4,5,6,7,8), and you now have an 8-pages book that can be "bound" in its binding cuts through a rubber band or the like. Repeat with the next 8-pages double-page, et cetera. (Actually, with just 8 pages, the paper may curl under the pressure of a rubber band – but go up to 32 pages or so, and the result will become quite stable.)
67 def handled_error_exit(msg):
68 print(f"ERROR: {msg}")
74 handled_error_exit("Can't run at all without pypdf installed.")
76 # some general paper geometry constants
77 POINTS_PER_CM = 10 * 72 / 25.4
78 A4_WIDTH = 21 * POINTS_PER_CM
79 A4_HEIGHT = 29.7 * POINTS_PER_CM
80 A4 = (A4_WIDTH, A4_HEIGHT)
82 # constants specifically for --nup4
83 A4_HALF_WIDTH = A4_WIDTH / 2
84 A4_HALF_HEIGHT = A4_HEIGHT / 2
85 CUT_DEPTH = 1.95 * POINTS_PER_CM
86 CUT_WIDTH = 1.05 * POINTS_PER_CM
87 MIDDLE_POINT_DEPTH = 0.4 * POINTS_PER_CM
88 SPINE_LIMIT = 1 * POINTS_PER_CM
89 QUARTER_SCALE_FACTOR = 0.5
92 class HandledException(Exception):
95 def validate_page_range(p_string, err_msg_prefix):
96 prefix = f"{err_msg_prefix}: page range string"
97 if '-' not in p_string:
98 raise HandledException(f"{prefix} lacks '-': {p_string}")
99 tokens = p_string.split("-")
101 raise HandledException(f"{prefix} has too many '-': {p_string}")
102 for i, token in enumerate(tokens):
105 if i == 0 and token == "start":
107 if i == 1 and token == "end":
112 raise HandledException(f"{prefix} carries value neither integer, nor 'start', nor 'end': {p_string}")
114 raise HandledException(f"{prefix} carries page number <1: {p_string}")
118 start = int(tokens[0])
122 if start > 0 and end > 0 and start > end:
123 raise HandledException(f"{prefix} has higher start than end value: {p_string}")
125 def split_crops_string(c_string):
126 initial_split = c_string.split(':')
127 if len(initial_split) > 1:
128 page_range = initial_split[0]
129 crops = initial_split[1]
132 crops = initial_split[0]
133 return page_range, crops
135 def parse_page_range(range_string, pages):
137 end_page = len(pages)
139 start, end = range_string.split('-')
140 if not (len(start) == 0 or start == "start"):
141 start_page = int(start) - 1
142 if not (len(end) == 0 or end == "end"):
144 return start_page, end_page
146 def draw_cut(canvas, x_spine_limit, direction):
147 outer_start_x = x_spine_limit - 0.5 * CUT_WIDTH * direction
148 inner_start_x = x_spine_limit + 0.5 * CUT_WIDTH * direction
149 middle_point_y = A4_HALF_HEIGHT + MIDDLE_POINT_DEPTH * direction
150 end_point_y = A4_HALF_HEIGHT + CUT_DEPTH * direction
151 canvas.line(inner_start_x, A4_HALF_HEIGHT, x_spine_limit, end_point_y)
152 canvas.line(x_spine_limit, end_point_y, x_spine_limit, middle_point_y)
153 canvas.line(x_spine_limit, middle_point_y, outer_start_x, A4_HALF_HEIGHT)
156 parser = argparse.ArgumentParser(description=__doc__, epilog=help_epilogue, formatter_class=argparse.RawDescriptionHelpFormatter)
157 parser.add_argument("-i", "--input_file", action="append", required=True, help="input PDF file")
158 parser.add_argument("-o", "--output_file", required=True, help="output PDF file")
159 parser.add_argument("-p", "--page_range", action="append", help="page range, e.g., '2-9' or '3-end' or 'start-14'")
160 parser.add_argument("-c", "--crops", action="append", help="cm crops left, bottom, right, top – e.g., '10,10,10,10'; prefix with ':'-delimited page range to limit effect")
161 parser.add_argument("-r", "--rotate_page", type=int, action="append", help="rotate page of number by 90° (usable multiple times on same page!)")
162 parser.add_argument("-s", "--symmetry", action="store_true", help="alternate horizontal crops between odd and even pages")
163 parser.add_argument("-n", "--nup4", action='store_true', help="puts 4 input pages onto 1 output page, adds binding cut stencil")
164 parser.add_argument("-a", "--analyze", action="store_true", help="in --nup4, print lines identifying spine, page borders")
165 parser.add_argument("-m", "--print_margin", type=float, default=0.43, help="print margin for --nup4 in cm (default 0.43)")
166 args = parser.parse_args()
168 # some basic input validation
169 for filename in args.input_file:
170 if not os.path.isfile(filename):
171 raise HandledException(f"-i: {filename} is not a file")
173 with open(filename, 'rb') as file:
174 pypdf.PdfReader(file)
175 except pypdf.errors.PdfStreamError:
176 raise HandledException(f"-i: cannot interpret {filename} as PDF file")
178 for p_string in args.page_range:
179 validate_page_range(p_string, "-p")
180 if len(args.page_range) > len(args.input_file):
181 raise HandledException("-p: more --page_range arguments than --input_file arguments")
183 for c_string in args.crops:
184 initial_split = c_string.split(':')
185 if len(initial_split) > 2:
186 raise HandledException(f"-c: cropping string has multiple ':': {c_string}")
187 page_range, crops = split_crops_string(c_string)
188 crops = crops.split(",")
190 validate_page_range(page_range, "-c")
192 raise HandledException(f"-c: cropping does not contain exactly three ',': {c_string}")
197 raise HandledException(f"-c: non-number crop in: {c_string}")
199 for r in args.rotate_page:
203 raise HandledException(f"-r: non-integer value: {r}")
205 raise HandledException(f"-r: value must not be <1: {r}")
207 float(args.print_margin)
209 raise HandledException(f"-m: non-float value: {arg.print_margin}")
217 import reportlab.pdfgen.canvas
219 raise HandledException("-n: need reportlab library installed for --nup4")
221 # select pages from input files
225 for i, input_file in enumerate(args.input_file):
226 file = open(input_file, 'rb')
227 opened_files += [file]
228 reader = pypdf.PdfReader(file)
230 if args.page_range and len(args.page_range) > i:
231 range_string = args.page_range[i]
232 start_page, end_page = parse_page_range(range_string, reader.pages)
233 if end_page > len(reader.pages): # no need to test start_page cause start_page > end_page is checked above
234 raise HandledException(f"-p: page range goes beyond pages of input file: {range_string}")
235 for old_page_num in range(start_page, end_page):
237 page = reader.pages[old_page_num]
238 pages_to_add += [page]
239 print(f"-i, -p: read in {input_file} page number {old_page_num+1} as new page {new_page_num}")
241 # we can do some more input validations now that we know how many pages output should have
243 for c_string in args.crops:
244 page_range, _= split_crops_string(c_string)
246 start, end = parse_page_range(page_range, pages_to_add)
247 if end > len(pages_to_add):
248 raise HandledException(f"-c: page range goes beyond number of pages we're building: {page_range}")
250 for r in args.rotate_page:
251 if r > len(pages_to_add):
252 raise HandledException(f"-r: page number beyond number of pages we're building: {r}")
254 # rotate page canvas (as opposed to using PDF's /Rotate command)
256 for rotate_page in args.rotate_page:
257 page = pages_to_add[rotate_page - 1]
258 page.add_transformation(pypdf.Transformation().translate(tx=-A4_WIDTH/2, ty=-A4_HEIGHT/2))
259 page.add_transformation(pypdf.Transformation().rotate(-90))
260 page.add_transformation(pypdf.Transformation().translate(tx=A4_WIDTH/2, ty=A4_HEIGHT/2))
261 print(f"-r: rotating (by 90°) page {rotate_page}")
263 # if necessary, pad pages to multiple of 8
265 mod_to_8 = len(pages_to_add) % 8
267 print(f"-n: number of input pages {len(pages_to_add)} not multiple of 8, padding to that")
268 for _ in range(8 - mod_to_8):
269 new_page = pypdf.PageObject.create_blank_page(width=A4_WIDTH, height=A4_HEIGHT)
270 pages_to_add += [new_page]
272 # normalize all pages to portrait A4
273 for page in pages_to_add:
274 if "/Rotate" in page:
275 page.rotate(360 - page["/Rotate"])
276 page.mediabox.left = 0
277 page.mediabox.bottom = 0
278 page.mediabox.top = A4_HEIGHT
279 page.mediabox.right = A4_WIDTH
280 page.cropbox = page.mediabox
282 # determine page crops, zooms, crop symmetry
283 crops_at_page = [(0,0,0,0)]*len(pages_to_add)
284 zoom_at_page = [1]*len(pages_to_add)
286 for c_string in args.crops:
287 page_range, crops = split_crops_string(c_string)
288 start_page, end_page = parse_page_range(page_range, pages_to_add)
289 crop_left_cm, crop_bottom_cm, crop_right_cm, crop_top_cm = [float(x) for x in crops.split(',')]
290 crop_left = crop_left_cm * POINTS_PER_CM
291 crop_bottom = crop_bottom_cm * POINTS_PER_CM
292 crop_right = crop_right_cm * POINTS_PER_CM
293 crop_top = crop_top_cm * POINTS_PER_CM
294 prefix = "-c, -t" if args.symmetry else "-c"
295 suffix = " (but alternating left and right crop between even and odd pages)" if args.symmetry else ""
296 print(f"{prefix}: to pages {start_page + 1} to {end_page} applying crops: left {crop_left_cm}cm, bottom {crop_bottom_cm}cm, right {crop_right_cm}cm, top {crop_top_cm}cm{suffix}")
297 cropped_width = A4_WIDTH - crop_left - crop_right
298 cropped_height = A4_HEIGHT - crop_bottom - crop_top
300 zoom_horizontal = A4_WIDTH / (A4_WIDTH - crop_left - crop_right)
301 zoom_vertical = A4_HEIGHT / (A4_HEIGHT - crop_bottom - crop_top)
302 if (zoom_horizontal > 1 and zoom_vertical < 1) or (zoom_horizontal < 1 and zoom_vertical > 1):
303 raise HandledException("-c: crops would create opposing zoom directions")
304 elif zoom_horizontal + zoom_vertical > 2:
305 zoom = min(zoom_horizontal, zoom_vertical)
307 zoom = max(zoom_horizontal, zoom_vertical)
308 for page_num in range(start_page, end_page):
309 if args.symmetry and page_num % 2:
310 crops_at_page[page_num] = (crop_right, crop_bottom, crop_left, crop_top)
312 crops_at_page[page_num] = (crop_left, crop_bottom, crop_right, crop_top)
313 zoom_at_page[page_num] = zoom
315 writer = pypdf.PdfWriter()
318 print("building 1-input-page-per-output-page book")
320 for i, page in enumerate(pages_to_add):
321 crop_left, crop_bottom, crop_right, crop_top = crops_at_page[i]
322 zoom = zoom_at_page[i]
323 page.add_transformation(pypdf.Transformation().translate(tx=-crop_left, ty=-crop_bottom))
324 page.add_transformation(pypdf.Transformation().scale(zoom, zoom))
325 cropped_width = A4_WIDTH - crop_left - crop_right
326 cropped_height = A4_HEIGHT - crop_bottom - crop_top
327 page.mediabox.right = cropped_width * zoom
328 page.mediabox.top = cropped_height * zoom
329 writer.add_page(page)
330 odd_page = not odd_page
331 print(f"built page number {i+1} (of {len(pages_to_add)})")
334 print("-n: building 4-input-pages-per-output-page book")
335 print(f"-m: applying printable-area margin of {args.print_margin}cm")
337 print("-a: drawing page borders, spine limits")
338 printable_margin = args.print_margin * POINTS_PER_CM
339 printable_scale = (A4_WIDTH - 2 * printable_margin)/A4_WIDTH
340 spine_part_of_page = (SPINE_LIMIT / A4_HALF_WIDTH) / printable_scale
341 bonus_shrink_factor = 1 - spine_part_of_page
347 for page in pages_to_add:
354 new_i_order += [8 * n_eights + 3,
363 new_page_order += [eight_pack[3]] # page front, upper left
364 new_page_order += [eight_pack[0]] # page front, upper right
365 new_page_order += [eight_pack[7]] # page front, lower left
366 new_page_order += [eight_pack[4]] # page front, lower right
367 new_page_order += [eight_pack[1]] # page back, upper left
368 new_page_order += [eight_pack[2]] # page back, upper right
369 new_page_order += [eight_pack[5]] # page back, lower left
370 new_page_order += [eight_pack[6]] # page back, lower right
374 for j, page in enumerate(new_page_order):
376 new_page = pypdf.PageObject.create_blank_page(width=A4_WIDTH, height=A4_HEIGHT)
378 # in-section transformations: align pages on top, left-hand pages to left, right-hand to right
379 new_i = new_i_order[j]
380 crop_left, crop_bottom, crop_right, crop_top = crops_at_page[new_i]
381 zoom = zoom_at_page[new_i]
382 page.add_transformation(pypdf.Transformation().translate(ty=(A4_HEIGHT / zoom - (A4_HEIGHT - crop_top))))
384 page.add_transformation(pypdf.Transformation().translate(tx=-crop_left))
385 elif i == 1 or i == 3:
386 page.add_transformation(pypdf.Transformation().translate(tx=(A4_WIDTH / zoom - (A4_WIDTH - crop_right))))
387 page.add_transformation(pypdf.Transformation().scale(zoom * bonus_shrink_factor, zoom * bonus_shrink_factor))
389 page.add_transformation(pypdf.Transformation().translate(ty=-2*printable_margin/printable_scale))
391 # outer section transformations
392 page.add_transformation(pypdf.Transformation().translate(ty=(1-bonus_shrink_factor)*A4_HEIGHT))
394 y_section = A4_HEIGHT
395 page.mediabox.bottom = A4_HALF_HEIGHT
396 page.mediabox.top = A4_HEIGHT
399 page.mediabox.bottom = 0
400 page.mediabox.top = A4_HALF_HEIGHT
403 page.mediabox.left = 0
404 page.mediabox.right = A4_HALF_WIDTH
406 page.add_transformation(pypdf.Transformation().translate(tx=(1-bonus_shrink_factor)*A4_WIDTH))
408 page.mediabox.left = A4_HALF_WIDTH
409 page.mediabox.right = A4_WIDTH
410 page.add_transformation(pypdf.Transformation().translate(tx=x_section, ty=y_section))
411 page.add_transformation(pypdf.Transformation().scale(QUARTER_SCALE_FACTOR, QUARTER_SCALE_FACTOR))
412 new_page.merge_page(page)
414 print(f"merged page number {page_count} (of {len(pages_to_add)})")
419 packet = io.BytesIO()
420 c = reportlab.pdfgen.canvas.Canvas(packet, pagesize=A4)
422 c.line(0, A4_HEIGHT, A4_WIDTH, A4_HEIGHT)
423 c.line(0, A4_HALF_HEIGHT, A4_WIDTH, A4_HALF_HEIGHT)
424 c.line(0, 0, A4_WIDTH, 0)
425 c.line(0, A4_HEIGHT, 0, 0)
426 c.line(A4_HALF_WIDTH, A4_HEIGHT, A4_HALF_WIDTH, 0)
427 c.line(A4_WIDTH, A4_HEIGHT, A4_WIDTH, 0)
429 new_pdf = pypdf.PdfReader(packet)
430 new_page.merge_page(new_pdf.pages[0])
431 printable_offset_x = printable_margin
432 printable_offset_y = printable_margin * A4_HEIGHT / A4_WIDTH
433 new_page.add_transformation(pypdf.Transformation().scale(printable_scale, printable_scale))
434 new_page.add_transformation(pypdf.Transformation().translate(tx=printable_offset_x, ty=printable_offset_y))
435 x_left_spine_limit = A4_HALF_WIDTH * bonus_shrink_factor
436 x_right_spine_limit = A4_WIDTH - x_left_spine_limit
437 if args.analyze or front_page:
438 packet = io.BytesIO()
439 c = reportlab.pdfgen.canvas.Canvas(packet, pagesize=A4)
443 c.line(x_left_spine_limit, A4_HEIGHT, x_left_spine_limit, 0)
444 c.line(x_right_spine_limit, A4_HEIGHT, x_right_spine_limit, 0)
447 draw_cut(c, x_left_spine_limit, (1))
448 draw_cut(c, x_right_spine_limit, (-1))
449 if args.analyze or front_page:
451 new_pdf = pypdf.PdfReader(packet)
452 new_page.merge_page(new_pdf.pages[0])
453 writer.add_page(new_page)
455 front_page = not front_page
458 for file in opened_files:
460 with open(args.output_file, 'wb') as output_file:
461 writer.write(output_file)
464 if __name__ == "__main__":
467 except HandledException as e:
468 handled_error_exit(e)